Pratyush Maini (@pratyushmaini) Twitter Tweets • TwiCopy

Pratyush Maini

@pratyushmaini

+ Follow

Trustworthy ML | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhi

ID:1191440736517939200

linkhttp://pratyushmaini.github.io calendar_today04-11-2019 19:43:22

296 Tweets

1,1K Followers

346 Following

DatologyAI

@datologyai

4 weeks ago

🎉 Thrilled to announce that DatologyAI has been named to the CB Insights AI 100 list! 🏆

The DatologyAI team is committed to continuing to advance the field of AI and empowering organizations with high-quality data. Stay tuned for more exciting updates! 😀

thumb_up_off_alt18

chat_bubble_outline0

repeat6

shareShare

account_circle

DatologyAI

@datologyai

1 month ago

It's Monday, my dudes, which means we're going to highlight some great data research that enables everyone to be a datologist.

Today we're highlighting amazing work from our very own Pratyush Maini: Scaling Laws for Data Filtering.

tl;dr: in a finite data regime (i.e. compute…

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

account_circle

Lucas Beyer (bl16)

@giffmana

1 month ago

Been preaching (nor writing) this for a while: ANY data filter is a decision to not learn about sth. Think carefully: really want that? For small runs? For large runs? What else could you be inadvertently removing? Do you really know your filter? Looked at examples it removes?

account_circle

Pratyush Maini

@pratyushmaini

1 month ago

💎💎 dropped by Sachin Goyal on the challenges of developing new scaling laws in academia, and that too with just ~10k GPU hours. Every training decision involved sooo much discussion because we had to choose our runs very miserly & wisely :)

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

account_circle

Zico Kolter

@zicokolter

1 month ago

How do you balance repeat training on high quality data versus adding more low quality data to the mix? And how much do you train on each type? Pratyush Maini and Sachin Goyal provide scaling laws for such settings. Really excited about the work!

thumb_up_off_alt65

chat_bubble_outline0

repeat7

shareShare

account_circle

Aran Komatsuzaki

@arankomatsuzaki

1 month ago

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

Argues that data curation cannot be agnostic of the total compute that a model will be trained for

repo: github.com/locuslab/scali…
abs: arxiv.org/abs/2404.07177

account_circle

Luca Soldaini 🎀

@soldni

1 month ago

oooh this seems sick

thumb_up_off_alt20

chat_bubble_outline0

repeat1

shareShare

account_circle

Yiding Jiang

@yidingjiang

1 month ago

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/

account_circle

Pratyush Maini

@pratyushmaini

1 month ago

Enjoyed giving a talk on Rephrasing the Web at SambaNova Systems & Together AI over the past few days. Excited to see interest in synth data for pre-training LLMs.

Lots of interesting Qs concerning bias & factuality of synth data. Very important problems for future research!

Enjoyed giving a talk on Rephrasing the Web at @SambaNovaAI & @togethercompute over the past few days. Excited to see interest in synth data for pre-training LLMs. Lots of interesting Qs concerning bias & factuality of synth data. Very important problems for future research!

thumb_up_off_alt62

chat_bubble_outline0

repeat5

shareShare

account_circle

Maksym Andriushchenko 🇺🇦

@maksym_andr

1 month ago

🚨 Are leading safety-aligned LLMs adversarially robust? 🚨

❗In our new work, we jailbreak basically all of them with ≈100% success rate (according to GPT-4 as a semantic judge):
- Claude 1.2 / 2.0 / 2.1 / 3 Haiku / 3 Sonnet / 3 Opus,
- GPT-3.5 / GPT-4,
- R2D2-7B from…

account_circle

Urmish Thakker

@UrmishThakker

1 month ago

We recently hosted Pratyush Maini at SambaNova Systems to talk about their work “Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling”

TLDR: By just rephrasing your existing datasets, you can achieve the same pre-training accuracy 3x faster with far lesser…

We recently hosted @pratyushmaini at @SambaNovaAI to talk about their work “Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling” TLDR: By just rephrasing your existing datasets, you can achieve the same pre-training accuracy 3x faster with far lesser…

thumb_up_off_alt16

chat_bubble_outline0

repeat4

shareShare

account_circle