Pratyush Maini(@pratyushmaini) 's Twitter Profileg
Pratyush Maini

@pratyushmaini

Trustworthy ML | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhi

ID:1191440736517939200

linkhttp://pratyushmaini.github.io calendar_today04-11-2019 19:43:22

296 Tweets

1,1K Followers

346 Following

DatologyAI(@datologyai) 's Twitter Profile Photo

🎉 Thrilled to announce that DatologyAI has been named to the CB Insights AI 100 list! 🏆

The DatologyAI team is committed to continuing to advance the field of AI and empowering organizations with high-quality data. Stay tuned for more exciting updates! 😀

🎉 Thrilled to announce that DatologyAI has been named to the CB Insights AI 100 list! 🏆 The DatologyAI team is committed to continuing to advance the field of AI and empowering organizations with high-quality data. Stay tuned for more exciting updates! 😀
account_circle
DatologyAI(@datologyai) 's Twitter Profile Photo

It's Monday, my dudes, which means we're going to highlight some great data research that enables everyone to be a datologist.

Today we're highlighting amazing work from our very own Pratyush Maini: Scaling Laws for Data Filtering.

tl;dr: in a finite data regime (i.e. compute…

account_circle
Lucas Beyer (bl16)(@giffmana) 's Twitter Profile Photo

Been preaching (nor writing) this for a while: ANY data filter is a decision to not learn about sth. Think carefully: really want that? For small runs? For large runs? What else could you be inadvertently removing? Do you really know your filter? Looked at examples it removes?

account_circle
Pratyush Maini(@pratyushmaini) 's Twitter Profile Photo

💎💎 dropped by Sachin Goyal on the challenges of developing new scaling laws in academia, and that too with just ~10k GPU hours. Every training decision involved sooo much discussion because we had to choose our runs very miserly & wisely :)

account_circle
Zico Kolter(@zicokolter) 's Twitter Profile Photo

How do you balance repeat training on high quality data versus adding more low quality data to the mix? And how much do you train on each type? Pratyush Maini and Sachin Goyal provide scaling laws for such settings. Really excited about the work!

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

Argues that data curation cannot be agnostic of the total compute that a model will be trained for

repo: github.com/locuslab/scali…
abs: arxiv.org/abs/2404.07177

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic Argues that data curation cannot be agnostic of the total compute that a model will be trained for repo: github.com/locuslab/scali… abs: arxiv.org/abs/2404.07177
account_circle
Yiding Jiang(@yidingjiang) 's Twitter Profile Photo

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/
account_circle
Pratyush Maini(@pratyushmaini) 's Twitter Profile Photo

Enjoyed giving a talk on Rephrasing the Web at SambaNova Systems & Together AI over the past few days. Excited to see interest in synth data for pre-training LLMs.

Lots of interesting Qs concerning bias & factuality of synth data. Very important problems for future research!

Enjoyed giving a talk on Rephrasing the Web at @SambaNovaAI & @togethercompute over the past few days. Excited to see interest in synth data for pre-training LLMs. Lots of interesting Qs concerning bias & factuality of synth data. Very important problems for future research!
account_circle
Maksym Andriushchenko 🇺🇦(@maksym_andr) 's Twitter Profile Photo

🚨 Are leading safety-aligned LLMs adversarially robust? 🚨

❗In our new work, we jailbreak basically all of them with ≈100% success rate (according to GPT-4 as a semantic judge):
- Claude 1.2 / 2.0 / 2.1 / 3 Haiku / 3 Sonnet / 3 Opus,
- GPT-3.5 / GPT-4,
- R2D2-7B from…

🚨 Are leading safety-aligned LLMs adversarially robust? 🚨 ❗In our new work, we jailbreak basically all of them with ≈100% success rate (according to GPT-4 as a semantic judge): - Claude 1.2 / 2.0 / 2.1 / 3 Haiku / 3 Sonnet / 3 Opus, - GPT-3.5 / GPT-4, - R2D2-7B from…
account_circle
Urmish Thakker(@UrmishThakker) 's Twitter Profile Photo

We recently hosted Pratyush Maini at SambaNova Systems to talk about their work “Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling”

TLDR: By just rephrasing your existing datasets, you can achieve the same pre-training accuracy 3x faster with far lesser…

We recently hosted @pratyushmaini at @SambaNovaAI to talk about their work “Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling” TLDR: By just rephrasing your existing datasets, you can achieve the same pre-training accuracy 3x faster with far lesser…
account_circle