Zico Kolter(@zicokolter) 's Twitter Profileg
Zico Kolter

@zicokolter

Associate professor at Carnegie Mellon, VP and Chief Scientist at Bosch Center for AI. Researching (deep) machine learning, robustness, implicit layers.

ID:841499391508779008

linkhttp://zicokolter.com calendar_today14-03-2017 04:01:04

524 Tweets

14,9K Followers

499 Following

Zico Kolter(@zicokolter) 's Twitter Profile Photo

There's been a lot of discussion on LLMs 'memorizing' training data, but we argue for more nuance in the definition of 'memorize'. This work advocates for adversarial prompts (and whether they can be shorter than the output) as a metric for assessing memorization.

account_circle
Zico Kolter(@zicokolter) 's Twitter Profile Photo

How do you balance repeat training on high quality data versus adding more low quality data to the mix? And how much do you train on each type? Pratyush Maini and Sachin Goyal provide scaling laws for such settings. Really excited about the work!

account_circle
Pratyush Maini(@pratyushmaini) 's Twitter Profile Photo

1/ 🥁Scaling Laws for Data Filtering 🥁

TLDR: Data Curation *cannot* be compute agnostic!
In our paper, we develop the first scaling laws for heterogeneous & limited web data.

w/Sachin Goyal Zachary Lipton Aditi Raghunathan Zico Kolter
📝:arxiv.org/abs/2404.07177

1/ 🥁Scaling Laws for Data Filtering 🥁 TLDR: Data Curation *cannot* be compute agnostic! In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data. w/@goyalsachin007 @zacharylipton @AdtRaghunathan @zicokolter 📝:arxiv.org/abs/2404.07177
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

Argues that data curation cannot be agnostic of the total compute that a model will be trained for

repo: github.com/locuslab/scali…
abs: arxiv.org/abs/2404.07177

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic Argues that data curation cannot be agnostic of the total compute that a model will be trained for repo: github.com/locuslab/scali… abs: arxiv.org/abs/2404.07177
account_circle
Pratyush Maini(@pratyushmaini) 's Twitter Profile Photo

🤯The TOFU dataset (locuslab.github.io/tofu) had 300k+ downloads last month, and is in Top 20 most downloaded datasets on Hugging Face📈. This is crazy given how small the LLM unlearning community is compared to, say, LLM evals (for GSM8k). Excited to see what y'all are building!

🤯The TOFU dataset (locuslab.github.io/tofu) had 300k+ downloads last month, and is in Top 20 most downloaded datasets on @huggingface📈. This is crazy given how small the LLM unlearning community is compared to, say, LLM evals (for GSM8k). Excited to see what y'all are building!
account_circle
Yiding Jiang(@yidingjiang) 's Twitter Profile Photo

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/
account_circle
Patrick Chao(@patrickrchao) 's Twitter Profile Photo

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent?

Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs!

jailbreakbench.github.io
🧵1/n

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent? Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs! jailbreakbench.github.io 🧵1/n
account_circle
Sander Dieleman(@sedielem) 's Twitter Profile Photo

This blog post is an amazing exposition and analysis of consistency models, and how they relate to diffusion models, leading to several suggested improvements to the training procedure that look very promising. Definitely worth a read!

account_circle
Adrian Weller(@adrian_weller) 's Twitter Profile Photo

authors
Rebuttals are due by Mar 28 23:59 AoE.
If you think helpful, you can use an anonymous public link to additional material such as figures (even a whole revised manuscript if you want). Reviewers are not required to look at this additional material.
ICML Conference

account_circle
Zhengyang Geng(@ZhengyangGeng) 's Twitter Profile Photo

🚀Our latest blog post unveils the power of Consistency Models and introduces Easy Consistency Tuning (ECT), a new way to fine-tune pretrained diffusion models to consistency models.

SoTA fast generative models using 1/32 training cost! 🔽
Get ready to speed up your generative…

🚀Our latest blog post unveils the power of Consistency Models and introduces Easy Consistency Tuning (ECT), a new way to fine-tune pretrained diffusion models to consistency models. SoTA fast generative models using 1/32 training cost! 🔽 Get ready to speed up your generative…
account_circle
Samuel Sokota(@ssokota) 's Twitter Profile Photo

SOTA AI for games like poker & Hanabi rely on search methods that don’t scale to games w/ large amounts of hidden information.

In our ICLR paper, we introduce simple search methods that scale to large games & get SOTA for Hanabi w/ 100x less compute. 1/N

arxiv.org/abs/2304.13138

SOTA AI for games like poker & Hanabi rely on search methods that don’t scale to games w/ large amounts of hidden information. In our ICLR paper, we introduce simple search methods that scale to large games & get SOTA for Hanabi w/ 100x less compute. 1/N arxiv.org/abs/2304.13138
account_circle
Vaishnavh Nagarajan(@_vaishnavh) 's Twitter Profile Photo

🗣️ “Next-token predictors can’t plan!” ⚔️ ​​“False! Every distribution is expressible as product of next-token probabilities!” 🗣️

In work w/ Gregor Bachmann , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴 arxiv.org/abs/2403.06963

account_circle
Victor Akinwande(@aknvictor) 's Twitter Profile Photo

For large-scale causal discovery, there's no need to use NOTEARS for its speed. Consider using LiNGAM. We've parallelized it, achieving a 32x speed-up on GPUs.

NOTEARS:
Scalable: ✅
Identifiability guarantees: ❌

AcceleratedLiNGAM:
Scalable: ✅
Identifiability guarantees: ✅

For large-scale causal discovery, there's no need to use NOTEARS for its speed. Consider using LiNGAM. We've parallelized it, achieving a 32x speed-up on GPUs. NOTEARS: Scalable: ✅ Identifiability guarantees: ❌ AcceleratedLiNGAM: Scalable: ✅ Identifiability guarantees: ✅
account_circle
Mingjie Sun(@_mingjiesun) 's Twitter Profile Photo

Excited to share our new paper where we study the intriguing phenomenon of massive activations in LLMs.

I hope our findings can offer a fresh perspective into understanding the internal representations of these powerful models.

Work with Xinlei Chen Zico Kolter Zhuang Liu.

account_circle
Christopher De Sa(@chrismdesa) 's Twitter Profile Photo

We are excited to announce this year’s keynote speakers for : Jeff Dean Jeff Dean (@🏡), Zico Kolter Zico Kolter, and Yejin Choi Yejin Choi! MLSys this year will be held in Santa Clara on May 13–16. More details at mlsys.org.

account_circle
Zhuang Liu(@liuzhuang1234) 's Twitter Profile Photo

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper

“Massive Activations in Large Language Models”

LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)
account_circle
Zico Kolter(@zicokolter) 's Twitter Profile Photo

The ICML Conference 2024 Ethics Chairs, Kristian Lum and Lauren Oakden-Rayner 🏳️‍⚧️, wrote a blog about the ethics review. Helpful for all authors and reviewers at ICML to better understand the process! medium.com/@icml2024pc/et… . . .

account_circle
Zico Kolter(@zicokolter) 's Twitter Profile Photo

To any mid-senior ML researchers who want to start coding again (see also: x.com/zicokolter/sta…), consider volunteering to serve as a program chair for ICML! Clearly it's a trend...

account_circle
Zico Kolter(@zicokolter) 's Twitter Profile Photo

I've made some substantial updates to my chatllm-vscode extension (long-form LLM chats as VSCode notebooks):
1. GPT-4Vision + Dall-E support
2. ollama support to use local LLMs (including LLaVA for vision)
3. Azure API support (including via SSO)

Link: marketplace.visualstudio.com/items?itemName…

account_circle
Runtian Zhai(@RuntianZhai) 's Twitter Profile Photo

Unlabeled data is crucial for modern ML. It provides info about data distribution P, but how to exploit such info?

Given a kernel K, our spotlight gives a general & principled way: Spectrally Transformed Kernel Regression (STKR). Camera-ready 👇
arxiv.org/abs/2402.00645

account_circle