Zico Kolter (@zicokolter) Twitter Tweets • TwiCopy

Zico Kolter

@zicokolter

+ Follow

Associate professor at Carnegie Mellon, VP and Chief Scientist at Bosch Center for AI. Researching (deep) machine learning, robustness, implicit layers.

ID:841499391508779008

linkhttp://zicokolter.com calendar_today14-03-2017 04:01:04

524 Tweets

14,9K Followers

499 Following

Zico Kolter

1 week ago

There's been a lot of discussion on LLMs 'memorizing' training data, but we argue for more nuance in the definition of 'memorize'. This work advocates for adversarial prompts (and whether they can be shorter than the output) as a metric for assessing memorization.

thumb_up_off_alt58

chat_bubble_outline0

account_circle

Zico Kolter

3 weeks ago

How do you balance repeat training on high quality data versus adding more low quality data to the mix? And how much do you train on each type? Pratyush Maini and Sachin Goyal provide scaling laws for such settings. Really excited about the work!

thumb_up_off_alt65

chat_bubble_outline0

account_circle

Pratyush Maini

3 weeks ago

1/ 🥁Scaling Laws for Data Filtering 🥁

TLDR: Data Curation *cannot* be compute agnostic!
In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data.

w/Sachin Goyal Zachary Lipton Aditi Raghunathan Zico Kolter
📝:arxiv.org/abs/2404.07177

1/ 🥁Scaling Laws for Data Filtering 🥁 TLDR: Data Curation *cannot* be compute agnostic! In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data. w/@goyalsachin007 @zacharylipton @AdtRaghunathan @zicokolter 📝:arxiv.org/abs/2404.07177

thumb_up_off_alt294

chat_bubble_outline0

account_circle

Aran Komatsuzaki

@arankomatsuzaki

3 weeks ago

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

Argues that data curation cannot be agnostic of the total compute that a model will be trained for

repo: github.com/locuslab/scali…
abs: arxiv.org/abs/2404.07177

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic Argues that data curation cannot be agnostic of the total compute that a model will be trained for repo: github.com/locuslab/scali… abs: arxiv.org/abs/2404.07177

thumb_up_off_alt460

chat_bubble_outline0

account_circle

Pratyush Maini

3 weeks ago

🤯The TOFU dataset (locuslab.github.io/tofu) had 300k+ downloads last month, and is in Top 20 most downloaded datasets on Hugging Face📈. This is crazy given how small the LLM unlearning community is compared to, say, LLM evals (for GSM8k). Excited to see what y'all are building!

🤯The TOFU dataset (locuslab.github.io/tofu) had 300k+ downloads last month, and is in Top 20 most downloaded datasets on @huggingface📈. This is crazy given how small the LLM unlearning community is compared to, say, LLM evals (for GSM8k). Excited to see what y'all are building!

thumb_up_off_alt74

chat_bubble_outline0

account_circle

Yiding Jiang

3 weeks ago

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/

thumb_up_off_alt135

chat_bubble_outline0

account_circle

Patrick Chao

1 month ago

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent?

Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs!

jailbreakbench.github.io
🧵1/n

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent? Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs! jailbreakbench.github.io 🧵1/n

thumb_up_off_alt172

chat_bubble_outline0

account_circle

Sander Dieleman

1 month ago

This blog post is an amazing exposition and analysis of consistency models, and how they relate to diffusion models, leading to several suggested improvements to the training procedure that look very promising. Definitely worth a read!

thumb_up_off_alt88

chat_bubble_outline0

account_circle

Adrian Weller

1 month ago

#ICML2024 authors
Rebuttals are due by Mar 28 23:59 AoE.
If you think helpful, you can use an anonymous public link to additional material such as figures (even a whole revised manuscript if you want). Reviewers are not required to look at this additional material.
ICML Conference

thumb_up_off_alt72

chat_bubble_outline0

account_circle

Zhengyang Geng

1 month ago

🚀Our latest blog post unveils the power of Consistency Models and introduces Easy Consistency Tuning (ECT), a new way to fine-tune pretrained diffusion models to consistency models.

SoTA fast generative models using 1/32 training cost! 🔽
Get ready to speed up your generative…

🚀Our latest blog post unveils the power of Consistency Models and introduces Easy Consistency Tuning (ECT), a new way to fine-tune pretrained diffusion models to consistency models. SoTA fast generative models using 1/32 training cost! 🔽 Get ready to speed up your generative…

thumb_up_off_alt142

chat_bubble_outline0

account_circle

Samuel Sokota

1 month ago

SOTA AI for games like poker & Hanabi rely on search methods that don’t scale to games w/ large amounts of hidden information.

In our ICLR paper, we introduce simple search methods that scale to large games & get SOTA for Hanabi w/ 100x less compute. 1/N

arxiv.org/abs/2304.13138

SOTA AI for games like poker & Hanabi rely on search methods that don’t scale to games w/ large amounts of hidden information. In our ICLR paper, we introduce simple search methods that scale to large games & get SOTA for Hanabi w/ 100x less compute. 1/N arxiv.org/abs/2304.13138

thumb_up_off_alt333

chat_bubble_outline0

account_circle

Vaishnavh Nagarajan

1 month ago

🗣️ “Next-token predictors can’t plan!” ⚔️ “False! Every distribution is expressible as product of next-token probabilities!” 🗣️

In work w/ Gregor Bachmann , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴 arxiv.org/abs/2403.06963

thumb_up_off_alt388

chat_bubble_outline0

account_circle

Victor Akinwande

1 month ago

For large-scale causal discovery, there's no need to use NOTEARS for its speed. Consider using LiNGAM. We've parallelized it, achieving a 32x speed-up on GPUs.

NOTEARS:
Scalable: ✅
Identifiability guarantees: ❌

AcceleratedLiNGAM:
Scalable: ✅
Identifiability guarantees: ✅

For large-scale causal discovery, there's no need to use NOTEARS for its speed. Consider using LiNGAM. We've parallelized it, achieving a 32x speed-up on GPUs. NOTEARS: Scalable: ✅ Identifiability guarantees: ❌ AcceleratedLiNGAM: Scalable: ✅ Identifiability guarantees: ✅

thumb_up_off_alt41

chat_bubble_outline0

account_circle

Mingjie Sun

2 months ago

Excited to share our new paper where we study the intriguing phenomenon of massive activations in LLMs.

I hope our findings can offer a fresh perspective into understanding the internal representations of these powerful models.

Work with Xinlei Chen Zico Kolter Zhuang Liu.

thumb_up_off_alt32

chat_bubble_outline0

account_circle

Christopher De Sa

2 months ago

We are excited to announce this year’s keynote speakers for #MLSys2024 : Jeff Dean Jeff Dean (@🏡), Zico Kolter Zico Kolter, and Yejin Choi Yejin Choi! MLSys this year will be held in Santa Clara on May 13–16. More details at mlsys.org.

thumb_up_off_alt81

chat_bubble_outline0

account_circle

Zhuang Liu

2 months ago

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper

“Massive Activations in Large Language Models”

LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

thumb_up_off_alt1,1K

chat_bubble_outline0

account_circle

Zico Kolter

2 months ago

The ICML Conference 2024 Ethics Chairs, Kristian Lum and Lauren Oakden-Rayner 🏳️‍⚧️, wrote a blog about the ethics review. Helpful for all authors and reviewers at ICML to better understand the process! medium.com/@icml2024pc/et… . . .

thumb_up_off_alt24

chat_bubble_outline0

account_circle

Zico Kolter

2 months ago

To any mid-senior ML researchers who want to start coding again (see also: x.com/zicokolter/sta…), consider volunteering to serve as a program chair for ICML! Clearly it's a trend...

thumb_up_off_alt69

chat_bubble_outline0

account_circle

Zico Kolter

3 months ago

I've made some substantial updates to my chatllm-vscode extension (long-form LLM chats as VSCode notebooks):
1. GPT-4Vision + Dall-E support
2. ollama support to use local LLMs (including LLaVA for vision)
3. Azure API support (including via SSO)

Link: marketplace.visualstudio.com/items?itemName…

thumb_up_off_alt100

chat_bubble_outline0

account_circle

Runtian Zhai

3 months ago

Unlabeled data is crucial for modern ML. It provides info about data distribution P, but how to exploit such info?

Given a kernel K, our #ICLR2024 spotlight gives a general & principled way: Spectrally Transformed Kernel Regression (STKR). Camera-ready 👇
arxiv.org/abs/2402.00645

thumb_up_off_alt59

chat_bubble_outline0

account_circle

fpc ok :)