Noam Brown(@polynoamial) 's Twitter Profileg
Noam Brown

@polynoamial

Researching reasoning @OpenAI | Co-created Libratus/Pluribus, the first superhuman no-limit poker AIs | Co-created CICERO | PhD from @SCSatCMU

ID:825088493764407298

linkhttp://www.noambrown.com calendar_today27-01-2017 21:10:01

870 Tweets

35,8K Followers

621 Following

Noam Brown(@polynoamial) 's Twitter Profile Photo

I’m at ICLR! Looking forward to catching up with friends, meeting new folks, and trying some Viennese schnitzel

account_circle
Noam Brown(@polynoamial) 's Twitter Profile Photo

Well said. There is a big opportunity for a neutral third party like Scale AI to step in as the 'Moody's of LLMs' and provide rigorous and comprehensive evals of all models.

account_circle
Hugh Zhang(@hughbzhang) 's Twitter Profile Photo

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.
account_circle
Noam Brown(@polynoamial) 's Twitter Profile Photo

Someone on the admissions committee for a top CS PhD program told me they no longer filter based on paper count because too many of the applicants already have multiple publications. Instead, they now filter by citation count. Not sure if he was joking but I believed it.

account_circle
lmsys.org(@lmsysorg) 's Twitter Profile Photo

🔥Exciting news -- GPT-4-Turbo has just reclaimed the No. 1 spot on the Arena leaderboard again! Woah!

We collect over 8K user votes from diverse domains and observe its strong coding & reasoning capability over others. Hats off to OpenAI for this incredible launch!

To offer…

🔥Exciting news -- GPT-4-Turbo has just reclaimed the No. 1 spot on the Arena leaderboard again! Woah! We collect over 8K user votes from diverse domains and observe its strong coding & reasoning capability over others. Hats off to @OpenAI for this incredible launch! To offer…
account_circle
Naman Jain(@StringChaos) 's Twitter Profile Photo

The new GPT-4-Turbo improves an impressive 4.5 points on LiveCodeBench (comprising competition-style programming problems).

These problems are quite challenging for current LLMs and this improvement highlights a considerable improvement in reasoning!!

x.com/polynoamial/st…

The new GPT-4-Turbo improves an impressive 4.5 points on LiveCodeBench (comprising competition-style programming problems). These problems are quite challenging for current LLMs and this improvement highlights a considerable improvement in reasoning!! x.com/polynoamial/st…
account_circle
Noam Brown(@polynoamial) 's Twitter Profile Photo

I tried to get a friend to read the scaling laws paper but they said it was too long so I sent them this gif instead

account_circle
Leo Gao(@nabla_theta) 's Twitter Profile Photo

Eliezer Yudkowsky ⏹️ while computers may excel at soft skills like creativity and emotional understanding, they will never match human ability at dispassionate, mechanical reasoning

account_circle