Noam Brown (@polynoamial) Twitter Tweets • TwiCopy

Well said. There is a big opportunity for a neutral third party like Scale AI to step in as the 'Moody's of LLMs' and provide rigorous and comprehensive evals of all models.

thumb_up_off_alt164

chat_bubble_outline0

repeat20

shareShare

account_circle

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

account_circle

Noam Brown

@polynoamial

1 month ago

Llama 3 is out in 8B and 70B sizes! (400B still training) Congrats to the AI at Meta team! ai.meta.com/blog/meta-llam…

account_circle

Noam Brown

@polynoamial

1 month ago

Too many startups focused on what GPT-4 isn't, not enough startups focused on what future models could be

account_circle

Noam Brown

@polynoamial

1 month ago

Someone on the admissions committee for a top CS PhD program told me they no longer filter based on paper count because too many of the applicants already have multiple publications. Instead, they now filter by citation count. Not sure if he was joking but I believed it.

account_circle

Noam Brown

@polynoamial

1 month ago

Eval numbers for the new GPT-4 Turbo are out

account_circle

lmsys.org

@lmsysorg

1 month ago

🔥Exciting news -- GPT-4-Turbo has just reclaimed the No. 1 spot on the Arena leaderboard again! Woah!

We collect over 8K user votes from diverse domains and observe its strong coding & reasoning capability over others. Hats off to OpenAI for this incredible launch!

To offer…

account_circle

Naman Jain

@StringChaos

1 month ago

The new GPT-4-Turbo improves an impressive 4.5 points on LiveCodeBench (comprising competition-style programming problems).

These problems are quite challenging for current LLMs and this improvement highlights a considerable improvement in reasoning!!

x.com/polynoamial/st…

account_circle

Noam Brown

@polynoamial

1 month ago

I tried to get a friend to read the scaling laws paper but they said it was too long so I sent them this gif instead

account_circle

Leo Gao

@nabla_theta

2 months ago

Eliezer Yudkowsky ⏹️ while computers may excel at soft skills like creativity and emotional understanding, they will never match human ability at dispassionate, mechanical reasoning

account_circle

Noam Brown

Noam Brown

Noam Brown

Noam Brown

Noam Brown

Noam Brown

Hugh Zhang

Noam Brown

Noam Brown

Noam Brown

Noam Brown

lmsys.org

Naman Jain

Noam Brown

Leo Gao