Ben Schmidt / @benmschmidt@sigmoid.social(@benmschmidt) 's Twitter Profileg
Ben Schmidt / @[email protected]

@benmschmidt

VP of Information Design @nomic_ai, building new ways to interpret and shape embedding models. Onetime history/digital humanities prof. @bschmidt.bsky.social

ID:222618390

linkhttps://benschmidt.org calendar_today03-12-2010 23:11:42

7,8K Tweets

10,1K Followers

1,2K Following

Follow People
Alexander Doria(@Dorialexander) 's Twitter Profile Photo

Overall Common Crawl is a snapshot of some part of what was published, primarily in English and other European languages from 2010-2024 (and even then, stuff prior to 2020 is often disregarded). More data has been dreamt before or aside that than you can ever conceive.

account_circle
Alexander Doria(@Dorialexander) 's Twitter Profile Photo

Yes but no. If we completely disregard copyright (I won’t) LLM labs are only running of data if they refuse to become what they should be: digitization and data orgs. Books alone are 150-200m and this is still dwarfed by the massive amount of magazines and newspapers content.

account_circle
Dmitry Kobak(@hippopedoid) 's Twitter Profile Photo

Great to see the openTSNE paper finally published! Pavlin's openTSNE is by far the best t-SNE implementation out there. Very fast, very easy to use, but also very flexible and easy to extend and build upon. We use it in all our t-SNE-related projects. Great job Pavlin Poličar!

Great to see the openTSNE paper finally published! Pavlin's openTSNE is by far the best t-SNE implementation out there. Very fast, very easy to use, but also very flexible and easy to extend and build upon. We use it in all our t-SNE-related projects. Great job @pavlinpolicar!
account_circle
Ben Schmidt / @benmschmidt@sigmoid.social(@benmschmidt) 's Twitter Profile Photo

Job alert Nomic AI ! I'm hiring a front end/Web engineer to build the next generation of data interfaces for curating, exploring, and model-building from text and image data. Apply here or pass it on: jobs.ashbyhq.com/nomic.ai/42e7a…

account_circle
Jonathan 🦬 jsench.bsky.social(@jsench) 's Twitter Profile Photo

In last decade’s renovation of the Library Mall space at UW-Madison, the open speaker’s podium that was built into campus post-Vietnam specifically to facilitate protest & assembly was torn down & replaced with a non-functional sculpture called “Both/And – Tolerance/Innovation.”

In last decade’s renovation of the Library Mall space at UW-Madison, the open speaker’s podium that was built into campus post-Vietnam specifically to facilitate protest & assembly was torn down & replaced with a non-functional sculpture called “Both/And – Tolerance/Innovation.”
account_circle
Phil Gentry(@pmgentry) 's Twitter Profile Photo

Very rarely do I get to pull a “I wrote a book on this subject and you are very mistaken” card, but the moment has arrived.

account_circle
Ben Schmidt / @benmschmidt@sigmoid.social(@benmschmidt) 's Twitter Profile Photo

As a constant cynic about Harvard, my model for humanities PhDs at hyper-wealthy schools is not cheap labor, but a perk that admins give to faculty. Great researchers love teaching grad seminars, advising disses, etc. more than they want, say, a foosball table in the lounge.

account_circle
Rob Townsend (also @rbtownsend.bsky.social)(@rbthisted) 's Twitter Profile Photo

'In 2020, the number of humanities bachelor’s degrees awarded (for the entire range of disciplines) fell below 200,000 for the first time since 2002—and then fell again in both 2021 and 2022' bit.ly/3xENm32

'In 2020, the number of humanities bachelor’s degrees awarded (for the entire range of disciplines) fell below 200,000 for the first time since 2002—and then fell again in both 2021 and 2022' bit.ly/3xENm32
account_circle
AndriyMulyar(@andriy_mulyar) 's Twitter Profile Photo

Just a reminder you can literally train nomic embed from scratch, the training data is public.

The only above 62% on mteb below 500M params to do so 😀

account_circle
Dmitry Kobak(@hippopedoid) 's Twitter Profile Photo

Our paper 'The landscape of biomedical research' is out in Patterns journal! Great job by Rita González Márquez.

cell.com/patterns/fullt…

Amazing interactive explorer by Ben Schmidt / @[email protected] from Nomic AI: static.nomic.ai/pubmed.html

For details see my original Twitter thread: x.com/hippopedoid/st….

Our paper 'The landscape of biomedical research' is out in @Patterns_CP! Great job by @ritagonmar. cell.com/patterns/fullt… Amazing interactive explorer by @benmschmidt from @nomic_ai: static.nomic.ai/pubmed.html For details see my original Twitter thread: x.com/hippopedoid/st….
account_circle
Ben Schmidt / @benmschmidt@sigmoid.social(@benmschmidt) 's Twitter Profile Photo

The fake accounts here seem to be getting a lot worse, no? I've now got multiple bots with *the same profile picture* liking old reposts. Seems hard for me to believe that this is either useful spamming, *or* something that would be hard for functioning auto-moderation to find.

The fake accounts here seem to be getting a lot worse, no? I've now got multiple bots with *the same profile picture* liking old reposts. Seems hard for me to believe that this is either useful spamming, *or* something that would be hard for functioning auto-moderation to find.
account_circle