OcciGlot(@occiglot) 's Twitter Profileg
OcciGlot

@occiglot

Open Source Language Models for Europe

ID:1757339431432822784

calendar_today13-02-2024 09:42:27

18 Tweets

197 Followers

10 Following

DFKI(@DFKI) 's Twitter Profile Photo

OcciGlot - New Open Source Language Models for Europe released 🇪🇺

Researchers from DFKI and hessian.AI have launched the OcciGlot initiative to develop generative open source language models for European languages.

👉🏼 dfki.de/en/web/news/oc…

OcciGlot - New Open Source Language Models for Europe released 🇪🇺 Researchers from DFKI and @Hessian_AI have launched the @occiglot initiative to develop generative open source language models for European languages. 👉🏼 dfki.de/en/web/news/oc…
account_circle
Hai Duong 'Čan' Tran(@PhoBoAI) 's Twitter Profile Photo

📢Máme málo neanglických LLM benchmarků! Pojďte se zapojit do Hugging Face 🤗komunitního projektu a pojďmě společně přeložit 500 anglických promptů do čestiny pro následné automatické evaluace.

Více na huggingface.co/spaces/DIBT-Cz….

account_circle
Alexander Doria(@Dorialexander) 's Twitter Profile Photo

Common corpus is an international initiative coordinated by @pleias_fr with the support of the state start-up LANGU:IA, supported by French Ministry of Culture and Direction interministérielle du numérique Also involving the open science LLM (Occiglot, Eleuther AI) and cultural heritage communities (@storytracer)

account_circle
OcciGlot(@occiglot) 's Twitter Profile Photo

500B public domain dataset released by pleias today. And of course it’s multilingual.

We’re very excited about our ongoing collaboration. More cool things to come 🚀

account_circle
Jan P. Harries(@jphme) 's Twitter Profile Photo

Interesting comparison of 'German' LLMs on Reddit - finds that DiscoResearch DiscoLM German 7b outputs best 'native-sounding' quality despite being behind in benchmarks - followed by DE EN Instruct which also uses our data 🙂🙌 1/2

Interesting comparison of 'German' LLMs on Reddit - finds that @DiscoResearchAI DiscoLM German 7b outputs best 'native-sounding' quality despite being behind in benchmarks - followed by #Occiglot DE EN Instruct which also uses our data 🙂🙌 1/2
account_circle
OcciGlot(@occiglot) 's Twitter Profile Photo

We just made a large-scale evaluation sweep of tokenizer performance across European languages.
We’re sharing it publicly as part of our commitment to transparent research, and hoping it might be helpful for others.

occiglot.github.io/occiglot/posts…

account_circle
Marktechpost AI Research News ⚡(@Marktechpost) 's Twitter Profile Photo

Meet Occiglot: A Large-Scale Research Collective for Open-Source Development of Large Language Models by and for Europe

marktechpost.com/2024/03/08/mee…

OcciGlot

account_circle
DiscoResearch(@DiscoResearchAI) 's Twitter Profile Photo

Checkout a large-scale research collective for open-source development of LLMs: huggingface.co/occiglot

A great open model initative by many of our friends hessian.AI , DFKI, TU Darmstadt and we are happy to contribute a (small) dataset for instruction tuning 🙂.

account_circle