OcciGlot (@occiglot) Twitter Tweets • TwiCopy

DFKI

1 month ago

OcciGlot - New Open Source Language Models for Europe released 🇪🇺

Researchers from DFKI and hessian.AI have launched the OcciGlot initiative to develop generative open source language models for European languages.

👉🏼 dfki.de/en/web/news/oc…

thumb_up_off_alt12

chat_bubble_outline0

repeat6

shareShare

account_circle

Hai Duong 'Čan' Tran

@PhoBoAI

1 month ago

📢Máme málo neanglických LLM benchmarků! Pojďte se zapojit do Hugging Face 🤗komunitního projektu a pojďmě společně přeložit 500 anglických promptů do čestiny pro následné automatické evaluace.

Více na huggingface.co/spaces/DIBT-Cz….

thumb_up_off_alt5

chat_bubble_outline0

repeat5

shareShare

account_circle

Alexander Doria

@Dorialexander

1 month ago

Common corpus is an international initiative coordinated by @pleias_fr with the support of the state start-up LANGU:IA, supported by French Ministry of Culture and Direction interministérielle du numérique Also involving the open science LLM (Occiglot, Eleuther AI) and cultural heritage communities (@storytracer)

thumb_up_off_alt33

chat_bubble_outline0

repeat6

shareShare

account_circle

OcciGlot

@occiglot

1 month ago

500B public domain dataset released by pleias today. And of course it’s multilingual.

We’re very excited about our ongoing collaboration. More cool things to come 🚀

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

account_circle

Jan P. Harries

@jphme

1 month ago

Interesting comparison of 'German' LLMs on Reddit - finds that DiscoResearch DiscoLM German 7b outputs best 'native-sounding' quality despite being behind in benchmarks - followed by #Occiglot DE EN Instruct which also uses our data 🙂🙌 1/2

Interesting comparison of 'German' LLMs on Reddit - finds that @DiscoResearchAI DiscoLM German 7b outputs best 'native-sounding' quality despite being behind in benchmarks - followed by #Occiglot DE EN Instruct which also uses our data 🙂🙌 1/2

thumb_up_off_alt35

chat_bubble_outline0

repeat7

shareShare

account_circle

OcciGlot

@occiglot

2 months ago

We just made a large-scale evaluation sweep of tokenizer performance across European languages.
We’re sharing it publicly as part of our commitment to transparent research, and hoping it might be helpful for others.

occiglot.github.io/occiglot/posts…

thumb_up_off_alt24

chat_bubble_outline0

repeat8

shareShare

account_circle

Marktechpost AI Research News ⚡

@Marktechpost

2 months ago

Meet Occiglot: A Large-Scale Research Collective for Open-Source Development of Large Language Models by and for Europe

marktechpost.com/2024/03/08/mee…

#ArtificialIntelligence #DataScience #LLMs OcciGlot

thumb_up_off_alt21

chat_bubble_outline0

repeat8

shareShare

account_circle

DiscoResearch

@DiscoResearchAI

2 months ago

Checkout #Occiglot a large-scale research collective for open-source development of LLMs: huggingface.co/occiglot

A great open model initative by many of our friends hessian.AI , DFKI, TU Darmstadt and we are happy to contribute a (small) dataset for instruction tuning 🙂.

thumb_up_off_alt13

chat_bubble_outline0

repeat6

shareShare

account_circle