Gregor Bachmann(@GregorBachmann1) 's Twitter Profileg
Gregor Bachmann

@GregorBachmann1

I am a PhD student @ETH Zürich working on deep learning. MLP-pilled 💊.

https://t.co/yWdDEV6Z15

ID:1527256391806746624

calendar_today19-05-2022 11:54:49

84 Tweets

230 Followers

272 Following

Ayça Takmaz(@aycatakmaz) 's Twitter Profile Photo

📢Call for papers

If you work on open-vocabulary 3D scene understanding, consider submitting your work to our workshop OpenSUN3D!

⌛️Deadline: April 1st, 2024🏃
Only 1 week left to submit your 8-page full papers or 4-page abstracts!

More info: opensun3d.github.io

📢Call for papers If you work on open-vocabulary 3D scene understanding, consider submitting your work to our #CVPR2024 workshop OpenSUN3D! ⌛️Deadline: April 1st, 2024🏃 Only 1 week left to submit your 8-page full papers or 4-page abstracts! More info: opensun3d.github.io
account_circle
Gregor Bachmann(@GregorBachmann1) 's Twitter Profile Photo

From stochastic parrot 🦜 to Clever Hans 🐴? In our work with Vaishnavh Nagarajan we carefully analyse the debate surrounding next-token prediction and identify a new failure of LLMs due to teacher-forcing 👨🏻‍🎓! Check out our work arxiv.org/abs/2403.06963 and the linked thread!

From stochastic parrot 🦜 to Clever Hans 🐴? In our work with @_vaishnavh we carefully analyse the debate surrounding next-token prediction and identify a new failure of LLMs due to teacher-forcing 👨🏻‍🎓! Check out our work arxiv.org/abs/2403.06963 and the linked thread!
account_circle
Vaishnavh Nagarajan(@_vaishnavh) 's Twitter Profile Photo

🗣️ “Next-token predictors can’t plan!” ⚔️ ​​“False! Every distribution is expressible as product of next-token probabilities!” 🗣️

In work w/ Gregor Bachmann , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴 arxiv.org/abs/2403.06963

account_circle
Lorenzo Noci(@lorenzo_noci) 's Twitter Profile Photo

Why in neural networks the learning rate can transfer from small to large models (both in width and depth)? It turns out that the sharpness dynamics can explain it. Check out our new work! arxiv.org/abs/2402.17457

w/ Alex Meterez (co-first),

Antonio Orvieto and T. Hofmann

account_circle
Dimitri von Rütte(@dvruette) 's Twitter Profile Photo

🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵
arxiv.org/abs/2402.14433

🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵 arxiv.org/abs/2402.14433
account_circle
Ayça Takmaz(@aycatakmaz) 's Twitter Profile Photo

Our workshop ☀️OpenSUN 3D🌍 on Open-Vocabulary 3D Scene Understanding will be held in conjunction with #CVPR2024 2024 at Seattle!

Call for papers is out!

account_circle
Dimitri von Rütte(@dvruette) 's Twitter Profile Photo

🚨 Calling on all FABRIC users! We need your help to learn about how you’ve been using FABRIC. Help us by taking 5 minutes to fill out the survey.

Haven’t tried FABRIC yet? Just try it using our Gradio demo! ✨👨‍🎨

📊 Survey: forms.gle/aMWLDW8xvyhkLb…
👾 Demo:

account_circle
Enis Simsar(@enisimsar) 's Twitter Profile Photo

🌟 Excited to present LIME, localized image editing via cross-attention regularization without extra data, re-training, or fine-tuning!

Collaboration with Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari

📄 Paper: arxiv.org/pdf/2312.09256
🔗 Project: enisimsar.github.io/LIME

account_circle
Gregor Bachmann(@GregorBachmann1) 's Twitter Profile Photo

I’ll be presenting 'Scaling MLPs' at , tomorrow (Wed) at 10:45am!
Hyped to discuss things like inductive bias, the bitter lesson, compute-optimality and scaling laws 👷⚖️📈

I’ll be presenting 'Scaling MLPs' at #NeurIPS2023, tomorrow (Wed) at 10:45am! Hyped to discuss things like inductive bias, the bitter lesson, compute-optimality and scaling laws 👷⚖️📈
account_circle
Ayça Takmaz(@aycatakmaz) 's Twitter Profile Photo

Today Elisabetta Fedele and I will present our work OpenMask3D at NeurIPS Conference 🎷

Visit our poster to learn more about OpenMask3D or to chat with us!

📍 Great Hall & Hall B1+B2 (level 1) #906
🕰️ 10:45-12:45
🌎 openmask3d.github.io

Francis Engelmann Federico Tombari Marc Pollefeys

Today @efedele16 and I will present our work OpenMask3D at @NeurIPSConf 🎷 Visit our poster to learn more about OpenMask3D or to chat with us! 📍 Great Hall & Hall B1+B2 (level 1) #906 🕰️ 10:45-12:45 🌎 openmask3d.github.io @FrancisEngelman @fedassa @mapo1
account_circle
Yuhui Ding(@yuhui_ding) 's Twitter Profile Photo

Inspired by recent breakthroughs in SSMs, we propose a new architecture, Graph Recurrent Encoding by Distance (GRED), for long-range graph representation learning: arxiv.org/abs/2312.01538
with Antonio Orvieto, Bobby and Thomas Hofmann (1/4)

account_circle
Gregor Bachmann(@GregorBachmann1) 's Twitter Profile Photo

Want to train a compute-optimal model but get there faster?
Try shape-adaptive training and follow the optimal curve for different “shape' configurations 🏎️💨!
Check-out Sotiris Anagnostidis and my work for more!  
📝arxiv.org/abs/2311.03233

Want to train a compute-optimal model but get there faster? Try shape-adaptive training and follow the optimal curve for different “shape' configurations 🏎️💨! Check-out @SAnagnostidis and my work for more!   📝arxiv.org/abs/2311.03233
account_circle
Sotiris Anagnostidis(@SAnagnostidis) 's Twitter Profile Photo

Scaling laws predict the minimum required amount of compute to reach a given performance, but can we do better? Yes, if we allow for a flexible 'shape' of the model! 🤸

Scaling laws predict the minimum required amount of compute to reach a given performance, but can we do better? Yes, if we allow for a flexible 'shape' of the model! 🤸
account_circle
Vaishnavh Nagarajan(@_vaishnavh) 's Twitter Profile Photo

Isn’t it arbitrary that a Transformer must produce the K+1'th token by attending to only K vectors in each layer?

In work led by Sachin Goyal, we explore a way to break this rule: by appending copies of a *single* “pause” token to delay the output.

arxiv.org/abs/2310.02226 1/

Isn’t it arbitrary that a Transformer must produce the K+1'th token by attending to only K vectors in each layer? In work led by @goyalsachin007, we explore a way to break this rule: by appending copies of a *single* “pause” token to delay the output. arxiv.org/abs/2310.02226 1/
account_circle
Vaishaal Shankar(@Vaishaal) 's Twitter Profile Photo

I had an argument with Preetum Nakkiran about MLPs 4 years ago. He said with enough data + compute the MLP/ConvNet gap would go to 0. I was convolution-pilled and convinced this wasn't possible. He was right: arxiv.org/abs/2306.13575

account_circle
Ayça Takmaz(@aycatakmaz) 's Twitter Profile Photo

Our ICCV Workshop ☀️OpenSUN3D🌍 on Open-Vocabulary 3D Scene Understanding will take place tomorrow afternoon at #ICCV2023!

Date: October 3rd, Tuesday
Time: 13:20-17:30
Location: E06

More info: OpenSUN3D.github.io

Our ICCV Workshop ☀️OpenSUN3D🌍 on Open-Vocabulary 3D Scene Understanding will take place tomorrow afternoon at @ICCVConference! Date: October 3rd, Tuesday Time: 13:20-17:30 Location: E06 More info: OpenSUN3D.github.io
account_circle
Ayça Takmaz(@aycatakmaz) 's Twitter Profile Photo

We will be at to present Human3D 🧑‍🤝‍🧑!

📌Poster: Wednesday, October 4th - 10:30-12:30, Paper ID 4949 - Room 'Nord' - 103

Project page: human-3d.github.io
Code & data: github.com/human-3d

Jonas Schult Irem Kaftan Mertcan Akçay Francis Engelmann Siyu Tang @VLG-ETHZ

account_circle