Leo Boytsov(@srchvrs) 's Twitter Profileg
Leo Boytsov

@srchvrs

Sr. Research Scientist @AWS Labs (ph-D @LTIatCMU) working on unnatural language processing, speaking πtorch & C++. Opinions sampled from MY OWN 100T param LM.

ID:87473622

linkhttp://searchivarius.org/about calendar_today04-11-2009 16:22:33

23,9K Tweets

7,4K Followers

1,9K Following

Leo Boytsov(@srchvrs) 's Twitter Profile Photo

An interesting mixture of expert variant. Strangely called an MoE-like architecture here. Apparently, MoE is becoming a term to denote a very specialized variant of MoE rather than MoE-concept in general.

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

RULER: What's the Real Context Size of Your Long-Context Language Models?

- new task categories multi-hop tracing and aggregation to test behaviors beyond searching from context
- all models exhibit large performance drops as the context length increases

arxiv.org/abs/2404.06654

RULER: What's the Real Context Size of Your Long-Context Language Models? - new task categories multi-hop tracing and aggregation to test behaviors beyond searching from context - all models exhibit large performance drops as the context length increases arxiv.org/abs/2404.06654
account_circle
AK(@_akhaliq) 's Twitter Profile Photo

In this work, we address this concern for tabular data. Specifically, we introduce a variety of different techniques to assess whether a language model has seen a tabular dataset during training. This investigation reveals that LLMs have memorized many popular tabular

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Shows superb capability that matches or even surpasses GPT4V and Gemini Pro in 10 of the 16 benchmarks

repo: github.com/InternLM/Inter…
abs: arxiv.org/abs/2404.06512

account_circle
Leo Boytsov(@srchvrs) 's Twitter Profile Photo

I have been recalling this factoid from time to time. One should continue by saying that only a few survived. The current start up craze (with most companies dying quickly) is not unprecedented. It was a similar 'attrition' rate 100 years ago.

account_circle
Ahmad Beirami(@abeirami) 's Twitter Profile Photo

Robustness methods
1) augment data with natural/synthetic perturbations and a consistency loss
2) reweight samples to improve generalization (like DRO)

We do it differently!
We show significant robustness with a simple tweak of the first layer and loss motivated by comms theory.

account_circle
Ahmad Beirami(@abeirami) 's Twitter Profile Photo

A periodic reminder to reviewers:

If you ask authors for more experiments, then you need to communicate a clear hypothesis you're trying to verify with those (e.g., effectiveness on imbalanced data, generalization beyond a certain modality, scalability, etc).

Otherwise don't!

account_circle
Delip Rao e/σ(@deliprao) 's Twitter Profile Photo

If this were a science paper, you would expect a country that picks its science workforce at random as a “weak baseline” and a leading nation like the US to actively experiment towards state-of-the-art, or at least beat the baseline.

Not providing a guaranteed path for…

account_circle