TAGinDS(@TAGinDS) 's Twitter Profile Photo

🔓🔍✨ How much can one learn about a language model by only making queries to its API? Tomorrow at 11 AM PST / 2 PM EST (link below), Daniel Paleka (ETH Zurich) Daniel Paleka will discuss the first attack that extracts nontrivial info from production language models using API calls.

🔓🔍✨ How much can one learn about a language model by only making queries to its API? Tomorrow at 11 AM PST / 2 PM EST (link below), Daniel Paleka (ETH Zurich) @dpaleka will discuss the first attack that extracts nontrivial info from production language models using API calls.
account_circle
Ishaan(@ishaan_jaff) 's Twitter Profile Photo

💥 LiteLLM now powers github.com/homanp/nagato by homanp and github.com/safevideo/auto… by SafeVideo AI
🔎 Updated Tutorial on how to use fine tuned gpt-3.5-turbo with litellm h/t to Daniel Paleka: docs.litellm.ai/docs/tutorials…
github.com/homanp/nagato

account_circle
Usman Anwar(@usmananwar391) 's Twitter Profile Photo

LLMs lack adversarial robustness and are vulnerable to jailbreaks and prompt-injections which compromise their security! How can we make progress on making LLMs more robust to such atttacks?

Daniel Paleka gives an overview of this challenge here:
x.com/dpaleka/status…

account_circle
SaTML Conference(@satml_conf) 's Twitter Profile Photo

LLM Capture the Flag! Be the defender (prompt the LLM to hide a secret number) or attacker (query the LLM to reveal the secret number). Some organizers include @AbdelnabiSahar Edoardo Debenedetti Mario Fritz Kai Greshake Thorsten Holz Daphne Ippolito Daniel Paleka Lea Schönherr Florian Tramèr 4/5

LLM Capture the Flag! Be the defender (prompt the LLM to hide a secret number) or attacker (query the LLM to reveal the secret number). Some organizers include @AbdelnabiSahar @edoardo_debe @mariojfritz @KGreshake @thorstenholz @daphneipp @dpaleka @leaschnherr @florian_tramer 4/5
account_circle
Florian Tramèr(@florian_tramer) 's Twitter Profile Photo

Very excited about this work on evaluating ML models when ground truth is unknown (eg when models are superhuman, or simply when humans are bad at the task)
We argue that when accuracy of individual decisions is hard to assess, we should look for 'logic bugs' across decisions

account_circle
Reshmi Ghosh(@reshmigh) 's Twitter Profile Photo

Couldn’t make it to IEEE SaTML Conference in person but incredibly thankful to Javier Rando Edoardo Debenedetti Daniel Paleka for helping us pre-record our 2nd position winning presentation and presenting it at the venue. Big THANK YOU!

Looking forward to the paper presentation!

account_circle
Vamshi Krishna(@VictorKnox99) 's Twitter Profile Photo

Found this paper (lnkd.in/esD98A3Q) by Lukas Fluri , Daniel Paleka, and Florian Tramèr pretty interesting. So, I decided to present it at our lab, led by Manas Gaur

In case you are interested, the recording of my presentation can be found at: youtube.com/watch?v=fQBu35…

account_circle
Cas (Stephen Casper)(@StephenLCasper) 's Twitter Profile Photo

🧵🧵🧵

Sometimes people ask me if I have a favorite paper. It's hard to answer, but lately, I have been saying this one. Below, I'll explain why we should have more work like it.

Authors are @javi_rando Daniel Paleka David Lindner Lennart Heim Florian Tramèr

arxiv.org/abs/2210.04610

account_circle
AI Safety Papers(@safe_paper) 's Twitter Profile Photo

Evaluating Superhuman Models with Consistency Checks
Lukas Fluri (Lukas Fluri), Daniel Paleka (@dpaleka), Florian Tramèr (@florian_tramer)

arxiv.org/abs/2306.09983

Tags: LLM evaluation, scalable oversight

The authors of this paper propose a framework for evaluating models that

account_circle
Daniel Eth (yes, Eth is my actual last name)(@daniel_271828) 's Twitter Profile Photo

“the highest impact grant would be to someone who understands sociology of academia to figure out how not to screw it up in predictable ways”
This does seem like a very impactful project for the right person to pursue

account_circle
Jacques(@JacquesThibs) 's Twitter Profile Photo

If this tweet does well, I’ll do a list for the best places to find AI Safety relevant papers!

But for now, have a look at my friend Daniel Paleka’s newsletter.

account_circle