“Stealing Part of a Production Language Model.”,
25th May, led by Sumit Kumar and Shiven Sinha \c A. Feder Cooper Arthur Conmy David Rolnick Daniel Paleka Krishnamurthy (Dj) Dvijotham Jonathan Hayase Thomas Steinke [4/N]
🔓🔍✨ How much can one learn about a language model by only making queries to its API? Tomorrow at 11 AM PST / 2 PM EST (link below), Daniel Paleka (ETH Zurich) Daniel Paleka will discuss the first attack that extracts nontrivial info from production language models using API calls.
💥 LiteLLM now powers github.com/homanp/nagato by homanp and github.com/safevideo/auto… by SafeVideo AI
🔎 Updated Tutorial on how to use fine tuned gpt-3.5-turbo with litellm h/t to Daniel Paleka: docs.litellm.ai/docs/tutorials…
github.com/homanp/nagato
Cool stuff going on at the SaTML Conference poster session! Come and check posters from Edoardo Debenedetti, Daniel Paleka and Lukas Fluri
LLMs lack adversarial robustness and are vulnerable to jailbreaks and prompt-injections which compromise their security! How can we make progress on making LLMs more robust to such atttacks?
Daniel Paleka gives an overview of this challenge here:
x.com/dpaleka/status…
norvid_studies Daniel Paleka AI Impacts I'm guessing the 'sorted by overall optimism' also doesn't do what you want?
LLM Capture the Flag! Be the defender (prompt the LLM to hide a secret number) or attacker (query the LLM to reveal the secret number). Some organizers include @AbdelnabiSahar Edoardo Debenedetti Mario Fritz Kai Greshake Thorsten Holz Daphne Ippolito Daniel Paleka Lea Schönherr Florian Tramèr 4/5
Couldn’t make it to IEEE SaTML Conference in person but incredibly thankful to Javier Rando Edoardo Debenedetti Daniel Paleka for helping us pre-record our 2nd position winning presentation and presenting it at the venue. Big THANK YOU!
Looking forward to the paper presentation!
Found this paper (lnkd.in/esD98A3Q) by Lukas Fluri , Daniel Paleka, and Florian Tramèr pretty interesting. So, I decided to present it at our lab, led by Manas Gaur
In case you are interested, the recording of my presentation can be found at: youtube.com/watch?v=fQBu35…
🧵🧵🧵
Sometimes people ask me if I have a favorite paper. It's hard to answer, but lately, I have been saying this one. Below, I'll explain why we should have more work like it.
Authors are @javi_rando Daniel Paleka David Lindner Lennart Heim Florian Tramèr
arxiv.org/abs/2210.04610
Evaluating Superhuman Models with Consistency Checks
Lukas Fluri (Lukas Fluri), Daniel Paleka (@dpaleka), Florian Tramèr (@florian_tramer)
arxiv.org/abs/2306.09983
Tags: LLM evaluation, scalable oversight
The authors of this paper propose a framework for evaluating models that
Special thanks to all of my co-authors: Abu Saparov Javier Rando Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh, Erik Jenner, Cas (Stephen Casper), Oliver Sourbut, Ben Edelman, Zhaowei (Jarvis) Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo
If this tweet does well, I’ll do a list for the best places to find AI Safety relevant papers!
But for now, have a look at my friend Daniel Paleka’s newsletter.