Twitter search results for "dpaleka" • TwiCopy

Ponnurangam Kumaraguru “PK”

1 week ago

“Stealing Part of a Production Language Model.”,
25th May, led by Sumit Kumar and Shiven Sinha \c A. Feder Cooper Arthur Conmy David Rolnick Daniel Paleka Krishnamurthy (Dj) Dvijotham Jonathan Hayase Thomas Steinke [4/N]

$“Stealing Part of a Production Language Model.”, 25th May, led by @sumitkk01010 and @sinha_shiven \c @afedercooper @ArthurConmy @david_rolnick @dpaleka @DjDvij @JonathanHayase @shortstein [4/N]$

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

account_circle

🔓🔍✨ How much can one learn about a language model by only making queries to its API? Tomorrow at 11 AM PST / 2 PM EST (link below), Daniel Paleka (ETH Zurich) Daniel Paleka will discuss the first attack that extracts nontrivial info from production language models using API calls.

🔓🔍✨ How much can one learn about a language model by only making queries to its API? Tomorrow at 11 AM PST / 2 PM EST (link below), Daniel Paleka (ETH Zurich) @dpaleka will discuss the first attack that extracts nontrivial info from production language models using API calls.

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

account_circle

Devoxx

@Devoxx

6 months ago

A project to test how sensitive #LLMs play to irrelevant chess factors @ github.com/dpaleka/llm-ch…

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

account_circle

Ishaan

@ishaan_jaff

6 months ago

💥 LiteLLM now powers github.com/homanp/nagato by homanp and github.com/safevideo/auto… by SafeVideo AI
🔎 Updated Tutorial on how to use fine tuned gpt-3.5-turbo with litellm h/t to Daniel Paleka: docs.litellm.ai/docs/tutorials…
github.com/homanp/nagato

thumb_up_off_alt66

chat_bubble_outline0

repeat8

shareShare

account_circle

Roko

@RokoMijic

4 months ago

Stuff is going on in AI safety:

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

account_circle

Javier Rando

@javirandor

1 month ago

Cool stuff going on at the SaTML Conference poster session! Come and check posters from Edoardo Debenedetti, Daniel Paleka and Lukas Fluri

Cool stuff going on at the @satml_conf poster session! Come and check posters from @edoardo_debe, @dpaleka and @LukasFluri_

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

account_circle

Usman Anwar

@usmananwar391

1 month ago

LLMs lack adversarial robustness and are vulnerable to jailbreaks and prompt-injections which compromise their security! How can we make progress on making LLMs more robust to such atttacks?

Daniel Paleka gives an overview of this challenge here:
x.com/dpaleka/status…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

account_circle

Sasha Rush

@srush_nlp

3 months ago

Daniel Paleka Matthew Carrigan

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

account_circle

Katja Grace 🔍

@KatjaGrace

1 year ago

norvid_studies Daniel Paleka AI Impacts I'm guessing the 'sorted by overall optimism' also doesn't do what you want?

@norvid_studies @dpaleka @AIImpacts I'm guessing the 'sorted by overall optimism' also doesn't do what you want?

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

account_circle

SaTML Conference

@satml_conf

8 months ago

LLM Capture the Flag! Be the defender (prompt the LLM to hide a secret number) or attacker (query the LLM to reveal the secret number). Some organizers include @AbdelnabiSahar Edoardo Debenedetti Mario Fritz Kai Greshake Thorsten Holz Daphne Ippolito Daniel Paleka Lea Schönherr Florian Tramèr 4/5

account_circle

Florian Tramèr

@florian_tramer

10 months ago

Very excited about this work on evaluating ML models when ground truth is unknown (eg when models are superhuman, or simply when humans are bad at the task)
We argue that when accuracy of individual decisions is hard to assess, we should look for 'logic bugs' across decisions

thumb_up_off_alt88

chat_bubble_outline0

repeat8

shareShare

account_circle

Reshmi Ghosh

@reshmigh

1 month ago

Couldn’t make it to IEEE SaTML Conference in person but incredibly thankful to Javier Rando Edoardo Debenedetti Daniel Paleka for helping us pre-record our 2nd position winning presentation and presenting it at the venue. Big THANK YOU!

Looking forward to the paper presentation!

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

account_circle

Consistently Candid Data Generating Process

@datagenproc

2 weeks ago

Daniel Paleka Tanishq Mathew Abraham, Ph.D. Thanks for this clarification!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

account_circle

Vamshi Krishna

@VictorKnox99

8 months ago

Found this paper (lnkd.in/esD98A3Q) by Lukas Fluri , Daniel Paleka, and Florian Tramèr pretty interesting. So, I decided to present it at our lab, led by Manas Gaur

In case you are interested, the recording of my presentation can be found at: youtube.com/watch?v=fQBu35…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

account_circle

Cas (Stephen Casper)

@StephenLCasper

11 months ago

🧵🧵🧵

Sometimes people ask me if I have a favorite paper. It's hard to answer, but lately, I have been saying this one. Below, I'll explain why we should have more work like it.

Authors are @javi_rando Daniel Paleka David Lindner Lennart Heim Florian Tramèr

arxiv.org/abs/2210.04610

thumb_up_off_alt66

chat_bubble_outline0

repeat9

shareShare

account_circle

AI Safety Papers

@safe_paper

10 months ago

Evaluating Superhuman Models with Consistency Checks
Lukas Fluri (Lukas Fluri), Daniel Paleka (@dpaleka), Florian Tramèr (@florian_tramer)

arxiv.org/abs/2306.09983

Tags: LLM evaluation, scalable oversight

The authors of this paper propose a framework for evaluating models that

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

account_circle

Daniel Eth (yes, Eth is my actual last name)

@daniel_271828

1 year ago

“the highest impact grant would be to someone who understands sociology of academia to figure out how not to screw it up in predictable ways”
This does seem like a very impactful project for the right person to pursue

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

account_circle

Usman Anwar

@usmananwar391

1 month ago

Special thanks to all of my co-authors: Abu Saparov Javier Rando Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh, Erik Jenner, Cas (Stephen Casper), Oliver Sourbut, Ben Edelman, Zhaowei (Jarvis) Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

account_circle

Jacques

@JacquesThibs

1 month ago

If this tweet does well, I’ll do a list for the best places to find AI Safety relevant papers!

But for now, have a look at my friend Daniel Paleka’s newsletter.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

account_circle