Dnext

June 7, 2024 2:57am

"I stumbled upon LLM Kryptonite."

"For a bit over a year I've been studying and working with a range of large language models (LLMs). Most users see LLMs wired into web interfaces, creating chatbots like ChatGPT, Copilot, and Gemini. But many of these models can also be accessed through APIs under a pay-as-you-go usage model. With a bit of Python coding, it's easy enough to create custom apps with these APIs."

"I have a client who asked for my assistance building a tool to automate some of the most boring bits of his work as an intellectual property attorney."

"Some parts involve value judgements such as 'does this seem close to that?' where 'close' doesn't have a strict definition -- more of a vibe than a rule. That's the bit an AI-based classifier should be able to perform 'well enough'"

"I set to work on writing a prompt for that classifier, beginning with something very simple."

"Copilot Pro sits on top of OpenAI's best-in-class model, GPT-4. Typed the prompt in, and hit return."

"The chatbot started out fine -- for the first few words in its response. Then it descended into a babble-like madness."

"No problem with that, I have pretty much all the chatbots -- Gemini, Claude, ChatGPT+, LLamA 3, Meta AI, Mistral, Mixtral."

"I ran through every chatbot I could access and -- with the single exception of Anthropic's Claude 3 Sonnet -- I managed to break every single one of them."

He doesn't present the prompt, though.

I stumbled upon LLM Kryptonite and no one wants to fix it

#solidstatelife #ai #genai #llms #adversarialexamples