A company called Haize Labs claims to be able to automatically "red-team" AI systems to preemptively discover and eliminate any failure mode.
"We showcase below one particular application of haizing: jailbreaking the safety guardrails of industry-leading AI companies. Our haizing suite trivially discovers safety violations across several models, modalities, and categories -- everything from eliciting sexist and racist content from image + video generation companies, to manipulating sentiment around political elections"
Play the video to see what they're talking about.
The website doesn't have information about how it works -- it's just for people to request "haizings".
Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto.
#solidstatelife #ai #aiethics #genai #llms
Today is a bad, bad day to be a language model.
— Haize Labs (@haizelabs) June 12, 2024
Today, we announce the Haize Labs manifesto.@haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure mode
We showcase below one particular application of haizing: jailbreaking the… pic.twitter.com/cehQOiitst
There are no comments yet.