A company called Haize Labs claims to be able to automatically "red-team" AI systems to preemptively discover and eliminate any failure mode.

"We showcase below one particular application of haizing: jailbreaking the safety guardrails of industry-leading AI companies. Our haizing suite trivially discovers safety violations across several models, modalities, and categories -- everything from eliciting sexist and racist content from image + video generation companies, to manipulating sentiment around political elections"

Play the video to see what they're talking about.

The website doesn't have information about how it works -- it's just for people to request "haizings".

Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto.

#solidstatelife #ai #aiethics #genai #llms

There are no comments yet.