Giskard is a tool I just learned exists for "automatic vulnerability detection for LLMs."
"With Giskard, data scientists can scan their model (tabular, NLP and LLMs) to find dozens of hidden vulnerabilities, instantaneously generate domain-specific tests, and leverage the Quality Assurance best practices of the open-source community."
"According to the Open Worldwide Application Security Project, some of the most critical vulnerabilities that affect LLMs are Prompt Injection (when LLMs are manipulated to behave as the attacker wishes), Sensitive Information Disclosure (when LLMs inadvertently leak confidential information), and Hallucination (when LLMs generate inaccurate or inappropriate content)."
"Giskard's scan feature ensures the automatic identification of such vulnerabilities, and many others. The library generates a comprehensive report which quantifies these into interpretable metrics."
"Issues detected include: Hallucinations, harmful content generation, prompt injection, robustness issues, sensitive information disclosure, stereotypes & discrimination, many more..."
I wonder how it does all that? Very intriguing. I had a glance at the source code repository, but it looks like I'd have to really dig in in order to figure out how this system works.
I found it exists from reading this article about how to use it with MLflow, "an open-source platform for managing end-to-end machine learning (ML) workflows."