Reliability #Check: An #Analysis of #GPT-3's Response to Sensitive Topics and Prompt Wording

source: https://arxiv.org/abs/2306.06199

Large language models (LLMs) have become mainstream technology with their versatile use cases and impressive performance. Despite the countless out-of-the-box applications, LLMs are still not reliable. A lot of work is being done to improve the factual accuracy, consistency, and ethical standards of these models through fine-tuning, prompting, and Reinforcement Learning with Human Feedback (RLHF), but no systematic analysis of the responses of these models to different categories of statements, or on their potential vulnerabilities to simple prompting changes is available.

#problem #truth #reality #llm #technology #ai #openAI #chatgpt #science #software