Dnext

September 26, 2024 2:43am

DataGemma refers to a new family of "Gemma" models from Google that are integrated with Data Commns, which is supposed to "reduce hallucination".

"Data Commons is a publicly available knowledge graph containing over 240 billion rich data points across hundreds of thousands of statistical variables. It sources this public information from trusted organizations like the United Nations (UN), the World Health Organization (WHO), Centers for Disease Control and Prevention (CDC) and Census Bureaus. Combining these datasets into one unified set of tools and AI models empowers policymakers, researchers and organizations seeking accurate insights."

I clicked over to the Data Commons website and clicked "Demographics", then "What languages are spoken at home in California". It gave me a page, "California: Non-English Languages", which said: Spanish 10.5M, Chinese 1.12M, Tagalog 788K, Vietnamese 559K, Korean 360K, Arabic 205K, Hindi 203K, ... Hmm, I would not have expected a language from the Philippines (Tagalog) to exceed a language from India (Hindi).

Anyway, in case you're wondering how the integration works, there's two methods: RIG and RAG.

"RIG (Retrieval-Interleaved Generation) enhances the capabilities of our language model, Gemma 2, by proactively querying trusted sources and fact-checking against information in Data Commons. When DataGemma is prompted to generate a response, the model is programmed to identify instances of statistical data and retrieve the answer from Data Commons."

"RAG (Retrieval-Augmented Generation) enables language models to incorporate relevant information beyond their training data, absorb more context, and enable more comprehensive and informative outputs. With DataGemma, this was made possible by leveraging Gemini 1.5 Pro's long context window. DataGemma retrieves relevant contextual information from Data Commons before the model initiates response generation, thereby minimizing the risk of hallucinations and enhancing the accuracy of responses."

DataGemma: Using real-world data to address AI hallucinations

#solidstatelife #ai #genai #llms #gemini #gemma

DataGemma: Using real-world data to address AI hallucinations

Introducing DataGemma, the first open models designed to connect LLMs with extensive real-world data drawn from Google's Data Commons.

There are no comments yet.