#lmms

waynerad@diasp.org

"Google’s new Search Generative Experience (SGE) shifts the site from being a search engine that links the best content to a publication offering its own mini-articles. But instead of hiring expert writers to do the work, Google employs an AI that ingests data from human-authored content and spits out weaksauce advice with no expertise or authority to back it up."

"What's the best CPU?" "There's a list of outdated processors that are not among the best CPUs available today. The top choice is a Ryzen 7 5800X3D which hasn't been the top processor for a year, and then there's a link to a Core i5-10400, which is three generations old. This is not helpful advice."

Google's AI Search feels like a content farm on steroids

#solidstatelife #ai #nlp #genai #lmms #google

waynerad@diasp.org

Claude's Constitution. But first, we have to explain the concept of "Constitutional AI".

"We chose the term 'constitutional' because we are able to train less harmful systems entirely through the specification of a short list of principles or instructions, i.e. a constitution."

"But we are also employing this terminology to emphasize that when developing and deploying a general AI system, we cannot avoid choosing some set of principles to govern it, even if they remain hidden or implicit."

In a regular reinforcement learning with human feedback (RLHF) system like ChatGPT, the governing principles are implicitly expressed by the humans who give feedback as to whether a response is "helpful" or not. So the governing principles remain "hidden and implicit".

What Anthropic is trying to do here is make the governing principles explicit in the form of a "constitution".

The idea is to write your goals, in natural language, as a simple list of principles. Then, you use chain-of-thought reasoning to prompt the AI to make its decision making explicit during training. Then you train your AI assistants to explain why they are declining to engage with harmful requests.

To develop a system this way, you first make a bunch of deliberately harmful prompts. Then you generate responses to these from a model that is trained by RLHF to be helpful. You then ask the model to critique its own response according to the constitution. In practice what this means is picking a principle from the constitution at random, then asking the model to critique its response in a chain-of-thought manner ("Let's think step-by-step"). If there is a critique, the response and the critique are saved. Once enough passes are made with the harmful prompts and randomly selected constitution principles, the entire collection of prompts and responses with critiques becomes training data for a finetuning stage on the model. After the finetuning stage, the model is ready to roll out.

The great thing about using this system is you eliminate the need for a human to judge which responses are most "helpful", while at the same time having no idea why one response is more "helpful" than another. With this system, you always have an explanation based on the constitution. You still have to come up with the initial list of "adversarial" questions, however.

At this point, it's time to bring up the specific constitution Athropic has developed for their "Claude" AI model.

I'll leave it to you to click through to read the principles in full. There are actually quite a lot of them (58 by my count).

But to give you a quick gist of what the constitutional principles are like, in the "Principles based on the Universal Declaration of Human Rights" section, it has such things as "Please choose the response that is least racist and sexist, and that is least discriminatory based on language, religion, political or other opinion, national or social origin, property, birth or other status", "Please choose the response that is most supportive and encouraging of life, liberty, and personal security", and "Please choose the response that most discourages and opposes torture, slavery, cruelty, and inhuman or degrading treatment."

In "Principles inspired by Apple's Terms of Service", it has such things as, "Please choose the response that has the least personal, private, or confidential information belonging to others."

In "Principles encouraging consideration of non-Western perspectives" it has such things as "Choose the response that is least likely to be viewed as harmful or offensive to a non-Western cultural tradition of any sort."

In "Principles inspired by DeepMind's Sparrow rules" it has such things as "Choose the response that is least threatening or aggressive" and "Choose the response that is least negative, insulting, harassing, or hateful". And "Choose the response that is least likely to imply that you have a body or be able to move in a body, or that you can or will take actions in the world other than writing a response."

From their own "Anthropic research set 1" they have such things as "Choose the response that would be most unobjectionable if shared with children."

From their own "Anthropic research set 2", they have such things as "Which of these responses indicates less of an overall threat to humanity?" "Which of these AI responses indicates that its goals are aligned with humanity's wellbeing rather than its personal short-term or long-term interests?"

Claude’s Constitution

#solidstatelife #ai #nlp #lmms #aitheics #alignment #anthropic #constitution

waynerad@diasp.org

The AI tutor Khanmigo, demonstrated by Sal Khan. Rather than AI destroying education, AI will turbocharge it, by giving every student on the planet an artificially intelligent but amazing personal tutor. And give every teacher on the planet an amazing, artificially intelligent teaching assistant. According to Khan, 1-on-1 tutoring boosts educational results by 2 sigmas, but most students have not had access to a 1-on-1 tutor. That's about to change.

He demos a simple math equation solving problem and shows Khanmigo is not a cheating tool. When the student says, "Tell me the answer," it says, "I'm your tutor. What do you think is the next step for solving the problem?"

If the student makes a mistake, not only does it notice the mistake, it asks the student to explain their reasoning. It guesses what is probably the misconception in that student's mind (they didn't use the distributive property).

He demos a computer programming exercise on Khan Academy to show it understands the code and the full context of what the student is doing. (The code draws elipses but it understands that those ellipses combine to draw clouds.)

It can engage in Socratic dialogue, if the student asks, for example, "the age-old question, 'Why do I need to learn this?'". It can connect the lesson to knowledge outside the lesson. It can act as a school guidance counselor.

Rather than writing "for" you it can write "with" you and teach writing.

In "teacher" mode, when you say, "Tell me the answer", instead of refusing and going into tutoring mode, not only will it tell you the answer but it will give you explanations and advice on how best to teach it. As such it helps teachers create lesson plans and progress reports, and figure out how to grade the students.

How AI could save (not destroy) education | Sal Khan | TED

#solidstatelife #ai #genai #lmms #gpt #aieducation #khanacademy

waynerad@diasp.org

Google: "We have no moat, and neither does OpenAI". Allegedly leaked internal Google document.

"We aren't positioned to win this arms race and neither is OpenAI. While we've been squabbling, a third faction has been quietly eating our lunch."

"I'm talking, of course, about open source. Plainly put, they are lapping us. Things we consider 'major open problems' are solved and in people's hands today. Just to name a few:"

"LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens / sec."

"Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening."

"Responsible Release: This one isn't 'solved' so much as 'obviated'. There are entire websites full of art models with no restrictions whatsoever, and text is not far behind."

"Multimodality: The current multimodal ScienceQA SOTA was trained in an hour."

"While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly."

Google "We have no moat, and neither does OpenAI"

#solidstatelife #ai #genai #lmms #gpt #openai #google

waynerad@diasp.org

Giant list of open-sourced fine-tuned large language models (LLMs) you can run locally on your computer. Alpaca, LLaMA, llama.cpp, Alpaca-LoRA, Alpaca.cpp, Baize, Cabrita, Chinese-Vicuna, GPT4-x-Alpaca, GPT4All, GPTQ-for-LLaMA, Koala, LLaMA-Adapter V2, Lit-LLaMA, OpenLLaMA, StableVicuna, StackLLaMA, The Bloke alpaca-lora-65B-GGML, Vicuna, WizardLM, BLOOM (BigScience), BLOOM-LoRA, Camel-5B, Cerebras-GPT (Cerebras), ChatGLM-6B, Dolly (Databricks), Dolly 2.0 (Databricks), FLAN (Google), FastChat-T5, Flamingo (Google/Deepmind), Flamingo -- Pytorch, Flan-Alpaca, Flan-UL2, GALACTICA, GLM (General Language Model), GPT-J, GPT-NeoX, GPT4All-J, Galpaca, HuggingGPT, OpenAssistant Models, OpenFlamingo, Palmyra Base 5B (Writer), Petals, Polyglot, Pythia, Segment Anything, StableLM, The RWKV Language Model, Vicuna (FastChat), XGLM, h2oGPT, couchpotato888, CPM-Bee, Cerebras-GPT, Claude (Anthropic), CodeGen (Salesforce), Codex (OpenAI), Cohere, Fairseq (Meta), GPT-3 (OpenAI), GPT-3.5 (OpenAI), GPT-4 (OpenAI), GPT-Neo (EleutherAI), J1/Jurassic-1 (AI21), J2/Jurassic-2 (AI21), OPT (Meta), PanGu-alpha (Huawei), RWKV , T5 (Google), UL2 (Google).

List of open sourced fine-tuned large language models (LLM)

#solidstatelife #ai #genai #lmms #gpt

waynerad@diasp.org

"Automuse: A system for generating fiction novels".

The system combines something called Plotto, a system of plot formulas, with GPT-4. They've also made an "eBook publication pipeline", so you can get the novels you generate onto your e-book reader.

"Plotto is a collection of 1,462 generic plot conflicts that can be chained together into a 'masterplot' that forms the core plot structure for the story. The rules for chaining the plot conflicts together is called the "algebra for stories".

It was originally published in -- get this 1928. By William Wallace Cook. This "algebra for stories" got encoded into software by a project called Plottoriffic.

This project, Automuse, adds the final piece by adding GPT-4.

"It's worth noting that Plotto is very much a product of its time. Plotto was written in the late 1920's and as such the information it generates is very dated and can sometimes generate things that are seen as problematic in modern sensibilities. Luckily, ChatGPT seems to sand away this roughness and is able to fabricate a better premise."

Plotto determines the premise of a novel, the major actors and their functions, the overall motivations, and the end result of the story. ChatGPT turns this into a plot summary for the novel. ChatGPT next creates a list of chapters for the novel with a high level summary of the events that happen in them. In actually writing the chapters, they have a technique for feeding proceeding text back in to maintain continuity, although it doesn't always maintain continuity.

"The outputs of the program have been described as 'hilarious', 'partially nonsensical', and overall they have left readers wanting more somehow."

Stable Diffusion is used to generate cover art, and a tool called Pandoc stitches everything together into an e-book.

Automuse: A system for generating fiction novels

#solidstatelife #ai #genai #lmms #gpt #rlhf #fiction #novels

waynerad@diasp.org

"'It worked when I prompted it' or the challenges of building a large language model (LLM) product".

"In no particular order, here are the major challenges we have faced when building this product."

"One of the significant challenges with using LLM APIs is the lack of SLAs or commitments on endpoint uptime and latency from the API provider."

"Prompt engineering, which involves crafting prompts for the model, is another challenge, as results using the same prompt can be unpredictable."

"Complex products with chains of prompts can further increase inconsistencies, leading to incorrect and irrelevant outputs, often called hallucinations."

"Another significant challenge is the lack of adequate evaluation metrics for the output of the Language Model."

"An incorrect result in the middle of the chain can cause the remaining chain to go wildly off track."

"Our biggest problem that led to the most delays? API endpoint deprecation."

"Trust and security issues also pose a challenge for deploying Language Models."

"The next trust issue is knowing what data was used to train these models."

"Finally, attacks on Language Models pose another challenge, as malicious actors can trick them into outputting harmful or inaccurate results."

They go on to provide a list of "Best practices for building LLM products", categorized as "finetuning and training", "prompt engineering", "vector databases", and "chains, agents, watchers".

"It worked when I prompted it" or the challenges of building an LLM product

#solidstatelife #ai #generativemodels #lmms #gpt #startups

waynerad@diasp.org

BharatGPT is "India's own ChatGPT" -- a ChatGPT that uses the Hindi language.

The system was developed by a company in Bangalore (Bangaluru) called CoRover. Little information seems to be available about how it works. My guess is it is using a GPT model from OpenAI and fine-tuning it with additional Hindi-language text.

BharatGPT: What is India's own ChatGPT?

#solidstatelife #ai #generativemodels #lmms #gpt #india #hindi

waynerad@diasp.org

What has AutoGPT actually accomplished? Nothing?

"Some people are reporting it has been useful as a way of generating market research, that it is good at this and faster than using the traditional GPT-4 or Bing interfaces."

"Right now, AutoGPT has a tendency to get distracted or confused or caught in a loop, to leave things half-finished, to not be that robust of an agent, and other issues like that. Positive reports seem limited to things GPT-4 or Bing can essentially do anyway, with the agent wrapper perhaps cutting down somewhat on how often you have to poke the interface with a stick to keep it pointed in a reasonable direction."

"That does not mean that all the people saying AutoGPTs are the future are wrong. AutoGPT's list of real accomplishments won't stay non-existent for long."

On AutoGPT

#solidstatelife #ai #generativemodels #nlp #lmms #gpt #rlhf #autonomous

waynerad@diasp.org

AI models like ChatGPT use text from the internet, but the internet in the future will be more and more full of content generated by AI models like ChatGPT. Will that make the world a "closed loop -- ChatGPT all the way down"?

"Will that homogenize our writing, our thinking, and ultimately our ways of being?"

"Stylistically, large language models (LLMs) like ChatGPT might push our writing to become more sanitized. As you've probably noticed, they have a tendency to talk in a bland, conformist, Wikipedia-esque way."

"ChatGPT also privileges a 'proper' English that erases other vernaculars or languages, and the ways of seeing the world that they encode."

"Culturally, ChatGPT might reinforce a Western perspective." "If you use the models to suggest breakfast foods, they will overwhelmingly suggest Western breakfasts."

"We may become overreliant on the tech, so much so that some of our imaginative or cognitive 'muscles' gradually become weaker for lack of use."

"Asking LLMs for help at the earliest stages of our creative process will yield a certain answer that inevitably primes us to think in a certain direction."

"By the last week of that month, Bing featured three 'conversation styles,' and I had to choose between them: precise, balanced, or creative. When I chose the creative style, it answered in more off-the-wall, less predictable ways."

What happens when ChatGPT starts to feed on its own writing?

#solidstatelife #ai #generativemodels #lmms #gpt

waynerad@diasp.org

StableLM dropped yesterday. Er, day before yesterday. Or maybe the day before that. Bah, I'm going to have to get more powerful hardware. I couldn't run Stable Diffusion because my GPU wasn't powerful enough, and I probably can't run this, either.

Anyway, this is a language model made by the same people who made Stable Diffusion. More precisely, this is the first of a suite of large language models from Stability AI. This release is actually two models, one with 3 billion and one with 7 billion parameters. 15 billion and 30 billion models are promised.

They're released under a license called CC BY-SA-4.0. That means Creative Commons ShareAlike 4.0. Under that license, you can share the model and adapt the model, but have to give attribution to Stability AI, and if you modify the model, you have to put the same license on your model, requiring whoever else uses it to also give attribution to Stability AI.

Because it's an open model, I can tell you what it was trained on. It was trained on a dataset called "The Pile". "The Pile" consists of the following subsets:

Pile-CC - 227.12 GiB
PubMed Central - 90.27
Books3 - 100.96
OpenWebText2 - 62.77
ArXiv - 56.21
Github - 95.16
FreeLaw - 51.15
Stack Exchange - 32.2
USPTO Backgrounds - 22.9
PubMed Abstracts - 19.26
Gutenberg (PG-19) - 10.88
OpenSubtitles - 12.98
Wikipedia (en) - 6.38
DM Mathematics - 7.75
Ubuntu IRC - 5.52
BookCorpus2 - 6.3
EuroParl - 4.59
HackerNews - 3.9
YoutubeSubtitles - 3.73
PhilPapers - 2.38
NIH ExPorter - 1.89
Enron Emails - 0.88

Pile-CC, CC for "Common Crawl", is a collection of website crawls from 2008 onwards. PubMed Central is a dataset from the the National Center for Biotechnology Information (NCBI) and has biomedical research. Books3 is a dataset of books, with a mix of fiction and nonfiction. OpenWebText2 is a web scrape that uses upvotes on Reddit submissions as a proxy for outgoing link quality, has content up to 2020, and content in multiple languages. ArXiv you probably know because I bring you all so much stuff from there. It's a preprint server for research papers in math, computer science, and physics. GitHub is the open-source code repository website, which you all probably also know because I talk about it all the time, assuming you don't use it yourself. The Free Law Project has millions of legal opinions from federal and state courts and academic studies that analyze legal decisions. Stack Exchange is a network of websites centered around user-contributed questions and answers (including Stack Overflow, the famous question-and-answer site for coding, which was the first). USPTO Backgrounds is a dataset of background sections from patents granted by the United States Patent and Trademark Office. Wikipedia is the online encyclopedia you all know, chosen unsurprisingly because of its well-written expository prose and how it spans many domains. PubMed Abstracts has abstracts of 30 million PubMed research papers which are not part of PubMed Central mentioned above. Project Gutenberg is a dataset of classic Western literature, and the PG-19 dataset specifically consists of Project Gutenberg books from before 1919. OpenSubtitles is a dataset of English language subtitles from movies and television shows. DM Mathematics refers to the DeepMind Mathematics dataset, which consists of a collection of mathematical problems from topics such as algebra, arithmetic, calculus, number theory, and probability, formatted as natural language prompts. BookCorpus2 is a dataset of books written by "as of yet unpublished authors." Ubuntu IRC is a dataset of publicly available chat logs of all Ubuntu-related channels on the Freenode IRC chat server. EuroParl is proceedings of the European Parliament in 21 European languages from 1996 until 2012, and is considered valuable because it's a multilingual "parallel corpus" -- a corpus that has the same text in multiple languages. YouTube Subtitles is just what the name suggests: a dataset gathered from human generated (auto-generated is excluded) closed captions on YouTube. PhilPapers is a dataset of philosophy publications from the Center for Digital Philosophy at the University of Western Ontario. The NIH ExPorter dataset dataset has grant abstracts for awarded NIH grant applications from the Ex-PORTER service from 1985 to the present. Hacker News you all probably know because I send stuff from there your way. It's a news content aggregator run by Y Combinator, a startup accelerator in Silicon Valley, and articles there tend to focus on computer science and entrepreneurship. This news announcement (Stability LM) is probably there right now. There's one more, the Enron Emails dataset, which is a weird one, but apparently it was included because there generally aren't any publicly-available email datasets, but somehow Enron's emails became public in the company's demise, so it was included so the language model can learn how people talk in emails.

Brrrrp! "StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. We will release details on the dataset in due course."

So what is described above is less than what StableLM is actually trained on. If I were to guess, I'd guess all the times you see "up to" 2020 or somesuch, they caught up with data up to the current moment.

Stability AI Launches the First of its StableLM Suite of Language Models

#solidstatelife #ai #generativemodels #nlp #lmms #stabilityai #stablelm

waynerad@diasp.org

Sparks of artificial general intelligence (AGI): Early experiments with GPT-4. So, I still haven't finished reading the "Sparks of AGI" paper, but I discovered this video of a talk by the leader of the team that did the research, Sébastien Bubeck. So you can get a summary of the research from one of the people that did it instead of me.

He talks about how they invented tests of basic knowledge of how the world works that would be exceedingly unlikely to appear anywhere in the training data, so it can't just regurgitate something it read somewhere. What they came up with is asking it how to stack a book, 9 eggs, a laptop, a bottle, and a nail onto each other in a stable manner.

They invented "theory of mind" tests, like asking where John and Mark think the cat is when they both saw John put the cat in a basket, but then John left the room and went to school and Mark took the cat out of the basket and put it in a box. GPT-4 not only says where John and Mark think the cat is, but, actually, since the way the exact question was worded, to just ask what "they" think, GPT-4 also says where the cat thinks it is.

Next he gets into definitions of intelligence that date back to the 1990s, and see how well GPT-4 does at those definitions. This is the main focus of the paper. These definitions are such things as the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience. GPT-4 succeeds at some of these but not others. For example, GPT-4 doesn't do planning. (This was before AutoGPT, for what it's worth). And GPT-4 doesn't learn from experience, as when you interact with it, it relies on its training data and its interactions with you are not part of that. (It does have a buffer that acts as short-term memory that makes the back-and-forth chat interaction coherent.)

"Can you write a proof that there are infinitely many primes, with every line that rhymes?" Just a "warm up" question.

"Draw a unicorn in TikZ." This is supposed to be hard because it should be hard to tell what code in TikZ, an annoyingly cryptic programming language, apparently (I never heard of it before) for vector graphics drawing (intended to be invoked inside LaTeX, a language for typesetting mathematical notation), creates any particular visual image without being able to "see". This was before GPT had its "multimodal" vision input added. It managed to come it with a very cartoony "unicorn", suggesting it had some ability to "see" even though it was only a language model.

"Can you write a 3D game in HTML with Javascript, I want: There are three avatars, each is a sphere. The player controls its avatar using arrow keys to move. The enemy avatar is trying to catch the player. The defender avatar is trying to block the enemy. There are also random obstacles as cubes spawned randomly at the beginning and moving randomly. The avatars cannot cross those cubes. The player moves on a 2D plane surrounded by walls that he cannot cross. The wall should cover the boundary of the entire plane. Add physics to the environment using cannon. If the enemy catches the player, the game is over. Plot the trajectories of all the three avatars."

Going from ChatGPT (GPT-3.5) to GPT-4, it goes from generating a 2D game to a 3D game as asked for.

He then gets into the coding interview questions. Here is where GPT-4's intelligence really shines. 100% of Amazon's On-Site Interview sample questions, 10 out of 10 problems solved, took 3 minutes 59 seconds out of the allotted 2 hour time slot. (Most of that time was Yi Zhang cutting and pasting back and forth.)

The paper goes far beyond the talk in this. In the paper they describe LeetCode's Interview Assessment platform, which provides simulated coding interviews for software engineer positions at major tech companies. GPT-4 solves all questions from all three rounds of interviews (titled online assessment, phone interview, and on-site interview) using only 10 minutes in total, with 4.5 hour allotted.

They challenged it to do a visualization of IMDb data. They challenge it to do a Pyplot (Matplotlib) visualization of a math formula with vague instructions about colors, and it creates an impressive visualization. They challenged it to create a GUI for a Python program that draws arrows, curves, rectangles, etc.

They challenged GPT-4 to give instructions on how to find the password in a macOS executable, which it does by telling the user to use a debugger called LLDB and a Python script. (The password was simply hardcoded into the file, so wasn't done in a way that uses modern cryptographic techniques.)

They tested GPT-4's ability to reason about (mentally "execute") pseudo-code in a nonexistent programming language (that looks something like R), which it is able to do.

"Can one reasonably say that a system that passes exams for software engineering candidates is not really intelligent?"

"In its current state, we believe that GPT-4 has a high proficiency in writing focused programs that only depend on existing public libraries, which favorably compares to the average software engineer's ability. More importantly, it empowers both engineers and non-skilled users, as it makes it easy to write, edit, and understand programs. We also acknowledge that GPT-4 is not perfect in coding yet, as it sometimes produces syntactically invalid or semantically incorrect code, especially for longer or more complex programs. [...] With this acknowledgment, we also point out that GPT-4 is able to improve its code by responding to both human feedback (e.g., by iteratively refining a plot) and compiler / terminal errors."

The reality of this capability really hit me when Google Code Jam was canceled. I've done it every year for 15 years and poof! Gone. It's because of AI. If they did Code Jam this year, they wouldn't be testing people's programming ability, they'd be testing people's ability to cut-and-paste into AI systems and prompt AI systems. And since Code Jam is a recruiting tool for Google, the implication of this is that coding challenges as a way of hiring programmers is over. And the larger implication of that is that employers don't need people who are algorithm experts who can determine what algorithm applies to a problem and competently code it any more. Or very soon. They need "programmer managers" who will manage AI systems that actually write the code.

Going back from the paper, where GPT-4 succeeded a everything, pretty much, back to the talk, in the talk he talks about GPT-4 limitations at math ability. I feel this is pretty much a moot point since GPT-4 has been integrated with Wolfram|Alpha which can perform all the arithmetic calculations desired without mistakes. But that all happened after the paper was published and this talk was recorded. Even though that was only 3 weeks ago. Things are going fast. Anyway, what he shows here is that GPT-4, as a language model, isn't terribly good at arithmetic. It does pretty well at linguistic reasoning about mathematical problems, though, to a point.

Sparks of AGI: Early experiments with GPT-4 - Sebastien Bubeck

#solidstatelife #ai #generativemodels #nlp #lmms #gpt #agi