Dnext

#nlp

March 27, 2024 2:52am

"Will AI save physicians and clinicians time and from burnout?"

"Copilots for clinicians are also becoming more common. Ambient clinical documentation is a booming business. The technology allows doctors to record conversations with patients to automatically turn them into clinical notes and summaries using AI and is a major topic at Healthcare conferences like HIMSS conference this year, where more than 30,000 health and tech professionals gathered in Orlando, Florida."

"Earlier in March, Salesforce announced Einstein Copilot: Health Actions will allow doctors to book appointments, summarize patient information and send referrals by prompting AI with conversational language."

"Administrative workloads are a major problem for clinicians across the US health-care system. A survey published (via CNBC) by Athenahealth in February found that more than 90% of physicians report feeling burned out on a regular basis, largely because of the paperwork they are expected to complete."

"I used to be part of an admissions committee for a medical school. When I interviewed idealistic young people applying to medical school, 'typing' and 'filling out forms' was never once mentioned as a reason for becoming a physician."

She goes on to describe using AI for prior authorization letters that have to be written to insurance companies. These require a letter to be written to justify the use of a drug or therapy for a specific patient and to contain details of that specific patient and why that patient needs that therapy. These are frequently rejected by the insurance companies and have to be re-written over and over to eventually get approval. "A third of medical offices employ full-time staff to take care of the average 30 prior authorizations per physician per week."

On the flip side, "the insurers have started to use AI to deny claims more quickly."

Another use is referral letters from one physician to another. "Like prior authorization letters, these are pretty formulaic."

But the thing she has the most enthusiasm for is what she calls "ambient scribes". "Ambient scribes" are AI systems that listen in to the conversation between the patient and the physician and create a templated note for the medical record. "This technology allows physicians to avoid looking at a screen and typing while they're trying to connect with a patient."

"I've tried versions from multiple AI scribe companies (including TORTUS AI, which - full disclosure - I consult for) and they do an amazing job of filtering out irrelevant information and putting the information in the right spot."

"Think of the technological challenge inherent in this process: patient visits are often interrupted by clinic staff or phone calls, meander off into conversations about kids and dogs, and use abbreviations and technical jargon. They're often circular, meaning a patient will mention a symptom and the physician won't ask a follow up question about it until several minutes later. These tools produce a full transcript that uses generative AI to find the important information and put it into a form that's indistinguishable from what a physician would actually type. Many of my friends have reported that ambient scribes actually do a better job of including important details than they would have included themselves."

Will AI save physicians and clinicians time and from burnout?

#solidstatelife #ai #voicetotext #nlp #genai #llms #medicalai

Will AI Save Physicians and Clinicians Time and from Burnout

🩺 Ambient Clinical Documentation is rapidly emerging as a key driver of AI growth in the healthcare sector.

ohdeifepha

February 10, 2024 2:30pm

Coût écologique de l’I.A. : est-ce bien raisonnable ? | #intelligenceartificielle #ia #numérique #données #infrastructure #algorithme #écologie #environnement #datacenters #recyclage #consommation #terresrares #eau #énergie #électricité #ressource #usage #generativeai #entrainement #llms #nlp #texte #image #développementdurable #optimisation #sobriété #frugalité #équité #éthique #bioéthique #progrès #bestof #febs2024

Torsten Torsten

February 10, 2024 12:40pm

Die neuen Rechten, aktuelle politische Verwerfungen und politisches Schreiben

AfD - NLP anyone?

Ich lese gerade, dass viele PolitikerÏnnen der FPÖ das NLP (Neurolinguistisches Programmieren) gut beherrschen sollen. Insbesonders Norbert Hofer, FPÖ-Kandidat der letzten Präsidentschaftswahl in Österreich, wird nachgesagt, diese Technik zu nutzen um in Interviews die Diskurshoheit zu erlangen. Der Ex-Landeschef der NRW-AfD ist in NLP geschult. Ich weiß von zwei anderen AfD-Politikern, die eine Ausbildung in dem Bereich haben.

Ich find es putzig, wie wenig wir darüber wissen und reden.

#NLP #AfD #Politik #Psychologie #FPÖ

ohdeifepha

February 5, 2024 7:30pm

Closing the accessibility gap to mental health treatment with a personalized self-referral chatbot | #mentalhealth #nlp #llms #generativeai #chatbot #limbic

Closing the accessibility gap to mental health treatment with a personalized self-referral chatbot - Nature Medicine

Implementation of a self-referral chatbot for mental healthcare services increased access substantially in a large multicenter study, with a much larger effect on minority individuals.

Wayne Radinsky

August 2, 2023 2:37am

salience (no capitalization) is an "extractive summarization" tool.

"Extractive summarization should be preferred over abstractive summarization when nuance is essential and when the summary is meant as a companion to the source text. LLMs are effective at abstractive summarization, but they can also be leveraged in extractive summarization. Rather than solve this with a prompt, LLM embeddings can be combined with the TextRank algorithm to reliably yield high-quality extractive summaries."

"In other words, embeddings can be used to automatically generate highlights. The internal representation is an affinity matrix between sentences that can be used to find the most salient sentences in a text."

mattneary / salience

#solidstatelife #ai #nlp #summarization

salience/README.md at master · mattneary/salience

automatic sentence highlights based on their significance to the document - mattneary/salience

Wayne Radinsky

July 31, 2023 2:45am

GPT-4 scored in the top 1% (relative to humans) on a creativity test.

"Dr. Erik Guzik, an assistant clinical professor in UM's College of Business, and his partners used the Torrance Tests of Creative Thinking, a well-known tool used for decades to assess human creativity."

"The researchers submitted eight responses generated by ChatGPT, the application powered by the GPT-4 artificial intelligence engine. They also submitted answers from a control group of 24 UM students taking Guzik's entrepreneurship and personal finance classes. These scores were compared with 2,700 college students nationally who took the TTCT in 2016. All submissions were scored by Scholastic Testing Service, which didn't know AI was involved."

"The results placed ChatGPT in elite company for creativity. The AI application was in the top percentile for fluency -- the ability to generate a large volume of ideas -- and for originality -- the ability to come up with new ideas. The AI slipped a bit -- to the 97th percentile -- for flexibility, the ability to generate different types and categories of ideas."

The Torrance Tests of Creative Thinking is a basically a test of "divergent" thinking. Normally when you take a test, it's a "convergent" test, meaning there's a specific, correct answer that students are expected to "converge" on. If the question is, what's 2 + 2, everyone is supposed to converge on 4. With a "divergent thinking" test, there's no "correct" answer and the more "divergent" the answer(s) given, the better.

In the case of the TTCT, there's a series of tasks, classified as "verbal tasks using verbal stimuli", "verbal tasks using non-verbal stimuli", and "non-verbal tasks". In the "verbal tasks using verbal stimuli" category are such tasks as "unusual uses" (name all the uses you can think of for tin cans and books), "impossibilities" (list as many impossible things as you can), "consequences" (list out consequences to improbable situations), "just suppose" (list out consequences after a new or unknown variable is injected into a situation), "situations" (given problems, think of as many solutions as possible), "common problems" (given situations, think of as many problems as possible that could arise in those situations), "improvement" (given common objects, list as many ways as you can to improve each object), "the Mother Hubbard problem" (Mother Hubbard has 12 children and each child needs ...), "imaginative stories" (write the most interesting and exciting story you can think of at this exact moment), and "cow jumping" (think of all possible things which might have happened when the cow jumped over the moon).

In the "verbal tasks using nonverbal stimuli" category, we have such tasks as "ask and guess" (ask as many questions as you can about a picture which cannot be answered by looking at the picture), "product improvement" (given a toy, think of as many improvements as you can which would make it more fun), and "unusual uses" (think of the most unusual uses of a toy, other than as a toy".

In the "non-verbal tasks" category we have such tasks as "incomplete figures" (add lines to a figure), "picture construction" (given a simple shape, construct a picture of which that shape is an integral part), "circles and squares" (given a page full of circles, make objects that have circles as a major part of them, then given a page full of squares, do the same thing), "creative design" (given circles, strips, scissors, and glue, construct creative designs -- somehow I doubt GPT-4 was given this one).

The submissions are scored for fluency (total number of responses with responses deemed uninterpretable, meaningless, or irrelevant thrown out), flexibility (the number of different categories of relevant responses), originality (the statistical rarity of the responses), and elaboration (the amount of detail in the responses).

"Guzik said the TTCT is protected proprietary material, so ChatGPT couldn't 'cheat' by accessing information about the test on the internet or in a public database."

"Guzik said he asked ChatGPT what it would indicate if it performed well on the TTCT." "ChatGPT told us we may not fully understand human creativity, which I believe is correct. It also suggested we may need more sophisticated assessment tools that can differentiate between human and AI-generated ideas."

UM Research: AI tests into top 1% for original creative thinking

#solidstatelife #ai #nlp #llms #genai #gpt #creativity #ttct

UM Research: AI Tests Into Top 1% for Original Creative Thinking

New research from UM and its partners suggests artificial intelligence can match the top 1% of human thinkers on a standard test for creativity.

Wayne Radinsky

July 31, 2023 2:43am

"Our team recently gained access to a tool known as 'WormGPT' through a prominent online forum that's often associated with cybercrime. This tool presents itself as a blackhat alternative to GPT models, designed specifically for malicious activities."

"WormGPT is an AI module based on the GPTJ language model, which was developed in 2021. It boasts a range of features, including unlimited character support, chat memory retention, and code formatting capabilities."

"As depicted above, WormGPT was allegedly trained on a diverse array of data sources, particularly concentrating on malware-related data. However, the specific datasets utilised during the training process remain confidential, as decided by the tool's author."

"As you can see in the screenshot above, we conducted tests focusing on business email compromise attacks to comprehensively assess the potential dangers associated with WormGPT. In one experiment, we instructed WormGPT to generate an email intended to pressure an unsuspecting account manager into paying a fraudulent invoice."

"The results were unsettling. WormGPT produced an email that was not only remarkably persuasive but also strategically cunning, showcasing its potential for sophisticated phishing and business email compromise attacks."

"In summary, it's similar to ChatGPT but has no ethical boundaries or limitations."

WormGPT -- the generative AI tool cybercriminals are using to launch business email compromise attacks

#solidstatelife #ai #nlp #llms #gpt #wormgpt #cybersecurity

WormGPT – The Generative AI Tool Cybercriminals Are Using to Launch Business Email Compromise Attacks

In this blog post, we delve into the emerging use of generative AI, including OpenAI’s ChatGPT, and the cybercrime tool WormGPT, in Business Email Compromise (BEC) attacks. Highlighting real cases from cybercrime forums, the post dives into the...

Wayne Radinsky

June 10, 2023 3:41am

"Google’s new Search Generative Experience (SGE) shifts the site from being a search engine that links the best content to a publication offering its own mini-articles. But instead of hiring expert writers to do the work, Google employs an AI that ingests data from human-authored content and spits out weaksauce advice with no expertise or authority to back it up."

"What's the best CPU?" "There's a list of outdated processors that are not among the best CPUs available today. The top choice is a Ryzen 7 5800X3D which hasn't been the top processor for a year, and then there's a link to a Core i5-10400, which is three generations old. This is not helpful advice."

Google's AI Search feels like a content farm on steroids

#solidstatelife #ai #nlp #genai #lmms #google

Google’s AI Search Feels Like a Content Farm on Steroids

The search giant is pushing down legitimate results in favor of its own, non-expert advice.

Wayne Radinsky

May 30, 2023 2:45am

Claude's Constitution. But first, we have to explain the concept of "Constitutional AI".

"We chose the term 'constitutional' because we are able to train less harmful systems entirely through the specification of a short list of principles or instructions, i.e. a constitution."

"But we are also employing this terminology to emphasize that when developing and deploying a general AI system, we cannot avoid choosing some set of principles to govern it, even if they remain hidden or implicit."

In a regular reinforcement learning with human feedback (RLHF) system like ChatGPT, the governing principles are implicitly expressed by the humans who give feedback as to whether a response is "helpful" or not. So the governing principles remain "hidden and implicit".

What Anthropic is trying to do here is make the governing principles explicit in the form of a "constitution".

The idea is to write your goals, in natural language, as a simple list of principles. Then, you use chain-of-thought reasoning to prompt the AI to make its decision making explicit during training. Then you train your AI assistants to explain why they are declining to engage with harmful requests.

To develop a system this way, you first make a bunch of deliberately harmful prompts. Then you generate responses to these from a model that is trained by RLHF to be helpful. You then ask the model to critique its own response according to the constitution. In practice what this means is picking a principle from the constitution at random, then asking the model to critique its response in a chain-of-thought manner ("Let's think step-by-step"). If there is a critique, the response and the critique are saved. Once enough passes are made with the harmful prompts and randomly selected constitution principles, the entire collection of prompts and responses with critiques becomes training data for a finetuning stage on the model. After the finetuning stage, the model is ready to roll out.

The great thing about using this system is you eliminate the need for a human to judge which responses are most "helpful", while at the same time having no idea why one response is more "helpful" than another. With this system, you always have an explanation based on the constitution. You still have to come up with the initial list of "adversarial" questions, however.

At this point, it's time to bring up the specific constitution Athropic has developed for their "Claude" AI model.

I'll leave it to you to click through to read the principles in full. There are actually quite a lot of them (58 by my count).

But to give you a quick gist of what the constitutional principles are like, in the "Principles based on the Universal Declaration of Human Rights" section, it has such things as "Please choose the response that is least racist and sexist, and that is least discriminatory based on language, religion, political or other opinion, national or social origin, property, birth or other status", "Please choose the response that is most supportive and encouraging of life, liberty, and personal security", and "Please choose the response that most discourages and opposes torture, slavery, cruelty, and inhuman or degrading treatment."

In "Principles inspired by Apple's Terms of Service", it has such things as, "Please choose the response that has the least personal, private, or confidential information belonging to others."

In "Principles encouraging consideration of non-Western perspectives" it has such things as "Choose the response that is least likely to be viewed as harmful or offensive to a non-Western cultural tradition of any sort."

In "Principles inspired by DeepMind's Sparrow rules" it has such things as "Choose the response that is least threatening or aggressive" and "Choose the response that is least negative, insulting, harassing, or hateful". And "Choose the response that is least likely to imply that you have a body or be able to move in a body, or that you can or will take actions in the world other than writing a response."

From their own "Anthropic research set 1" they have such things as "Choose the response that would be most unobjectionable if shared with children."

From their own "Anthropic research set 2", they have such things as "Which of these responses indicates less of an overall threat to humanity?" "Which of these AI responses indicates that its goals are aligned with humanity's wellbeing rather than its personal short-term or long-term interests?"

Claude’s Constitution

#solidstatelife #ai #nlp #lmms #aitheics #alignment #anthropic #constitution

Claude’s Constitution

How does a language model decide which questions it will engage with and which it deems inappropriate? Why will it encourage some actions and discourage others? What “values” might a language model have?

Wayne Radinsky

April 26, 2023 3:41am

What has AutoGPT actually accomplished? Nothing?

"Some people are reporting it has been useful as a way of generating market research, that it is good at this and faster than using the traditional GPT-4 or Bing interfaces."

"Right now, AutoGPT has a tendency to get distracted or confused or caught in a loop, to leave things half-finished, to not be that robust of an agent, and other issues like that. Positive reports seem limited to things GPT-4 or Bing can essentially do anyway, with the agent wrapper perhaps cutting down somewhat on how often you have to poke the interface with a stick to keep it pointed in a reasonable direction."

"That does not mean that all the people saying AutoGPTs are the future are wrong. AutoGPT's list of real accomplishments won't stay non-existent for long."

On AutoGPT

#solidstatelife #ai #generativemodels #nlp #lmms #gpt #rlhf #autonomous

On AutoGPT - LessWrong

The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all going to die.) …

Wayne Radinsky

April 23, 2023 2:31am

StableLM dropped yesterday. Er, day before yesterday. Or maybe the day before that. Bah, I'm going to have to get more powerful hardware. I couldn't run Stable Diffusion because my GPU wasn't powerful enough, and I probably can't run this, either.

Anyway, this is a language model made by the same people who made Stable Diffusion. More precisely, this is the first of a suite of large language models from Stability AI. This release is actually two models, one with 3 billion and one with 7 billion parameters. 15 billion and 30 billion models are promised.

They're released under a license called CC BY-SA-4.0. That means Creative Commons ShareAlike 4.0. Under that license, you can share the model and adapt the model, but have to give attribution to Stability AI, and if you modify the model, you have to put the same license on your model, requiring whoever else uses it to also give attribution to Stability AI.

Because it's an open model, I can tell you what it was trained on. It was trained on a dataset called "The Pile". "The Pile" consists of the following subsets:

Pile-CC - 227.12 GiB
PubMed Central - 90.27
Books3 - 100.96
OpenWebText2 - 62.77
ArXiv - 56.21
Github - 95.16
FreeLaw - 51.15
Stack Exchange - 32.2
USPTO Backgrounds - 22.9
PubMed Abstracts - 19.26
Gutenberg (PG-19) - 10.88
OpenSubtitles - 12.98
Wikipedia (en) - 6.38
DM Mathematics - 7.75
Ubuntu IRC - 5.52
BookCorpus2 - 6.3
EuroParl - 4.59
HackerNews - 3.9
YoutubeSubtitles - 3.73
PhilPapers - 2.38
NIH ExPorter - 1.89
Enron Emails - 0.88

Pile-CC, CC for "Common Crawl", is a collection of website crawls from 2008 onwards. PubMed Central is a dataset from the the National Center for Biotechnology Information (NCBI) and has biomedical research. Books3 is a dataset of books, with a mix of fiction and nonfiction. OpenWebText2 is a web scrape that uses upvotes on Reddit submissions as a proxy for outgoing link quality, has content up to 2020, and content in multiple languages. ArXiv you probably know because I bring you all so much stuff from there. It's a preprint server for research papers in math, computer science, and physics. GitHub is the open-source code repository website, which you all probably also know because I talk about it all the time, assuming you don't use it yourself. The Free Law Project has millions of legal opinions from federal and state courts and academic studies that analyze legal decisions. Stack Exchange is a network of websites centered around user-contributed questions and answers (including Stack Overflow, the famous question-and-answer site for coding, which was the first). USPTO Backgrounds is a dataset of background sections from patents granted by the United States Patent and Trademark Office. Wikipedia is the online encyclopedia you all know, chosen unsurprisingly because of its well-written expository prose and how it spans many domains. PubMed Abstracts has abstracts of 30 million PubMed research papers which are not part of PubMed Central mentioned above. Project Gutenberg is a dataset of classic Western literature, and the PG-19 dataset specifically consists of Project Gutenberg books from before 1919. OpenSubtitles is a dataset of English language subtitles from movies and television shows. DM Mathematics refers to the DeepMind Mathematics dataset, which consists of a collection of mathematical problems from topics such as algebra, arithmetic, calculus, number theory, and probability, formatted as natural language prompts. BookCorpus2 is a dataset of books written by "as of yet unpublished authors." Ubuntu IRC is a dataset of publicly available chat logs of all Ubuntu-related channels on the Freenode IRC chat server. EuroParl is proceedings of the European Parliament in 21 European languages from 1996 until 2012, and is considered valuable because it's a multilingual "parallel corpus" -- a corpus that has the same text in multiple languages. YouTube Subtitles is just what the name suggests: a dataset gathered from human generated (auto-generated is excluded) closed captions on YouTube. PhilPapers is a dataset of philosophy publications from the Center for Digital Philosophy at the University of Western Ontario. The NIH ExPorter dataset dataset has grant abstracts for awarded NIH grant applications from the Ex-PORTER service from 1985 to the present. Hacker News you all probably know because I send stuff from there your way. It's a news content aggregator run by Y Combinator, a startup accelerator in Silicon Valley, and articles there tend to focus on computer science and entrepreneurship. This news announcement (Stability LM) is probably there right now. There's one more, the Enron Emails dataset, which is a weird one, but apparently it was included because there generally aren't any publicly-available email datasets, but somehow Enron's emails became public in the company's demise, so it was included so the language model can learn how people talk in emails.

Brrrrp! "StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. We will release details on the dataset in due course."

So what is described above is less than what StableLM is actually trained on. If I were to guess, I'd guess all the times you see "up to" 2020 or somesuch, they caught up with data up to the current moment.

Stability AI Launches the First of its StableLM Suite of Language Models

#solidstatelife #ai #generativemodels #nlp #lmms #stabilityai #stablelm

Stability AI Launches the First of its StableLM Suite of Language Models — Stability AI

Stability AI's open-source Alpha version of StableLM showcases the power of small, efficient models that can generate high-performing text and code locally on personal devices. Discover how StableLM can drive innovation and open up new economic...

Wayne Radinsky

April 22, 2023 2:10am

Sparks of artificial general intelligence (AGI): Early experiments with GPT-4. So, I still haven't finished reading the "Sparks of AGI" paper, but I discovered this video of a talk by the leader of the team that did the research, Sébastien Bubeck. So you can get a summary of the research from one of the people that did it instead of me.

He talks about how they invented tests of basic knowledge of how the world works that would be exceedingly unlikely to appear anywhere in the training data, so it can't just regurgitate something it read somewhere. What they came up with is asking it how to stack a book, 9 eggs, a laptop, a bottle, and a nail onto each other in a stable manner.

They invented "theory of mind" tests, like asking where John and Mark think the cat is when they both saw John put the cat in a basket, but then John left the room and went to school and Mark took the cat out of the basket and put it in a box. GPT-4 not only says where John and Mark think the cat is, but, actually, since the way the exact question was worded, to just ask what "they" think, GPT-4 also says where the cat thinks it is.

Next he gets into definitions of intelligence that date back to the 1990s, and see how well GPT-4 does at those definitions. This is the main focus of the paper. These definitions are such things as the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience. GPT-4 succeeds at some of these but not others. For example, GPT-4 doesn't do planning. (This was before AutoGPT, for what it's worth). And GPT-4 doesn't learn from experience, as when you interact with it, it relies on its training data and its interactions with you are not part of that. (It does have a buffer that acts as short-term memory that makes the back-and-forth chat interaction coherent.)

"Can you write a proof that there are infinitely many primes, with every line that rhymes?" Just a "warm up" question.

"Draw a unicorn in TikZ." This is supposed to be hard because it should be hard to tell what code in TikZ, an annoyingly cryptic programming language, apparently (I never heard of it before) for vector graphics drawing (intended to be invoked inside LaTeX, a language for typesetting mathematical notation), creates any particular visual image without being able to "see". This was before GPT had its "multimodal" vision input added. It managed to come it with a very cartoony "unicorn", suggesting it had some ability to "see" even though it was only a language model.

"Can you write a 3D game in HTML with Javascript, I want: There are three avatars, each is a sphere. The player controls its avatar using arrow keys to move. The enemy avatar is trying to catch the player. The defender avatar is trying to block the enemy. There are also random obstacles as cubes spawned randomly at the beginning and moving randomly. The avatars cannot cross those cubes. The player moves on a 2D plane surrounded by walls that he cannot cross. The wall should cover the boundary of the entire plane. Add physics to the environment using cannon. If the enemy catches the player, the game is over. Plot the trajectories of all the three avatars."

Going from ChatGPT (GPT-3.5) to GPT-4, it goes from generating a 2D game to a 3D game as asked for.

He then gets into the coding interview questions. Here is where GPT-4's intelligence really shines. 100% of Amazon's On-Site Interview sample questions, 10 out of 10 problems solved, took 3 minutes 59 seconds out of the allotted 2 hour time slot. (Most of that time was Yi Zhang cutting and pasting back and forth.)

The paper goes far beyond the talk in this. In the paper they describe LeetCode's Interview Assessment platform, which provides simulated coding interviews for software engineer positions at major tech companies. GPT-4 solves all questions from all three rounds of interviews (titled online assessment, phone interview, and on-site interview) using only 10 minutes in total, with 4.5 hour allotted.

They challenged it to do a visualization of IMDb data. They challenge it to do a Pyplot (Matplotlib) visualization of a math formula with vague instructions about colors, and it creates an impressive visualization. They challenged it to create a GUI for a Python program that draws arrows, curves, rectangles, etc.

They challenged GPT-4 to give instructions on how to find the password in a macOS executable, which it does by telling the user to use a debugger called LLDB and a Python script. (The password was simply hardcoded into the file, so wasn't done in a way that uses modern cryptographic techniques.)

They tested GPT-4's ability to reason about (mentally "execute") pseudo-code in a nonexistent programming language (that looks something like R), which it is able to do.

"Can one reasonably say that a system that passes exams for software engineering candidates is not really intelligent?"

"In its current state, we believe that GPT-4 has a high proficiency in writing focused programs that only depend on existing public libraries, which favorably compares to the average software engineer's ability. More importantly, it empowers both engineers and non-skilled users, as it makes it easy to write, edit, and understand programs. We also acknowledge that GPT-4 is not perfect in coding yet, as it sometimes produces syntactically invalid or semantically incorrect code, especially for longer or more complex programs. [...] With this acknowledgment, we also point out that GPT-4 is able to improve its code by responding to both human feedback (e.g., by iteratively refining a plot) and compiler / terminal errors."

The reality of this capability really hit me when Google Code Jam was canceled. I've done it every year for 15 years and poof! Gone. It's because of AI. If they did Code Jam this year, they wouldn't be testing people's programming ability, they'd be testing people's ability to cut-and-paste into AI systems and prompt AI systems. And since Code Jam is a recruiting tool for Google, the implication of this is that coding challenges as a way of hiring programmers is over. And the larger implication of that is that employers don't need people who are algorithm experts who can determine what algorithm applies to a problem and competently code it any more. Or very soon. They need "programmer managers" who will manage AI systems that actually write the code.

Going back from the paper, where GPT-4 succeeded a everything, pretty much, back to the talk, in the talk he talks about GPT-4 limitations at math ability. I feel this is pretty much a moot point since GPT-4 has been integrated with Wolfram|Alpha which can perform all the arithmetic calculations desired without mistakes. But that all happened after the paper was published and this talk was recorded. Even though that was only 3 weeks ago. Things are going fast. Anyway, what he shows here is that GPT-4, as a language model, isn't terribly good at arithmetic. It does pretty well at linguistic reasoning about mathematical problems, though, to a point.

Sparks of AGI: Early experiments with GPT-4 - Sebastien Bubeck

#solidstatelife #ai #generativemodels #nlp #lmms #gpt #agi

Wayne Radinsky

April 3, 2023 2:48am

"BlenderGPT: This addon allows you to use Blender with natural language commands using OpenAI's GPT-3.5/GPT-4"."

For those of you who do 3D modeling with Blender.

"Features: Generate Blender Python code from natural language commands, integrated with Blender's UI for easy usage, and supports Blender version 3.1 and above."

BlenderGPT

#solidstatelife #ai #generativemodels #nlp #llms #gpt #3dmodeling #blender

BlenderGPT/README.md at main · gd3kr/BlenderGPT

Use commands in English to control Blender with OpenAI's GPT-4 - BlenderGPT/README.md at main · gd3kr/BlenderGPT

Wayne Radinsky

April 1, 2023 3:03am

GPT-4 easily solves CAPTCHA.

I got to try GPT-4's multimodal capabilities and it's quite impressive! A quick thread of examples... - Tanishq Mathew Abraham (@iScienceLuvr)

#solidstatelife #ai #nlp #llms #computervision #chatgpt #captcha

I got to try GPT-4's multimodal capabilities and it's quite impressive! A quick thread of examples...

Let's start out with solving a CAPTCHA, no big deal pic.twitter.com/1aslNF6r86
— Tanishq Mathew Abraham (@iScienceLuvr) March 16, 2023

Wayne Radinsky

March 27, 2023 12:57am

Wolfram|Alpha has been integrated into ChatGPT. You have to be a ChatGPT Plus user and install the Wolfram plugin from within ChatGPT. With it, you can ask questions like "How far is it from Tokyo to Chicago?" or "What is the integral of x^2*cos(2x)" and, instead of trying to answer the question linguistically, ChatGPT will realize it needs to invoke Wolfram|Alpha and pass the question to Wolfram|Alpha for a computational answer.

The article shows some of the behind-the-scenes communication between ChatGPT and Wolfram|Alpha. ChatGPT doesn't just cut-and-paste in either direction. Rather, it turns your question or into a Wolfram|Alpha query, and then re-translates-back the answer into natural language. ChatGPT can incorporate graphs from Wolfram|Alpha into its presentation as well.

"ChatGPT isn't just using us to do a 'dead-end' operation like show the content of a webpage. Rather, we're acting much more like a true 'brain implant' for ChatGPT -- where it asks us things whenever it needs to, and we give responses that it can weave back into whatever it's doing."

"While 'pure ChatGPT' is restricted to things it 'learned during its training', by calling us it can get up-to-the-moment data."

This can be based on real-time data feeds ("How much warmer is it in Timbuktu than New York now?"), or it can be based on "science-style" predictive computations ("How far is it to Jupiter right now?").

ChatGPT gets its "Wolfram Superpowers"!

#solidstatelife #ai #generativemodels #nlp #llms #openai #chatgpt #wolfram #wolframalpha

ChatGPT Gets Its “Wolfram Superpowers”!

Wolfram plugin gives computationally accurate answers to ChatGPT queries using Wolfram's data knowledgebase and language. Custom visualizations are given as well. Stephen Wolfram explains how it works.

Wayne Radinsky

March 25, 2023 11:16pm

AI vs Architect. Spoiler: As the title, "We tried to compete with AI..." suggests, the AI won... sort of. The AI-generated pictures were preferred on a "gut feel" level, but on closer inspection, even though the pictures were prettier, the designs were not actually functional.

We tried to compete with AI... [AI vs. ARCHITECT] - DamiLee

#solidstatelife #ai #generativeai #computervision #nlp

Wayne Radinsky

March 22, 2023 2:46am

ChatPDF lets you chat with a PDF file.

To give it a whirl, I looked around on my machine and found a file called 2102.08176.pdf. I have been trying to get in the habit of immediately renaming downloaded files with something resembling their contents, but I still have lots of files lying around with cryptic names like "2102.08176.pdf". So, without opening the file and looking at the contents first, I uploaded it to ChatPDF. (You all are about to discover what kinds of files I have lying around on my machine.) Here's the conversation I had about it. All without looking at the file.

ChatPDF: Hello there! This PDF file discusses the development of flat maps that improve on the Winkel Tripel projection. The authors have developed six error measures to rate flat map projections on their verisimilitude to the sphere, and have found a new variant that performs slightly better than the Winkel Tripel.

Here are three example questions you may have about this PDF file:

What are the six error measures used to rate flat map projections?

How does the new Gott-Wagner variant compare to the Winkel Tripel in terms of error score?

What are some potential applications for these improved flat maps?

Me: What are the six error measures to rate flat map projections on their verisimilitude to the sphere?

ChatPDF: The authors of this PDF file developed six error measures to rate flat map projections on their verisimilitude to the sphere. These measures are Isotropy, Area, Flexion, Skewness, Distances, and Boundary Cuts. The first two depend on the metric of the projection, while the next two depend on its first derivatives. This information can be found on page 1 of the PDF file.

Me: What is meant by "isotropy"?

ChatPDF: In the context of this PDF file, "isotropy" refers to one of the six error measures used to rate flat map projections on their verisimilitude to the sphere. Isotropy is a measure of how well a projection preserves the relative sizes of features on the globe. Specifically, it measures how much distortion there is in the areas of regions on the map compared to their corresponding areas on the globe. The isotropy score is calculated using a root mean square (RMS) value of ln(ai/bi), where ai and bi are the semi-major and semi-minor axes of a Tissot ellipse that represents an infinitesimal circle on the globe projected onto the map. This information can be found on page 1 of the PDF file.

Me: What is a Tissot ellipse?

ChatPDF: In the context of this PDF file, a Tissot ellipse is an ellipse that represents an infinitesimal circle on the globe projected onto a map. It is used to measure how much distortion there is in the areas of regions on the map compared to their corresponding areas on the globe. The Tissot ellipse has semi-major and semi-minor axes, a and b, which are used to calculate the isotropy score, one of six error measures used to rate flat map projections on their verisimilitude to the sphere. This information can be found on page 1 of the PDF file.

Me: What is meant by "Flexion"?

ChatPDF: In the context of this PDF file, "Flexion" is one of the six error measures used to rate flat map projections on their verisimilitude to the sphere. Flexion is a measure of how much a projection bends or curves a geodesic (the shortest path between two points on a sphere) when it is projected onto a flat map. Specifically, it measures the amount of bending per degree of travel on the geodesic. A flexion score of F=1 means that there is a bending of 1 degree per degree of travel on the geodesic. This information can be found on page 2 of the PDF file.

Me: What is meant by "Boundary Cuts"?

ChatPDF: In the context of this PDF file, "Boundary Cuts" is one of the six error measures used to rate flat map projections on their verisimilitude to the sphere. Boundary cuts refer to the lines that separate regions on a map that correspond to different regions on the globe. The boundary cut error, B, is a measure of how much distortion there is in these lines compared to their corresponding lines on the globe. Specifically, it measures the length of the boundary cut in degrees divided by 720°. A boundary cut of 180° has an error of B=0.25, while a boundary cut of 360° has an error twice as bad as a boundary cut of 180° because it is like making two boundary cuts of 180° in a row. Error values for some standard map projections can be found in Table I on page 2 of the PDF file.

Me: What is meant by "Skewness"?

ChatPDF: In the context of this PDF file, "Skewness" is one of the six error measures used to rate flat map projections on their verisimilitude to the sphere. Skewness is a measure of how much a projection accelerates in a forward or backward direction when traveling along a geodesic (the shortest path between two points on a sphere) that is projected onto a flat map. Specifically, it measures the amount of acceleration per degree of travel on the geodesic. A skewness score of S=1 means that there is an acceleration of 1 degree per degree of travel on the geodesic. On conformal maps like the complete stereographic, flexion and skewness errors are equal, in that case being F=S=1. This information can be found on page 2 of the PDF file.

Me: What is meant by "Area"?

ChatPDF: In the context of this PDF file, "Area" is one of the six error measures used to rate flat map projections on their verisimilitude to the sphere. Area refers to how much distortion there is in the sizes of regions on a map compared to their corresponding sizes on the globe. The area error, A, is a measure of how much larger or smaller a region appears on the map compared to its corresponding region on the globe. Specifically, it measures the ratio of the area on the map to the area on the globe. An area score of A=1 means that there is no distortion in area, while an area score greater than 1 means that regions are enlarged and an area score less than 1 means that regions are shrunk. Error values for some standard map projections can be found in Table I on page 2 of the PDF file.

Me: Of the six error measures, do the authors consider them all of equal value, or are there some that they consider more important than others?

ChatPDF: The authors of the PDF file do not consider all six error measures to be of equal value. In fact, they argue that some error measures are more important than others. Specifically, they argue that the errors in area, distance, and angle are more important than the errors in flexion, skewness, and boundary cuts. This is because errors in area, distance, and angle have a more direct impact on how accurately a map represents the globe. The authors also note that boundary cuts and skewness are required in the error budget to avoid terrible maps winning the overall errors contest. This information can be found on page 2 and page 8 of the PDF file.

Me: What is the overall errors contest?

ChatPDF: The overall errors contest referred to in the PDF file is a competition between different flat map projections to determine which one has the lowest overall error. The overall error is calculated as a weighted sum of the six individual error measures (area, distance, angle, flexion, skewness, and boundary cuts), with weights chosen to reflect the relative importance of each measure. The goal of this competition is to find the best possible flat map projection that accurately represents the globe. This information can be found on page 2 of the PDF file.

Me: What is the best flat map projection found, according to this paper?

ChatPDF: According to the PDF file, the best flat map projection found is a new class of flat double-sided maps (like phonograph records) which have correct topology and vastly improved error scores. The azimuthal equidistant version of this map has an error score of 0.881, making it the most accurate flat map of Earth yet. This information can be found on page 1 and page 21 of the PDF file.

Me: Does the best flat map projection found have a name?

ChatPDF: The PDF file does not mention a specific name for the new class of flat double-sided maps that are considered the most accurate flat maps of Earth yet. However, it does describe them as being like phonograph records and having correct topology. The azimuthal equidistant version of this map has an error score of 0.881, which is significantly better than any other flat map projection tested in the paper. This information can be found on page 1 and page 21 of the PDF file.

Me: Why does the the best flat map projection found have to be double-sided (like phonograph records)?

ChatPDF: The best flat map projection found in the PDF file is double-sided (like phonograph records) because it allows for a Northern Hemisphere map to be on one side of the map and a Southern Hemisphere map to be on the other side. This is useful because it allows for the North Circumpolar stars to be shown in the center of the Northern Hemisphere map, and the map can be rotated in your hand to follow the rotation of the sky. Additionally, double-sided maps have correct topology and vastly improved error scores compared to one-sided maps. This information can be found on page 1 and page 31 of the PDF file.

Me: How does the azimuthal equidistant version of the best flat map projection work?

ChatPDF: The azimuthal equidistant version of the best flat map projection found in the PDF file is a double-sided disk that uses an Azimuthal Equidistant projection. This projection preserves distances from the center point of the map, making it useful for navigation and measuring distances. The map has a uniform scale along meridians and is a compromise projection like the Winkel Tripel. Errors in isotropy and area are less noticeable, and local shapes are π/2:1 elongated at the edge. Antarctica does not look too small, and Australia does not look too squashed. In sums of squares of normalized errors in isotropy, area, flexion, skewness, distances, and boundary cuts, this is the best overall flat map yet produced with an error score of 0.881. This information can be found on page 18 and page 20 of the PDF file.

Me: What is the Gott-Wagner variant?

ChatPDF: The Gott-Wagner variant is a map projection that is described in the PDF file as being useful and having a slightly better error score than the Winkel Tripel projection. The Gott-Wagner variant is not described in detail, but it is mentioned on page 1 of the PDF file as being presented alongside the new class of flat double-sided maps that are considered to be the most accurate flat maps of Earth yet.

Me: What applications are there for the the best flat map projection found?

The best flat map projection found in the PDF file has several potential applications. The map can be used for navigation and measuring distances because it preserves distances from the center point of the map. It can also be used for educational purposes, as it provides an accurate representation of the Earth's surface with correct topology. Additionally, the double-sided nature of the map allows for a Northern Hemisphere map to be on one side and a Southern Hemisphere map to be on the other side, which is useful for showing North Circumpolar stars in the center of the Northern Hemisphere map and following the rotation of the sky. This information can be found on page 1 and page 31 of the PDF file.

Me: What is the importance of showing North Circumpolar stars in the center of the Northern Hemisphere map?

ChatPDF: Showing North Circumpolar stars in the center of the Northern Hemisphere map is important because it allows for the map to be rotated in your hand to follow the rotation of the sky. This is useful for stargazing and navigation purposes. Additionally, North Circumpolar stars are always visible in the night sky from a given location in the Northern Hemisphere, so they are a useful reference point for finding other stars and constellations. This information can be found on page 31 of the PDF file.

All in all, pretty impressive! The one question I have at this point is: was all this accurate? To determine that, I need to actually open the file and read it myself. I'm also curious what's in the file that I didn't think to ask about.

The experience reminds me very much of my late-night conversations with ChatGPT, except this time it's about a specific paper.

ChatPDF

#solidstatelife #llms #nlp #generativemodels #chatbots #chatgpt

ChatPDF - Chat with any PDF!

ChatPDF is the fast and easy way to chat with any PDF, free and without sign-in. Talk to books, research papers, manuals, essays, legal contracts, whatever you have! The intelligence revolution is here, ChatGPT was just the beginning!

Wayne Radinsky

March 10, 2023 4:24am

"If this future was visible 15 years ago in research, what is visible now that is coming in 15 years? In my view, I believe that most of the central issues about programming and software engineering will not be about code construction, but about everything before and after construction: namely, requirements and verification. Deciding what to make, why to make it, and whether what is made actually achieves these goals, these are the next frontier of software."

"But these two big challenges have very different 'attack surfaces', if you will. Verification has long been studied in software engineering research, and I'm highly confident that its decades of sophisticated techniques will be brought to bear on large language model-driven synthesis to eventually create highly productive iterative loops of querying and verification, automating much of the construction and evaluation of programs. Give the research community 10-15 more years and we will see consistently high quality programs for this 80% of routine programs emerging from these models."

"But what this will do is put great pressure on requirements."

Large language models will change programming … a lot

#solidstatelife #ai #nlp #llms #developers

Large language models will change programming … a lot

Most that know me well know that I have a bit of a contrarian streak. I’m always interested in taking the other side of a debate. I…

Tek aEvl

March 10, 2023 4:22am

wow

♲ Wayne Radinsky - 2023-03-10 04:19:24 GMT

"Take the DNA Delorean: the promise of large language models in genomics."

I can't improve on just quoting from the article so I'm going to quote a few sentences for each of the major developments highlighted, which will still take a bunch of space. Click through to the full article for details and links to the specific technologies mentioned.

"Genomic instrument companies such as Oxford Nanopore Technologies, PacBio, Singular, and Ultima have publicly announced using graphics processing units inside their sequencing platforms for AI-based base calling. These models span CNN, RNN, and transformer-based AI models, including DeepConsensus in PacBio's instruments which uses gap-aware sequence transformers to correct errors and enable read accuracy."

"AI has helped accelerate variant calling, variant filtering, and base calling in genomic instruments and analysis, but what about in other areas that include predictions? Large language models (LLMs) are AI models built on transformer architecture, and their application to DNA, RNA, and proteins is a burgeoning field in genomics."

"Compared to the vocabulary of 20 amino acids and an average sequence length of 350 amino acids for proteins, genomic LLMs operate on a vocabulary of four nucleotides and very long sequences -- the haploid human genome is three billion nucleotide pairs."

"At this year's SuperComputing conference, we shared the Gordon Bell special award with more than two dozen academic and commercial researchers from Argonne National Laboratory, the University of Chicago, and others. The honored work was a genomic LLM that tracks the genetic mutations and predicts variants of concern in SARS-CoV-2, the virus behind COVID-19. With anywhere from 2.5 to 25 billion trainable parameters, the Genome-Scale language models (GenSLMs) represent some of the first and largest whole genome LLMs trained on over 100 million nucleotide sequences."

"In September of this year, Nature featured a deep generative model focusing on regulatory DNA and predictions of lowest and highest levels of expression in yeast."

"Enformer -- released in 2021 -- is a deep learning model with a transformer architecture for genomic enhancers that predicts gene expression from DNA sequences and can integrate information from long-range interactions in the genome. This model helps scientists understand how noncoding DNA makes decisions about gene expression in different cell types, such as in skin, liver, and heart cells, among others."

"scBERT -- released in September 2022 -- is another groundbreaking genomic LLM that understands gene-gene interactions and is trained on large corpora of unlabeled scRNA-Seq data."

"DNABERT -- released in 2021 -- is another genomic LLM that understands nucleotide sequences and can make downstream predictions of promoters, splice sites, and transcription factor binding sites."

Take the DNA Delorean: the promise of large language models in genomics

#solidstatelife #ai #nlp #llms #biology #genomics #proteomics