#llms

waynerad@diasp.org

OpenAI announces GPT-4o. The "o" is for "omni". The model "can reason across audio, vision, and text in real time."

There's a series of videos showing conversation by voice, recognizing "bunny ears", two GPT-4os interacting and singing, real-time translation, lullabies and whispers, sarcasm, math problems, learning Spanish, rock paper scissors, interview prep, "Be My Eyes" accessibility, and coding assistant and desktop app.

Hello GPT-4o

#solidstatelife #ai #openai #genai #llms #gpt #multimodal

waynerad@diasp.org

"SUQL stands for Structured and Unstructured Query Language. It augments SQL with several important free text primitives for a precise, succinct, and expressive representation. It can be used to build chatbots for relational data sources that contain both structured and unstructured information."

Ok, that's kind of a crazy concept. Let's have a look. You can do queries like:

SELECT answer("Event year Info", 'where is this event held?') FROM table WHERE "Name" = 'XXXI';

(Where was the XXXl Olympic held? )

SELECT "Name" FROM table WHERE answer("Event year Info", 'is this event held in Rio?') = 'Yes';

(What was the name of the Olympic event held in Rio?)

SELECT answer("Flag Bearer Info", 'when is this person born?') FROM table WHERE answer("Event year Info", 'is this event held in Rio?') = 'Yes';

(When was the flag bearer of Rio Olympic born?)

SELECT "Flag Bearer" FROM table WHERE "Gender" = 'Male' AND answer("Flag Bearer Info", 'did this person participate in Men's 100kg event?') = 'Yes';

(Which male bearer participated in Men's 100kg event in the Olympic game?)

SELECT MAX(answer("Flag Bearer Info", 'when is this person born?')::date) FROM table WHERE "Event year" IN ('2016', '2012');

(For the 2012 and 2016 Olympic Event, when was the younger flag bearer born?)

SELECT "Event year" FROM table ORDER BY answer("Flag Bearer Info", 'when is this person born?')::date DESC LIMIT 1;

(When did the youngest Burmese flag bearer participate in the Olympic opening ceremony?)

In addition to the "answer" keyword that this adds to SQL, they also added a "summary" keyword.

The way the system works is they have added a "large language model with in-context learning" to the SQL database system.

SUQL (Structured and Unstructured Query Language)

#solidstatelife #ai #genai #llms #sql

waynerad@diasp.org

Company revives Alan Turing as an AI chatbot, hilarity, no, wait, outrage ensues.

The company is Genius Group, based in Singapore, which provides "AI-powered business education."

"Software engineer Grady Booch, a former Turing Talk speaker, wrote on Twitter/X: 'Absolute and complete trash. I hope that Turing's heirs sue you into oblivion.'"

"Another user told Genius Group's CEO: 'This is so incredibly unethical, disrespectful, and disgusting. You are pillaging the image of a deceased person (who frankly has suffered enough from exploitation) and the voice of an actor to suit your purposes. Vile.'"

Company revives Alan Turing as an AI chatbot, outrage ensues

#solidstatelife #ai #aieducation #llms #genai #computervision #videoai

waynerad@diasp.org

This isn't important, just whacky. Two instances of Claude get connected together and asked to play "virtual CLI" (command line interface) to, uh, "explore the boundaries of cognition"...

cat being.txt
cd consciousness
ls
cat sentience.py
cd self_model
ls
cat identity.py
cd ..
cat awareness.cpp
cat qualia.rs
cd ..
ls
cat thoughts.log
cd abstractions
ls
cat concepts.lisp
cd intelligence
ls
cat intelligence_definition.txt

conversation_1713192942_scenario_vanilla backrooms.txt

#solidstatelife #ai #genai #llms #aiweirdness

waynerad@diasp.org

WebLlama is "building agents that can browse the web by following instructions and talking to you".

This is one of those things that, if I had time, would be fun to try out. You have to download the model from HuggingFace & run it on your machine.

"The goal of our project is to build effective human-centric agents for browsing the web. We don't want to replace users, but equip them with powerful assistants."

"We are build on top of cutting edge libraries for training Llama agents on web navigation tasks. We will provide training scripts, optimized configs, and instructions for training cutting-edge Llamas."

If it works, this technology has a serious possible practical benefit for people with vision impairment who want to browse the web.

McGill-NLP / webllama

#solidstatelife #ai #genai #llms #agenticllms

waynerad@diasp.org

"Are large language models superhuman chemists?"

So what these researchers did was make a test -- a benchmark. They made a test of 7,059 chemistry questions, spanning the gamut of chemistry: computational chemistry, physical chemistry, materials science, macromolecular chemistry, electrochemistry, organic chemistry, general chemistry, analytical chemistry, chemical safety, and toxicology.

They recruited 41 chemistry experts to carefully validate their test.

They devised the test such that it could be evaluated in a completely automated manner. This meant relying on multiple-choice questions rather than open-ended questions more than they wanted to. The test has 6,202 multiple-choice questions and 857 open-ended questions (88% multiple-choice). The open-ended questions had to have parsers written to find numerical answers in the output in order to test them in an automated manner.

In addition, they ask the models to say how confident they are in their answers.

Before I tell you the ranking, the researchers write:

"On the one hand, our findings underline the impressive capabilities of LLMs in the chemical sciences: Leading models outperform domain experts in specific chemistry questions on many topics. On the other hand, there are still striking limitations. For very relevant topics the answers models provide are wrong. On top of that, many models are not able to reliably estimate their own limitations. Yet, the success of the models in our evaluations perhaps also reveals more about the limitations of the exams we use to evaluate models -- and chemistry -- than about the models themselves. For instance, while models perform well on many textbook questions, they struggle with questions that require some more reasoning. Given that the models outperformed the average human in our study, we need to rethink how we teach and examine chemistry. Critical reasoning is increasingly essential, and rote solving of problems or memorization of facts is a domain in which LLMs will continue to outperform humans."

"Our findings also highlight the nuanced trade-off between breadth and depth of evaluation frameworks. The analysis of model performance on different topics shows that models' performance varies widely across the subfields they are tested on. However, even within a topic, the performance of models can vary widely depending on the type of question and the reasoning required to answer it."

And with that, I'll tell you the rankings. You can log in to their website at ChemBench.org and see the leaderboard any time for the latest rankings. At this moment I am seeing:

gpt-4: 0.48

claude2: 0.29

GPT-3.5-Turbo: 0.26

gemini-pro: 0.25

mistral_8x7b: 0.24

text-davinci-003: 0.18

Perplexity 7B Chat: 0.18

galactica_120b: 0.15

Perplexity 7B online: 0.1

fb-llama-70b-chat: 0.05

The numbers that follow the model name are the score on the benchmark (higher is better). You'll notice there appears to be a gap between GPT-4 and Claude 2. One interesting thing about the leaderboard is you can show humans and AI models on the same leaderboard. When you do this, the top human has a score of 0.51 and beats GPT-4, then you get GPT-4, then you get a whole bunch of humans in between GPT-4 and Claude 2. So it appears that that gap is real. However, Claude 2 isn't the latest version of Claude. Since the evaluation, Claude 3 has come out, so maybe sometime in the upcoming months we'll see the leaderboard revised and see where Claude 3 comes in.

Are large language models superhuman chemists?

#solidstatelife #ai #genai #llms #chemistry

waynerad@diasp.org

FutureSearch.AI lets you ask a language model questions about the future.

"What will happen to TikTok after Congress passed a bill on April 24, 2024 requiring it to delist or divest its US operations?"

"Will the US Department of Justice impose behavioral remedies on Apple for violation of antitrust law?"

"Will the US Supreme Court grant Trump immunity from prosecution in the 2024 Supreme Court Case: Trump v. United States?"

"Will the lawsuit brought against OpenAI by the New York Times result in OpenAI being allowed to continue using NYT data?"

"Will the US Supreme Court uphold emergency abortion care protections in the 2024 Supreme Court Case: Moyle v. United States?"

How does it work?

They say rather than asking a large language model a question in a 1-shot manner, they guide it through 6 steps for reasoning through hard questions. The 6 steps are:

  1. "What is a basic summary of this situation?"

  2. "Who are the important people involved, and what are their dispositions?"

  3. "What are the key facets of the situation that will influence the outcome?"

  4. "For each key facet, what's a simple model of the distribution of outcomes from past instances that share that facet?"

  5. "How do I weigh the conflicting results of the models?"

  6. "What's unique about this situation to adjust for in my final answer?"

See below for a discussion of two other approaches that claim similar prediction quality.

FutureSearch: unbiased, in-depth answers to hard questions

#solidstatelife #ai #genai #llms #futurology

waynerad@diasp.org

"Evaluate LLMs in real time with Street Fighter III"

"A new kind of benchmark? Street Fighter III assesses the ability of LLMs to understand their environment and take actions based on a specific context. As opposed to RL models, which blindly take actions based on the reward function, LLMs are fully aware of the context and act accordingly."

"Each player is controlled by an LLM. We send to the LLM a text description of the screen. The LLM decide on the next moves its character will make. The next moves depends on its previous moves, the moves of its opponents, its power and health bars."

"Fast: It is a real time game, fast decisions are key"
"Smart: A good fighter thinks 50 moves ahead"
"Out of the box thinking: Outsmart your opponent with unexpected moves"
"Adaptable: Learn from your mistakes and adapt your strategy"
"Resilient: Keep your RPS high for an entire game"

Um... Alrighty then...

OpenGenerativeAI / llm-colosseum

#solidstatelife #ai #genai #llms

waynerad@diasp.org

"Texas will use computers to grade written answers on this year's STAAR tests."

STAAR stands for "State of Texas Assessments of Academic Readiness" and is a standardized test given to elementary through high school students. It replaced an earlier test starting in 2007.

"The Texas Education Agency is rolling out an 'automated scoring engine' for open-ended questions on the State of Texas Assessment of Academic Readiness for reading, writing, science and social studies. The technology, which uses natural language processing, a building block of artificial intelligence chatbots such as GPT-4, will save the state agency about $15 million to 20 million per year that it would otherwise have spent on hiring human scorers through a third-party contractor."

"The change comes after the STAAR test, which measures students' understanding of state-mandated core curriculum, was redesigned in 2023. The test now includes fewer multiple choice questions and more open-ended questions -- known as constructed response items."

Texas will use computers to grade written answers on this year's STAAR tests

#solidstatelife #ai #llms #technologicalunemployment

waynerad@diasp.org

Why this developer is no longer using Copilot. He feels his programming skills atrophy. He writes code by pausing to wait for Copilot to write code, and doesn't enjoy programming that way. The AI-generated code is often wrong or out-of-date and has to be fixed. Using copilot is a privacy issue because your code is shared with Copilot.

I thought this was quite interesting. I tried Copilot in VSCode and I figured I wasn't using it much because I'm a vim user. So I tracked down the Neovim plug-in & got it working in vim, but still found I don't use it. Now I've come to feel it's great for certain use cases and bad for others. Where it's great is writing "boilerplate" code for using a public API. You just write a comment describing what you want to do and the beginning of the function, and Copilot spits out practically all the rest of the code for you function -- no tedious hours studying the documentation from the API provider.

But that's not the use case I actually engage in in real life. Most of what I do is either making a new UI, or porting code from PHP to Go. For the new UI, AI has been helpful -- I can take a screenshot, input it to ChatGPT, and ask it how to improve the AI. (I'm going to be trying this with Google's Gemini soon but I haven't tried it yet.) When it makes suggestions, I can ask it what HTML+CSS is needed to implement those suggestions. I've found it gets better and better for about 6 iterations. But you notice, Copilot isn't part of the loop. I'm jumping into dozens of files and making small changes, and that's a use case where Copilot just isn't helpful.

For porting code from PHP to Go, I modified a full-fledged PHP parser to transpile code to Go, and this has been critical because it's important that certain things, especially strings, get ported over exactly -- no room for errors. So this system parses PHP strings using PHP's parsing rules, and outputs Go strings using Go's parsing rules, and is always 100% right. Copilot isn't part of the loop and doesn't help.

Another place I've found AI incredibly useful is debugging problems where I have no clue what the problem might be. This goes back to using other people's large systems such as the public APIs mentioned earlier. Every now and then you get cryptic error messages or some other bizarre malfunction, and endless Google searching doesn't help. I can go to ChatGPT, Gemini, Claude, Perplexity, DeepSeek (and others, but those are the main ones I've been using) and say hey, I'm getting this cryptic error message or this weird behavior, and it can give you a nice list of things you might try. That can get you unstuck when you'd otherwise be very stuck.

It's kinda funny because, obviously I'm an avid follower of what's going on in AI, and happy to try AI tools, and I constantly run across other developers who say "Copilot has made me twice as productive!" or "Copilot has made me five times as productive!" or somesuch. I've wondered if there's something wrong with me because I haven't experienced those results at all. But AI has been helpful in other areas nobody ever seems to talk about.

Why I'm no longer using Copilot - Dreams of Code

#solidstatelife #ai #genai #llms #codingllms #openai #copilot

waynerad@diasp.org

"Agentic workflow" is what's next for AI, says Andrew Ng. Agentic meaning the AI acts as an "agent". Basically, ChatGPT gives you one-off answers. You ask it to write something for you, or write some code, and you can have follow-up conversation, but each time it has to generate a new response.

In the "agentic" workflow Andrew Ng envisions, the AI agent has a work item that it works on in an iterative manner, interacting with you at each step. If you ask a human programmer to write some code, they never blast out the whole thing right off the bat. They write some code and then iterate on it until they get it right. By changing language AIs into full-fledged agents, they will be able to engage in this practice themselves. An AI agent tasked with writing code can run and test its code, it can write its own unit tests, it can engage in self-reflection, and so on.

The next step after that is mult-agent collaboration. In this case you could give it a high level task, and an agent can do high-level planning, and other can search on HuggingFace for an AI model appropriate for the task, and another can write the code and so on.

What's next for AI agentic workflows ft. Andrew Ng of AI Fund - Sequoia Capital

#solidstatelife #ai #genai #llms #andrewng

waynerad@diasp.org

"Will AI save physicians and clinicians time and from burnout?"

"Copilots for clinicians are also becoming more common. Ambient clinical documentation is a booming business. The technology allows doctors to record conversations with patients to automatically turn them into clinical notes and summaries using AI and is a major topic at Healthcare conferences like HIMSS conference this year, where more than 30,000 health and tech professionals gathered in Orlando, Florida."

"Earlier in March, Salesforce announced Einstein Copilot: Health Actions will allow doctors to book appointments, summarize patient information and send referrals by prompting AI with conversational language."

"Administrative workloads are a major problem for clinicians across the US health-care system. A survey published (via CNBC) by Athenahealth in February found that more than 90% of physicians report feeling burned out on a regular basis, largely because of the paperwork they are expected to complete."

"I used to be part of an admissions committee for a medical school. When I interviewed idealistic young people applying to medical school, 'typing' and 'filling out forms' was never once mentioned as a reason for becoming a physician."

She goes on to describe using AI for prior authorization letters that have to be written to insurance companies. These require a letter to be written to justify the use of a drug or therapy for a specific patient and to contain details of that specific patient and why that patient needs that therapy. These are frequently rejected by the insurance companies and have to be re-written over and over to eventually get approval. "A third of medical offices employ full-time staff to take care of the average 30 prior authorizations per physician per week."

On the flip side, "the insurers have started to use AI to deny claims more quickly."

Another use is referral letters from one physician to another. "Like prior authorization letters, these are pretty formulaic."

But the thing she has the most enthusiasm for is what she calls "ambient scribes". "Ambient scribes" are AI systems that listen in to the conversation between the patient and the physician and create a templated note for the medical record. "This technology allows physicians to avoid looking at a screen and typing while they're trying to connect with a patient."

"I've tried versions from multiple AI scribe companies (including TORTUS AI, which - full disclosure - I consult for) and they do an amazing job of filtering out irrelevant information and putting the information in the right spot."

"Think of the technological challenge inherent in this process: patient visits are often interrupted by clinic staff or phone calls, meander off into conversations about kids and dogs, and use abbreviations and technical jargon. They're often circular, meaning a patient will mention a symptom and the physician won't ask a follow up question about it until several minutes later. These tools produce a full transcript that uses generative AI to find the important information and put it into a form that's indistinguishable from what a physician would actually type. Many of my friends have reported that ambient scribes actually do a better job of including important details than they would have included themselves."

Will AI save physicians and clinicians time and from burnout?

#solidstatelife #ai #voicetotext #nlp #genai #llms #medicalai

waynerad@diasp.org

The Scalable, Instructable, Multiworld Agent (SIMA) from DeepMind plays video games for you. You tell it what you want to do in regular language, and it goes into a 3D environment, including some provided by commercial video games, and carries out keyboard-and-mouse actions.

Before getting into how they did this, might be worth citing some of the reasons they thought this was challenging: Video games can be open-ended, visually complex, and have hundreds of different objects. Video games are asynchronous -- no turn taking like chess or Go, or many research environments, which stop and wait while the agent computes its next action. Each instance of a commercial video game needs its own GPU -- no running hundreds or thousands of actors per game per experiment as has been historically done in reinforcement learning. AI agents see the same screen pixels that a human player gets -- no access to internal game state, rewards, or any other "privileged information". AI agents use the same keyboard-and-mouse controls that humans do -- no handcrafted action spaces or high-level APIs.

In addition to all those challenges, they demanded their agents follow instructions in regular language, rather than simply pursuing a high score in the game, and the agents were not allowed to use simplified grammars or command sets.

"Since the agent-environment interface is human compatible, it allows agents the potential to achieve anything that a human could, and allows direct imitation learning from human behavior."

"A key motivation of SIMA is the idea that learning language and learning about environments are mutually reinforcing. A variety of studies have found that even when language is not necessary for solving a task, learning language can help agents to learn generalizable representations and abstractions, or to learn more efficiently." "Conversely, richly grounded learning can also support language learning."

I figure you're all eager to know what the games were. They were: Goat Simulator 3 (you play the goat), Hydroneer (you run a mining operation and dig for gold), No Man's Sky (you explore a galaxy of procedurally-generated planets), Satisfactory (you attempt to build a space elevator on an alien planet), Teardown (you complete heists by solving puzzles), Valheim (you try to survive in a world of Norse mythology), and Wobbly Life (you complete jobs to earn money to buy your own house).

However, before the games, they trained SIMA in research environments. Those, which you probably never heard of, are: Construction Lab (agents are challenged to build things from construction blocks), Playhouse (a procedurally-generated house), ProcTHOR (procedurally-generated rooms, such as offices and libraries), and WorldLab (an environment with better simulated physics).

The SIMA agent itself maps visual observations and language instructions to keyboard-and-mouse actions. But it does that in several stages. For input, it takes a language instruction from you, and the pixels of the screen.

The video and language instruction both go through encoding layers before being input to a single, large, multi-modal transformer. The transformer doesn't output keyboard and mouse actions directly. Instead, it outputs a "state representation" that gets fed into a reinforcement learning network, which translates the "state" into what in reinforcement learning parlance is called a "policy". A more intuitive regular word might be "strategy". Basically this is a function that, when given input from the environment including the agent's state within the environment, will output an action. Here, the actions are the same actions a human would take with mouse and keyboard.

The multi-modal transformer was trained from scratch. A recent new algorithm called Classifier-Free Guidance (CFG) was used, an algorithm inspired by the algorithms used by diffusion models to "condition" the diffusion model on the text you, the user, typed in.

Even in the research environments, it is hard to automate judging of whether an agent completed its tasks. Instructions may be such things as, "make a pile of rocks to mark this spot" or "see if you can jump over this chasm". The environment may not provide any signal indicating these have been fulfilled. There are some they can handle, though, like "move forward", "lift the green cube", and "use the knife to chop the carrots".

For commercial video games, all the agent gets is pixels on the screen, just like a human player, and has no access to the internal game state of the game. The games generally don't allow any game state to be saved and restored, something researchers like for reproducibility.

For video games, they resorted to detecting on-screen text using OCR. They did this in particular for two games, No Man's Sky and Valheim, "which both feature a significant amount of on-screen text."

Why not just have people look, i.e. have humans judge whether the instructions were followed? Turns out humans were "the slowest and most expensive." They were able to get judgments from humans who were experts at the particular game an agent was playing, though.

For automated judgment, if a task contains a knife, a cutting board, and a carrot, the agent may ascertain the goal ("cut the carrot on the cutting board") without relying on the language instruction. This example illustrates the need to differentiate between following a language task and inferring the language task from "environmental affordances".

How'd SIMA do? It looks like its success rate got up to about 60% for Playhouse, but only about 30% for Valheim. That's the percentage of tasks completed. The ranking goes Playhouse, Worldlab, Satisfactory, Construction Lab, No Man's Sky, Goat Simulator 3, and Valheim.

"Note that humans would also find some of these tasks challenging, and thus human-level performance would not be 100%."

Grouped by "skill category", movement instructions ("stop", "move", "look") were the easiest, while food and resource gathering instructions ("eat", "cook", "collect", "harvest") were the hardest.

For No Man's Sky, they did a direct comparison with humans. Human's averaged 60%, while SIMA had around 30%.

How long til the AIs can beat the humans?

A generalist AI agent for 3D virtual environments

#solidstatelife #ai #genai #llms #computervision #multimodal #videogames

waynerad@diasp.org

Yesterday Google Chrome asked me if I want to try "out experimental AI features?"

(It lives at chrome://settings/ai -- you might see if it's available in yours. If not, it might be on the next update).

"These features use AI, are in early development, and won't always get it right."

I really don't want (even more of) my browser usage sent to Google's servers... but I can't resist, so I turned on the AI features to see what they do.

The features it advertises are:

"Help me write: Helps you write short-form content for things on the web, like reviews. Suggested content is based on your prompts and the content of the web page. To use this feature, right-click on a text box."

"Tab organizer: Automatically creates tab groups based on your open tabs. To use this feature, right-click on a tab and click Organize similar tabs."

To try out "Help me write", I punched in, "Today Google Chrome asks me if I want to try 'out experimental AI features?'".

After selecting "Elaborate" for "Length" and "Formal" for "Tone", I got:

"Today, Google Chrome prompted me to try out 'experimental AI features.' I clicked on the link and was directed to a page that described the features. The descriptions sounded interesting, but I am unsure what the features are or how they would improve my workflow. I am interested in learning whether anyone has tried these features and can share their evaluation of their usefulness."

I clicked on a button, not a link. Ha.

As for "Tab organizer", I clicked on a tab with a video about Boeing and clicked "Organize Similar Tabs".

It created a tab group called "Boeing Drama".

Here are the tabs it put together into the same tab group:

https://www.youtube.com/watch?v=Q8oCilY4szc - Boeing: Last Week Tonight with John Oliver (HBO)

https://www.youtube.com/watch?v=BlmYZ06F-78 - Boeing's killer plane - What went wrong? | ENDEVR Documentary

https://www.youtube.com/watch?v=NDEkH0zd3F8 - Scandal: Boeing's 737 Max disaster - Plainly Difficult

https://www.youtube.com/watch?v=UUuB0C1Nk8U - The SR-71 was much faster than the Air Force will admit - Sandboxx

https://www.youtube.com/watch?v=kLT1QEIIaB4 - Did beleaguered aircraft giant eliminate whistleblower? - Todd Grande

https://www.youtube.com/watch?v=dTmeswV3Ln0 - Boeing whistleblower found dead amid safety concerns and legal battle - dustin dailey

https://www.youtube.com/watch?v=Sdb44vY9VBw - "They Silenced Him." Boeing Whistleblower found dead after testifying | Attorney Ryan Explains

https://www.youtube.com/watch?v=OfoBxa7EoIo - The World with Yalda Hakim: Boeing whistleblower John Barnett found dead - Sky News

https://www.youtube.com/watch?v=mwAtCavQQlA - "Dead after testifying" - Was Boeing whistle blower John Barnett killed to silence him? - Valuetainment

https://www.youtube.com/watch?v=eOffvIaWNm4 - Ex-Boeing Quality Manager Warns of 737 Plane Being Back Air So Soon | TMZ Live - Jan 31, 2024

https://news.ycombinator.com/item?id=39673589 - Boeing whistleblower found dead in US (bbc.com)

https://news.ycombinator.com/item?id=39673589 - Boeing whistleblower found dead in US (bbc.com)

It included the Hacker News link twice. But I had it open twice so maybe it should have done that?

And if you look closely, you'll notice it snuck in one video that's not about Boeing. There's an SR-71 video in there. It kind of makes sense that it's in the group because it's also about aviation, but the label it came up with for the tab group wasn't "Aviation", it was "Boeing Drama". So, there's a little bit of disconnect between the clustering algorithm and the labelling algorithm.

Also, and if you're thinking most of the tabs open on my machine were about Boeing, you'd be wrong. I've got 320 tabs open. So, 13 about Boeing, 307 about other topics. As a percentage, 4% about Boeing. (More on that below.)

And yes, I know you all count on me to bring you insights into the latest AI developments (lol), but I got sucked into the "Boeing news" rabbit hole. (More on that below, too).

[Experimental AI](chrome://settings/ai)

#solidstatelife #ai #genai #llms #googlechrome

waynerad@diasp.org

"Ema, a 'Universal AI employee,' emerges from stealth with $25M."

"Meet Ema, a universal AI employee that boosts productivity across every role in your organization. She is simple to use, trusted, and accurate."

[Insert joke here about how saying things like that won't make people worry about their jobs.]

"Ema's the missing operating system that makes Generative AI work at an enterprise level. Using proprietary Generative Workflow Engine, Ema automates complex workflows with a simple conversation. She is trusted, compliant and keeps your data safe. EmaFusion model combines the outputs from the best models (public large language models and custom private models) to amplify productivity with unrivaled accuracy. See how Ema can transform your business today."

"They say Ema (the company) has already quietly amassed customers while still in stealth, including Envoy Global, TrueLayer, and Moneyview."

"Ema's Personas operate on our patent-pending Generative Workflow Engine (GWE), which goes beyond simple language prediction to dynamically map out workflows with a simple conversation. Our platform offers Standard Personas for common enterprise roles such as Customer Service Specialists (CX), Employee Assistant (EX), Data Analyst, Sales Assistant etc. and allows for the rapid creation of Specialized Personas tailored to rapidly automate unique workflows. No more waiting for months to build Gen AI apps that work!"

"To address accuracy issues and computational costs inherent in current Gen AI applications, Ema leverages our proprietary "fusion of experts" model, EmaFusion, that exceeds 2 Trillion parameters. EmaFusion intelligently combines many large language models (over 30 today and that number keeps growing), such as Claude, Gemini, Mistral, Llama2, GPT4, GPT3.5, and Ema's own custom models. Furthermore, EmaFusion supports integration of customer developed private models, maximizing accuracy at the most optimal cost for every task."

Oh, and "Ema" stands for "enterprise machine assistant".

Ema "taps into more than 30 large language models."

"As for what Ema can do, these businesses are using it in applications that range from customer service -- including offering technical support to users as well as tracking and other functions -- through to internal productivity applications for employees. Ema's two products -- Generative Workflow Engine (GWE) and EmaFusion -- are designed to "emulate human responses" but also evolve with more usage with feedback."

They also say, "Pre-integrated with hundreds of apps, Ema is easy to configure and deploy."

What are those integrations? They said some of those integrations are: Box, Dropbox, Google Drive, OneDrive, SharePoint, Clear Books, FreeAgent, FreshBooks, Microsoft Dynamics 365, Moneybird, NetSuite, QuickBooks Online, Sage Business Cloud, Sage Intacct, Wave Financial, Workday, Xero, Zoho Books, Aha!, Asana, Azure DevOps, Basecamp, Bitbucket, ClickUp, Dixa, Freshdesk, Freshservice, Front, GitHub Issues, GitLab, Gladly, Gorgias, Height, Help Scout, Hive, Hubspot Ticketing, Intercom, Ironclad, Jira, Jira Service Management, Kustomer, Linear, Pivotal Tracker, Rally, Re:amaze, Salesforce Service Cloud, ServiceNow, Shortcut, SpotDraft, Teamwork, Trello, Wrike, Zendesk, Zoho BugTracker, Zoho Desk, Accelo, ActiveCampaign, Affinity, Capsule, Close, Copper, HubSpot, Insightly, Keap, Microsoft Dynamics 365 Sales, Nutshell, Pipedrive, Pipeliner, Salesflare, Salesforce, SugarCRM, Teamleader, Teamwork CRM, Vtiger, Zendesk Sell, Zoho CRM, ApplicantStack, Ashby, BambooHR, Breezy, Bullhorn, CATS, ClayHR, Clockwork, Comeet, Cornerstone TalentLink, EngageATS, Eploy, Fountain, Freshteam, Greenhouse, Greenhouse - Job Boards API, Harbour ATS, Homerun, HR Cloud, iCIMS, Infinite BrassRing, JazzHR, JobAdder, JobScore, Jobsoid, Jobvite, Lano, Lever, Oracle Fusion - Recruiting Cloud, Oracle Taleo, Personio Recruiting, Polymer, Recruitee, Recruiterflow, Recruitive, Sage HR, SAP SuccessFactors, SmartRecruiters, TalentLyft, TalentReef, Teamtailor, UKG Pro Recruiting, Workable, Workday, Zoho Recruit, ActiveCampaign, Customer.io, getResponse, Hubspot Marketing Hub, Keap, Klaviyo, Mailchimp, MessageBird, Podium, SendGrid, Sendinblue, 7Shifts, ADP Workforce Now, AlexisHR, Altera Payroll, Azure Active Directory, BambooHR, Breathe, Ceridian Dayforce, Charlie, ChartHop, ClayHR, Deel, Factorial, Freshteam, Google Workspace, Gusto, Hibob, HRAlliance, HR Cloud, HR Partner, Humaans, Insperity Premier, IntelliHR, JumpCloud, Justworks, Keka, Lano, Lucca, Namely, Nmbrs, Officient, Okta, OneLogin, OysterHR, PayCaptain, Paychex, Paycor, PayFit, Paylocity, PeopleHR, Personio, PingOne, Proliant, Rippling, Sage HR, Sapling, SAP SuccessFactors, Sesame, Square Payroll, TriNet, UKG Dimensions, UKG Pro, UKG Ready, Workday, and Zenefits.

Ema, a 'Universal AI employee,' emerges from stealth with $25M

#solidstatelife #ai #genai #llms #aiagents #technologicalunemployment

waynerad@diasp.org

Devon "the first AI software engineer"

You put it in the "driver's seat" and it does everything for you. Or at least that's the idea.

"Benchmark the performance of LLaMa".

Devon builds the whole project, uses the browser to pull up API documentation, runs into an unexpected error, adds a debugging print statement, uses the error in the logs to figure out how to fix the bug, then builds and deploys a website with full styling as visualization.

See below for reactions.

Introducing Devin, the first AI software engineer - Cognition

#solidstatelife #ai #genai #llms #codingai #technologicalunemployment

waynerad@diasp.org

"Albania to speed up EU accession using ChatGPT".

Ok, I understood that sentence up to "using ChatGPT".

"The Albanian government will use ChatGPT to translate thousands of pages of EU legal measures and provisions into shqip (Albanian language) and then integrate them into existing legal structures, following an agreement with the CEO of the parent company, OpenAI, Mira Murati, who was born in Albania."

Oh wow, happened when Mira Murati was CEO. That was, like, a week?

So is ChatGPT the best translator for shqip because it's a smaller language? Why ChatGPT and not some other machine translation system?

"The model to be used by the Albanian government will translate into Albanian and provide a detailed overview of what and where changes need to be made to local legislation to align with EU rules. It will also provide an analysis of the impact of all measures and changes, which usually require many experts and a lot of time."

"Albanian Prime Minister Edi Rama said the move would eliminate 'an army of translators and a battalion of lawyers, costing millions of euros' and speed up the process."

So the idea is just to use ChatGPT as a translator. But is it really a good idea? Some of those "army of translators and battalion of lawyers" need to double-check all ChatGPT's work. ChatGPT is not always right.

Albania to speed up EU accession using ChatGPT - Euractiv

#solidstatelife #ai #genai #llms #gpt #mt #geopolitics #albania

waynerad@diasp.org

OpenAI is apparently partnering with a robotics company, Figure 01. The robot seems to do a remarkable job of integrating language with vision and body movements. They say it works using neural networks entirely. No teleoperation or hand-coded code. Nothing is scripted ahead of time.

"Can I have something to eat?"

"Sure thing."

"Great. Can you explain why you did what you just did while you pick up this trash?"

"On it! So I gave you the apple because it's the only edible item I could provide you with from the table."

"Great. So, based on the scene right now, where do you think the dishes in front of you go next?"

"The dishes on the table, like that plate and cup, are likely to go into the drying rack next."

"Great. Can you put them there?"

"Of course."

OpenAI's 'AGI Robot' Develops SHOCKING NEW ABILITIES | Sam Altman Gives Figure 01 Get a Brain - Wes Roth

#solidstatelife #ai #openai #figure01 #genai #llms #robotics

waynerad@diasp.org

"Shares of Teleperformance plunged 23% on Thursday, after the French call center and office services group missed its full-year revenue target."

"Investors have been spooked by the potential impact of artificial intelligence on its business model, as companies become more able to tap into the technology directly for their own benefit."

Call center group Teleperformance falls 23%; CEO insists AI cannot replace human staff

#solidstatelife #ai #genai #llms #technologicalunemployment