#llms

waynerad@diasp.org

Grunty is a "self-hosted desktop app to have AI control your computer, powered by the new Claude computer use capability. Allow Claude to take over your laptop and do your tasks for you (or at least attempt to, lol). Written in Python, using PyQt."

"If it wipes your computer, sends weird emails, or orders 100 pizzas... that's on you."

Grunty

#solidstatelife #ai #llms #genai #agenticai #anthropic

waynerad@diasp.org

OpenAI o1 isn't as good as an experienced professional programmer, but... "the set of tasks that O1 can do is impressive, and it's becoming more and more difficult to find easily demonstrated examples of things it can't do."

"There's a ton of things it can't do. But a lot of them are so complicated they don't really fit in a video."

"There are a small number of specific kinds of entry level developer jobs it could actually do as well, or maybe even better, than new hires."

Carl of "Internet of Bugs" recounts how he spent the last 3 weeks experimenting with the o1 model to try to find its shortcomings. /

"I've been saying for months now that AI couldn't do the work of a programmer, and that's been true, and to a large extent it still is. But in one common case, that's less true than it used to be, if it's still true at all."

"I've worked with a bunch of new hires that were fresh out with CS degrees from major colleges. Generally these new hires come out of school unfamiliar with the specific frameworks used on active projects. They have to be closely supervised for a while before they can work on their own. They have to be given self-contained pieces of code so they don't screw up something else and create regressions. A lot of them have never actually built anything that wasn't in response to a homework assignment.

"This o1 thing is more productive than most, if not all, of those fresh CS graduates I've worked with.

"Now, after a few months, the new grads get the hang of things, and from then on, for the most part, they become productive enough that I'd rather have them on a project than o1."

When I have a choice, I never hire anyone who only has an academic and theoretical understanding of programming and has never actually built anything that faces a customer, even if they only built it for themselves. But in the tech industry, many companies specifically create entry-level positions for new grads."

"In my opinion, those positions where people can get hired with no practical experience, those positions were stupid to have before and they're completely irrelevant now. But as long as those kinds of positions still exist, and now that o1 exists, I can no longer honestly say that there aren't any jobs that an AI could do better than a human, at least as far as programming goes."

"o1 Still has a lot of limitations."

Some of the limitations he cited were writing tests and writing a SQL RDBMS in Zig.

ChatGPT-O1 Changes Programming as a Profession. I really hated saying that. - Internet of Bugs

#solidstatelife #ai #genai #llms #codingai #openai #technologicalunemployment

waynerad@diasp.org

"Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku."

For safety reasons, the last thing we'd allow an AI to do is take full control over a computer, looking at the screen and typing keys and moving the mouse and doing mouse clicks, just like a human, enabling it to do literally everything on a computer a human can do. Oh wait...

"Available today on the API, developers can direct Claude to use computers the way people do -- by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. At this stage, it is still experimental -- at times cumbersome and error-prone. We're releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time."

"Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already begun to explore these possibilities, carrying out tasks that require dozens, and sometimes even hundreds, of steps to complete. For example, Replit is using Claude 3.5 Sonnet's capabilities with computer use and UI navigation to develop a key feature that evaluates apps as they're being built for their Replit Agent product."

But unlike me, everyone else seems to be reacting very positively.

"It doesn't get said enough: Not only is Claude the most capable LLM, but they also have the best character. Great work Claude and Team!"

"Just imagine the accessibility possibilities. For those with mobility or visual impairments, Claude can assist with tasks by simply asking, like helping in usage with apps and systems that often lack proper accessibility features."

That's a good point, actually.

Still, you might want to run it in a VM for now?

"Wow, this is going to be quite game-changing!"

"Impressive to see Claude navigating screens like a human! Though still in beta, this could be a game-changer for automating tedious tasks. Can't wait to see how it develops!"

"What I found particularly noteworthy in this demo was that the information wasn't copied from the CRM, but typed letter by letter. Purely speculating, but perhaps because there are rare cases where websites do not accept copied input, which often also affects password managers."

"This is RPA-like functionality. Wow, Will this be a game-changer?"

RPA stands for Robotic Process Automation.

"What are the security implications of this? Could a bad actor use this to ask Claude to go into other people's computers and access their confidential information?"

Ok, at least one person besides me is feeling a little worry.

"That's epic, you guys have the best AI. This company is something special."

"Computer Use is truly a pivotal advancement. Enabling AI to interact with computers like humans do is a significant leap towards AGI. Exciting times ahead!"

"Looks like Siri on screen awareness but two (or more) years early and available for use now (but meanwhile, on server.) WOW. Well done guys."

"Absolutely incredible -- Super excited to build with this & see what others build!"

"Immediately prompting: 'Do all my work'"

If Claude can do all your work, why will you get paid?

"This could be huge for companies struggling with legacy systems and modernization."

"This is one more pivotal point in AI's evolution. In 2025, more innovation and use cases will emerge, and human involvement is slowly being eliminated. It looks like a small improvement, but it's huge at its core and will significantly impact how AI will be used in a few years. Kudos Claude Team!"

Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

#solidstatelife #genai #llms #agenticai #athropic

waynerad@diasp.org

"LegalFast: Create legal documents, fast."

"Not using AI."

"LegalFast uses AI to power some functionality, but there's a difference between using AI as a tool and having ChatGPT generate complete documents."

So there you have it: Uses AI, but doesn't use AI. I wonder if this is going to become a thing.

Personally, I think a lot of what determines whether AI is appropriate is the reliability requirement. AI is great for things like brainstorming where you only need one great idea and it can generate some bad ones. AI would be bad to generate software for a spacecraft or a medical device. What reliability is required for legal documents?

LegalFast | Create legal documents fast

#solidstatelife #ai #genai #llms #aiethics

waynerad@diasp.org

"BabyAGI 2o is an exploration into creating the simplest self-building autonomous agent. Unlike its sibling project BabyAGI 2, which focuses on storing and executing functions from a database, BabyAGI 2o aims to iteratively build itself by creating and registering tools as required to complete tasks provided by the user. As these functions are not stored, the goal is to integrate this with the BabyAGI 2 framework for persistence of tools created."

The naming might be confusing. OpenAI came out with a model called "o1", and the name "2o" might get you thinking this BabyAGI is using the "o" model. That's not the case.

What this is is a variant of BabyAGI 2 that installs anything it likes and runs code generated by LLMs automatically, that tries continuously to update itself and its tools in order to accomplish whatever taks you gave it. It works with a variety of LLMs -- it uses a system called LiteLLM that lets you choose between more than 100 LLMs. It tries to do everything without human intervention, so when errors happen, it will try to learn from them, and continue iterating towards task completion.

babyagi-2o

#solidstatelife #ai #genai #llms #agenticai

waynerad@diasp.org

A "prompt testing exercise" in India "involving Meta AI, Gemini, Co-pilot and Adobe Firefly" showed "Meta AI's text-to-image generation feature is being weaponised to create harmful AI images targeting Muslims in India by reinforcing negative stereotypes."

I'm reminded that we don't have the "real" LLMs. All the companies have LLMs that can do things that the public LLMs we have access to can't, because of the "guardrail adding" effort. So any time you say, "LLMs can't do X", well, maybe they can, just not the public ones you're using.

Exclusive: Meta AI's text-to-image feature weaponised in India to generate harmful imagery

#solidstatelife #ai #genai #llms #computervision

waynerad@diasp.org

This research paper, published May 5th of this year, says "People cannot distinguish GPT-4 from a human in a Turing Test". Can we declare May 5, 2024, as the date machines passed the Turing Test?

I asked ChatGPT to tell me what all the implications are of AI passing the Turing Test. (Is it an irony I ask an AI that passed the Turning Test what the implications are of AI passing the Turing Test?)

It said, for "Philosophical and ethical implications", that we'd have to redefine what it means to be "intelligent" and what "human intelligence" means, what it means to be "conscious", and the ability to simulate human conversation could lead to ethical dilemmas (deceptive automated customer service systems or deceptive medical or legal automated systems).

For "Social implications", it said "Impact on employment", especially roles that involve interpersonal communication (e.g., customer service, therapy, teaching), "AI in media and entertainment" -- writing novels, creating art, or generating music -- and "Public trust and misinformation" such as deepfakes and fake identities.

For "Legal and regulatory implications", it said "Legal accountability" -- who is accountable for an AI's actions -- "Regulation and oversight" especially in sectors where trust and human judgment are paramount (e.g., healthcare, legal advice, financial trading), and "Personhood and rights" -- does AI deserve rights?

Under "Technological implications", "Advances in human-AI interaction" -- sophisticated, seamless, and natural interaction using language -- resulting in personal assistants, customer service, and virtual companions, "Enhanced autonomous systems" (self driving cars, really?), and "AI as creative agents", but not just content creation, also "emotional work" such as therapy and help with decision-making

Under "Economic implications", "Market disruption", disruption of many industries, particularly those reliant on human communication and customer service, "Increased AI investment", well that's certainly happened, hasn't it, look how many billions OpenAI spends per year, but people will seek to capitalize on AI in specific sectors, e.g., healthcare, education, finance, and "AI-driven Productivity", particularly "in sectors where human-like interaction or decision-making is critical".

Under "Cultural Implications", it listed "Changing social interactions", meaning developing social bonds with AI entities, and over-reliance on AI, and "Education and knowledge", transform education and enabling personalized learning. (Except the point of education isn't really learning, it's credentials, but that's a story for another time).

Under "Security implications", it listed "Cybersecurity threats", especially social engineering attacks (e.g., phishing, fraud) enabled by AI's conversational abilities, and "Autonomous decision-making in security", in areas like national defense or policing, where bias could be a problem.

And finally under "Scientific implications", it listed "Advances in cognitive science", how understanding and building AI that can pass the Turing Test might yield insights into human cognition and intelligence -- eh, not yet, not that I've seen anyway -- and "AI in research", with AI taking on hypothesis generation, data analysis, or even autonomous experimentation.

I put the same question to Google's Gemini and (after the disclaimer that "The assertion that AI has recently passed the Turing Test is debatable") it... mostly listed the same items with slightly different categorization. The one new item it put in was "New benchmarks for AI", "Passing the Turing Test may necessitate the development of new, more comprehensive tests to evaluate AI's capabilities beyond just mimicking human conversation." That's a good point, Gemini.

I put the same question to Claude and it listed the same results, as short points, inviting me to ask it elaborate more.

Asked Meta.AI (from the company formerly known as Facebook), but it didn't seem to yield any new items.

I asked Grok (from X, Elon Musk's company), and it gave me the same list without any categorization.

I asked Perplexity and it mentioned "Multimodal AI development": "Success in language-based Turing Tests could accelerate progress in other areas of AI, such as visual reasoning and abstract problem-solving." Similarly under "Scientific and research implications", it listed "Shift in AI research focus: Passing the Turing Test might redirect AI research towards other aspects of intelligence beyond language processing." It also listed "Interdisciplinary collaboration": "There would likely be increased collaboration between AI researchers, cognitive scientists, and ethicists."

Perplexity also added "New Business Models: Industries could emerge around creating, managing, and regulating human-like AI interactions". Other systems highlighted increased investment in existing "sectors".

I also put the question to DeepSeek, the Chinese LLM, which gave a similar list but put legitimacy and ethical considerations first. It was also the first to mention "Data privacy": "Increased reliance on AI systems" "may raise concerns about data privacy and the collection of sensitive information."

Finally, I put the question to ChatGLM, another Chinese LLM, which, under "Educational Implications", added "Pedagogical shifts: Educators may need to adapt their teaching methods to incorporate AI that can engage in dialogue with students." Also under "Security implications", it listed "Defense and military": "The use of AI in military applications could become more sophisticated, leading to new arms races."

People cannot distinguish GPT-4 from a human in a Turing test

#solidstatelife #ai #llms #genai #turingtest

waynerad@diasp.org

"Hacker plants false memories in ChatGPT to steal user data in perpetuity"

Flaw in long-term memory in chatbots that try too hard to be personal assistants?

"Rehberger found that memories could be created and permanently stored through indirect prompt injection, an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations. These false memories could be planted by storing files in Google Drive or Microsoft OneDrive, uploading images, or browsing a site like Bing -- all of which could be created by a malicious attacker."

Hacker plants false memories in ChatGPT to steal user data in perpetuity

#solidstatelife #ai #genai #llms #cybersecurity

waynerad@diasp.org

Looks like OpenAI has an answer to Claude's "Artifacts", which they call "Canvas".

"People use ChatGPT every day for help with writing and code. Although the chat interface is easy to use and works well for many tasks, it's limited when you want to work on projects that require editing and revisions. Canvas offers a new interface for this kind of work."

"With canvas, ChatGPT can better understand the context of what you're trying to accomplish. You can highlight specific sections to indicate exactly what you want ChatGPT to focus on. Like a copy editor or code reviewer, it can give inline feedback and suggestions with the entire project in mind."

"You control the project in canvas. You can directly edit text or code. There's a menu of shortcuts for you to ask ChatGPT to adjust writing length, debug your code, and quickly perform other useful actions. You can also restore previous versions of your work by using the back button in canvas."

"Coding shortcuts include:"
"Review code: ChatGPT provides inline suggestions to improve your code."
"Add logs: Inserts print statements to help you debug and understand your code."
"Add comments: Adds comments to the code to make it easier to understand."
"Fix bugs: Detects and rewrites problematic code to resolve errors."
"Port to a language: Translates your code into JavaScript, TypeScript, Python, Java, C++, or PHP."

Wait, into PHP? No, no, no, you should only be translating code out of PHP. PHP is one of the worst languages ever. It might even be worse than JavaScript.

No Go. Alrighty, let's continue.

"A second challenge involved tuning the model's editing behavior once the canvas was triggered -- specifically deciding when to make a targeted edit versus rewriting the entire content. We trained the model to perform targeted edits when users explicitly select text through the interface, otherwise favoring rewrites. This behavior continues to evolve as we refine the model."

It seems to me like this could be a first step in transforming "coders" into "managers" who "manage" an AI system that actually does the code writing.

For those of you who aren't coders and use regular language, they say:

"Writing shortcuts include:"
"Suggest edits: ChatGPT offers inline suggestions and feedback."
"Adjust the length: Edits the document length to be shorter or longer."
"Change reading level: Adjusts the reading level, from Kindergarten to Graduate School."
"Add final polish: Checks for grammar, clarity, and consistency."
"Add emojis: Adds relevant emojis for emphasis and color."

Introducing canvas

#solidstatelife #ai #genai #llms #chatgpt #openai

waynerad@diasp.org

"Scaling up self-attention inference."

This wepage outlines the mathematics behind the "attention" mechanism used in large language models, then describes a new mathematical technique that allows the context window of a large language model to be split into pieces that can be computed independently and then combined. The end result is the same as computing the "attention" results from the entire context window.

This should enable large language models (LLMs) to continue to have larger and larger context windows, because now the computation requirement scales logarithmically with the size of the context window instead of linearly. So each time you increase your CPUs and GPUs by some linear increment, you'll double the size of the context window you can do.

Scaling up self-attention inference

#solidstatelife #ai #genai #llms #transformers

waynerad@diasp.org

OpenAI o1 is so smart, humans are not smart enough to create test questions to test how smart it is anymore. Discussion between Alan D. Thompson and Cris Sheridan. Open AI o1 beats PhD level experts across the board on tests we humans have made to test how intelligent other humans are. PhD-level humans are trying to come up with new questions but it is hard for other PhD-level humans to understand the questions and verify answers.

OpenAI reset the numbering, instead of continuing with the "GPT" series, because they think this is a new type of model. The "o" actually just means "OpenAI" so when I say "OpenAI o1", I'm really saying "OpenAI OpenAI 1".

You might think, if this is a new type of model, we'd know what type of model it is. Nope. OpenAI has not told us anything. We don't know what the model architecture is. We don't know how many parameters it has. We don't know how much compute was used to train it, or how much training data it used. We don't know what token system is used or how many tokens.

All we really know is that "chain-of-thought" reasoning has been built into the model in a way previous models never had built into them. (Called "hidden chain of thought", but not necessarily hidden -- you are allowed to see it.) This "chain-of-thought" system is guided by reinforcement learning in some way, but we don't know how that works.

The "system card" that OpenAI published mainly focuses on safety tests. Jailbreak evaluations, hallucinations, fairness and bias, hate speech, threats, and violence, chain-of-thought deception, self-knowledge, theory of mind, political persuasion, "capture-the-flag" (CTF) computer security challenges, reverse engineering, network exploits, biological threat creation.

It has some evaluation of "agentic" tasks (things like installing Docker containers), and multi-lingual capabilities.

Anyway, OpenAI is called "Open" AI but is becoming increasingly secretive.

That and we appear to have entered a new era where AI systems are smarter than the humans that make the tests to test how smart they are.

Interview about AI - Dr Alan D. Thompson on OpenAI's New o1 Model Is a Really Big Deal (Sep/2024) - Dr Alan D. Thompson

#solidstatelife #ai #genai #llms #agi

waynerad@diasp.org

Question: "When will an AI achieve a 98th percentile score or higher in a MENSA admission test?"

Sept. 2020: 2042 (22 years away)
Sept. 2021: 2031 (10 years away)
Sept. 2022: 2028 (6 years away)
Sept. 2023: 2026 (3 years away)

Resolved September 12, 2024

This are median prediction times from Metaculus, is an online forecasting platform, based on 275 predictions.

AI did it by a law test.

MENSA considers a 95% score on the Law School Admission Test (LSAT) to correspond to a 98% score on a general IQ test.

OpenAI released o1, which scores 95.6% on the LSAT "raw score", which is above the threshold.

I'd be interested to see when an AI system could pass the 98% threshold on one of MENSA's regular IQ tests, though.

When will an AI achieve a 98th percentile score or higher in a Mensa admission test?

#solidstatelife #ai #genai #llms #mensa

waynerad@diasp.org

llm-interpolate interpolates between embeddings. As an example of what that means, if you ask it to interpolate between "MyRapSong.wav" and "MyContrySong.wav" (misspelled?) with five intermediate points, it gives you:

"MyRapSong.wav",
"HipHopMeetsCountry.wav",
"SmoothCountryRap.wav",
"CountryVibes.wav",
"MyCountrySong.wav"

It's just interpolating filenames? Not generating the actual songs?

Ok, weird idea.

If you try this, let me know how it goes. It looks like you have to install an llm embed tool for it to work.

#solidstatelife #ai #genai #llms #embeddings

https://github.com/vagos/llm-interpolate

waynerad@diasp.org

DataGemma refers to a new family of "Gemma" models from Google that are integrated with Data Commns, which is supposed to "reduce hallucination".

"Data Commons is a publicly available knowledge graph containing over 240 billion rich data points across hundreds of thousands of statistical variables. It sources this public information from trusted organizations like the United Nations (UN), the World Health Organization (WHO), Centers for Disease Control and Prevention (CDC) and Census Bureaus. Combining these datasets into one unified set of tools and AI models empowers policymakers, researchers and organizations seeking accurate insights."

I clicked over to the Data Commons website and clicked "Demographics", then "What languages are spoken at home in California". It gave me a page, "California: Non-English Languages", which said: Spanish 10.5M, Chinese 1.12M, Tagalog 788K, Vietnamese 559K, Korean 360K, Arabic 205K, Hindi 203K, ... Hmm, I would not have expected a language from the Philippines (Tagalog) to exceed a language from India (Hindi).

Anyway, in case you're wondering how the integration works, there's two methods: RIG and RAG.

"RIG (Retrieval-Interleaved Generation) enhances the capabilities of our language model, Gemma 2, by proactively querying trusted sources and fact-checking against information in Data Commons. When DataGemma is prompted to generate a response, the model is programmed to identify instances of statistical data and retrieve the answer from Data Commons."

"RAG (Retrieval-Augmented Generation) enables language models to incorporate relevant information beyond their training data, absorb more context, and enable more comprehensive and informative outputs. With DataGemma, this was made possible by leveraging Gemini 1.5 Pro's long context window. DataGemma retrieves relevant contextual information from Data Commons before the model initiates response generation, thereby minimizing the risk of hallucinations and enhancing the accuracy of responses."

DataGemma: Using real-world data to address AI hallucinations

#solidstatelife #ai #genai #llms #gemini #gemma

waynerad@diasp.org

InstaDeep, a startup from Tatooine, was acquired for $682 million by BioNTech -- er, make that Tataouine, the city in Tunisia that Tatooine, Luke Skywalker's home planet in the fictional universe of Star Wars, was named after.

"When the Covid-19 pandemic ground the world to a halt, InstaDeep trained a large language model to accurately predict new, dangerous variants before they spread."

"We took transformers already pre-trained on all proteins that existed and then altered the language of SARS-CoV-2. We tried to see if we could predict to some level of accuracy whether or not something could be high-risk. It turned out that it worked. All the variants we identified as potentially dangerous were later confirmed as concerning by the World Health Organization."

How InstaDeep became Africa’s biggest AI startup success

#solidstatelife #ai #genai #llms #medicalai

waynerad@diasp.org

The future of polling: AI chatbots predict elections better than humans.

"In a closely watched New York Democratic primary in June, centrist George Latimer ousted incumbent Jamaal Bowman by a wide margin of 58.7% to 41.3%."

"Ahead of the vote, two 19-year-old college dropouts in Manhattan conducted a poll that accurately predicted the results within 371 votes. Their secret? They didn't survey a single person. Instead, they asked thousands of AI chatbots which candidate they preferred."

"For election results, a seven-person company called Aaru uses census data to replicate voter districts, creating AI agents essentially programmed to think like the voters they are copying. Each agent is given hundreds of personality traits, from their aspirations to their family relationships. The agents are constantly surfing the internet and gathering information meant to mimic the media diets of the humans they're replicating, which sometimes causes them to change their voting preferences."

A disturbing implication of this is that all of our voting choices are a more-or-less straightforward consequence of our media diets. So the real choice isn't who to vote for, it's what media to pay attention to.

No people, no problem: AI chatbots predict elections better than humans

#solidstatelife #ai #genai #llms #futurology

waynerad@diasp.org

"Homeworkify by Tutor AI: Your AI-powered learning companion"

I was interested to know if this would give you answers and explanations, like ChatGPT etc (which are notorious for being used to do homework), or whether instead of just directly giving you answers and explanations, it would try to guide you to figuring out the answer on your own, the way a good human tutor would. To find out, I figured I would need an actual homework problem, so I cracked open my chemistry book and grabbed a problem:


An aqueous solution has an iodine concentration of 2.00 * 10^-3 mol/L. Calculate the percentage of iodine remaining in the aqueous phase after extraction of 0.100 L of this aqueous solution with 0.050 L of CCl4 at 25 degrees C. The equilibrium constant K for [I2]CCl4 / [I2]aq = 85 at 25 degrees C.


Here's the reply:


Let's break down this problem step by step to find the percentage of iodine remaining in the aqueous phase after the extraction.

Step 1: Define the Equilibrium Constant

The equilibrium constant K$$ for the partitioning of iodine between carbon tetrachloride $ CCl4 $ and water is given by the expression: $$ K = \frac{$I_2${CCl4}}{I2{aq}} = 85 HereI_2${CCl4} istheconcentrationofiodineintheCCl4phaseandI2{aq}$ is the concentration of iodine in the aqueous phase.

Step 2: Initial Concentrations

The initial concentration of iodine in the aqueous solution is:

$$ $I_2${aq \text{initial}} = 2.00 \times 10^{-3} \text{ mol/L}

Theinitialvolumeoftheaqueoussolutionis: V{aq} = 0.100 \text{ L}

ThevolumeofCCl4usedforextractionis: V_{CCl4} = 0.050 \text{ L} $$

Step 3: Set Up the Equilibrium Expressions

Let x be the amount of iodine inmoles that transfers from the aqueous phase to the CCl4 phase.

The iodine concentration in the aqueous phase after extraction is:

$$ $I_2${aq \text{final}} = \frac{ 2.00 x 10^-3 mol/L \cdot 0.100 \text{ L} - x}{0.100 \text{ L}}
TheiodineconcentrationintheCCl4phasewillbe: I2{CCl4} = \frac{x}{0.050 \text{ L}} $### Step 4: Apply the Equilibrium Constant

Using the equilibrium constant expression:$ K = \frac{$I_2${CCl4}}{I2{aq \text{final}}} = 85
SubstitutetheexpressionsforI_2${CCl4} and I2{aq \text{final}}$:

85 = \frac{\frac{x}{0.050}}{\frac{$ 2.00 \times 10^{-3} \times 0.100 $ - x}{0.100}}

Step 5: Solve for x

Simplify the equation:

85 = (x / 0.050) / (((2.00 x 10^-4) - x) / 0.100)

85 = 2x / (0.2 x 10^-3 - x)

Now solve for x:

85 $ 0.2 \times 10^{-3} - x $ = 2x 17 \times 10^{-3} - 85x = 2x

17 x 10^-3 = 87x

x = \frac{17 \times 10^{-3}}{87} $$ x \approx 1.95 \times 10^{-4} \text{ mol} $### Step 6: Calculate the Final Concentration in Aqueous Phase

The final concentration of iodine in the aqueous phase is: I_2${aq \text{final}} = \frac{2.00 x 10^-3 x 0.100- 1.95 \times 10^{-4}}{0.100} $$ $I_2${aq \text{final}} = \frac{2.00 \times 10^{-4} - 1.95 \times 10^{-4}}{0.100} $$ $I_2${aq \text{final}} = \frac{0.05 \times 10^{-4}}{0.100} $$ $I_2${aq \text{final}} = 0.50 \times 10^{-3} \text{ mol/L} $### Step 7: Calculate the Percentage of Iodine Remaining

The percentage of iodine remaining in the aqueous phase is: \text{Percentage remaining} = \left \frac{I2{aq \text{final}}}{I2{aq \text{initial}}} \right x 100 \text{Percentage remaining} = \left \frac{0.50 \times 10^{-3}}{2.00 \times 10^{-3}} \right x 100 \text{Percentage remaining} = \left \frac{0.50}{2.00} \right x 100 Percentage remaining = 25$

Conclusion

After the extraction 25% of the iodine remains in the aqueous phase.


Yes, it really did give me a malformatted mishmash of markdown and LaTeX, with words mashed together and line breaks in the wrong places and the whole deal. So, that's one strike against it right there. I will have to try the AI LaTeX helper next to see if I can make this all nicely formatted.

Secondly, it definitely didn't do any interactive back-and-fourth like a human tutor would to guide me into figuring out the answer on my own and learning the key concepts in the process. So I definitely see an opening for an AI tutor that can do that.

Thirdly, it got the answer wrong. It said 25% but the correct answer is 2.3%.

However, it did have the right general idea. You make a variable x that remains in the aqueous phase -- or you could do the reverse and define x as the amount that changes to the CCl4 phase, which is what it actually decided to do. 2.00 x 10^-4 - x will be what's in the other phase, regardless of which way you define it.

So, it set the solution up correctly but somewhere in the plugging and chugging it did the algebra wrong.

Homeworkify by Tutor AI: Your AI-powered learning companion

#solidstatelife #ai #genai #llms

waynerad@diasp.org

"At a glance, SocialAI -- which is billed as a pure 'AI Social Network' -- looks like Twitter, but there's one very big twist on traditional microblogging: There are no other human users here. Just you."

"In a nutshell, SocialAI lets you share your thoughts with an infinite supply of ever-available AI-powered bots that can endlessly chat back."

"Think about it: No remark you post to SocialAI will ever be greeted with silence nor fail to engage en masse. You simply can't get ghosted. The app's faux users exist to hang on your every word -- leveraging programmed enthusiasm to chip canned commentary into your replies (even the sarcastic, snarky, and pessimistic bots can't resist joining these continuous scroll comment pile-ons)."

Alrighty then.

SocialAI offers a Twitter-like diary where AI bots respond to your posts | TechCrunch

#solidstatelife #ai #genai #llms #chatbots #socialnetworking

waynerad@diasp.org

AI can't cross this line on a graph and we don't know why.

The graph has the "error" that the neural net is trying to minimize as part of its training (also called the "loss") on the vertical axis.

On the horizontal axis, it has the amount of computing power thrown at the training process.

When switched to a log-log graph -- logarithmic on both axes -- a straight line emerges.

This is actually one of 3 observed neural network scaling laws. The other two look at model size and dataset size, and see a similar pattern.

Have we discovered some fundamental law of nature, like the ideal gas law in chemistry, or is this an artifact of the particular methods we are using now to train neural networks?

You might think someone knows but no one knows.

That didn't stop this YouTuber from making some good animations of the graphs and various concepts in neural network training, such as cross-entropy. It introduces the interesting concept that language may have a certain inherent entropy.

The best theory as to why the scaling laws hold tries to explain it in terms of neural networks learning high-dimensional manifolds.

AI can't cross this line and we don't know why. - Welch Labs

#solidstatelife #ai #llms #genai #deeplearning #neuralnetworks #scalinglaws

waynerad@diasp.org

Alexis Conneau, OpenAI's research lead for GPT-4o/GPT-5, has left OpenAI to start a new company to create "Her", as in, from the movie. (Alrighty then.)

Career update: After an amazing journey at @OpenAI building #Her, I’ve decided to start a new company

#solidstatelife #ai #genai #llms #chatbots #her