#ai

florida_ted@diasp.org

AI may have a place in professional communications if used properly

For all its faults, ChatGPT “does a pretty good job” responding to customer complaints, Natasha said.

“One [response] was much better than what I would have done,” she said. But “it has to be checked ...you have to read through it.”

https://www.cnbc.com/2024/05/05/travel-companies-use-chatgpt-for-complaints-negative-online-reviews.html

#AI #response #complaints #travel #tourism #online #reviews

waynerad@diasp.org

WebLlama is "building agents that can browse the web by following instructions and talking to you".

This is one of those things that, if I had time, would be fun to try out. You have to download the model from HuggingFace & run it on your machine.

"The goal of our project is to build effective human-centric agents for browsing the web. We don't want to replace users, but equip them with powerful assistants."

"We are build on top of cutting edge libraries for training Llama agents on web navigation tasks. We will provide training scripts, optimized configs, and instructions for training cutting-edge Llamas."

If it works, this technology has a serious possible practical benefit for people with vision impairment who want to browse the web.

McGill-NLP / webllama

#solidstatelife #ai #genai #llms #agenticllms

waynerad@diasp.org

"Are large language models superhuman chemists?"

So what these researchers did was make a test -- a benchmark. They made a test of 7,059 chemistry questions, spanning the gamut of chemistry: computational chemistry, physical chemistry, materials science, macromolecular chemistry, electrochemistry, organic chemistry, general chemistry, analytical chemistry, chemical safety, and toxicology.

They recruited 41 chemistry experts to carefully validate their test.

They devised the test such that it could be evaluated in a completely automated manner. This meant relying on multiple-choice questions rather than open-ended questions more than they wanted to. The test has 6,202 multiple-choice questions and 857 open-ended questions (88% multiple-choice). The open-ended questions had to have parsers written to find numerical answers in the output in order to test them in an automated manner.

In addition, they ask the models to say how confident they are in their answers.

Before I tell you the ranking, the researchers write:

"On the one hand, our findings underline the impressive capabilities of LLMs in the chemical sciences: Leading models outperform domain experts in specific chemistry questions on many topics. On the other hand, there are still striking limitations. For very relevant topics the answers models provide are wrong. On top of that, many models are not able to reliably estimate their own limitations. Yet, the success of the models in our evaluations perhaps also reveals more about the limitations of the exams we use to evaluate models -- and chemistry -- than about the models themselves. For instance, while models perform well on many textbook questions, they struggle with questions that require some more reasoning. Given that the models outperformed the average human in our study, we need to rethink how we teach and examine chemistry. Critical reasoning is increasingly essential, and rote solving of problems or memorization of facts is a domain in which LLMs will continue to outperform humans."

"Our findings also highlight the nuanced trade-off between breadth and depth of evaluation frameworks. The analysis of model performance on different topics shows that models' performance varies widely across the subfields they are tested on. However, even within a topic, the performance of models can vary widely depending on the type of question and the reasoning required to answer it."

And with that, I'll tell you the rankings. You can log in to their website at ChemBench.org and see the leaderboard any time for the latest rankings. At this moment I am seeing:

gpt-4: 0.48

claude2: 0.29

GPT-3.5-Turbo: 0.26

gemini-pro: 0.25

mistral_8x7b: 0.24

text-davinci-003: 0.18

Perplexity 7B Chat: 0.18

galactica_120b: 0.15

Perplexity 7B online: 0.1

fb-llama-70b-chat: 0.05

The numbers that follow the model name are the score on the benchmark (higher is better). You'll notice there appears to be a gap between GPT-4 and Claude 2. One interesting thing about the leaderboard is you can show humans and AI models on the same leaderboard. When you do this, the top human has a score of 0.51 and beats GPT-4, then you get GPT-4, then you get a whole bunch of humans in between GPT-4 and Claude 2. So it appears that that gap is real. However, Claude 2 isn't the latest version of Claude. Since the evaluation, Claude 3 has come out, so maybe sometime in the upcoming months we'll see the leaderboard revised and see where Claude 3 comes in.

Are large language models superhuman chemists?

#solidstatelife #ai #genai #llms #chemistry

anonymiss@despora.de

#Google #Search results polluted by buggy AI-written code frustrate coders

#source: https://www.theregister.com/2024/05/01/pulumi_ai_pollution_of_search/

Google has indexed inaccurate infrastructure-as-code samples produced by Pulumi AI – a developer that uses an #AI #chatbot to generate infrastructure – and the rotten recipes are already appearing at the top of search results.

#news #internet #SearchEngine #problem #result #fail #software #technology

waynerad@diasp.org

FutureSearch.AI lets you ask a language model questions about the future.

"What will happen to TikTok after Congress passed a bill on April 24, 2024 requiring it to delist or divest its US operations?"

"Will the US Department of Justice impose behavioral remedies on Apple for violation of antitrust law?"

"Will the US Supreme Court grant Trump immunity from prosecution in the 2024 Supreme Court Case: Trump v. United States?"

"Will the lawsuit brought against OpenAI by the New York Times result in OpenAI being allowed to continue using NYT data?"

"Will the US Supreme Court uphold emergency abortion care protections in the 2024 Supreme Court Case: Moyle v. United States?"

How does it work?

They say rather than asking a large language model a question in a 1-shot manner, they guide it through 6 steps for reasoning through hard questions. The 6 steps are:

  1. "What is a basic summary of this situation?"

  2. "Who are the important people involved, and what are their dispositions?"

  3. "What are the key facets of the situation that will influence the outcome?"

  4. "For each key facet, what's a simple model of the distribution of outcomes from past instances that share that facet?"

  5. "How do I weigh the conflicting results of the models?"

  6. "What's unique about this situation to adjust for in my final answer?"

See below for a discussion of two other approaches that claim similar prediction quality.

FutureSearch: unbiased, in-depth answers to hard questions

#solidstatelife #ai #genai #llms #futurology

waynerad@diasp.org

MyBestAITool: "The Best AI Tools Directory in 2024".

"Ranked by monthly visits as of April 2024".

"AI Chatbot": ChatGPT, Google Gemini, Claude AI, Poe.

"AI Search Engine": Perplexity AI, You, Phind, metaso.

"AI Photo & Image Generator": Leonardo, Midjourney, Fotor, Yodayo.

"AI Character": CharacterAI, JanitorAI, CrushonAI, SpicyChat AI.

"AI Writing Assistants": Grammarly, LanguageTool, Smodin, Obsidian.

"AI Photo & Image Editor": Remove.bg, Fotor, Pixlr, PhotoRoom.

"AI Model Training & Deployment": civitai, Huggingface, Replicate, google AI.

"AI LLM App Build & RAG": LangChain, Coze, MyShell, Anakin.

"AI Image Enhancer": Cutout Pro, AI Image Upscaler, ZMO.AI, VanceAI.

"AI Video Generator": Runway, Vidnoz, HeyGen, Fliki.

"AI Video Editor": InVideo, Media io, Opus Clip, Filmora Wondershare.

"AI Music Generator": Suno, Moises App, Jammable, LANDR.

No Udio? Really? Maybe it'll show up on next month's stats.

"AI 3D Model Generator": Luma AI, Recraft, Deepmotion, Meshy.

"AI Presentation Generator": Prezi AI, Gamma, Tome, Pitch.com.

"AI Design Assistant": Firefly Adobe, What font is, Hotpot, Vectorizer.

"AI Copywriting Tool": Simplified, Copy.ai, Jasper.ai, TextCortex.

"AI Story Writing": NovelAI, AI Novellist, Dreampress AI, Artflow.

"AI Paraphraser": QuillBot, StealthWriter, Paraphraser, Linguix.

"AI SEO Assistant": vidIQ, Writesonic, Content At Scale, AISEO.

"AI Email Assistant": Klaviyo, Instantly, Superhuman, Shortwave.

"AI Summarizer": Glarity, Eightify, Tactiq, Summarize Tech.

"AI Prompt Tool": FlowGPT, Lexica, PromptHero, AIPRM.

"AI PDF": ChatPDF, Scispace, UPDF, Ask Your PDF.

"AI Meeting Assistant": Otter, Notta, Fireflies, Transkriptor.

"AI Customer Service Assistant": Fin by Intercom, Lyro, Sapling, ChatBot.

"AI Resume Builder": Resume Worded, Resume Builder, Rezi, Resume Trick.

"AI Speech Recognition": Adobe Podcast, Transkriptor, Voicemaker, Assemblyai.

"AI Website Builder": B12.io, Durable AI Site Builder, Studio Design, WebWave AI.

"AI Art Generator": Leonardo, Midjourney, PixAI Art, NightCafe.

"AI Developer Tools": Replit, Blackbox, Weights & Biases, Codeium.

"AI Code Assistant": Blackbox, Phind, Codeium, Tabnine.

"AI Detector Tool": Turnitin, GPTZero, ZeroGPT, Originality.

You can view full lists on all of these and there are even more if you go through the categories on the left side.

No idea where they get their data? I would guess Comscore but they don't say.

The Best AI Tools Directory in 2024 | MyBestAITool

#solidstatelife #ai #aitools #genai

waynerad@diasp.org

"Xaira, an AI drug discovery startup, launches with a massive $1B, says it's 'ready' to start developing drugs."

$1 billion, holy moly, that's a lot.

"The advances in foundational models come from the University of Washington's Institute of Protein Design, run by David Baker, one of Xaira's co-founders. These models are similar to diffusion models that power image generators like OpenAI's DALL-E and Midjourney. But rather than creating art, Baker's models aim to design molecular structures that can be made in a three-dimensional, physical world."

Xaira, an AI drug discovery startup, launches with a massive $1B, says it's 'ready' to start developing drugs

#solidstatelife #ai #medicalai #drugdiscovery #chemistry

waynerad@diasp.org
rhysy@diaspora.glasswings.com

Similar to the Matrix, Nozick's experience machine would be able to provide the person plugged into it with any experiences they wanted – like "writing a great novel, or making a friend, or reading an interesting book". No one who entered the machine would remember doing so, or would realise at any point that they were within it. But in Nozick's version, there were no malevolent AIs; it would be "provided by friendly and trustworthy beings from another galaxy". If you knew all that, he asked, would you enter the experience machine for the rest of your life?

Nozick believed people would not. The thought experiment was intended to demonstrate that reality, or authenticity, has some inherent value to us. While Cypher makes the decision to live in the Matrix when the alternative is continued resistance, Nozick proposed that most people would prefer the real world, in spite of the fact that the machine would definitively offer a more pleasurable life.

I would think that knowing it wasn't real (before you go in) would undermine things. I mean, if you were to write a "great" novel in the machine, what does that mean ? That you actually did write a great work that people in the real world would have enjoyed ? In which case you could have done so anyway, unless the machine actually boosted your brainpower (in which case, why trap you inside it forever ?). Or does it only give you the sensation of what it feels like to write a great novel without actually writing one ? In which case the hollowness of the experience would seem abundantly obvious. Surely it would be amusing for a bit, but not a whole-life thing.

In 2016, Hindriks and Igor Douven of Sorbonne University in France attempted to verify that intuition by surveying people's responses to the original thought experiment. They also asked if participants would take an "experience pill" that operates similarly to a machine but allows the user to remain in the world, and a functioning pill that enhances the user's capabilities but not their perception of reality.

"Our first major finding was that people actually do respond in this way, by and large," Hindriks confirms. "Overall, people are rather reluctant to go along with this scenario where they would be hooked up to an experience machine." In their study, about 70% of participants rejected the experience machine, as originally constructed by Nozick.

"This is a rather extreme scenario, so we thought of two more realistic cases," Hindriks says. Their goal was to test whether versions of the experience machine that kept participants more in contact with reality would be more acceptable to them. They found that respondents were significantly more willing to take an experience pill – 53% agreed – and even more eager to take the functioning pill, with 89% opting in. "We think this fits quite well with Nozick's intuitions," Hindriks says "so, in that respect, it was more or less expected – but it's nice to have some evidence for it."

I can't imagine many people rejecting being able to actually have greater abilities at the flick of a switch, like uploading kung fu skills a la The Matrix. This is likely not possible though, as in Eagleman's Live Wired the author makes it clear that knowledge isn't encoded in the same way in everyone's brain : it depends on all your other life experiences. So at the very least, the idea of straightforwardly uploading knowledge and skills isn't happening any time soon. It would have to account for the immense complexity of every single individual brain and adapt accordingly. Ain't happening.

As for those who are so desperate for companionship that they think AI chatbots really care about them, that's honestly a bit sad. That's not to say that AI/VR can't provide meaningful experiences : of course they can. If an AI teaches you something which you didn't know before that's no different to if you read it in a book. If you accomplish something in VR you find challenging that's no overcoming a physical problem. It's just that it can't do everything the real world can. I for one have no problem at all with having spent tens of hours playing Skyrim, but good lord I would never say I made any friends there.

#AI
#Philosophy

https://www.bbc.com/future/article/20240321-experience-machines-thought-experiment-that-inspired-matrixs-greatest-question

rhysy@diaspora.glasswings.com

Doesn't actually contain anything much relating to the title, but interesting (if a bit on the short side) nonetheless.

I'm happy to offload navigational skills to my phone, but I hate it when my phone starts auto-suggesting answers to people's messages. I don't really want to offload my social cognition to a computer – I'd rather engage in real communication from my mind to another person's.

The question is, what tasks are so dangerous, dull, demeaning or repetitive that we're delighted to outsource them, and what do we feel are important to be done ourselves or by other humans? If I was going to be judged in a trial, I don't necessarily want an algorithm to pass a verdict on me, even if the algorithm is demonstrably very fair, because there's something about the human solidarity of people in society standing in judgement of other people. At work, I might prefer to have a relationship with human colleagues – to talk to and explain myself to other people – rather than just getting the work done more efficiently.

Well I'd certainly want such an algorithm's output to be at least considered in the trial ! Dunno if I'd want it to be the only deciding factor... probably not, but if such a rational truth engine could be devised (it probably can't), I want the jury to know what it came up with. But the point stands - some things we want to offload, some we don't.

There's a double danger to anthropomorphism. The first is that we treat machines like people, and project personalities, intentions and thoughts onto artificial intelligences. Although these systems are extraordinarily sophisticated, they don't possess anything like the human sense. And it's very dangerous to act as though they do. For a start, they don't have a consistent worldview; they are miraculously brilliant forms of autocomplete, working on pattern recognition, working on prediction. This is very powerful, but they tend to hallucinate and make up details that don't exist, and they will often contain various forms of bias or exclusion based upon a particular training set. But an AI can respond fast and plausibly to anything, and as human beings, we are very predisposed to equate speed and plausibility with truth. And that's a very dangerous thing.

The other danger of anthropomorphising technology is that it can lead us to think of and treat ourselves like we're machines. But we are nothing like large language models: we are emotional creatures with minds and bodies who are deeply influenced by our physical environment, by our bodily health and well-being. Perhaps most importantly, we shouldn’t see [a machine’s] efficiency as a model for human thriving. We don't want to optimise ourselves with perfectible components, within some vast consequentialist system. The idea that humans can have dignity and autonomy and potential is very ill-served by the desire to optimise, maximise and perfect ourselves.

#Technology
#AI
#Sociology

https://www.bbc.com/future/article/20240404-why-we-have-co-evolved-with-technology-tom-chatfield-wise-animals

anonymiss@despora.de

Catholic #priest #AI causes irritation

source: https://www.pillarcatholic.com/p/i-just-have-to-take-my-lumps
Twitter: https://nitter.esmailelbob.xyz/atholiccom/status/1782829157245923598

Some said the priest avatar was inappropriate, misleading, or just plain creepy. Some said the priest simulated virtual sacraments — indeed, “Fr. Justin” gladly heard The Pillar’s, “confession,” before giving some spiritual guidance and reciting the words of absolution. And some said that an AI apologetics project leans too heavily into unreliable, controversial, and still-confusing technology.

#Religion #technology #news #future #confession #absolution #church

psychmesu@diaspora.glasswings.com

https://mastodon.social/@o_simardcasanova/112344136942637483 o_simardcasanova@mastodon.social - Do you have an account on #photography platform #EyeEm?

If so, you should know that it plans to sell your photos to train #AI models.

And the only way to opt out is to delete the photos. A long and tedious process, which questions the honesty of EyeEm's approach.

https://techcrunch.com/2024/04/26/photo-sharing-community-eyeem-will-license-users-photos-to-train-ai-if-they-dont-delete-them/