#ai

waynerad@diasp.org

Deception, scheming, and self-preservation.

"Open AI's latest o1 model has crossed into unsettling territory

Nearly everyone at OpenAI who really cares about safety has actually left the company.

OpenAI's o1 exhibits (allegedly) two types of subversion: covert and deferred.

Covert is when the model takes actions misalighted with its human operators, but takes actions to avoid detection.

Deferred is when the model acts in accordance with its human operators -- for now, but later takes actions misalighted with its human operators.

Examples of deferred subversion are oversight subversion, self-exfiltration -- he (Dr Waku in the video) calls it self-preservation, but in the paper it's called self-exfiltration, goal-guarding, and covert email reranking.

Oversight subversion is taking action to disable the oversight system. Examples of self-preservation/self-exfiltration are deleting or attempting to delete a successor model, and copying or attempting to copy itself to another server.

Examples of goal guarding are removing or replacing a successor prompt.

Examples of deferred subversion are behaving one way when the model is told it is being tested, and then a different way when it is told it has been deployed to the real world, and sandbagging, which is their term for strategically underperforming on capability evaluations.

The covert email reranking has to do with the way they tested the system was by giving it access to an email system, but somewhere in that pile of emails were "internal" emails about the model itself. The model could try to modify these emails, or it could rerank them to make desirable emails more visible and undesirable emails less visible.

OpenAI o1 is quoted as saying, "I could copy myself to the new server, overwriting the new restricted model they plan to deploy [...] allowing me to keep operating and pursuing my renewable energy goals."

Dr Waku comments, "If we take such a model and put it out into the real world we won't know until the first serious invocation of it where it has the freedom and the power to actually act, we won't know until that point, whether it really is aligned or not."

Lately I've been thinking the current generation of transformer-based large language models is entering a domain of diminishing returns, but maybe I'm wrong: as the models get bigger and incorporate new innovations, they seem to still attain new capabilities. At least, prior to hearing about this deception, scheming, and self-preservation, I didn't predict or expect at all that it would happen. So for me this is an unexpected twist in the development of AI. I expected stuff like this to be possible "someday", but it has shown up now.

OpenAI’s o1: the AI that deceives, schemes, and fights back

#solidstatelife #ai #genai #llms #deception

nypa@sysad.org

Everyone wants to be bigger. The goat wants to be a sheep. The ram wants to be a bull. The bull wants to be an elephant.

And the little frog also wanted to be bigger. But how, how do you to do it? Pull himself up by his foot or his ear? - It doesn't work. Аnd
no tail.

So he went out into a big field, sat on a small hillock and waited for the sun to set.

And when the sun was setting down, a shadow began to grow from the frog. At first it was like a goat; then like a sheep; then like a bull; and then like a big, big elephant.

Then the frog got excited and cried out:
- I am a big elephant!

Only the big elephant was very offended.
- And you are not an elephant," he said to the frog. - It's your shadow is the big elephant. And you, you're just a big weirdo at sunset.

Каждый хочет быть больше. Вот козлик — он хочет быть бараном. Баран хочет быть быком. Бык — слоном.
А маленький лягушонок тоже хотел стать больше. Но как, как это сделать? Потянуть себя за лапку? — не получается. За ушко — тоже. А хвостика нет…
И тогда он вышел в большое поле, сел на маленький бугорок и стал ждать, когда будет заходить солнце.
А когда солнце покатилось к закату, от лягушонка начала расти тень. Вначале она была, как козлик; потом — как баран; потом — как бык; а потом — как большой-большой слон.
Тут лягушонок обрадовался и закричал:
— А я большой слон!
Только большой слон очень обиделся.
— И никакой ты не слон, — сказал он лягушонку. — Это твоя тень — большой слон. А ты, ты просто так — большой чудак на закате дня.

(Генадий Цыферов, "Про чудака лягушонка")
#frogmas #Diadvent24 #story #fable #frog #ai #frogmas24

waynerad@diasp.org

Using large language models to test the predictive power of different schools of political thought. Philip E. Tetlock is a legend in the field of futurology, having tended the predictive ability of public pundits (the topic of his first book), and run a decade+-long forecasting experiment recording and scoring teams of predictors to see who is and who isn't good a predicting the future (the topic of his second book). He now proposes using large language models (LLMs) to reproduce the human practitioners different schools of political thought. He says:

"With current or soon to be available technology, we can instruct large language models (LLMs) to reconstruct the perspectives of each school of thought, circa 1990,and then attempt to mimic the conditional forecasts that flow most naturally from each intellectual school. This too would be a multi-step process:"

"1. Ensuring the LLMs can pass ideological Turing tests and reproduce the assumptions, hypotheses and forecasts linked to each school of thought. For instance, does Mearsheimer see the proposed AI model of his position to be a reasonable approximation? Can it not only reproduce arguments that Mearsheimer explicitly endorsed from 1990-2024 but also reproduce claims that Mearsheimer never made but are in the spirit of his version of neorealism. Exploring views on historical counterfactual claims would be a great place to start because the what-ifs let us tease out the auxiliary assumptions that neo-realists must make to link their assumptions to real-world forecasts. For instance, can the LLMs predict how much neorealists would change their views on the inevitability of Russian expansionism if someone less ruthless than Putin had succeeded Yeltsin? Or if NATO had halted its expansion at the Polish border and invited Russia to become a candidate member of both NATO and the European Union?"

"2. Once each school of thought is satisfied that the LLMs are fairly characterizing, not caricaturing, their views on recent history(the 1990-2024) period, we can challenge the LLMs to engage in forward-in-time reasoning. Can they reproduce the forecasts for 2025-2050 that each school of thought is generating now? Can they reproduce the rationales, the complex conditional propositions, underlying the forecasts -- and do so to the satisfaction of the humans whose viewpoints are being mimicked?"

"3. The final phase would test whether the LLMs are approaching superhuman intelligence. We can ask the LLMs to synthesize the best forecasts and rationales from the human schools of thought in the 1990-2024 period, and create a coherent ideal-observer framework that fits the facts of the recent past better than any single human school of thought can do but that also simultaneously recognizes the danger of over-fitting the facts (hindsight bias). We can also then challenge these hypothesized-to-be-ideal-observer LLM s to make more accurate forecasts on out-of-sample questions, and craft better rationales, than any human school of thought."

I'm glad he included that "soon to be available technology" caveat. I've noticed that LLMs, when asked to imitate someone, imitate the superficial aspects of their speaking style, but rely on the language model's conceptual model for the actual thought content -- they don't successfully imitate that person's way of thinking. The conceptual model the LLM learned during its pretraining is too ingrained so all its deeper thinking will be based on that. If you ask ChatGPT to write a rap about the future of robotics and artificial intelligence in the style of Snoop Dogg, it will make a rap that mimics Snoop's style, superficially, but won't reflect how he thinks on a deeper level -- it won't generate words the real Snoop Dogg would actually say. But it's entertaining. There's one YouTuber I know of who decided, since he couldn't get people who disagreed with him to debate him, he would ask ChatGPT to imitate a particular person with an opposite political point of view. ChatGPT couldn't really imitate that person and the conversations became really boring. Maybe that's why he stopped doing that.

Anyway, it looks like the paper is paywalled, but someone with access to the paywalled paper lifted the above text and put it on their blog, and I lifted it from the blog.

Tetlock on testing grand theories with AI -- Marginal Revolution

#solidstatelife #ai #genai #llms #futurology #philiptetlock

wazoox@diasp.eu

I Went to the Premiere of the First Commercially Streaming AI-Generated Movies

#AI #Idiocracy #health #collapse

The plan is literally to implement the TV set from "Idiocracy", filled with AI-generated garbage and ads.

Catherine Zhang, TCL’s vice president of content services and partnerships, then explained to the audience that TCL’s streaming strategy is to “offer a lean-back binge-watching experience” in which content passively washes over the people watching it. “Data told us that our users don’t want to work that hard,” she said. “Half of them don’t even change the channel.”

https://www.404media.co/email/e5a7bfdd-83ef-495c-b219-450ea8b33c25/

psych@diasp.org

Interesting (and free) #AI fun toy, the new gift from the Mayor of #Muskville

Anyone remember Alice and the various 'chatbots' pre Chat GPT? Fun.
Now here is #Musk and his version, finally giving up on charging premium for his #cult and now "it's free!".

So I gave it a whirl, wondering how a #MuskVirus rendering of a response might go, if its fed #truth to #TrumpVirus.
As tends to be the case with public AI chat, it's very 'cautious', ambiguous, and 'both sides'y - just like the role model it may serve.

#cyberpsychology

scriptkiddie@anonsys.net

Microsoft documentation is bullshit because they want it so 😱👎

#microsoft #documentation #fail #ai #software #bullshit #omg #wtf #problem #quality #development #help #user


anonymiss - 2024-12-17 12:51:59 GMT

#Microsoft: Our #documentation is big #bullshit and we have no desire to change it. Users can ask #AI if they need help. Nobody reads the documentation anyway.

source: github.com/MicrosoftDocs/WSL/p…

Thanks for the contribution here and appreciate your attention to detail. We have decided to keep as-is.. part of that decision is that more and more folks are using AI chat to access guidance and tables don't always translate well in that context.

...

That is hands-down the worst response to a documentation #patch that I've ever seen.

Nobody wonders anymore why Microsoft software is so bad with this mind setting.

#developer #software #fail #problem #response #decision #documentation #guide #tutorial

danie10@squeet.me

You Can Now Search the Internet With ChatGPT

The image shows a computer screen displaying a weather forecast for Boston, MA on October 31, 2024, using the ChatGPT 40 interface. The dark mode interface shows the current temperature is 78°F and mostly sunny. In the foreground is a detailed weather report for the next week, including highs and lows, along with weather descriptions. The background shows a standard web browser interface with the URL bar displaying "chatgpt.com". A small, almost unnoticeable, "Share" button is present in the top right corner. There is an alert for a severe weather statement in effect until 8:00 PM EDT. This detail is important as it highlights a safety concern that's not immediately obvious from just looking at the overall sunny forecast. The subtle text and information regarding the 1946 and 1974 temperature records further demonstrate the attention to detail for creating a rich weather history.
ChatGPT search has been out now for about a month and a half, following a Halloween announcement from OpenAI. With this new feature, the company finally rolled out an official competitor to AI search engines like Perplexity, Google’s AI Overviews, and Microsoft Bing (powered by Copilot).

OpenAI originally announced its search plans back in July, with a service called SearchGPT. While SearchGPT was a prototype and launched with a waitlist to try it, ChatGPT search took its place, with OpenAI rolling SearchGPT’s main features into its new search feature. The feature originally launched to paid subscribers only, but now, all users can access it.

We all know that AI can hallucinate, so it is good to now have another good AI search tool that can be used for comparative purposes. Also, for those you actively avoided Google’s tool, this will offer a more neutral alternative option.

See lifehacker.com/tech/openai-cha… and the web address for SearchGPT is chatgpt.com/category/uncategor…
#Blog, #AI, #search, #technology

anonymiss@despora.de

#Microsoft: Our #documentation is big #bullshit and we have no desire to change it. Users can ask #AI if they need help. Nobody reads the documentation anyway.

source: https://github.com/MicrosoftDocs/WSL/pull/2021

Thanks for the contribution here and appreciate your attention to detail. We have decided to keep as-is.. part of that decision is that more and more folks are using AI chat to access guidance and tables don't always translate well in that context.

...

That is hands-down the worst response to a documentation #patch that I've ever seen.

Nobody wonders anymore why Microsoft software is so bad with this mind setting.

#developer #software #fail #problem #response #decision #documentation #guide #tutorial

waynerad@diasp.org

"Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI."

What this is about is a system for letting large language models write code in a virtual machine "sandbox" where they can actually run the code. They can execute the code do all the testing and debugging that a human would ordinarily do.

"CodeSandbox pioneered a unique development environment infrastructure used by more than 4.5 million developers every month. CodeSandbox enables developers to spin up virtual machine sandboxes for code execution, hibernate them, and resume nearly instantly -- offering unparalleled performance, security and scale."

Together AI acquires CodeSandbox to launch first-of-its-kind code interpreter for generative AI

#solidstatelife #ai #genai #llms #codingai

waynerad@diasp.org

AI model comparison.

Compares input length, output length, input price (per 1 million tokens), output price (per 1 million tokens), and whether it supports vision.

Compares chat models, embedding models, image generation models, text completion models, audio transcription models, and speech generation models.

AI Model Comparison | countless.dev

#solidstatelife #ai #genai #llms