#openai

waynerad@diasp.org

Reaction video to OpenAI Sora, OpenAI's system for generating video from text.

I encountered the reaction video first, in fact I discovered Sora exists from seeing the reaction video, but see below for the official announcement from OpenAI.

It's actually kind of interesting and amusing comparing the guesses in the reaction videos about how the system works from the way it actually works. People are guessing based on their knowledge of traditional computer graphics and 3D modeling. However...

The way Sora works is quite fascinating. We don't know the knitty-gritty details but OpenAI has described the system at a high level.

Basically it combines ideas from their image generation and large language model systems.

Their image generation systems, DALL-E 2 and DALL-E 3, are diffusion models. Their large language models, GPT-2, GPT-3, GPT-4, GPT-4-Vision, etc, are transformer models. (In fact "GPT" stands for "generative pretrained transformer").

I haven't seen diffusion and transformer models combined before.

Diffusion models work by having a set of parameters in what they call "latent space" that describe the "meaning" of the image. The word "latent" is another way of saying "hidden". The "latent space" parameters are "hidden" inside the model but they are created in such a way that the images and text descriptions are correlated, which is what makes it possible to type in a text prompt and get an image out. I've elsewhere given high-level hand-wavey descriptions of how the latent space parameters are turned into images through the diffusion process, and how the text and images are correlated (a training method called CLIP), so I won't repeat that here.

Large language models, on the other hand, work by turning words and word pieces into "tokens". The "tokens" are vectors constructed in such a way that the numerical values in the vectors are related to the underlying meaning of the words.

To make a model that combines both of these ideas, they figured out a way of doing something analogous to "tokens" but for video. They call their video "tokens" "patches". So Sora works with visual "patches".

One way to think of "patches" is as video compression both spatially and temporally. Unlike a video compression algorithm such as mpeg that does this using pre-determined mathematical formulas (discrete Fourier transforms and such), in this system the "compression" process is learned and is all made of neural networks.

So with a large language model, you type in text and it outputs tokens which represent text, which are decoded to text for you. With Sora, you type in text and it outputs tokens, except here the tokens represent visual "patches", and the decoder turns the visual "patches" into pixels for you to view.

Because the "compression" works both ways, in addition to "decoding" patches to get pixels, you can also input pixels and "encode" them into patches. This enables Sora to input video and perform a wide range of video editing tasks. It can create perfectly looping video, it can animate static images (why no Mona Lisa examples, though?), it can extend videos, either forward or backward in time. Sora can gradually interpolate between two input videos, creating seamless transitions between videos with entirely different subjects and scene compositions. I found these to be the most freakishly fascinating examples on their page of sample videos.

They list the following "emerging simulation capabilities":

"3D consistency." "Sora can generate videos with dynamic camera motion. As the camera shifts and rotates, people and scene elements move consistently through three-dimensional space."

This is where they have the scene everyone is reacting to in the reaction videos, where the couple is walking down the street in Japan with the cherry blossoms.

By the way, I was wondering what kind of name is "Sora" so I looked it up on behindthename.com. It says there are two Japanese kanji characters both pronounced "sora" and both of which mean "sky".

"Long-range coherence and object permanence." "For example, our model can persist people, animals and objects even when they are occluded or leave the frame. Likewise, it can generate multiple shots of the same character in a single sample, maintaining their appearance throughout the video."

"Interacting with the world." "Sora can sometimes simulate actions that affect the state of the world in simple ways. For example, a painter can leave new strokes along a canvas that persist over time, or a man can eat a burger and leave bite marks."

"Simulating digital worlds." "Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity."

However they say, "Sora currently exhibits numerous limitations as a simulator." "For example, it does not accurately model the physics of many basic interactions, like glass shattering."

This is incredible - ThePrimeTime

#solidstatelife #ai #genai #diffusionmodels #gpt #llms #computervision #videogeneration #openai

waynerad@diasp.org

GitHub Copilot causes code churn? The term "code churn" is a fancy way of saying Copilot writes crappy code. Copilot writes crappy code, developers fail to notice it (at first), check it in, then discover it's crappy (within 2 weeks -- that's the arbitrary time window chosen for the study), causing them to go in and fix it, thus causing the code to "churn", get it?

Copilot Causes Code Churn? This Study Is Concerning... Theo - t3․gg

#solidstatelife #ai #genai #llms #openai #copilot #developers

anonymiss@despora.de

The Dawn of the AI-Military Complex

source: https://goodinternet.substack.com/p/the-dawn-of-the-ai-military-complex

Two weeks ago, #OpenAI deleted it's ban on using #ChatGPT for "Military and Warfare" and revealed, that it's working with the military on "cybersecurity tools". It's clear to me that the darlings of generative AI want in on the wargames, and i'm very confident they are not the only ones. With ever more international conflicts turning hot, from Israels war on Hamas after the massacre on 7th october to Russias invasion of Ukraine to local conflicts like the Houthis attacking US trade ships with drones and the US retaliating, plus the competetive pressure from China, who surely have their own versions of AI-powered automated weapon systems in place, i absolutely think that automatic war pipelines are in high demand from many many international players with very very deep pockets, and #SiliconValley seems more than eager to exploit.

#wargame #war #terror #military #ai #news #complex #politics #economy #conflict

waynerad@diasp.org

Sam Altman says over the next 2 years, they expect to roll out "multimodality", meaning speech in, speech out, with video, great increases in their models' reasoning ability, improvements in reliability, customizability, personalizability, and the ability to use your own data.

In the long term, they expect to be able to combine their models with language and vision and adapt them for robotics.

Sam Altman just revealed key details about GPT-5... (GPT-5 robot, AGI + More)

#solidstatelife #ai #openai

anonymiss@despora.de

Reliability #Check: An #Analysis of #GPT-3's Response to Sensitive Topics and Prompt Wording

source: https://arxiv.org/abs/2306.06199

Large language models (LLMs) have become mainstream technology with their versatile use cases and impressive performance. Despite the countless out-of-the-box applications, LLMs are still not reliable. A lot of work is being done to improve the factual accuracy, consistency, and ethical standards of these models through fine-tuning, prompting, and Reinforcement Learning with Human Feedback (RLHF), but no systematic analysis of the responses of these models to different categories of statements, or on their potential vulnerabilities to simple prompting changes is available.

#problem #truth #reality #llm #technology #ai #openAI #chatgpt #science #software

anonymiss@despora.de

Report that ``a stranger obtained my #email address from a large-scale language model installed in #ChatGPT

source: https://gigazine.net/gsc_news/en/20231225-chatgpt-model-delivered-email-personal-information

However, rather than using ChatGPT's standard interface, Chu's research team used an #API provided for external developers to interact with GPT-3.5 Turbo and fine-tune the model for professional use. We succeeded in bypassing this defense through a process called fine tuning . Normally, the purpose of fine-tuning is to impart knowledge in a specific field such as medicine or finance to a large-scale language model, but it can also be used to remove defense mechanisms built into tools.

#security #privacy #ai #technology #problem #news #openAI #exploit

aljazeera@squeet.me

2023 in Review: The human cost of ChatGPT | The Take

As the year wraps up, we're looking back at ten of the episodes that defined our year at The Take. This originally aired on February 1.ChatGPT is taking the ...#aljazeeralive #aljazeeraenglish #aljazeera #AlJazeeraEnglish #AlJazeera #aljazeeraEnglish #aljazeeralive #aljazeeralivenews #aljazeeralatest #latestnews #newsheadlines #aljazeeravideo #chatgpt #openai #AI #artificialintelligence #technology #thetake #podcast
2023 in Review: The human cost of ChatGPT | The Take