#computervision

waynerad@diasp.org

"How Kpopalypse determines the use of AI-generated imagery in k-pop music videos."

"Hyuna sorry I mean IU's 'Holssi' has a video which is mainly not AI, but the floating people certainly are AI."

"The dog/wolf/whatever the fuck that is at the start of Kiss Of Life's 'Get Loud', that's AI-generated for sure -- no, not CGI."

"There's lots of floaty AI-generated crap in Odd Youth's 'Best Friendz' video, like random bubbles, confetti, and... people having accidents, how aegyo, much heart shape."

"There's also a technique in AI image generation that I like to call 'detail spam'. Watch the sequence of images in Achii's 'Fly' video from 2:30 to 2:36. This is all AI-generation at work."

"Same again with Jay 'where's my soju' Park and 'Gimme A Minute (to type in this prompt for exploding cars)'."

"XG use AI in their imagery all the time. For an example, check out the 'Princess Mononoke'-inspired foot imagery at 1:20 in the video [to "Howling"]."

"Speaing of all things environment, I'll leave you with environmental expert Chuu's 'Strawberry Rush' which is almost certainly using a fair bit of AI-generated imagery for all the more boilerplate-looking background cartoon shit."

How Kpopalypse determines the use of AI-generated imagery in k-pop music videos

#solidstatelife #computervision #diffusionmodels #aidetection

waynerad@diasp.org

"The open source project DeFlock is mapping license plate surveillance cameras all over the world."

"On his drive to move from Washington state to Huntsville, Alabama, Will Freeman began noticing lots of cameras."

"Once I started getting into the South, I saw a ton of these black poles with a creepy looking camera and a solar panel on top. I took a picture of it and ran it through Google, and it brought me to the Flock website. And then I knew like, 'Oh, that's a license plate reader.' I started seeing them all over the place and realized that they were for the police."

"Flock is one of the largest vendors of automated license plate readers (ALPRs) in the country. The company markets itself as having the goal to fully 'eliminate crime' with the use of ALPRs and other connected surveillance cameras."

"And so he made a map, and called it DeFlock. DeFlock runs on Open Street Map, an open source, editable mapping software."

The open source project DeFlock is mapping license plate surveillance cameras all over the world

#solidstatelife #ai #computervision #alprs #surveillance

waynerad@diasp.org

Generative AI is being added to Notepad.exe. No, I'm not making this up.

"With this update, we are introducing the ability to rewrite content in Notepad with the help of generative AI. You can rephrase sentences, adjust the tone, and modify the length of your content based on your preferences to refine your text."

And MS Paint.

New AI experiences for Paint and Notepad begin rolling out to Windows Insiders

#solidstatelife #ai #genai #llms #computervision

waynerad@diasp.org

"Inside 'Project Rodeo,' the Tesla effort pushing the limits of self-driving technology."

"Operating on open streets with other vehicles, cyclists, and pedestrians, test drivers on Project Rodeo have tested unreleased software that will be crucial to Tesla's push into autonomous driving."

"Test drivers said they sometimes navigated perilous scenarios, particularly those drivers on Project Rodeo's 'critical intervention' team, who say they're trained to wait as long as possible before taking over the car's controls. Tesla engineers say there's a reason for this: The longer the car continues to drive itself, the more data they have to work with."

Inside 'Project Rodeo,' the Tesla effort pushing the limits of self-driving technology

#solidstatelife #ai #computervision #autonomousvehicles #tesla

waynerad@diasp.org

In a follow-up to the neural network GameNGen version of Doom, we now have the DIAMOND diffusion world version of Counter-Strike: Global Offensive. "DIAMOND" stands for "Diffusion As a Model Of eNvironment Dreams". Trained on vastly less training data than Doom, but, only 10 frames per second. Like Doom, it isn't just generating video that looks like the game, it's actually playable. But it can do some weird things (see below).

Counter-Strike's Dust II runs purely within a neural network on an RTX 3090 -- performance is disappointing at only 10 FPS

#solidstatelife #ai #genai #computervision

waynerad@diasp.org

A "prompt testing exercise" in India "involving Meta AI, Gemini, Co-pilot and Adobe Firefly" showed "Meta AI's text-to-image generation feature is being weaponised to create harmful AI images targeting Muslims in India by reinforcing negative stereotypes."

I'm reminded that we don't have the "real" LLMs. All the companies have LLMs that can do things that the public LLMs we have access to can't, because of the "guardrail adding" effort. So any time you say, "LLMs can't do X", well, maybe they can, just not the public ones you're using.

Exclusive: Meta AI's text-to-image feature weaponised in India to generate harmful imagery

#solidstatelife #ai #genai #llms #computervision

waynerad@diasp.org

"Introducing the AI-Powered Electronic Component Classifier: The ultimate in intelligent component management."

Possibly useful for those of you who work a lot with electronic components.

"The Electronic Component Classifier is a project that uses machine learning and artificial intelligence to automate the identification and classification of electrical and electronic components."

"Features: Component classification: Resistors, capacitors, LEDs, transistors, potentiometers, diodes, and integrated circuits are the seven classes into which electronic and electrical components can be simply categorised, using multilayer categorization. Further details: With only one click, you may find out more details about integrated Circuits, transistors, and capacitors. User-friendly design: The interface is simple to use and navigate, thanks to its clear headings, buttons, and text boxes."

Introducing the AI-Powered Electronic Component Classifier: The Ultimate in Intelligent Component Management

#solidstatelife #ai #computervision #electronics

waynerad@diasp.org

Diffusion Illusions: Flip illusions, rotation overlays, twisting squares, hidden overlays, Parker puzzles...

If you've never heard of "Parker puzzles", Matt Parker, the math YouTuber, asked this research team to make him a jigsaw puzzle with two solutions: one is a teacup, and the other is a doughnut.

The system they made starts with diffusion models, which are the models you use when you type a text prompt in and it generates the image for you. Napoleon as a cat or unicorn astronauts or whatever.

What if you could generate two images at once that are mathematically related somehow?

That's what the Diffusion Illusions system does. Actually it can even do more than two images.

First I must admit, the system uses an image parameterization system called Fourier Features Networks, and I clicked through to the research paper for Fourier features Networks, but I couldn't understand it. The "Fourier" part suggests sines and cosines, and yes, there's sine and cosine math in there, but there's also "bra-ket" notion, like you normally see in quantum physics, with partial differential equations in the bra-ket notation, and such. So, I don't understand how Fourier Features works.

There's a video of a short talk from SIGGRAPH, and in it (at about 4:30 in), they claim that diffusion models, all by themselves, have "adversarial artifacts" that Fourier Features fixes. I have no idea why diffusion models on their own would have any kind of "adversarial artifacts" problems. So obviously if I have no idea what might cause the problems, I have no idea why Fourier Features might fix them.

Ok, with that out of the way, the way the system works is there are the output images that the system generates, which they call "prime" images. The fact that they give them a name implies there's an additional type of image in the system, and there is. They call these other images the "dream target" images. Central to the whole thing is the "arrangement process" formulation. The only requirement of the "arrangement process" function is that it is differentiable, so deep learning methods can be applied to it. It is this "arrangement process" that decides whether you're generating flip illusions, rotation overlay illusions, hidden overlay illusions, twisting squares illusions, Parker puzzles, or something else -- you could define your own.

After this, it runs two training processes concurrently. The first is the standard way diffusion illusions are trained. This calculates an "error", also called a loss, from the target text conditioning, which is called the score distillation loss.

Apparently, however, circumstances exist where it is not trivial for prime images to follow the gradients from the Score Distillation Loss to give you images that create the illusion you are asking for. To get the system unstuck, they added the "dream target loss" training system. The "dream target" images are images made from your text prompts individually. So, let's say you want to make a flip illusion that is a penguin viewed one way and a giraffe when flipped upside down. In this instance, the system will take the "penguin" prompt and create an image from it, and take the "giraffe" prompt and create a separate image for it, and flip it upside down. These become the "dream target" images.

The system then computes a loss on the prime images and "dream target" images, as well as the original score distillation loss. If the system has any trouble converging on the "dream target" images, new "dream target" images are generated from the same original text prompts.

In this way, the system creates visual illusions. You can even print the images and turn them into real-life puzzles. For some illusions, you print on transparent plastic and overlap the images using an overhead projector.

Diffusion Illusions

#solidstatelife #ai #computervision #genai #diffusionmodels

waynerad@diasp.org

This looks like the video game Doom, but it is actually the output of a diffusion model.

Not only that, but the idea here isn't just to generate video that looks indistinguishable from Doom gameplay, but to create a "game engine" that actually lets you play the game. In fact this diffusion model "game engine" is called "GameNGen", which you pronounce "game engine".

To do this, they actually made two neural networks. The first is a reinforcement learning agent that plays the actual game Doom. As it does so, its output gets ferried over to the second neural network as "training data". In this manner, the first neural network creates unlimited training data for the second neural network.

The second neural network is the actual diffusion model. They started with Stable Diffusion 1.4, a diffusion model "conditioned on" text, which is what enables it to generate images when you input text. They ripped out the "text" stuff, and replaced it with conditioning on "actions", which are the buttons and mouse movements you make to play the game, and previous frames.

Inside the diffusion model, it creates "latent state" that represents the state of the game -- sort of. That's the idea, but it doesn't actually do a good job of it. It does a good job of remembering state that is actually represented on the screen (health, ammo, available weapons, etc), because it's fed the previous 3 frames of video every time step to generate the next frame of video, but not so good at remembering anything that goes off the screen. Oh, probably should mention, this diffusion model runs fast enough to generate images at "real time" video frame rates.

Because it doesn't use the actual Doom game engine state code -- or otherwise represent the game state with conventional code -- but represents state inside the neural network, but does so imperfectly for stuff that goes off the screen, when humans play this game, it seems like real Doom for short time periods, but when played over any extended length of time, humans can tell it's not real Doom.

GameNGen - Michael Kan

#solidstatelife #ai #genai #computervision #diffusionmodels #videogames #doom

waynerad@diasp.org

"In the 1930s, Disney invented the multiplane camera and was the first to create sound-synchronized, full color cartoons -- eventually leading to the groundbreaking animated film Snow White and the Seven Dwarfs."

"Marvel and DC Comics rose to prominence in the 1940s, dubbed the 'golden age of comics,' enabled by the mass availability of the 4-color rotary letterpress and offset lithography for printing comics at scale."

"Similarly, Pixar was uniquely positioned in the 1980s to leverage a new technology platform -- computers and 3D graphics."

we believe the Pixar of the next century won't emerge through traditional film or animation, but rather through interactive video. This new storytelling format will blur the line between video games and television/film -- fusing deep storytelling with viewer agency and 'play,' opening up a vast new market."wo

So says Jonathan Lai of Andreessen Horowitz, the Silicon Valley investment firm.

"The promise of interactive video lies in blending the accessibility and narrative depth of TV/film, with the dynamic, player-driven systems of video games."

"The biggest remaining technical hurdle for interactive video is reaching frame generation speeds fast enough for content generation on the fly. Dream Machine currently generates ~1 frame per second. The minimum acceptable target for games to ship on modern consoles is a stable 30 FPS, with 60 FPS being the gold standard. With the help of advancements such as Pyramid Attention Broadcast (PAB), this could go up to 10-20 FPS on certain video types, but is still not quite fast enough."

("By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular diffusion transformer-based video generation models including Open-Sora, Open-Sora-Plan, and Latte.")

"Given the rate at which we've seen underlying hardware and model improvements, we estimate that we may be ~2 years out from commercially viable, fully generative interactive video."

"In February 2024, Google DeepMind released its own foundation model for end-to-end interactive video named Genie. The novel approach to Genie is its latent action model, which infers a hidden action in between a pair of video frames."

"We've seen teams incorporate video elements inside AI-native game engines." "Latens by Ilumine is building a 'lucid dream simulator' where users generate frames in real-time as they walk through a dream landscape." "Developers in the open-source community Deforum are creating real-world installations with immersive, interactive video. Dynamic is working on a simulation engine where users can control robots in first person using fully generated video." "Fable Studio is building Showrunner, an AI streaming service that enables fans to remix their own versions of popular shows." "The Alterverse built a D&D inspired interactive video RPG where the community decides what happens next. Late Night Labs is a new A-list film studio integrating AI into the creative process. Odyssey is building a visual storytelling platform powered by 4 generative models." "Series AI has developed Rho Engine, an end-to-end platform for AI game creation." "We're also seeing AI creation suites from Rosebud AI, Astrocade, and Videogame AI enable folks new to coding or art to quickly get started making interactive experiences."

"Who will build the Interactive Pixar?"

The Next Generation Pixar: How AI will Merge Film & Games

#solidstatelife #ai #genai #computervision #videoai #startups

waynerad@diasp.org

Somebody made an AI watermark remover. Not a remover for the new watermarking systems that are supposed to invisibly "watermark" images created by AI so it's possible to tell they're generated by AI -- nobody is using those -- yet -- no, we're talking about a system to remove old-fashioned regular watermarks on images.

Is this actually a good idea? Seems like people watermark images to keep people from bypassing their licensing terms.

It looks like this comes from China. So maybe this is something someone in China wants. (Languages available are English, Chinese, Spanish, Portuguese, Russian, and Bahasa Indonesia.)

Watermark Remover

#solidstatelife #ai #genai #computervision

waynerad@diasp.org

Drones and AI transforming war.

Electromagnetic warfare "has turned the drone war into a game of cat and mouse." "One side develops a signal jammer capable of interfering with the frequency used by the other side's drones, so the other side develops drones that communicate using a different frequency, then the first side adapts their electronic warfare capabilities, and so on."

"But there's an obvious solution." "These red boxes represent the first days of a new epoch of warfare. That's because this drone, developed by startup Ukrainian company Saker, is autonomously identifying targets." "While all indications suggest that there's not yet wide-scale use of AI drones in Ukraine, Saker's scrappy autonomous drones have reportedly already destroyed Russian targets in autonomous mode, meaning the era of AI warfare has quietly begun."

He goes on to predict not just autonomous drones, but autonomous drone swarms, will be the future of warfare.

The terrifying efficiency of drone warfare - Wendover Productions

#solidstatelife #ai #robotics #computervision

waynerad@diasp.org

Anyone got a video to look for?

Reverse Video Search can not only find a specific video if you're looking for something specific, it can find similar video that's not the specific video you started with. Like you have a video of a scene from a movie and you want to find what movie it came from or the full scene, it can find that, or if you just want similar videos that are not the same exact scene and not from that movie or anything, it can find that.

In this demo video he uses a Jupyter notebook.

Apparently the way the system works is, it breaks the video into chunks, for example 5-frame chunks, then calculates an "embedding" vector for each chunk, and uses those vectors to look for similar vectors in the database it is searching against. So it's basically vector search for video.

Reverse Video Search: Find ANY Video Clip Instantly - Mixpeek

#solidstatelife #ai #computervision