#computervision

waynerad@diasp.org

GPT-4 easily solves CAPTCHA.

I got to try GPT-4's multimodal capabilities and it's quite impressive! A quick thread of examples... - Tanishq Mathew Abraham (@iScienceLuvr)

#solidstatelife #ai #nlp #llms #computervision #chatgpt #captcha

waynerad@diasp.org

Zork + Imogen. What does the world of Zork, the 1981 video game, look like? A Google team uses Imogen to create a visual version of the game. They don't simply pipe the text into Imogen; instead, they modify the game to reveal internal game state that the game knows but isn't included in every text output, which helps maintain continuity during the interaction with the player.

AdventurImagen - Zork meets Google's Imagen generative imagery - Matt Walsh

#solidstatelife #ai #generativemodels #computervision #imogen

waynerad@diasp.org

Facial recognition software flagged a mother going to see a Rockettes show with her daughter and a pack of Girl Scouts. Evidently the face recognition system was programmed to flag everyone working for law firms suing the entertainment company running the show, and she worked for a law firm that was representing a client suing a restaurant that was owned by the same entertainment company that was putting on the show.

"Kelly Conlon, a senior associate with the New Jersey personal injury firm Davis, Saperstein and Salomon -- which is representing a client suing a restaurant owned by the parent company, MSG Entertainment -- told NBC New York that security guards approached her and asked for identification as soon as she arrived at Radio City Music Hall on the weekend after Thanksgiving. The guards ultimately turned her away from the show even though she is not involved in her firm's litigation against the company. Conlon's daughter and the rest of the Girl Scouts were able to attend the performance, she told the station."

Girl Scout mom kicked out of Radio City and barred from seeing Rockettes after facial recognition tech identified her

#solidstatelife #ai #computervision #facerecognition

waynerad@diasp.org

Automatic Observer in StarCraft. The idea here is to replace the humans who control the in-game "cameras" for esports audiences with an automated system. Rule-based automated systems already exist, but this system uses (Mask R-CNN) to turn video of the gameplay into objects, and this is then fed into a supervised learning system that uses camera control from humans as its training data. So the idea is it makes an automated system that controls the in-game cameras like humans.

In this example video from StarCraft, it did seem to me like the Automatic Observer did a better job of "anticipating" action and showing it coming, for example by showing attackers marching towards a defended position before they get there, while the rule-based system would jerk you over to the action after it had already started.

Automatic Observer using Human Observing Data in StarCraft

#solidstatelife #ai #computervision #esports

waynerad@diasp.org

AI model to populate virtual worlds with 3D objects, such as vehicles, furniture, animals, and so on, and characters. Basically what it does is take 2D art and turn it into 3D objects. In fact the model is called GET3D. Outputs triangles and meshes and high-fidelity textures. You can also use text to further modify the objects. Can generate unlimited random variations of an object.

NVIDIA GET3D: AI model to populate virtual worlds with 3D objects and characters - NVIDIA Developer

#solidstatelife #ai #computervision #generativemodels #videogames #nvidia

waynerad@diasp.org

"'Salt' resembles many science-fiction films from the '70s and early '80s, complete with 35mm footage of space freighters and moody alien landscapes. But while it looks like a throwback, the way it was created points to what could be a new frontier for making movies."

"Fabian Stelzer creates images with image-generation tools such as Stable Diffusion, Midjourney and DALL-E 2. He makes voices mostly using AI voice generation tools such as Synthesia or Murf. And he uses GPT-3, a text-generator, to help with the script writing."

"There's an element of audience participation, too."

This guy is using AI to make a movie -- and you can help decide what happens next | CNN Business

#solidstatelife #ai #nlp #computervision #generativemodels #filmmaking

waynerad@diasp.org

Take 4 images of a subject, give it to this new Google AI, and it can change the background behind your little doggy... and make your little doggy swimming, sleeping, in a bucket, and getting a haircut. Similarly, if you have a pair of stylish sunglasses, you can ask a bear to wear it, make a cool product photo, or put it in front of the Eiffel Tower. Put your favorite teapot into different contexts, see it in use, or see what it would look like if it was transparent. Create art renditions of your test subject from legendary artists of the past.

Google's new AI: dog goes in, statue comes out! - Two Minute Papers

#solidstatelife #ai #computervision #generativemodels #aiart

waynerad@diasp.org

"The difference between human-drawn bad bicycles and AI-generated photorealistic 5-6 legged horses is important and insightful. Humans are largely unable to reproduce the visual likeness of something. But they know what the parts are (2 wheels + 2 pedals + handbar + saddle). On the other hand, a deep learning model is excellent at reproducing local visual likeness (what it's fitted on), yet it has no understanding of the parts & their organization.

"A 5-year old that draws disproportionate stick figures will still draw horses with 4 legs and 1 head and 2 eyes."

"This is the difference between discrete and continuous world models. Between a graph and a differentiable curve."

The difference between human-drawn bad bicycles and AI-generated photorealistic 5-6 legged horses

#solidstatelife #ai #computervision #generativemodels

waynerad@diasp.org

"The face search engine PimEyes heralds the end of anonymity in public spaces. All it takes is a photo from a cell phone or security camera, and PimEyes provides links to similar or identical faces on the web -- all for a monthly fee. The linked pages can then reveal the name, profession or further personal details about a person."

"After critical reports from netzpolitik.org in 2020, the once Polish company fled first to the Seychelles, then to Belize -- and did no longer respond to any press inquiries. Politicians from Germany and the EU have sharply criticized the search engine. A local German data protection commissioner has initiated proceedings and, according to his own statement, is still waiting for a response from PimEyes."

"Now PimEyes has a new owner, who is a 34-year-old security scholar from Georgia. In an interview with netzpolitik.org, Giorgi Gobronidze explains why he, of all people, bought the search engine -- and what he intends to do to make PimEyes less attractive to stalkers."

"The Russian invasion of Georgia was a main driver for me to study security studies. I started to learn more about technology because the Russian army carried out a massive cyber attack which closed down the whole country for almost three days. Later, I also started to study how artificial intelligence based solutions affect not only the security landscape, but our everyday life."

PimEyes-CEO: The user is the stalker, not the search engine

#solidstatelife #ai #computervision #facerecognition

waynerad@diasp.org

OpenArt.AI is another AI art library and search engine. I tried my "driver's license" search from last week that Lexica.art excelled at but Libraire.ai totally got tripped up over. OpenArt.AI also flopped, but I got a bunny driving a school bus and a couple of images of dogs driving sports cars, so at least the result was entertaining.

But who needs a search engine when you can just generate images yourself?

OpenArt.AI

#solidstatelife #ai #computervision #generativemodels #aiart