#computervision

waynerad@diasp.org

"OpenRecall is a fully open-source, privacy-first alternative to proprietary solutions like Microsoft's Windows Recall or Limitless' Rewind.ai. With OpenRecall, you can easily access your digital history, enhancing your memory and productivity without compromising your privacy."

"OpenRecall captures your digital history through regularly taken snapshots, which are essentially screenshots. The text and images within these screenshots are analyzed and made searchable, allowing you to quickly find specific information by typing relevant keywords into OpenRecall. You can also manually scroll back through your history to revisit past activities."

openrecall / openrecall

#solidstatelife #ai #computervision

waynerad@diasp.org

"Opinion: It's time for the Biden Campaign to embrace AI"

"By Kaivan Shroff, Guest Writer"

"The stakes of the 2024 presidential election cannot be overstated. With Donald Trump promising to act as a dictator 'on day one,' it is not hyperbolic to say the future of American democracy hangs in the balance. Against this backdrop, the Biden campaign faces a critical challenge: conveying a strong and effective image of President Joe Biden to a population and media ecosystem increasingly focused on optics over substance. Given the president's concerning performance last week, it's time for the Biden campaign to consider leveraging artificial intelligence (AI) to effectively reach the voting public."

"Reasonably, some may challenge the use of AI as dishonest and deceptive, but the current information ecosystem is arguably no better." "We must ask the question, are augmented AI videos that present Biden in his best form -- while sharing honest and accurate information -- really more socially damaging than our information ecosystem's current realities?"

"AI-generated content can be tailored to highlight President Biden's accomplishments, clearly articulate his policies, and present a consistent, compelling message. In an era where visual mediums and quick, digestible content dominate public perceptions, AI offers an opportunity for more effective communication. These AI-enhanced videos could ensure that the public does not make decisions about the future of our democracy based on an inconveniently timed cough, stray stutter, or healthy but hobbled walk (Biden suffers from a 'stiff gait')."

"The use of AI renderings in political campaigns is becoming increasingly common, and the Republican Party has already embraced this technology and is using AI in their attack ads against the president. Instead of a race to the bottom, the Biden campaign could consider an ethical way to deploy the same tools."

Opinion: It's time for the Biden Campaign to embrace AI | HuffPost Opinion

#solidstatelife #ai #genai #llms #computervision #deepfakes #domesticpolitics

waynerad@diasp.org

"Kosmos-2.5 is a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format."

"This unified multimodal literate capability is achieved through a shared decoder-only auto-regressive Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images."

Kosmos-2.5: A multimodal literate model

#solidstatelife #ai #genai #llms #computervision #multimodal

waynerad@diasp.org

Hunyuan-DiT is an image generator that generates art with "Chinese elements" using Chinese prompts. It's an open-source model created by Chinese giant TenCent. It's a diffusion model, and diffusion models are trained from text with "contrastive" learning. Hunyuan-DiT was started using an English dataset, and then was "fine-tuned" from there with a Chinese image and Chinese text dataset. Because of this, even though it is optimized to generating Chinese images from Chinese text, it is still capable of generating images from English text. It knows Chinese places, Chinese painting styles, Chinese food, Chinese dragons, traditional Chinese attire, and so on. It looks like if you ask it to generate images of people, it will generate images of Chinese people unless you ask otherwise.

Hunyuan-DiT: A powerful multi-resolution diffusion transformer with fine-grained Chinese understanding

#solidstatelife #ai #genai #computervision #diffusionmodels

waynerad@diasp.org

"Avian eye-inspired perovskite artificial vision system for foveated and multispectral imaging".

The paper is paywalled, but from the abstract what I'm able to figure out is these researchers have developed an, essentially, digital camera, except unlike a regular digital camera which produces pictures with a uniform density of pixels everywhere, this camera has a ton of extra pixels in the center of the image, giving it a "fovea" like birds of prey. Also, like bird eyes, this "foveated" digital camera can see ultraviolet light, in addition to regular visible light.

If any of you can get around the paywall it would be interesting to find out if the reason it's made with perovskites is to provide this additional ultraviolet detection. The say it's construction is a "vertically stacked perovskite photodetector". This may mean it's not a charge-coupled device (CCD) like regular digital cameras.

This is obviously a research product so no word yet on commercialization, but it seems obvious to me that this has immediate application in military drones.

Avian eye-inspired perovskite artificial vision system for foveated and multispectral imaging

#solidstatelife #computervision #photodetectors #perovskites

waynerad@diasp.org

ToonCrafter: Generative Cartoon Interpolation.

Check out the numerous examples. This looks like something that could really help human animators make cartoons faster without losing their own hand-drawn animation style.

The way the system works is you input two frames, and ask the system to interpolate all the frames in between. You can optionally further augment the input with a sketch.

The way the system works is by using a diffusion model for video generation called DynamiCrafter. The DynamiCrafter has an internal "latent representation" that encodes something of the meaning of the frames that it uses to generate the video frames.

This system, ToonCrafter, uses the first and last frames to work backward to the "latent representations", then interpolates the "latent representations" to get the intermediate frames.

Because DynamiCrafter was trained on live-action video, and there's a huge gap in visual style between live-action and cartoons, such as exaggerated expressions and simplified textures, they had to take pains to "fine tune" the system with a lot of additional training on a high-quality cartoon dataset they constructed themselves.

It addition to the DynamiCrafter video generator, the also added a "detail-injecting" 3D decoder. This is an additional complex part of the system, with multiple 3D residual network layers and upsampling layers.

ToonCrafter: Generative Cartoon Interpolation

#solidstatelife #ai #genai #computervision #diffusionmodels #animation

waynerad@diasp.org

Windrecorder is a "personal memory search engine".

"Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics."

Sounds like Microsoft Recall, only open source.

I haven't said anything about Microsoft Recall. I'm guessing you all've heard about it. If you haven't, it is a new AI feature that screenshots everything you do, and enable you to ask the AI questions and it can answer by comprehending the screenshots. It seems privacy concerns encourage switching to Linux.

Windrecorder | Personal Memory Search Engine

#solidstatelife #ai #computervision #genai

waynerad@diasp.org

"SignWave: An easy-to-use program that transcribes text or audio files into a sign language animation."

"Given how much society has advanced technologically, the fact that there still isn't enough attention given to making communication more accessible for the deaf community is inexcusable. One of our teammates spoke of his first-hand experience with this issue, as his grandfather is a deaf individual who communicates primarily through sign language and visual cues. That's when we had the idea of automating translation to sign language, similar to closed captions on videos. As a result, we have created SignWave, an accessible and convenient translator from English to American Sign Language (ASL)."

Uses OpenAI's Whisper to convert speech to text.

tan-ad / SignWave

#solidstatelife #ai #computervision #signlanguage #asl

waynerad@diasp.org

"Piecing together the secrets of the Stasi".

"In the weeks before the Wall fell, Stasi agents destroyed as many documents as they could. Many were pulped, shredded, or burned, and lost forever. But between forty and fifty-five million pages were just torn up, and later stuffed in paper sacks."

"There were reports on television about a small team manually reconstructing the files. So I thought, This is a very interesting field for machine vision." "At the time, Bertram Nickolay, a Berlin-based engineer and expert in machine vision, was a lead engineer at a member institute of the Fraunhofer-Gesellschaft, the German technology giant that helped invent the MP3. With the right scanner and software, he reckoned, a computer could identify the fragments of a page and piece them together digitally."

"The reality proved more frustrating."

Piecing Together the Secrets of the Stasi | The New Yorker

#solidstatelife #ai #surveillance #computervision #eastgermany #stasi

waynerad@diasp.org

Kaspersky, the company that makes anti-virus software that some of you out there probably use (although maybe you'll rethink that after reading this), has been accused of making neural net software that's been added to Iranian drones and sent to battle in Ukraine.

So you'll see this link is to "Part 2" of a story. "Part 1" is about how a company called Albatross located in the Alabuga special economic zone, which is located in "the Republic of Tatarstan", which is not a separate country, but a state within Russia that is called a "Republic" anyway instead of "Oblast" which is the usual word for what would correspond approximately to a "state" in our country (well, assuming your country is the US, which it might not be, as there are people from everywhere here on FB, but you probably have something analogous in your country, "Province" for example), and is located -- if you've ever heard of the city of Kazan, Kazan in the capitol of Tatarstan -- ok that was a bit long for a sub-clause, where was I? Oh yeah, a company called Albatross in the Alabuga special economic zone in Tatarstan got hacked, and what the documents revealed is that this company, "Albatross" was making "motor boats", but "motor boats" was a code name for drones (and "bumpers" was the code name for the warheads they carried), and more specifically the "Dolphin 632 motor boat" was really the Iranian Shahed-136 UAV, which got renamed to the Geran-2 when procured by the Russian military.

"Part 2" which is the link here goes into the Kaspersky connection. Allegedly two people at Kaspersky previously took part in a contest, called ALB-search, to make a neural network on a drone that could find a missing person. In the military adaptation, it finds enemy soldiers. Kaspersky Lab made a subdivision called Kaspersky Neural Networks.

The article links to a presentation regarding a neural network for a drone for agriculture, with slides about assessment of crop quality, crop counting, weed detection, land inventory, and such, but it goes on to describe searching for people and animals, UAV detection (detection of other drones in its surroundings), and even traffic situation analysis.

There's also a system called Kaspersky Antidrone, which is supposed to be able to hijack, basically, control of someone else's drone within a controlled airspace.

The article alleges Kaspersky was working with Albatross not only to deploy their neural networks to Albatross drones and to use them for detection of enemy soldiers but to develop them into artillery spotters as well. This is all with an on-board neural network that runs directly on the drone.

If true, this would indicate advancement of drones in the Ukraine war, which, so far I've heard very little of neural networks running on board on drones, as well as advancement of cooperation between Russia and Iran as well as integration of civilian companies such as Kaspersky into the war effort.

This information comes from a website called InformNapalm which I haven't seen before but they say was created by some Ukrainians as a "citizen journalism" site following the Russian annexation of Crimea in 2014.

Kaspersky has denied the allegations (article on that below).

AlabugaLeaks Part 2: Kaspersky Lab and neural networks for Russian military drones

#solidstatelife #ai #computervision #uavs

waynerad@diasp.org

Facial recognition AI has come to the TSA (Transportation Security Administration).

"TSA is using facial identification to verify a passenger's identity at its security checkpoints using the US Customs and Border Protection (CBP) Traveler Verification Service (TVS), which creates a secure biometric template of a passenger's live facial image taken at the checkpoint and matches it against a gallery of templates of pre-staged photos that the passenger previously provided to the government (e.g., US Passport or Visa). Participation is optional. Passengers who have consented to participate may choose to opt-out at any time and instead go through the standard identity verification process by a Transportation Security Officer (TSO)."

TSA PreCheck(R): Touchless Identity Solution

#solidstatelife #ai #computervision #facialrecognition

waynerad@diasp.org

Company revives Alan Turing as an AI chatbot, hilarity, no, wait, outrage ensues.

The company is Genius Group, based in Singapore, which provides "AI-powered business education."

"Software engineer Grady Booch, a former Turing Talk speaker, wrote on Twitter/X: 'Absolute and complete trash. I hope that Turing's heirs sue you into oblivion.'"

"Another user told Genius Group's CEO: 'This is so incredibly unethical, disrespectful, and disgusting. You are pillaging the image of a deceased person (who frankly has suffered enough from exploitation) and the voice of an actor to suit your purposes. Vile.'"

Company revives Alan Turing as an AI chatbot, outrage ensues

#solidstatelife #ai #aieducation #llms #genai #computervision #videoai

waynerad@diasp.org

"EyeEm, the Berlin-based photo-sharing community that exited last year to Spanish company Freepik after going bankrupt, is now licensing its users' photos to train AI models. Earlier this month, the company informed users via email that it was adding a new clause to its Terms & Conditions that would grant it the rights to upload users' content to 'train, develop, and improve software, algorithms, and machine-learning models.' Users were given 30 days to opt out by removing all their content from EyeEm's platform."

AI says: All your photos are belong to us.

Photo-sharing community EyeEm will license users' photos to train AI if they don't delete them - techcrunch.com

#solidstatelife #ai #genai #computervision

waynerad@diasp.org

Vidu is a Chinese video generation AI competitive with OpenAI's Sora, according to rumor (neither is available for the public to use). It's a collaboration between Tsinghua University in Beijing and a company called Shengshu Technology.

"Vidu is capable of producing 16-second clips at 1080p resolution -- Sora by comparison can generate 60-second videos. Vidu is based on a Universal Vision Transformer (U-ViT) architecture, which the company says allows it to simulate the real physical world with multi-camera view generation. This architecture was reportedly developed by the Shengshu Technology team in September 2022 and as such would predate the diffusion transformer (DiT) architecture used by Sora."

"According to the company, Vidu can generate videos with complex scenes adhering to real-world physics, such as realistic lighting and shadows, and detailed facial expressions. The model also demonstrates a rich imagination, creating non-existent, surreal content with depth and complexity. Vidu's multi-camera capabilities allows for the generation of dynamic shots, seamlessly transitioning between long shots, close-ups, and medium shots within a single scene."

"A side-by-side comparison with Sora reveals that the generated videos are not at Sora's level of realism."

Meet Vidu, A New Chinese Text to Video AI Model - Maginative

#solidstatelife #ai #genai #computervision #videogeneration

waynerad@diasp.org

Creating sexually explicit deepfakes to become a criminal offence in the UK. If the images or videos were never intended to be shared, under the new legislation, the person will face a criminal record and unlimited fine. If the images are shared, they face jail time.

Creating sexually explicit deepfakes to become a criminal offence

#solidstatelife #ai #genai #computervision #deepfakes #aiethics

waynerad@diasp.org

"The rise of generative AI and 'deepfakes' -- or videos and pictures that use a person's image in a false way -- has led to the wide proliferation of unauthorized clips that can damage celebrities' brands and businesses."

"Talent agency WME has inked a partnership with Loti, a Seattle-based firm that specializes in software used to flag unauthorized content posted on the internet that includes clients' likenesses. The company, which has 25 employees, then quickly sends requests to online platforms to have those infringing photos and videos removed."

This company Loti has a product called "Watchtower", which watches for your likeness online.

"Loti scans over 100M images and videos per day looking for abuse or breaches of your content or likeness."

"Loti provides DMCA takedowns when it finds content that's been shared without consent."

They also have a license management product called "Connect", and a "fake news protection" program called "Certify".

"Place an unobtrusive mark on your content to let your fans know it's really you."

"Let your fans verify your content by inspecting where it came from and who really sent it."

They don't say anything about how their technology works.

Hollywood celebs are scared of deepfakes. This talent agency will use AI to fight them.

#solidstatelife #ai #genai #computervision #deepfakes #aiethics

waynerad@diasp.org

Photorealistic AI-generated talking humans. "VLOGGER" is a system for generating video to match audio of a person talking. So you can make video of any arbitrary person saying any arbitrary thing. You just supply the audio (which could itself be AI-generated) and a still image of a person (which also could itself be AI-generated).

Most of the sample videos wouldn't play for me, but the ones in the top section did and seem pretty impressive. You have to "unmute" them to hear the audio and see that the video matches the audio.

They say the system works using a 2-step approach where the first step is to take just the audio signal, and use a neural network to predict what facial expressions, gaze, gestures, pose, body language, etc, would be appropriately associated with that audio, and the second step is to combine the output of the first step with the image you provide to generate the video. Perhaps surprisingly (at least to me), both of these are done with diffusion networks. I would've expected the second step to be done with diffusion networks, but the first to be done with some sort of autoencoder network. But no, they say they used a diffusion network for that step, too.

So the first step is taking the audio signal and converting to spectrograms. In parallel with that the input image is input into a "reference pose" network that analyses it to determine what the person looks like and what pose the rest of the system has to deal with as a starting point.

These are fed into the "motion generation network". The output of this network is "residuals" that describe face and body positions. It generates one set of all these parameters for each frame that will be in the resulting video.

The result of the "motion generation network", along with the reference image and the pose of the person in the reference image is then passed to the next stage, which is the temporal diffusion network that generates the video. A "temporal diffusion" network is a diffusion network that generates images, but it has been modified so that it maintains consistency from frame to frame, hence the "temporal" word tacked on to the name. In this case, the temporal diffusion network has undergone the additional step of being trained to handle the 3D motion "residual" parameters. Unlike previous non-diffusion-based image generators that simply stretched images in accordance with motion parameters, this network incorporates the "warping" parameters into the training of the neural network itself, resulting in much more realistic renditions of human faces stretching and moving.

This neural network generates a fixed number of frames. They use a technique called "temporal outpainting" to extend the video to any number of frames. The "temporal outpainting" system re-inputs the previous frames, minus 1, and uses that to generate the next frame. In this manner they can generate a video of any length with any number of frames.

As a final step they incorporate an upscaler to increase the pixel resolution of the output.

VLOGGER: Multimodal diffusion for embodied avatar synthesis

#solidstatelife #ai #computervision #generativeai #diffusionmodels