#solidstatelife

waynerad@diasp.org

AI is simultaneously overhyped and underhyped, alleges Dagogo Altraide, aka "ColdFusion" (technology history YouTube channel). For AI, we're at the "peak of inflated expectations" stage of the Garter hype cycle.

At the same time, tech companies are doing mass layoffs of tech workers, and it's not because of overhiring during the pandemic any more, and it's not the regular business cycle -- companies with record revenues and profits are doing mass layoffs of tech workers. "The truth is slowing coming out" -- the layoffs are because of AI, but tech companies want to keep it secret.

So despite the inflated expectations, AI isn't underperforming when it comes to reducing employment.

AI deception: How tech companies are fooling us - ColdFusion

#solidstatelife #ai #technologicalunemployment

waynerad@diasp.org

Will AI combined with the mathematics of category theory enable AI systems to have powerful and accurate reasoning capabilities and ability to explain their reasoning?

While I'm kind of doubtful, I'm not well-versed in category theory to be able to have an opinion. I tried to read the research paper, but I didn't understand it. I think one needs to be well-versed on category theory before reading the paper. I have the YouTube video of a discussion with one of the researchers (Paul Lessard) below, which I actually stumbled upon first. And I also have an introductory video to category theory.

Apparently they have gotten millions of dollars in investment for a startup to bring category-theory-based AI to market, which surprises me because it seems so abstract, I would not expect VCs to understand it and become strong enough believers in it to make millions in investments. Then again, maybe VCs see their job as taking huge risks for potentially huge returns, in which case, if this technology is successful and successfully takes over the AI industry, they win big.

As best I understand, set theory, which most of us learned (a little bit of) in school, and group theory (which we didn't learn in school, most of us) are foundations of category theory, which uses them as building blocks and extends the degree of abstraction out further. Set theory has to do with "objects" being members of "sets", from which you build concepts like subsets, unions, intersections, and so on. Group theory is all about symmetry, and I have this book called "The Symmetry of Things", which is absolutely gorgeous, with pictures of symmetrical tiling on planes and spheres using translation, reflection, rotation, and so on. The first have introduces a notation you can use to represent all the symmetries, and that part I understood, but the second half abstracts all that and goes deep into group theory, and it got so abstract that I got lost and could not understand it. From what I understand, group theory is incredibly powerful, though, such that, for example, all the computer algebra systems that perform integrals of complex functions symbolically do it with group theory, not with trial-and-error or exhaustive brute-force-search or any of the ways you as a human would probably try to do it with pencil and paper and tables of integrals from the back of your calculus book. Category theory I have not even tried to study, and it is supposed to be even more abstract than that.

Anyway, I thought I would pass this along on the possibility that some of you understand it and on the possibility that it might revolutionize AI as its proponents claim. If it does you heard it from me first, eh?

Categorical deep learning: An algebraic theory of architectures

#solidstatelife #ai #categorytheory #startups

waynerad@diasp.org

David Graeber was right about BS jobs, says Max Murphy. Basically, our economy is bifurcating into two kinds of jobs: "essential" jobs that, despite being "essential", are lowly paid and unappreciated, and "BS" (I'm just going to abbreviate) jobs that are highly paid but accomplish nothing useful for anybody. The surprise, perhaps, is that these BS jobs, despite being well paid, are genuinely soul-crushing.

My question, though, is how much of this is due to technological advancement, and will the continued advancement of technology (AI etc) increase the ratio of BS jobs to essential jobs further in favor of the BS jobs?

David Graeber was right about bullsh*t jobs - Max Murphy

#solidstatelife #ai #technologicalunemployment

waynerad@diasp.org

"AI could actually help rebuild the middle class," says David Autor.

"Artificial intelligence can enable a larger set of workers equipped with necessary foundational training to perform higher-stakes decision-making tasks currently arrogated to elite experts, such as doctors, lawyers, software engineers and college professors. In essence, AI -- used well -- can assist with restoring the middle-skill, middle-class heart of the US labor market that has been hollowed out by automation and globalization."

"Prior to the Industrial Revolution, goods were handmade by skilled artisans: wagon wheels by wheelwrights; clothing by tailors; shoes by cobblers; timepieces by clockmakers; firearms by blacksmiths."

"Unlike the artisans who preceded them, however, expert judgment was not necessarily needed -- or even tolerated -- among the 'mass expert' workers populating offices and assembly lines."

"As a result, the narrow procedural content of mass expert work, with its requirement that workers follow rules but exercise little discretion, was perhaps uniquely vulnerable to technological displacement in the era that followed."

"Stemming from the innovations pioneered during World War II, the Computer Era (AKA the Information Age) ultimately extinguished much of the demand for mass expertise that the Industrial Revolution had fostered."

"Because many high-paid jobs are intensive in non-routine tasks, Polanyi's Paradox proved a major constraint on what work traditional computers could do. Managers, professionals and technical workers are regularly called upon to exercise judgment (not rules) on one-off, high-stakes cases."

Polanyi's Paradox, named for Michael Polanyi who observed in 1966, "We can know more than we can tell," is the idea that "non-routine" tasks involve "tacit knowledge" that can't be written out as procedures -- and hence coded into a computer program. But AI systems don't have to be coded explicitly and can learn this "tacit knowledge" like humans.

"Pre-AI, computing's core capability was its faultless and nearly costless execution of routine, procedural tasks."

"AI's capacity to depart from script, to improvise based on training and experience, enables it to engage in expert judgment -- a capability that, until now, has fallen within the province of elite experts."

Commentary: I feel like I had to make the mental switch from expecting AI to automate "routine" work to "mental" work, i.e. what matters is mental-vs-physical, not creative-vs-routine. Now we're right back to talking about the creative-vs-routine distinction.

AI could actually help rebuild the middle class | noemamag.com

#solidstatelife #ai #technologicalunemployment

waynerad@diasp.org

"The rise of generative AI and 'deepfakes' -- or videos and pictures that use a person's image in a false way -- has led to the wide proliferation of unauthorized clips that can damage celebrities' brands and businesses."

"Talent agency WME has inked a partnership with Loti, a Seattle-based firm that specializes in software used to flag unauthorized content posted on the internet that includes clients' likenesses. The company, which has 25 employees, then quickly sends requests to online platforms to have those infringing photos and videos removed."

This company Loti has a product called "Watchtower", which watches for your likeness online.

"Loti scans over 100M images and videos per day looking for abuse or breaches of your content or likeness."

"Loti provides DMCA takedowns when it finds content that's been shared without consent."

They also have a license management product called "Connect", and a "fake news protection" program called "Certify".

"Place an unobtrusive mark on your content to let your fans know it's really you."

"Let your fans verify your content by inspecting where it came from and who really sent it."

They don't say anything about how their technology works.

Hollywood celebs are scared of deepfakes. This talent agency will use AI to fight them.

#solidstatelife #ai #genai #computervision #deepfakes #aiethics

waynerad@diasp.org

"Texas will use computers to grade written answers on this year's STAAR tests."

STAAR stands for "State of Texas Assessments of Academic Readiness" and is a standardized test given to elementary through high school students. It replaced an earlier test starting in 2007.

"The Texas Education Agency is rolling out an 'automated scoring engine' for open-ended questions on the State of Texas Assessment of Academic Readiness for reading, writing, science and social studies. The technology, which uses natural language processing, a building block of artificial intelligence chatbots such as GPT-4, will save the state agency about $15 million to 20 million per year that it would otherwise have spent on hiring human scorers through a third-party contractor."

"The change comes after the STAAR test, which measures students' understanding of state-mandated core curriculum, was redesigned in 2023. The test now includes fewer multiple choice questions and more open-ended questions -- known as constructed response items."

Texas will use computers to grade written answers on this year's STAAR tests

#solidstatelife #ai #llms #technologicalunemployment

waynerad@diasp.org

The Daily Show with Jon Stewart did a segment on AI and jobs. Basically, we're all going to get helpful assistants which will make us more productive, so it's going to be great, except, more productive means fewer humans employed, but don't worry, that's just the 'human' point of view. (First 8 minutes of this video.)

Jon Stewart on what AI means for our jobs & Desi Lydic on Fox News's Easter panic | The Daily Show

#solidstatelife #ai #aiethics #technologicalunemployment

waynerad@diasp.org

Udio generates AI-generated music. I went through the staff picks. I was impressed that it rendered "acoustic" music well with lyrics -- and the singing seemed actually good and the lyrics made sense. Does genres like jazz & country.

They don't say anything about how the system works. They say the program is free during the beta program.

Udio | Make your music

#solidstatelife #ai #genai #audioai #musicai

waynerad@diasp.org

Why this developer is no longer using Copilot. He feels his programming skills atrophy. He writes code by pausing to wait for Copilot to write code, and doesn't enjoy programming that way. The AI-generated code is often wrong or out-of-date and has to be fixed. Using copilot is a privacy issue because your code is shared with Copilot.

I thought this was quite interesting. I tried Copilot in VSCode and I figured I wasn't using it much because I'm a vim user. So I tracked down the Neovim plug-in & got it working in vim, but still found I don't use it. Now I've come to feel it's great for certain use cases and bad for others. Where it's great is writing "boilerplate" code for using a public API. You just write a comment describing what you want to do and the beginning of the function, and Copilot spits out practically all the rest of the code for you function -- no tedious hours studying the documentation from the API provider.

But that's not the use case I actually engage in in real life. Most of what I do is either making a new UI, or porting code from PHP to Go. For the new UI, AI has been helpful -- I can take a screenshot, input it to ChatGPT, and ask it how to improve the AI. (I'm going to be trying this with Google's Gemini soon but I haven't tried it yet.) When it makes suggestions, I can ask it what HTML+CSS is needed to implement those suggestions. I've found it gets better and better for about 6 iterations. But you notice, Copilot isn't part of the loop. I'm jumping into dozens of files and making small changes, and that's a use case where Copilot just isn't helpful.

For porting code from PHP to Go, I modified a full-fledged PHP parser to transpile code to Go, and this has been critical because it's important that certain things, especially strings, get ported over exactly -- no room for errors. So this system parses PHP strings using PHP's parsing rules, and outputs Go strings using Go's parsing rules, and is always 100% right. Copilot isn't part of the loop and doesn't help.

Another place I've found AI incredibly useful is debugging problems where I have no clue what the problem might be. This goes back to using other people's large systems such as the public APIs mentioned earlier. Every now and then you get cryptic error messages or some other bizarre malfunction, and endless Google searching doesn't help. I can go to ChatGPT, Gemini, Claude, Perplexity, DeepSeek (and others, but those are the main ones I've been using) and say hey, I'm getting this cryptic error message or this weird behavior, and it can give you a nice list of things you might try. That can get you unstuck when you'd otherwise be very stuck.

It's kinda funny because, obviously I'm an avid follower of what's going on in AI, and happy to try AI tools, and I constantly run across other developers who say "Copilot has made me twice as productive!" or "Copilot has made me five times as productive!" or somesuch. I've wondered if there's something wrong with me because I haven't experienced those results at all. But AI has been helpful in other areas nobody ever seems to talk about.

Why I'm no longer using Copilot - Dreams of Code

#solidstatelife #ai #genai #llms #codingllms #openai #copilot

waynerad@diasp.org

The XZ attack has taken the world of cybersecurity by storm. This video provides a concise overview. (If you prefer text, there is a link to a text-based FAQ below.)

It begins with a clever "social engineering" attack, where two people play "good cop bad cop" to guilt-trip the maintainer of XZ. First I should probably mention that XZ Utils is a compression system used by Linux, in lots of places including package managers, build (code compilation) systems, and ssh, the "secure shell" system that enables people to log in to remote servers and run commands. (I myself use ssh dozens of times every day -- if you don't work with servers you wouldn't know, but this is how servers are managed all over the internet.) Getting back to the "social engineering" attack, the attackers successfully demoralized the project maintainer, who was an open source developer working in his spare time and not paid. He eventually gave up and made the "good cop" co-maintainer of the project.

The attack itself is pretty interesting, too. The attacker did not touch ssh, or at least not the code for ssh itself. He changed test code. And not in an obvious way -- he changed a "binary blob" that is opaque to people examining changes to the code to decide whether to accept the changes on their systems or not. The binary blog would get decompressed at build time, and it turned out inside it was a bash script (bash is another one of those Linux shells), and the bash script would get executed. The bash script would modify the ssh system in such a way that a certain public key would be replaced by a different one. The purpose of the original public key was to make sure only trusted people with the corresponding private key could update a running ssh system. With the attacker's key in place, the attacker can now change running ssh systems. Not only that, but because an ssh installation on a server runs with root privileges, because it has to because it has to be able to authenticate any user and then launch a command-line shell for that user with that user's privileges, the attacker becomes able to log in as root on any Linux server infected with the attack -- which could have eventually become more or less all of them had the attack not been discovered.

To me, this attack is interesting on so many levels:

1) It comes through the "supply chain" -- attacking open source at the point where contributors (often unpaid) submit their contributions.

2) It involves a "social engineering" attack on the supply chain, something it had never occurred to me was even possible before.

3) There was a long delay between the social engineering attack and the technical attack -- about 2 years. The attackers spent 2 years building trust to exploit later.

4) It attacks one piece of software (ssh) by attacking a completely different and apparently unrelated piece of software (XZ Utils).

5) It attacks the software not by attacking the code to the software directly, but to its test code.

6) It carries out the attack by running malicious code at build time instead of runtime. (The build of XZ Utils is part of the build of ssh.)

7) It attacks a cryptosystem by replacing a legitimate key with the attacker's key and getting the attacker's key "officially" distributed.

8) Had it been successful, the implications would have been huge -- it would have given the attacker access to practically every Linux server everywhere. (Well, every Linux server, pretty much, uses ssh but the attack initially targeted RedHat & Debian, so maybe it wouldn't have spread to everywhere.)

9) The attack was discovered accidentally, because it modified its target's performance, not any other aspect of its behavior.

I hadn't mentioned that last one yet, but yeah, the attack was discovered by a person who was doing performance benchmarks on a completely unrelated project (to do with the Postgres database), which just happened to include automated ssh logins as part of the testing system, and the ssh logins suddenly slowed down for no apparent reason. In trying to figure out what had gone wrong, he discovered the attack.

This has huge implications for the future for open source software and trust in all the projects and maintainers and regular software updates that are done on a daily basis all over the world. Some are predicting wholesale abandonment of the package distribution systems used currently throughout the Linux world. At the very least, everyone contributing to projects that become standard parts of Linux distributions is going to come under much greater scrutiny.

And in case you're wondering, no, nobody knows who the attackers were, at least as far as I know. And no, no one knows how many other attacks might exist "out there" in the Linux software supply chain.

XZ backdoor: Timeline and overview - Seytonic

#solidstatelife #cybersecurity

waynerad@diasp.org

In 2010, the US government secretly built a social network to destabilize Cuba -- allegedly. Actually, not one but two. Whenever I read stuff like this, I wonder what is going on now that I won't know about for 10 or 20 years.

"The existence of ZunZuneo and its ties to the US government only came to the American public's attention after the Associated Press published a bombshell story in April 2014. It detailed the true goal behind this planned infiltration with information technologies in the Communist country:"

"ZunZuneo's organizers wanted the social network to grow slowly to avoid detection by the Cuban government. Eventually, documents and interviews reveal, they hoped the network would reach critical mass so that dissidents could organize 'smart mobs' -- mass gatherings called at a moment's notice -- that could trigger political demonstrations, or 'renegotiate the balance of power between the state and society.'"

"After ZunZuneo was shut down, but before the existence of the social media network was made public by the AP in 2014, the US government launched a new network called Piramideo, or roughly 'pyramid' in English. This effort was more above-board in the sense that it was run through the Broadcasting Board of Governors, America's largest public-facing media outlet with a mandate to help spread American versions of truth and democracy around the world. Or, to put it more bluntly, the BBG was America's biggest foreign propaganda arm. A descendant of the Cold War's USIA, the BBG billed Piramideo as a network that, 'makes it easier for people to connect with each other, free from government control.'"

Relevant to the current discussion about whether TikTok can be used for political influence, or whether the Chinese government would attempt to use it that way. Interesting how what caused these social networks to fail is the same as what caused hundreds of others to fail: they just couldn't get popular enough for the positive network effects to kick in. TikTok has surpassed that hurdle.

Remember when the US secretly built a social network to destabilize Cuba?

#solidstatelife #socialnetworks

waynerad@diasp.org

Lumen Orbit is a new startup wants to "put hundreds of satellites in orbit, with the goal of processing data in space before it's downlinked to customers on Earth."

"Lumen's business plan calls for deploying about 300 satellites in very low Earth orbit, at an altitude of about 315 kilometers (195 miles). The first satellite would be a 60-kilogram (132-pound) demonstrator that's due for launch in May 2025."

"We started Lumen with the mission of launching a constellation of orbital data centers for in-space edge processing, Essentially, other satellites will send our constellation the raw data they collect. Using our on-board GPUs, we will run AI models of their choosing to extract insights, which we will then downlink for them. This will save bandwidth downlinking large amounts of raw data and associated cost and latency."

If you're wondering who wants this, there's a bunch of investors listed in the article, and it says they've raised $2.4 million to start with.

Lumen Orbit emerges from stealth and raises $2.4M to put data centers in space - GeekWire

#solidstatelife #ai #aihardware #startups #space

waynerad@diasp.org

Aethero is a new startup promising "edge computing" AI systems in satellites. "The next generation of space rated edge computing".

"Aethero's onboard software includes a thin, headless system containing a minimal set of packages and tools needed to boot the system. The onboard system will allow you to run containerized applications and will provide services such as Over the Air software updating. The Over the Air testing or updating and fleet management or system level testing is accessed through Aethero's unified Aether Software. We provide an automated framework for platform testing; it includes support for the Hardware, Board Support Packages and Software -- this allows users to develop, debug and test multi-node device systems reliably, scalably and effectively."

"Users can use Aethero's AMATDT (Automated Model Annotation, Training & Deployment Tool) that is integrated within the Aether Software to customize models deployed on Aethero's Edge Computing Modules. Current imagery support includes RGB, Multispectral, and Hyperspectral data."

This is tailored to Aethero's hardware.

"Aethero is leveraging standard architectures such as the CubeSat or PC104 framework, state-of-the-art radiation-hardened commercial components, and modern software operating systems to provide a high performance, highly capable single board computers, or multiple redundant or distributed computing configurations. The versatile architecture means that the system can be used for a variety of applications such as Autonomous Spacecraft Operation, Machine Vision Operations such as [VPS] Visual Positioning Systems or Optical Navigation, Imagery Processing ([UV/EO/IR] Ultraviolet/Electro-Optical/Infrared, [MSI] Multi-Spectral, [HSI] Hyperspectral, [SAR] Synthetic Aperture Radar, [LIDAR] Light Detection and Ranging, [TIR] Thermal Infrared), [RF] Radio Frequency Signal Processing such as Link-Budget Optimization, Video/Image processing such as Manipulation or Segmentation, Object Detection with Classification, [AI] Artifical Intelligence/[ML] Machine Learning Applications, [SDR] Software Defined Radio, Data Compression/Management, etc."

Aethero -- space data, re-imagined

#solidstatelife #ai #edgecomputing #aihardware #space #satellites

waynerad@diasp.org

"Researchers have developed a photoacoustic imaging watch for high-resolution imaging of blood vessels in the skin. The wearable device could offer a non-invasive way to monitor hemodynamic indicators such as heart rate, blood pressure and oxygen saturation that can indicate how well a person's heart is working."

But it requires "a backpack housing the laser and power supply."

Sounds like a step up from my FitBit, which measures blood oxygen fluctuations. But the difference is that calculates a crude estimate, while this device does actual imaging.

They say it has a resolution of 8.7 micrometers, with a field of view of 3 mm in diameter. They say it has a "motorized adjustable focus for optimizing the imaging plane for different individuals."

Wearable tech captures real-time hemodynamics on the go

#solidstatelife #medicalimaging

waynerad@diasp.org

"Accessibility has failed: Try generative UI = individualized UX". Says legendary usability expert Jakob Nielsen. What he means by "accessibility" is borrowing the concept of "accessibility" in the physical world, where wheelchair ramps on buildings and busses are built, and so on, and applying it to the world of computing. This means, for example, making screen readers that translate screens into speech or braile, so blind or hearing-impared people can use computers. If you're a web developer you're supposed to fill in your "alt" attributes for all your image tags so screen readers can tell users what the image is. (Finishing out the headline, "UI" stands for "user interface" and "UX" stands for "user experience" -- most of you probably already know that.) Jakob Nielsen says:

"Accessibility has failed as a way to make computers usable for disabled users. My metrics for usable design are the same whether the user is disabled or not: whether it's easy to learn the system, whether productivity is high when performing tasks, and whether the design is pleasant -- even enjoyable -- to use."

"Assessed this way, the accessibility movement has been a miserable failure. Computers are still difficult, slow, and unpleasant for disabled users, despite about 30 years of trying. (I started promoting accessibility in 1996 when I worked at Sun Microsystems, but by no means claim to have been the first accessibility advocate.)"

"There are two reasons accessibility has failed:"

"Accessibility is too expensive for most companies to be able to afford everything that's needed with the current, clumsy implementation. There are too many different types of disabilities to consider for most companies to be able to conduct usability testing with representative customers with every kind of disability. Most companies either ignore accessibility altogether because they know that they won't be able to create a UX that's good enough to attract sufficient business from disabled customers, or they spend the minimum necessary to pass simplistic checklists but never run the usability studies with disabled users to confirm or reject the usability of the resulting design."

"Accessibility is doomed to create a substandard user experience, no matter how much a company invests, particularly for blind users who are given a linear (one-dimensional) auditory user interface to represent the two-dimensional graphical user interface (GUI) designed for most users."

"'Generative UI' is simply the application of artificial intelligence to automatically generate user interface design."

But he doesn't stop there. He goes on to envision "first-generation" and "second-generation" generative UI:

"'First-generation generative UI' for frozen designs where the AI only modifies the UI before shipping the product."

"I foresee a much more radical approach to generative UI to emerge shortly -- maybe in 5 years or so. In this second-generation generative UI, the user interface is generated afresh every time the user accesses the app. Most important, this means that different users will get drastically different designs. This is how we genuinely help disabled users."

Accessibility has failed: Try generative UI = individualized UX

#solidstatelife #hci #usability #ai #genai

waynerad@diasp.org

FlowState: The FPV Drone Documentary. FPV as in "first-person view". The out-of-body experience of being in a state of 'flow', one with your drone. Blow the video up to full screen and crank up the resolution... unless you're prone to motion sickness. Jaw-dropping footage of stunts in 3 dimensions around "bandos" (abandoned buildings), skyscrapers, parking garages, other urban structures, mountains, lakes, waterfalls, other natural scenery, skateboarders, mountain bikers, skiers, snowboarders, base jumpers, wing suit jumpers, etc, drift cars, and more. Delves into the hobbyist culture behind FPV. Shows drone racing. Explores the the worry that FAA "RemoteID" laws may bring an end to the hobby. Introduces you to all the key players in the FPV community, which is a smaller and more tight-knit community than you may have thought.

FlowState: The FPV Drone Documentary (Full Film Official Release) - Joshua Bardwell

#solidstatelife #inventions #uavs #drones #fpv

waynerad@diasp.org

Photorealistic AI-generated talking humans. "VLOGGER" is a system for generating video to match audio of a person talking. So you can make video of any arbitrary person saying any arbitrary thing. You just supply the audio (which could itself be AI-generated) and a still image of a person (which also could itself be AI-generated).

Most of the sample videos wouldn't play for me, but the ones in the top section did and seem pretty impressive. You have to "unmute" them to hear the audio and see that the video matches the audio.

They say the system works using a 2-step approach where the first step is to take just the audio signal, and use a neural network to predict what facial expressions, gaze, gestures, pose, body language, etc, would be appropriately associated with that audio, and the second step is to combine the output of the first step with the image you provide to generate the video. Perhaps surprisingly (at least to me), both of these are done with diffusion networks. I would've expected the second step to be done with diffusion networks, but the first to be done with some sort of autoencoder network. But no, they say they used a diffusion network for that step, too.

So the first step is taking the audio signal and converting to spectrograms. In parallel with that the input image is input into a "reference pose" network that analyses it to determine what the person looks like and what pose the rest of the system has to deal with as a starting point.

These are fed into the "motion generation network". The output of this network is "residuals" that describe face and body positions. It generates one set of all these parameters for each frame that will be in the resulting video.

The result of the "motion generation network", along with the reference image and the pose of the person in the reference image is then passed to the next stage, which is the temporal diffusion network that generates the video. A "temporal diffusion" network is a diffusion network that generates images, but it has been modified so that it maintains consistency from frame to frame, hence the "temporal" word tacked on to the name. In this case, the temporal diffusion network has undergone the additional step of being trained to handle the 3D motion "residual" parameters. Unlike previous non-diffusion-based image generators that simply stretched images in accordance with motion parameters, this network incorporates the "warping" parameters into the training of the neural network itself, resulting in much more realistic renditions of human faces stretching and moving.

This neural network generates a fixed number of frames. They use a technique called "temporal outpainting" to extend the video to any number of frames. The "temporal outpainting" system re-inputs the previous frames, minus 1, and uses that to generate the next frame. In this manner they can generate a video of any length with any number of frames.

As a final step they incorporate an upscaler to increase the pixel resolution of the output.

VLOGGER: Multimodal diffusion for embodied avatar synthesis

#solidstatelife #ai #computervision #generativeai #diffusionmodels