Meta (the company formerly known as Facebook) has created a video generation model, called "Meta Movie Gen". Have a look at the sample videos.
2 Likes
3 Comments
Meta (the company formerly known as Facebook) has created a video generation model, called "Meta Movie Gen". Have a look at the sample videos.
"In the 1930s, Disney invented the multiplane camera and was the first to create sound-synchronized, full color cartoons -- eventually leading to the groundbreaking animated film Snow White and the Seven Dwarfs."
"Marvel and DC Comics rose to prominence in the 1940s, dubbed the 'golden age of comics,' enabled by the mass availability of the 4-color rotary letterpress and offset lithography for printing comics at scale."
"Similarly, Pixar was uniquely positioned in the 1980s to leverage a new technology platform -- computers and 3D graphics."
we believe the Pixar of the next century won't emerge through traditional film or animation, but rather through interactive video. This new storytelling format will blur the line between video games and television/film -- fusing deep storytelling with viewer agency and 'play,' opening up a vast new market."wo
So says Jonathan Lai of Andreessen Horowitz, the Silicon Valley investment firm.
"The promise of interactive video lies in blending the accessibility and narrative depth of TV/film, with the dynamic, player-driven systems of video games."
"The biggest remaining technical hurdle for interactive video is reaching frame generation speeds fast enough for content generation on the fly. Dream Machine currently generates ~1 frame per second. The minimum acceptable target for games to ship on modern consoles is a stable 30 FPS, with 60 FPS being the gold standard. With the help of advancements such as Pyramid Attention Broadcast (PAB), this could go up to 10-20 FPS on certain video types, but is still not quite fast enough."
("By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular diffusion transformer-based video generation models including Open-Sora, Open-Sora-Plan, and Latte.")
"Given the rate at which we've seen underlying hardware and model improvements, we estimate that we may be ~2 years out from commercially viable, fully generative interactive video."
"In February 2024, Google DeepMind released its own foundation model for end-to-end interactive video named Genie. The novel approach to Genie is its latent action model, which infers a hidden action in between a pair of video frames."
"We've seen teams incorporate video elements inside AI-native game engines." "Latens by Ilumine is building a 'lucid dream simulator' where users generate frames in real-time as they walk through a dream landscape." "Developers in the open-source community Deforum are creating real-world installations with immersive, interactive video. Dynamic is working on a simulation engine where users can control robots in first person using fully generated video." "Fable Studio is building Showrunner, an AI streaming service that enables fans to remix their own versions of popular shows." "The Alterverse built a D&D inspired interactive video RPG where the community decides what happens next. Late Night Labs is a new A-list film studio integrating AI into the creative process. Odyssey is building a visual storytelling platform powered by 4 generative models." "Series AI has developed Rho Engine, an end-to-end platform for AI game creation." "We're also seeing AI creation suites from Rosebud AI, Astrocade, and Videogame AI enable folks new to coding or art to quickly get started making interactive experiences."
"Who will build the Interactive Pixar?"
The Next Generation Pixar: How AI will Merge Film & Games
#solidstatelife #ai #genai #computervision #videoai #startups
Company revives Alan Turing as an AI chatbot, hilarity, no, wait, outrage ensues.
The company is Genius Group, based in Singapore, which provides "AI-powered business education."
"Software engineer Grady Booch, a former Turing Talk speaker, wrote on Twitter/X: 'Absolute and complete trash. I hope that Turing's heirs sue you into oblivion.'"
"Another user told Genius Group's CEO: 'This is so incredibly unethical, disrespectful, and disgusting. You are pillaging the image of a deceased person (who frankly has suffered enough from exploitation) and the voice of an actor to suit your purposes. Vile.'"
Company revives Alan Turing as an AI chatbot, outrage ensues
#solidstatelife #ai #aieducation #llms #genai #computervision #videoai
MagicAnimate animates humans based on a reference image. See the Mona Lisa jogging or doing yoga.
The way it works is, well, first of all it uses a diffusion network to generate the video. Systems for generating video using GANs (generative adversarial networks) have also been developed. Diffusion networks, however, have recently shown themselves to be better at taking a human-entered text prompt and turning that into an image. The problem, though, is that if you want to make a video, you go frame by frame, and since each frame is independent of the others, that inevitably leads to flickering.
The key insight here is that instead of doing the "diffusion" process frame-by-frame, you do it on the entire video all at once. This enables "temporal consistency" across frames. A couple more elements are necessary to get the whole system to work, though.
One is discarding the normal way diffusion networks use an internal encoding that is tied to a text prompt. In this system, since a reference image is provided instead, there is no text prompt. So the whole system is trained to use an internal encoding that is based on appearance. This enables the system to maintain the appearance of the original video for both the human being animated and the background.
The other key piece that gets the system to work is incorporating a prior system called ControlNet. ControlNet analyzes the pose provided and converts in into a motion signal, which is a dense set of body "keypoints". The first step of the process involves analysis of the control points. Joint diffusion of the control points and the reference image is the second stage after that.
If you're wondering how the system manages to hold the entire video in memory to do the diffusion process on the entire videos, the answer is that actually it doesn't. Because they needed to get the system to work on GPUs with limited memory, the researchers actually devised a "sliding window" system where it would generate overlapping segments of video. The frames are close enough that they can be combined with simple averaging and the end result looks okay.
Speaking of the researchers, this was a joint team between ByteDance and the National University of Singapore. ByteDance as in, the maker of TikTok. Application of this to TikTok is obvious.