[ad_1]
Google Research recently unveiled VideoPoet, an exciting new artificial intelligence (AI) model that can generate stunning videos from a variety of inputs. VideoPoet represents a quantum leap in AI-generated video technology.
VideoPoet is a large language model (LLM) trained by Google to produce video content, rather than just text or code like most other LLMs. The researchers trained it on a dataset of 270 million videos and more than 1 billion text-image pairs pulled from the Internet. This allowed VideoPoet to learn the complex task of video generation.
At the heart of VideoPoet is the transformer architecture, a neural network design optimized for processing sequential data such as text or video frames. It converts input into text, visual and audio embeds that serve as conditions for the video output. For example, a text prompt can produce a matching descriptive video.
What sets VideoPoet apart from other video generation models is its ability to create smooth, high-fidelity motion over longer 16-frame videos. Other diffusion models often generate artifacts or malfunctions when movements become too complex. VideoPoet can display more coherent movements without sacrificing realism.
Additionally, VideoPoet can mimic different camera movements and visual styles and generate appropriate audio. This versatility allows it to use text, images and video as input to create new videos. By integrating all these capabilities into one LLM, VideoPoet provides a streamlined video creation experience.
Google Research compared VideoPoet to leading distribution models such as Source-1, VideoCrafter and Phenaki. Human raters overwhelmingly favored VideoPoet’s results in comparisons with each other. Raters chose VideoPoet videos as better-fitting cues 24-35% of the time and because they had more interesting movements 41-54% of the time.
“On average, people selected 24-35% of VideoPoet examples because they followed directions better than a competing model versus 8-11% for competing models. Reviewers also favored 41-54% of VideoPoet’s examples for more interesting movements than 11-21% for other models.
Google in an official blog post.
Currently, VideoPoet specializes in vertical mobile video generation, similar to Snap and TikTok. However, Google plans to expand it to accommodate every generation, ranging from text, images, video and audio. This could push the boundaries of what is possible with generative media models even further.
The only limitation for now is that VideoPoet is not publicly accessible. Google has not indicated a timeline for its release as a practical tool. But its demonstration clearly points to major advances in AI-generated video on the horizon.
VideoPoet represents an exciting preview of the future of AI creativity. The versatile approach to replicating the complexity of video could power new applications in animation, visual effects, advertising and more. For now, we must eagerly await further developments from Google Research and other labs working on similar generative video models. But if VideoPoet is any indication, the future looks incredibly promising.