[ad_1]
Google has announced the new LLM multimodal and video generator, VideoPoet, which allows users to create high-quality videos from images, text and audio. Most models are trained for each task separately, while VideoPoet is a multimodal model trained for everything. There is some skepticism within the community about some partial applications and their effectiveness following the announcements. Advances in text fidelity and motion further improve the Android experience itself.
Under the hood, VideoPoet has developed a decoder-only architecture that generates content that has not been specifically trained on. It is comparable to other LLMs in terms of pre-training and task-specific adaptation. If you’ve ever tested DALL-E 3 or MidJourney, you’ll be familiar with Google VideoPoet. Google has supported Runway AI Video Generation, including VideoPoet, as one of its upcoming projects. It can be used to improve photorealism using AI, raising concerns about the real capabilities of the model and its real-world applications, as well as its effectiveness. Let’s learn more about this.
Google Video Poet
Google’s pre-trained LLM is a framework that can be adapted for various video generation tasks, such as text-to-video, image-to-video, video inpainting and outpainting, video stylization, and video-to-audio -generation. This is called an autoregressive model, where natural language processing allows this multimodal model to be trained on video, audio, images and text.
It uses a decoder-only architecture that can handle inputs such as text, images, video and audio. It has been trained on billions of images and more than 270 million publicly available videos. Compared to other common video models that are based on diffusion models, which add noise to training data and then recreate it, VideoPoet combines multiple videos into a unified language model.
- Pre-trained MAGVIT 2 video tokenizer and SoundStream audio tokenizer, which are compatible with text-based language models.
- An autoregressive language model that can learn to predict the next video or audio tokens in order to generate output.
- A blend of multimodal generative learning and the LLM training framework, which includes text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylization, and video to audio includes .
Moreover, it uses different modes and is trained with multiple tokenizers, MAGVIT V2 for video and images, and SoundStream for audio. Since it is a proven good model, Google’s approach of using it to perform various tasks is said to be a good choice.
By default, Google generates portrait videos, which sets it apart from its competitors like Imagen Video, RunwayML, Stable Video Diffusion, Pika, Animate Everyone, and others. Other models struggle to generate content with high motion coherence, but VideoPoet does it with an accuracy that looks nice. However, it also supports video in square orientation and audio generation based on video input.
Features of Google VideoPoet
It offers multi-modal functionality, which can accept input including text, images, videos and audio, to generate a video. You can use Google VideoPoet to create a short film by writing a short screenplay with a prompt.
- Text to video
- Video with text prompt
- Zero-shot video stylization
- Video to audio
- Long video editing and camera movements
This AI model pushes the boundaries of existing technology. This type of AI model is not available from Microsoft. They recently added the ability to generate audio using text prompts, using Suno AI. Google uses the LLM-based AI model, based on transformer architecture instead of the diffusion model, to train and produce content for high-quality output.
VideoPoet can generate longer and more consistent movements in longer videos at 16 FPS. Users can also choose from a wide range of options, including different camera movements and visual and aesthetic styles. It can also generate audio to match the video clip. Google Research shows that approximately 24-35% of users prefer the content generated by VideoPoet compared to competing models.
Conclusion
Google has not yet released VideoPoet to the public. They have announced their latest model, which will soon be integrated into the company’s products and services. Despite Google sharing some great features, users cannot use them as of now. On the other hand, Microsoft has brought visual and audio capabilities.
We can expect VideoPoet to be integrated with Bard Advance, and possibly in the next version of the Google Pixel 9 series. There are limitations to Google VideoPoet. You cannot generate a long video or replace parts of the video frame with your own imaginative elements of your preference. I’d like to share my feedback with you when it becomes available to the public and give you a guide on how to use it. Stay tuned for further updates on this.