ByteDance Launches Vidi2: Multimodal AI Revolutionizing Video Editing

1 Dec, 2025

ByteDance debuts Vidi2, a 12B-parameter multimodal LLM designed to generate TikTok videos from simple prompts.

On December 1, 2025, ByteDance announced Vidi2, its latest multimodal large language model tailored specifically for video understanding and editing. With 12 billion parameters and powered by the Gemma-3 architecture, Vidi2 processes hours-long raw footage, localizing objects and people at one-second intervals, facilitating fine-grained editing such as object tracking in dynamic scenes. Its innovative adaptive token compression maintains key visual details while handling long videos efficiently. This breakthrough enables users to generate complete TikTok short videos or movie clips simply from text prompts, signaling a disruptive shift in video content creation workflows. Leveraging TikTok’s vast daily active user base and massive video training data, ByteDance aims to establish a potent AI flywheel that challenges traditional video editing and AI companies alike. Vidi2 remains in research phase with a demonstration expected soon, highlighting the expanding role of domain-specialized multimodal LLMs in media production.[4]

Source: AIbase

Comments

Share your thoughts using your GitHub account.

ByteDance Launches Vidi2: Multimodal AI Revolutionizing Video Editing

Related Posts

Google Launches Nano Banana Pro, a Breakthrough in AI Image Generation and Editing

GPT-4.2 Vision Tops Advanced Multimodal Image Analysis in 2025

Weights & Biases launches LLM Evaluation Jobs

Comments