Go back

ByteDance Launches Vidi2: Multimodal AI Revolutionizing Video Editing

ByteDance Launches Vidi2: Multimodal AI Revolutionizing Video Editing

ByteDance debuts Vidi2, a 12B-parameter multimodal LLM designed to generate TikTok videos from simple prompts.

On December 1, 2025, ByteDance announced Vidi2, its latest multimodal large language model tailored specifically for video understanding and editing. With 12 billion parameters and powered by the Gemma-3 architecture, Vidi2 processes hours-long raw footage, localizing objects and people at one-second intervals, facilitating fine-grained editing such as object tracking in dynamic scenes. Its innovative adaptive token compression maintains key visual details while handling long videos efficiently. This breakthrough enables users to generate complete TikTok short videos or movie clips simply from text prompts, signaling a disruptive shift in video content creation workflows. Leveraging TikTok’s vast daily active user base and massive video training data, ByteDance aims to establish a potent AI flywheel that challenges traditional video editing and AI companies alike. Vidi2 remains in research phase with a demonstration expected soon, highlighting the expanding role of domain-specialized multimodal LLMs in media production.[4]

Source: AIbase


Share this post on:

Previous Post
Fujitsu Develops Privacy-Preserving Multi-AI Agent Collaboration Technology
Next Post
OpenAI Launches GPT-5.1 with Adaptive Thinking and New Developer Tools

Related Posts