ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

Click to Play the Long Animations (8 or 32 seconds)!

Methodology

Although video generation has made great progress in capacity and controllability and is gaining increasing attention, currently available video generation models still make minimal progress in the video length they can generate. Due to the lack of well-annotated long video data, high training/inference cost, and flaws in the model designs, current video generation models can only generate videos of $2 \sim 4$ seconds, greatly limiting their applications and the creativity of users. We present ZoLA, a zero-shot method for creative long animation generation with short video diffusion models and even with short video consistency models~(a new family of generative models known for the fast generation with top-performing quality). In addition to the extension for long animation generation~(dozens of seconds), ZoLA as a zero-shot method, can be easily combined with existing community adapters~(developed only for image or short video models) for more innovative generation results, including control-guided animation generation/editing, motion customization/alternation, and multi-prompt conditioned animation generation, etc. And, importantly, all of these can be done with commonly affordable GPU~(12 GB for 32-second animations) and inference time~(90 seconds for denoising 32-second animations with consistency models). Experiments validate the effectiveness of ZoLA, bringing great potential for creative long animation generation.


Comparisons

Here we demonstrate long animations generated by different methods.
Click to Play the Long Animations.

Align Your Latents
Gen-L-Video
ZoLA

Gallery Ⅰ: Standard Generation

Here we demonstrate long animations generated by ZoLA.
Click to Play the Long Animations.

Gallery Ⅱ: Controllable Generation

Here we demonstrate long animations generated by ZoLA with ControlNet.
Click to Play the Long Animations.

Gallery Ⅲ: Multi-Prompt Conditions

Here we demonstrate long animations generated by ZoLA conditioned with multi-prompts.
Click to Play the Long Animations.

Prompt Change of Video Ⅰ: Smile ➜ Cry

Prompt Change of Video Ⅱ: Camera Left ➜ Camera Right ➜ Camera Up

Prompt Change of Video Ⅲ: Play the Guitar ➜ Sing the Song

Prompt Change of Video Ⅳ: Look at the view ➜ Smile ➜ Hands Up

Project page template is borrowed from DreamBooth.