Abstract
Generating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high dimensionality and complexity. Several recent diffusion-based methods have shown comparable performance by compressing videos to a lower-dimensional latent space, using traditional video autoencoder architecture. However, such method that employ standard frame-wise 2D or 3D convolution fail to fully exploit the spatio-temporal nature of videos. To address this issue, we propose a novel hybrid video diffusion model, called HVDM, which can capture spatio-temporal dependencies more effectively. HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video including: (i) a global context information captured by a 2D projected latent, (ii) a local volume information captured by 3D convolutions with wavelet decomposition, and (iii) a frequency information for improving the video reconstruction. Based on this disentangled representation, our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details. Experiments on standard video generation benchmarks such as UCF101, SkyTimelapse, and TaiChi demonstrate that the proposed approach achieves state-of-the-art video generation quality, showing a wide range of video applications (e.g., long video generation, image-to-video, and video dynamics control).
Overall architecture of our hybrid video autoencoder (HVDM)
Diverse latent video diffusion models
Main Results
Short Video Generation
DIGAN
LVDM
PVDM
HVDM
Long Video Generation
DIGAN
LVDM
PVDM
HVDM
Applications
Image-to-Video
Image
Video
Image
Video
Image
Video
Image
Video
Video Dynamics Control
Slow Motion
Medium Motion
Fast Motion
BibTeX
@article{kim2024hybrid,
title={Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation},
author={Kim, Kihong and Lee, Haneol and Park, Jihye and Kim, Seyeon and Lee, Kwanghee and Kim, Seungryong and Yoo, Jaejun},
year={2024},
eprint={2402.13729},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Project page template is borrowed from DreamBooth.