CogVideo is currently the largest general domain text-to-video generation pre-training model, with 9.4 billion parameters. CogVideo effectively utilizes the pre-trained text-to-image generation model (CogView2) to the text-to-video generation model, and uses a multi-frame rate layered training strategy.

CogVideo: Generating videos from text descriptions.

