A pipeline to generate long videos according to text prompt
Xinchen Zhang
Tsinghua University
| A spectacular waterfall | A car driving down the road. |
![]() |
![]() |
| Astronauts traveling in space | A cat looking out the window |
![]() |
![]() |
Before inference, you need to use LLMs to obtain segmented fragments based on the prompt, along with complex descriptions of each fragment.
We provide a template in template.txt. Then copy and paste the template to ChatGPT, you can get the generated prompts.
We offer two ways to generate a long video. If you choose I2VGen-XL as the backbone, run:
python pipeline_i2vgenxl.py --seed 1234 --fps 16If you choose SVD as the backbone, run:
python pipeline_svd.py --seed 1234 --fps 16After that, we use EMA-VFI to interpolate the video.




