GitHub

GenTron: Diffusion Transformers for Image and Video Generation

Unofficial PyTorch Implementation

Paper | Project Page

GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
The University of Hong Kong, Meta

This repository contains:

🪐 A simple PyTorch implementation of Text-to-Image GenTron
🪐 A simple PyTorch implementation of Text-to-Video GenTron
⚡️ An ImageNet features extract script
🛸 A GenTron training script
🛸 A GenTron training script using stored features.

Setup

conda create -n gentron python=3.10
conda activate gentron
pip install -r requirements.txt

Sampling

python sample.py --image_size 512 --seed 1

python sample.py --model GenTron-T2I-XL/2 --image_size 256 --ckpt /path/to/model.pt

python sample_t2v.py --model GenTron-T2V-XL/2 --image_size 256 --ckpt /path/to/model.pt

GenTron Model	Train Steps	Image Resolution
B/2	150000	256x256

Training T2I Model

Preparation

torchrun --nnodes=1 --nproc_per_node=1 extract_features.py --data_path /path/to/ImageNet/train --features_path /path/to/ImageNet/features

Training

Train GenTron-T2I model directly.

accelerate launch --mixed_precision fp16 train.py --model GenTron-T2I-XL/2 --data_path /path/to/ImageNet/train

accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train.py --model GenTron-T2I-XL/2 --data_path /path/to/ImageNet/train

Train GenTron-T2I model with extracted features.

accelerate launch --mixed_precision fp16 train_v2.py --model GenTron-T2I-XL/2 --features_path /path/to/ImageNet/features

accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train_v2.py --model GenTron-T2I-XL/2 --features_path /path/to/ImageNet/features

Training T2V Model

Preparation

WebVid-10M Datset.

Assumes webvid data is structured as follows.
Webvid/
    videos/
        000001_000050/      ($page_dir)
            1.mp4           (videoid.mp4)
            ...
            5000.mp4
        ...

MSR-VTT Datset.

The official data and video links can be found in link.

For the convenience, you can also download the splits and captions by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

Besides, the raw videos can be found in sharing from Frozen️ in Time, i.e.,

wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip

Training

Train GenTron-T2V model directly.

accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train_t2v.py --model GenTron-T2V-XL/2 --meta_path /path/to/webvid/results_10M_train.csv --data_dir /path/to/webvid

accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train_t2v.py --model GenTron-T2V-XL/2 --meta_path /path/to/msrvtt_data/MSRVTT_data.json --data_dir /path/to/MSRVTT

Acknowledgments

Citation

@article{chen2023gentron,
  title={Gentron: Delving deep into diffusion transformers for image and video generation},
  author={Chen, Shoufa and Xu, Mengmeng and Ren, Jiawei and Cong, Yuren and He, Sen and Xie, Yanping and Sinha, Animesh and Luo, Ping and Xiang, Tao and Perez-Rua, Juan-Manuel},
  journal={arXiv preprint arXiv:2312.04557},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GenTron: Diffusion Transformers for Image and Video Generation

Unofficial PyTorch Implementation

Paper | Project Page

Setup

Sampling

Training T2I Model

Preparation

Training

Training T2V Model

Preparation

Training

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
diffusion		diffusion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.py		download.py
extract_features.py		extract_features.py
models.py		models.py
requirements.txt		requirements.txt
sample.png		sample.png
sample.py		sample.py
sample_t2v.py		sample_t2v.py
train.py		train.py
train_t2v.py		train_t2v.py
train_v2.py		train_v2.py

License

lavinal712/GenTron

Folders and files

Latest commit

History

Repository files navigation

GenTron: Diffusion Transformers for Image and Video Generation

Unofficial PyTorch Implementation

Paper | Project Page

Setup

Sampling

Training T2I Model

Preparation

Training

Training T2V Model

Preparation

Training

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages