VidStamp is a watermarking framework for video diffusion models that embeds high-capacity, flexible, and imperceptible watermarks directly into the latent space of temporally-aware video diffusion models. By fine-tuning the decoder through a two-stage training process—first on images, then on videos—VidStamp achieves strong ownership verification, frame-level tamper localization, and minimal visual degradation.
Our paper: "VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models"
- Download the COCO 2017 dataset.
- Set the paths to the train and val image folders in:
configs/SVD_Finetune_First_Stage_config.json
.
- Use the prompts from:
train_prompts.txt
val_prompts.txt
- Generate corresponding images using Stable Diffusion 2.1.
- Save the images inside appropriate folders (e.g.,
dataset/train_images/
anddataset/val_images/
).
- Use the generated images to synthesize videos using Stable Video Diffusion.
- Save the videos inside appropriate folders (e.g.,
dataset/train_videos/
anddataset/val_videos/
).
- In
configs/SVD_Evaluate_decoder.json
:- Set the path to the validation images folder.
- In
configs/SVD_Finetune_Second_Stage_config.json
:- Set the paths to the train videos and val videos folders.
- Follow the instructions in the Stable Signature GitHub repository to download the TorchScript version of the watermark extractor.
- Set the path to the downloaded watermark extractor model in:
configs/SVD_Finetune_First_Stage_config.json
configs/SVD_Finetune_Second_Stage_config.json
configs/SVD_Evaluate_decoder.json
- First Stage Fine-tuning (Image-based):
python scripts/finetune_first_stage.py --config configs/SVD_Finetune_First_Stage_config.json
- Second Stage Fine-tuning (Video-based):
python scripts/finetune_second_stage.py --config configs/SVD_Finetune_Second_Stage_config.json
- To extract and evaluate the embedded watermarks:
python scripts/evaluate_watermark.py --config configs/SVD_Evaluate_decoder.json
- COCO Dataset is required only for the first-stage fine-tuning.
- Generated images and videos are required for the second-stage fine-tuning and evaluation.
- VidStamp supports both per-frame and segment-wise watermarking.
- The watermark extractor must be correctly downloaded and linked before training or evaluation.