Image2Reverb

LATEST INFORMATION:

WARNING THIS IS NOT SUPPORTED DURING DEVELOPMENT AND YOU USE IT AT YOUR OWN RISK

Consider the main repo the dev environment and things could break and be left in that state for a while. I'm hobbyist messing about with this to see if it can speed up sound stage duties for short videos in Comfyui. It tested working for one shot image results, but needs some more work.

Adaptation is underway to see if it can be used to work with Comfyui for image or ideally video ambience when applying audio to a scene clip created in comfyui. I would recommend not installing this version at this stage while it is being worked on. I will post updates here if it becomes something I think is going to be useful and useable - mdkberry (August 2025)

Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis

Nikhil Singh, Jeff Mentch, Jerry Ng, Matthew Beveridge, Iddo Drori

Project Page

Code for the ICCV 2021 paper [arXiv]. Image2Reverb is a method for generating audio impulse responses, to simulate the acoustic reverberation of a given environment, from a 2D image of it.

Dependencies

Updated to work in a conda environment and use pytorch 2.7 and CUDA 12.8, see requirements.txt and environment.yml for more information

Image Requirements

Images should be 512x512 pixels for best results
Supported formats: JPG, PNG, BMP, TIFF

Usage

Running on a Single Image

To generate an impulse response from a single image, you can use the provided run_single_image.py script:

python run_single_image.py --image_path path/to/your/image.jpg --output_dir ./results

This script will:

Resize your image to 512x512 pixels if needed
Create a temporary dataset structure
Run the model on your image
Save the results in the specified output directory
Clean up temporary files

Required Pre-trained Models

The required pre-trained models should be placed in the models folder:

Places365 ResNet50 model: models/resnet50_places365.pth.tar
Monodepth2 models: models/mono_640x192/ folder containing encoder.pth and depth.pth
Image2Reverb checkpoint: models/model.ckpt

If you haven't already downloaded these models, you can get them from:

Places365 ResNet50 model: http://places2.csail.mit.edu/models_places365/resnet50_places365.pth.tar
Monodepth2 models: From https://github.com/nianticlabs/monodepth2
Image2Reverb checkpoint: https://media.mit.edu/~nsingh1/image2reverb/model.ckpt

On successful run of Single image script

Here's what each of the generated files contains and how to use them:

results/test/test.wav: This is the main output - the impulse response (IR) audio file generated from your input image. This file simulates how sound would reverberate in the environment depicted by your image. You can use this IR in audio processing software or digital audio workstations (DAWs) to apply realistic reverb to your audio.
results/test/input.png: This is a visualization of your input image as processed by the model. It may include some preprocessing or modifications made by the model.
results/test/depth.png: This shows the estimated depth map of your input image, which the model uses to understand the 3D structure of the environment.
results/test/spec.png: This is a visualization of the spectrogram of the generated impulse response, showing how the frequency content of the reverb changes over time.
results/t60.json: Contains RT60 values (reverberation times) for different frequency bands. RT60 is the time it takes for sound to decay by 60dB, which is an important acoustic parameter.
results/t60.png: A graphical representation of the RT60 values in a box plot.
results/t60_err.npy: Numerical data about the T60 error metrics.

To use the generated impulse response:

Take the test.wav file from the results/test/ directory
Import it into your audio software or DAW
Use it as an impulse response in a convolution reverb plugin
Process your dry audio with the reverb to simulate the acoustics of the environment in your input image

The model has successfully converted your 2D image of a room into a realistic 3D acoustic simulation in the form of an impulse response audio file.

Examples

Here are some examples of the Image2Reverb model's output for different environments:

Alley

Your browser does not support the audio element.

Alley Impulse Response (WAV)

Cathedral

Your browser does not support the audio element.

Cathedral Impulse Response (WAV)

Bedroom

Your browser does not support the audio element.

Bedroom Impulse Response (WAV)

Empty Field

Your browser does not support the audio element.

Empty Field Impulse Response (WAV)

Resources

Model Checkpoint

Code Acknowlegdements

We borrow and adapt code snippets from GANSynth (and this PyTorch re-implementation), additional snippets from this PGGAN implementation, monodepth2, this GradCAM implementation, and more.

Citation

If you find the code, data, or models useful for your research, please consider citing our paper:

@InProceedings{Singh_2021_ICCV,
    author    = {Singh, Nikhil and Mentch, Jeff and Ng, Jerry and Beveridge, Matthew and Drori, Iddo},
    title     = {Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {286-295}
}

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
datasets		datasets
image2reverb		image2reverb
scripts		scripts
webpage		webpage
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
run_single_image.py		run_single_image.py
test.py		test.py
test_nn.py		test_nn.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image2Reverb

LATEST INFORMATION:

Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis

Dependencies

Image Requirements