Skip to content

Torch-TensorRT v2.9.0

Latest

Choose a tag to compare

@lanluo-nvidia lanluo-nvidia released this 17 Oct 15:58
8767d9b

PyTorch 2.9, CUDA 13.0 TensorRT 10.13, Python 3.13

Torch-TensorRT 2.9.0 Linux x86-64 and Windows targets PyTorch 2.9, TensorRT 10.13, CUDA 13.0, 12.8, 12.6 and Python 3.10 ~ 3.13

Python

x86-64 Linux and Windows

aarch64 SBSA Linux and Jetson Thor

NOTE: You must explicitly install TensorRT or use system installed TensorRT wheels for aarch64 platforms

uv pip install torch torch-tensorrt tensorrt 

aarch64 Jetson Orin

 - no torch_tensorrt 2.9 release for Jetson Orin, please continue using torch_tensorrt 2.8 release

C++

x86-64 Linux and Windows

  • CUDA 13.0 Tarball / Zip

Deprecations

FX Frontend

The FX frontend was the precursor to the Dynamo frontend and a number of Dynamo components were shared between the two. Now that the Dynamo frontend is stable and all shared components have been decoupled we will no longer ship the FX frontend in binary releases starting in H1Y26. The FX frontend will remain in the source tree for the foreseeable future so source builds can reinstall the frontend if necessary.

New Features

LLM and VLM improvements

In this release, we’ve introduced several key enhancements:

  • Sliding Window Attention in SDPA Converter : Added support for sliding window attention, enabling successful compilation of the Gemma3 model (Gemma3-1B).
  • Dynamic Custom Lowering Passes
    Refactored the lowering framework to allow users to dynamically register custom passes based on the configuration of Hugging Face models.
  • Vision-Language Model (VLM) Support
    • Added support for Eagle2 and Qwen2.5-VL models via the new run_vlm.py utility.
    • run_vlm.py enables compilation of both vision and language components of a VLM model. It also supports KV caching for efficient VLM generation.

See the documentation for detailed instructions on running these models.

TensorRT-RTX

TensorRT-RTX is a JIT-first version of TensorRT. Where as TensorRT will perform tactic selection and fusions during a build phase. TensorRT-RTX allows you to distribute builds prior to specializing for specific hardware so that one GPU agnostic package can be distributed to all users of your builds. Then on first use, TensorRT RTX will tune for the specific hardware your users are running. Torch-TensorRT-RTX is a build of Torch-TensorRT that uses the TensorRT-RTX compiler stack inplace of standard TensorRT. All APIs are identical to Torch-TensorRT, however, some features such as weak-typing and at compile time post training quantization are not supported.

Improvements

  • Closed a number of performance gaps between Torch-TensorRT and ONNX TensorRT constructed graphs

What's Changed

Full Changelog: v2.8.0...v2.9.0