Skip to content

DarkKowalski/SparkTTS.cpp

Repository files navigation

SparkTTS inference with C++

Windows:

  • ONNX Runtime (DirectML backend) for BiCodec/Wav2Vec etc.
  • llama.cpp (Vulkan backend) for Qwen2.5-0.5B

macOS:

  • CoreML for BiCodec/Wav2Vec etc.
  • llama.cpp (Metal backend) for Qwen2.5-0.5B

Performance

With Q4-K quantized transformer, it can achieve Real-Time Factor (RTF) of approximately 0.15 and 300ms first audio sample latency on a NVIDIA RTX 4070 GPU.

How to build

Install Rust

Rust

Setup vcpkg

vcpkg

Build llama.cpp

Windows

  1. Make sure you are using x64 Native Tools Command Prompt for VS 2022

  2. Setup Vulkan dependencies, llama.cpp build doc

  3. Build and install with CMake

cd third_party\llama.cpp
cmake -B build -G Ninja -DGGML_VULKAN=ON -DLLAMA_CURL=OFF -DCMAKE_INSTALL_PREFIX=..\..\lib\llama -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config Release
cd ..\..

macOS (Apple Silicon)

pushd third_party/llama.cpp
cmake -B build -G Ninja -DLLAMA_CURL=OFF -DCMAKE_INSTALL_PREFIX=../../lib/llama -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config Release
popd

Build ONNX Runtime (Windows only with DirectML)

cd third_party\onnxruntime
python .\tools\ci_build/build.py ^
    --update ^
    --build ^
    --config Release ^
    --build_shared_lib ^
    --parallel ^
    --build_dir ./build ^
    --cmake_extra_defines "CMAKE_POLICY_VERSION_MINIMUM=3.5" ^
    --skip_tests ^
    --enable_lto ^
    --use_dml
cmake --install build\Release --config Release --prefix ..\..\lib\onnxruntime
cd ..\..

Build with CMake and Ninja

Windows

cmake --preset=vcpkg -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config Release && copy /Y build\src\*.dll install\tools\bin

macOS

cmake --preset=vcpkg -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config Release

How to use

C API is provided for C++ and other languages.

Example command line tool is provided to for performance tuning.

Acknowledgements

Models used in this project are from SparkAudio/Spark-TTS

Inspired by arghyasur1991/Spark-TTS-Unity

Third-party libraries used in this project:

About

SparkTTS inference with C++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published