Windows:
- ONNX Runtime (DirectML backend) for BiCodec/Wav2Vec etc.
- llama.cpp (Vulkan backend) for Qwen2.5-0.5B
macOS:
- CoreML for BiCodec/Wav2Vec etc.
- llama.cpp (Metal backend) for Qwen2.5-0.5B
With Q4-K quantized transformer, it can achieve Real-Time Factor (RTF) of approximately 0.15 and 300ms first audio sample latency on a NVIDIA RTX 4070 GPU.
-
Make sure you are using
x64 Native Tools Command Prompt for VS 2022 -
Setup Vulkan dependencies, llama.cpp build doc
-
Build and install with CMake
cd third_party\llama.cpp
cmake -B build -G Ninja -DGGML_VULKAN=ON -DLLAMA_CURL=OFF -DCMAKE_INSTALL_PREFIX=..\..\lib\llama -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config Release
cd ..\..pushd third_party/llama.cpp
cmake -B build -G Ninja -DLLAMA_CURL=OFF -DCMAKE_INSTALL_PREFIX=../../lib/llama -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config Release
popdcd third_party\onnxruntime
python .\tools\ci_build/build.py ^
--update ^
--build ^
--config Release ^
--build_shared_lib ^
--parallel ^
--build_dir ./build ^
--cmake_extra_defines "CMAKE_POLICY_VERSION_MINIMUM=3.5" ^
--skip_tests ^
--enable_lto ^
--use_dml
cmake --install build\Release --config Release --prefix ..\..\lib\onnxruntime
cd ..\..cmake --preset=vcpkg -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config Release && copy /Y build\src\*.dll install\tools\bincmake --preset=vcpkg -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --config ReleaseC API is provided for C++ and other languages.
Example command line tool is provided to for performance tuning.
Models used in this project are from SparkAudio/Spark-TTS
Inspired by arghyasur1991/Spark-TTS-Unity
Third-party libraries used in this project: