This project is a RESTful API server that provides endpoints for transcribing and translating audio files. The APIs are compitable with OpenAI APIs of transcriptions and translations.
The following is the list of supported audio formats and codecs:
- 
The formats supported are
caf,isomp4,mkv,ogg,aiff,wav. - 
The codecs supported are
aac,adpcm,alac,flac,mp1,mp2,mp3,pcm,vorbis. 
Note
The project is still under active development. The existing features still need to be improved and more features will be added in the future.
- 
Install
WasmEdge v0.14.1withwasi_nn-whisperplugincurl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.14.1 - 
Deploy
wasi_nn-whisperplugin# Download whisper plugin for Mac Apple Silicon curl -LO https://github.com/WasmEdge/WasmEdge/releases/download/0.14.1/WasmEdge-plugin-wasi_nn-whisper-0.14.1-darwin_arm64.tar.gz # Unzip the plugin to $HOME/.wasmedge/plugin tar -xzf WasmEdge-plugin-wasi_nn-whisper-0.14.1-darwin_arm64.tar.gz -C $HOME/.wasmedge/plugin
 
- 
Download
whisper-api-server.wasmbinarycurl -LO https://github.com/LlamaEdge/whisper-api-server/releases/download/0.3.0/whisper-api-server.wasm
 - 
Download model
ggmlwhisper models are available from https://huggingface.co/ggerganov/whisper.cpp/tree/mainIn the following command,
ggml-medium.binis downloaded as an example. You can replace it with other models.curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin
 - 
Start server
wasmedge --dir .:. whisper-api-server.wasm -m ggml-medium.bin
To start the server on other port, use
--socket-addrto specify the port you want to use, for example:wasmedge --dir .:. whisper-api-server.wasm -m ggml-medium.bin --socket-addr 0.0.0.0:10086
To start the server with api-key, set the environment variable
API_KEYto specify the api-key, for example:wasmedge --dir .:. --env API_KEY=your_api_key whisper-api-server.wasm -m ggml-medium.bin
 
- 
Download audio file
curl -LO https://github.com/LlamaEdge/whisper-api-server/raw/main/data/test.wav
 - 
Send
curlrequest to the transcriptions endpointcurl --location 'http://localhost:8080/v1/audio/transcriptions' \ --header 'Content-Type: multipart/form-data' \ --form 'file=@"test.wav"'
If everything is set up correctly, you should see the following generated transcriptions:
{ "text": "[00:00:00.000 --> 00:00:03.540] This is a test record for Whisper.cpp" } 
- 
Download audio file
curl -LO https://github.com/LlamaEdge/whisper-api-server/raw/main/data/test_cn.wav
This audio contains a Chinese sentence,
这里是中文广播, the English meaning isThis is a Chinese broadcast. - 
Send
curlrequest to the translations endpointcurl --location 'http://localhost:8080/v1/audio/translations' \ --header 'Content-Type: multipart/form-data' \ --form 'file=@"test.wav"' \ --form 'language="zh"'
Note that the correct value of
languagecan be found in ISO-639-1If everything is set up correctly, you should see the following generated transcriptions:
{ "text": "[00:00:00.000 --> 00:00:04.000] This is a Chinese broadcast." } 
To build the whisper-api-server.wasm binary, you need to have the Rust toolchain installed. If you don't have it installed, you can install it by following the instructions on the Rust website.
If you are working on macOS, you need to download the wasi-sdk from https://github.com/WebAssembly/wasi-sdk/releases; and then, set the WASI_SDK_PATH environment variable to the path of the wasi-sdk directory, and set CC environment variable to the clang of wasi-sdk, for example:
export WASI_SDK_PATH /path/to/wasi-sdk-22.0
export CC="${WASI_SDK_PATH}/bin/clang --sysroot=${WASI_SDK_PATH}/share/wasi-sysroot"Now, you can build the whisper-api-server.wasm binary by following the steps below:
- 
Clone the repository
git clone https://github.com/LlamaEdge/whisper-api-server.git
 - 
Build the
whisper-api-server.wasmbinarycd whisper-api-server cargo build --releaseIf the build is successful, you should see the
whisper-api-server.wasmbinary in thetarget/wasm32-wasip1/releasedirectory. 
$ wasmedge whisper-api-server.wasm -h
Whisper API Server
Usage: whisper-api-server.wasm [OPTIONS] --model <MODEL>
Options:
  -n, --model-name <MODEL_NAME>    Model name [default: default]
  -a, --model-alias <MODEL_ALIAS>  Model alias [default: default]
  -m, --model <MODEL>              Path to the whisper model file
      --threads <THREADS>          Number of threads to use during computation [default: 4]
      --processors <PROCESSORS>    Number of processors to use during computation [default: 1]
      --task <TASK>                Task type [default: full] [possible values: transcribe, translate, full]
      --no-audio-preprocessor      Do not pre-process input audio files
      --port <PORT>                Port number [default: 8080]
      --socket-addr <SOCKET_ADDR>  Socket address of LlamaEdge API Server instance. For example, `0.0.0.0:8080`
  -h, --help                       Print help (see more with '--help')
  -V, --version                    Print version