|
|
A voice-activated AI assistant modeled after HAL 9000 from 2001: A Space Odyssey. The assistant is built using a Raspberry Pi Zero 2 W, integrated with dual microphones, a speaker, and status LEDs - all housed within a 1:1 scale HAL 9000 model kit. The system activates on the wake phrase “Hey HAL” using Porcupine for local wake word detection and processes spoken input using fully self-hosted services for speech-to-text (Vosk), language generation (Ollama), and text-to-speech (Piper). The text-to-speech voice is custom-trained using samples from the film to closely match HAL’s original tone. A demo of the voice assistant can be viewed here. |
This project combines the following hardware and software components:
- Raspberry Pi Zero 2 W - The voice assistant's main computer. Runs the Python assistant script and handles communication with STT/LLM/TTS services via local network.
- ReSpeaker 2-Mics Pi Hat - Provides two onboard microphones, a JST 2.0 speaker output, and three programmable LEDs.
- adafruit Mono Enclosed Speaker (1W 8 Ohm) - Compatible speaker with JST 2.0 connector, used for audio output.
- Moebius Models HAL 9000 1:1 Scale Model Kit - Enclosure used to house hardware, modeled after HAL 9000 from 2001: A Space Odyssey.
- Homelab Server – Hosts all compute-heavy services (speech-to-text, language generation, and text-to-speech) over the LAN. Provides fast, local, offline processing with no reliance on cloud services.
- OS: Proxmox VE running Ubuntu Server VM
- CPU: Intel Core i5-8400
- RAM: 16 GB DDR4
- GPU: NVIDIA GeForce GTX 1060 6GB (used for LLM acceleration and Piper TTS training)
- Storage: 50 GB SSD allocated to VM
- Porcupine - Used for "Hey Hal" wake word detection, which runs locally on the Raspberry Pi Zero 2 W.
- Vosk - Speech-to-text server used to transcribe recorded voice input into text.
- Ollama - Runs the LLM used for generating responses.
- It uses the latest llama3 model, featuring 8 billion parameters.
- Piper - Text-to-speech engine that converts text into audible speech in real-time. Also used to train a Hal 9000 text-to-speech model using audio samples from the film.
- The HAL 9000 .onnx speech-to-text model can be found on my HuggingFace, along with its corresponding dataset.
The voice assistant is driven by a sequence of self-hosted services, coordinated through a Python script (app.py) running on a Raspberry Pi Zero 2 W.
The assistant runs continuously in a listening state, waiting for the wake phrase “Hey HAL.” Wake word detection is handled locally on the Raspberry Pi using Porcupine. When the phrase is recognized, the system begins actively recording voice input until a silence threshold is met. The recorded audio is then sent over the local network to my homelab server, where it is first transcribed by a speech-to-text service (Vosk). The transcribed text is then passed to a large language model (Ollama), which generates a textual response. This response is then sent to a text-to-speech engine (Piper), which synthesizes speech audio. The audio is streamed back to the Raspberry Pi and played through the speaker, enabling a fully self-hosted, offline voice interaction.
The process is illustrated below:
sequenceDiagram
autonumber
participant User
participant WakeWord as Wake Word Detection<br>(Porcupine)
participant Assistant as Voice Assistant
participant STT as Speech-to-Text<br>(Vosk)
participant LLM as Large Language Model<br>(Ollama)
participant TTS as Text-to-Speech<br>(Piper)
User->>WakeWord: Speak wake word ("Hey HAL")
WakeWord-->>Assistant: Detect wake word
Assistant-->>User: Turn LED on
Assistant->>Assistant: Start recording for user input
User->>Assistant: Speak voice input
Assistant->>Assistant: Detect silence
Assistant->>STT: Send audio for transcription
STT-->>Assistant: Return transcribed text
Assistant->>LLM: Send transcribed text to LLM
LLM-->>Assistant: Return LLM text response
Assistant->>TTS: Send LLM text response for speech synthesis
TTS-->>Assistant: Stream synthesized audio
Assistant-->>User: Play response through speaker
Assistant-->>User: Turn LED off
Before setting up the assistant, ensure the following conditions are met:
- A Raspberry Pi Zero 2 W is set up and connected to the same local network as your server.
- A ReSpeaker 2-Mics Pi Hat is correctly installed and initialized on the Raspberry Pi. Follow the driver setup instructions for the hat on seeed studio's website.
- A separate computer or server must be available on the same network to host the required backend services:
The easiest way to run Vosk is by running the WebSocket server using Docker:
docker run -d -p 2700:2700 alphacep/kaldi-en:latestMore information for running the server can be found on the official documentation for Vosk. Ensure Vosk is running the WebSocket server on port 2700.
Ollama can be installed using the following command:
curl -fsSL https://ollama.com/install.sh | shBy default, Ollama runs on port 11434. Once Ollama is installed, you will need to run a large language model of your choosing. For this project, I used the latest llama3 model, featuring 8 billion parameters.
ollama run llama3:latestIf you decide to use a different model, you will need to change the OLLAMA_MODEL constant in app.py.
To set up the Piper Python HTTP server, I recommend following Thorsten-Voice's tutorial on YouTube. He provides excellent resources for setting up a Piper TTS environment, as well as training your own voices. More information on Piper can be found on their GitHub. Ensure Piper is running the HTTP server on port 5000.
If you'd like to use a custom text-to-speech model, you can run the HTTP server with my custom HAL 9000 model, available on my HuggingFace page. Alternatively, you can train your own model using the corresponding dataset.
With all the services running on their respective ports (Vosk, Ollama, and Piper), the Raspberry Pi client can be set up.
clone https://github.com/campwill/hal-voice-assistant.git
cd hal-voice-assistantCreate the virtual environment and install all the required dependencies.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtIn app.py, set the correct device index for your ReSpeaker 2-Mics Pi Hat:
RESPEAKER_INDEX = 1 # Change to your specific device indexTo find the device index for your audio device:
arecord -lCreate a .env file in the root of the project:
nano .envPICOVOICE_KEY=your_picovoice_api_key
PICOVOICE_MODEL_PATH=/absolute/path/to/your/model.ppnGet your Picovoice API key and your Porcupine wake word model (.ppn) from Picovoice Developer Console.
python app.pyThe Python script can be ran as a service to start automatically once the Raspberry Pi turns on. More information about running scripts on startup can be found here.
demo.mp4
For the physical enclosure of my voice assistant, I used the Moebius Models HAL 9000 1:1 Scale Model Kit. This kit arrives as a set of unassembled plastic components. I painted the body with Flat Black for the faceplate and Metallic Aluminum for the frame. For the lens components, I used Elmer’s Glue to secure them without fogging or damaging the clear plastic.
The speaker grill included in the kit was a solid plastic piece with no perforations. To make it functional, I drilled out all of the holes to allow audio to pass through clearly.
Below are some pictures of the assembly process.
|
|
|
|
|
|
|
|
I was able to mount the Raspberry Pi in a position where the LED aligned with HAL 9000’s eye. For now, I used cardboard and tape as a temporary solution (I don't own a 3D printer yet). Below are some pictures of the components mounted inside the model kit.
|
|
|
- I hope to shorten the response time by exploring ways to optimize the Vosk STT pipeline, such as reducing silence detection lag or modifying WebSocket handling.
- Additionally, instead of sending three separate requests to my homelab server, I may create a unified API endpoint that handles the STT, LLM, and TTS stages in a single request to minimize network overhead.
- The audio for the ReSpeaker 2-Mics Pi Hat has also been giving me issues, especially after the Raspberry Pi reboots. I will be tinkering with the ReSpeaker 2-Mics Pi Hat's audio functionality to see if I can fix the volume issues.
- I would also like to implement home assistant.