This repository contains a speech command system that processes audio inputs, transcribes them into text, and executes corresponding commands. The system utilizers whisber and a distilled SBERT paraphrase model, or BERT large depending on which pipeline you choose to use.
To get started with the speech command system, clone the repository and install the required dependencies:
git clone https://github.com/EthanEpp/saiCommandExecution/
git clone https://huggingface.co/ethan3048/saiCommandProcessor
cd ./saiCommandExecution
pip install -r requirements.txt
Run the main script to start the system:
python main.py
This will initialize the system, start listening for 5 seconds for a command, then output the transcribed audio and the mapped command.
- numpy==1.26.4
- openai_whisper==20231117
- PyAudio==0.2.14
- pydub==0.25.1
- pytest==8.2.2
- sacremoses==0.1.1
- scipy==1.13.1
- scikit_learn==1.4.2
- sentence_transformers==3.0.1
- sounddevice==0.4.7
- spacy==3.7.5
- torch==2.3.1
- tqdm==4.66.4
- transformers==4.41.2
This notebook is designed to be run in colab, and goes through model loading and inference either through microphone input that is transcribed by whisper or direct text input.
This module converts spoken language into text using the Whisper model from OpenAI.
This module contains the architecture for the slot filling and intent detection that maps transcribed speech inputs to their corresponding command and necessary tags.
This module contains utility functions for handling audio files such as converting them to wav.
This module contains the inference function for running inference on a model.
This script converts audio files to wav.
Example usage in home directory: python scripts/convert_audio.py input_dir output_dir
This script is used for testing whisper to make sure the model and dependencies are loaded correctly.
Example usage in home directory: python scripts/whisper_test.py audio_files
This script is used for when you would like to train a new CNET model.
The repository includes unit and integration tests to ensure the functionality of the modules.
tests/test_speech_to_text_unit.py
: Tests for the speech-to-text module.tests/test_audio_utils.py
: Tests for the audio utilities module.
tests/test_speech_to_text_integration.py
: Integration tests for the speech-to-text module.
The project uses GitHub Actions for continuous integration. The configuration file is located at .github/workflows/ci.yml
.
For any questions or suggestions, please contact Ethan Epp at [[email protected]].