AI-VTuber-System

A graphical system program that allows you to quickly create your own AI VTuber for free.

Tutorial Video

https://www.youtube.com/watch?v=Hwss_p2Iroc

User Manual

https://docs.google.com/document/d/16DU-DJKMaC-15K6iShLd9ioXc8VqjTLqgMPswsjPjF0/edit?usp=sharing

Installation Guide

Python >= 3.8, install the main dependencies:

pip3 install -r requirements.txt

For the specific PyTorch packages which require a special handling, use the following command:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Nvidia

Latest GPU Driver https://www.nvidia.com.tw/Download/index.aspx?lang=tw

CUDA Toolkit 12.1.1 https://developer.nvidia.com/cuda-12-1-1-download-archive

cuDNN https://developer.nvidia.com/cudnn-downloads

Whisper

Excerpt from https://github.com/openai/whisper

There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model. The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~10x
base	74 M	`base.en`	`base`	~1 GB	~7x
small	244 M	`small.en`	`small`	~2 GB	~4x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x
turbo	809 M	N/A	`turbo`	~6 GB	~8x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models. Additionally, the turbo model is an optimized version of large-v3 that offers faster transcription speed with a minimal degradation in accuracy.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

Whisper Model Download Links

tiny.en: https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt

tiny: https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt

base.en: https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt

base: https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt

small.en: https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt

small: https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt

medium.en: https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt

medium: https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt

large-v1: https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt

large-v2: https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt

large-v3: https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

turbo: https://openaipublic.azureedge.net/main/whisper/models/aff26ae408abcba5fbf8813c21e62b0941638c5f6eebfb145be0c9839262a19a/large-v3-turbo.pt

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
AIVT_Character/Midokyoko_Kyurei		AIVT_Character/Midokyoko_Kyurei
GUI_control_panel		GUI_control_panel
Google/gemini		Google/gemini
Live_Chat		Live_Chat
My_Tools		My_Tools
OBS_websocket		OBS_websocket
OpenAI		OpenAI
Sentiment_Analysis		Sentiment_Analysis
TextToSpeech		TextToSpeech
VTubeStudioPlugin		VTubeStudioPlugin
.gitattributes		.gitattributes
AIVT_Config.py		AIVT_Config.py
AI_Vtuber_GUI.py		AI_Vtuber_GUI.py
AI_Vtuber_UI.py		AI_Vtuber_UI.py
LICENSE.txt		LICENSE.txt
Mic_Record.py		Mic_Record.py
Play_Audio.py		Play_Audio.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-VTuber-System

Tutorial Video

User Manual

Installation Guide

Nvidia

Whisper

Whisper Model Download Links

Star History

About

Releases 5

Packages

Languages

License

AIVTDevPKevin/AI-VTuber-System

Folders and files

Latest commit

History

Repository files navigation

AI-VTuber-System

Tutorial Video

User Manual

Installation Guide

Nvidia

Whisper

Whisper Model Download Links

Star History

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages