Request for advice on custom wake word model training and evaluating #227

sangheonEN · 2024-12-24T02:49:32Z

sangheonEN
Dec 24, 2024

Hello. I am writing this to explain my current situation and gain an understanding of open wake word learning.

Currently, I am using training_models.ipynb to learn with my dataset.

Dataset used:
- Positive_sample: "hey thomas" / 48 clips
- Negative_sample: Korean speech audio data / 319 clips
Training content: Refer to the attached training_models.ipynb
Evaluation Content: I tested it with the attached hey_thomas_female_test_sample.wav file and the prediction value did not increase during the time when hey thomas was spoken, and the prediction value only stayed at the maximum range of 0.008, so it seems that the learning model was not trained properly. See the graph below.

scores = oww.predict_clip("/home/openWakeWord/sample_data/hey_thomas_female_test_sample.wav")

plt.figure()
_ = plt.plot([i["hey_thomas2"] for i in scores])

Question

1) Am I doing the training correctly?
- An error occurs in the mixed_clips, labels, background_clips = next(mixing_generator) code. (Refer to the attached training_models.ipynb)
- An error occurs in the openwakeword.data.trim_mmap(output_file) code. (Refer to the attached training_models.ipynb)
- When I set batch_size = 8 and perform training, the following problem occurs.

"---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[40], line 33
31 tp = sum(predictions.flatten()[y.flatten() == 1] >= 0.5)
32 fn = sum(predictions.flatten()[y.flatten() == 1] < 0.5)
---> 33 history['recall'].append(float(tp/(tp+fn).detach().numpy()))

AttributeError: 'int' object has no attribute 'detach'"

2) I would like to get some advice on how to supplement the lack of positive samples.

Currently, I have about 2 million wav files for negative samples. However, since there are only 50 positive samples, I expect a class imbalance problem to occur. So, the information I found out on my own is as follows. However, I could not find the code for data augmentation.

"https://github.com/dscripka/openWakeWord/blob/main/docs/synthetic_data_generation.md" Here, you mentioned the synthetic data augmentation technique, but I can understand the content, but do I have to develop the code myself? I wonder if there is a code that I can refer to. I don't think I can develop the code myself. I would appreciate it if you could tell me the development method for data augmentation for positive samples.

openwakeword_advice_data.zip

JocelynVelarde · 2024-12-27T21:33:37Z

JocelynVelarde
Dec 27, 2024

Hello! I’ve been also trying to generate my own synthetic audio files from that repo, still I’m facing some different issues, because I’m not very sure on where to use them.

17 replies

sangheonEN Jan 8, 2025
Author

Are you saying that when you tested with a microphone, it worked well?

On the contrary, when I tested with a microphone, it didn't work well, but you said that the model trained by following this colab code https://github.com/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb worked well? Could you share the code for testing with a microphone?

And do you know if it is possible to generate these models embedding_model.onnx and melspectrogram.onnx using my custom data, as I wrote here: #230?

JocelynVelarde Jan 8, 2025

yes it worked fine, this is the model i trained, you can also test it on that script within the repo. As for the model notebook let me share it.

I only did two updates to this script. Modifying the openwakeword/examples/custom_model.yml file with the wake word I want, on that case the word is “no”. And then just typing “no” on the config["target_phrase"] = ["no”]

Have you tried this before?

JocelynVelarde Jan 8, 2025

As for the melspectrogram.onnx I’m pretty sure you can create your own and just replace it on the downloads at the top of the script. Haven’t tried that tho^^

sangheonEN Jan 8, 2025
Author

automatic_model_training_simple.zip
Yes, I also followed this process https://github.com/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb and set config["target_phrase"] = ["hey thomas"] to create a hey thomas model. However, the hey thomas voice data that I spoke through the microphone was input to the model and the output prediction value was in the range of 0.001~0.002, so normal learning did not occur. So what I tried is exactly what I explained above.

I created 55,000 samples of Hey thomas through VITS and PIPER TTS and used them as positive samples. Also, I used 23,000 data from any Korean dataset for the negative samples.

Also, the code used is https://github.com/dscripka/openWakeWord/blob/main/notebooks/training_models.ipynb

I used the code.

However, the learning seems to be good, but when I say hey thomas through the microphone, it doesn't work properly, and The prediction value stayed at the level of 0.00002~9.

Should I implement the code that integrates this process with https://github.com/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb and this process https://github.com/dscripka/openWakeWord/blob/main/notebooks/training_models.ipynb myself..? I don't know why the performance is not good.

sangheonEN Jan 9, 2025
Author

When training the model, the "hey thomas" positive sample (class0) is a TTS model generated based on English pronunciation, and the negative sample (class1) is an unspecified number of voice data without "hey thomas". It was trained using about 22,000 voice data provided here (https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=109). The difference in pronunciation of positive sample/negative sample will be divided into English/Korean. Will this have an effect?

Request for advice on custom wake word model training and evaluating #227

Uh oh!

Uh oh!

sangheonEN Dec 24, 2024

Replies: 1 comment · 17 replies

Uh oh!

JocelynVelarde Dec 27, 2024

Uh oh!

Uh oh!

sangheonEN Jan 8, 2025 Author

Uh oh!

JocelynVelarde Jan 8, 2025

Uh oh!

JocelynVelarde Jan 8, 2025

Uh oh!

Uh oh!

sangheonEN Jan 8, 2025 Author

Uh oh!

sangheonEN Jan 9, 2025 Author

sangheonEN
Dec 24, 2024

Replies: 1 comment 17 replies

JocelynVelarde
Dec 27, 2024

sangheonEN Jan 8, 2025
Author

sangheonEN Jan 8, 2025
Author

sangheonEN Jan 9, 2025
Author