Godot Whisper
Features
- Realtime audio transcribe.
- Audio transcribe with recorded audio.
- Runs on separate thread.
- Metal for Apple devices.
- OpenCL for rest.
How to install
Go to a github release, copy paste the addons folder to the demo folder. Restart godot editor.
Or install from Godot Whisper - Speech to Text - Godot Asset Library
Requirements
- A language model, can be downloaded in godot editor.
AudioStreamToText
AudioStreamToText
- this node can be used in editor to check transcribing. Simply add a WAV audio source and click start_transcribe button.
Normal times for this, using tiny.en model are about 0.3s. This only does transcribing.
CaptureStreamToText
This runs also resampling on the audio(in case mix rate is not exactly 16000 it will process the audio to 16000). Then it runs every transcribe_interval transcribe function.
Initial Prompt
For Chinese, if you want to select between Traditional and Simplified, you need to provide an initial prompt with the one you want, and then the model should keep that same one going. See Whisper Discussion #277.
Also, if you have problems with punctuation, you can give it an initial prompt with punctuation. See Whisper Discussion #194.
Language Model
Go to any StreamToText
node, select a Language Model to Download and click Download. You might have to alt tab editor or restart for asset to appear. Then, select language_model
property.
Global settings
Go to Project β Project Settings β General β Audio β Input (Check Advance Settings).
You will see a bunch of settings there.
Also, as doing microphone transcribing requires the data to be at a 16000 sampling rate, you can change the audio driver mix rate to 16000: audio/driver/mix_rate
. This way the resampling wonβt need to do any work, winning you some valuable 50-100ms for larger audio, but at the price of audio quality.