Multi Algorithm Recognition¶
Built on a third party library. The official documentation for vosk can be found here. The library performs well in multi thread environments
officially supported algorithms
CMU Sphinx (works offline)
Google Speech Recognition
Google Cloud Speech API
Wit.ai
Microsoft Azure Speech
Microsoft Bing Voice Recognition (Deprecated)
Houndify API
IBM Speech to Text
Snowboy Hotword Detection (works offline)
Tensorflow
Vosk API (works offline)
OpenAI whisper (works offline)
Whisper API
Installation¶
To install Google Integration run the following snippet, which will install the required dependencies
pip install dronebuddylib[SPEECH_RECOGNITION_MULTI]
Usage¶
The Google integration module requires the following configurations to function
Required Configurations¶
SPEECH_RECOGNITION_MULTI_ALGO_ALGORITHM_NAME - The maximum number of seconds the microphone listens before timing out.
Optional Configurations¶
SPEECH_RECOGNITION_MULTI_ALGO_ALGO_MIC_TIMEOUT - The maximum number of seconds the microphone listens before timing out.
SPEECH_RECOGNITION_MULTI_ALGO_ALGO_PHRASE_TIME_LIMIT - The maximum duration for a single phrase before cutting off.
SPEECH_RECOGNITION_MULTI_ALGO_IBM_KEY - The IBM API key for using IBM speech recognition
engine_configs = EngineConfigurations({})
engine_configs.add_configuration(AtomicEngineConfigurations.SPEECH_RECOGNITION_MULTI_ALGO_ALGORITHM_NAME,
SpeechRecognitionMultiAlgoAlgorithmSupportedAlgorithms.GOOGLE.name)
engine = SpeechRecognitionEngine(SpeechRecognitionAlgorithm.MULTI_ALGO_SPEECH_RECOGNITION, engine_configs)
result = engine.recognize_speech(audio_steam=data)
How to use with the mic¶
engine_configs = EngineConfigurations({})
engine_configs.add_configuration(AtomicEngineConfigurations.SPEECH_RECOGNITION_MULTI_ALGO_ALGORITHM_NAME,
SpeechRecognitionMultiAlgoAlgorithmSupportedAlgorithms.GOOGLE.name)
engine = SpeechRecognitionEngine(SpeechRecognitionAlgorithm.MULTI_ALGO_SPEECH_RECOGNITION, engine_configs)
while True:
with speech_microphone as source:
try:
result = engine.recognize_speech(source)
if result.recognized_speech is not None:
intent = recognize_intent_gpt(intent_engine, result.recognized_speech)
execute_drone_functions(intent, drone_instance, face_recognition_engine, object_recognition_engine,
text_recognition_engine, voice_engine)
else:
logger.log_warning("TEST", "Not Recognized: voice ")
except speech_recognition.WaitTimeoutError:
engine.recognize_speech(source)
time.sleep(1) # Sleep to simulate work and prevent a tight loop
Output¶
The output will be given in the following json format
{
'recognized_speech': "",
'total_billed_time': ""
}
- Where
recognized_speech - Text with the recognized speech
total_billed_time - if a paid service the billed time