Multi Algorithm Recognition

Built on a third party library. The official documentation for vosk can be found here. The library performs well in multi thread environments

officially supported algorithms

  • CMU Sphinx (works offline)

  • Google Speech Recognition

  • Google Cloud Speech API

  • Wit.ai

  • Microsoft Azure Speech

  • Microsoft Bing Voice Recognition (Deprecated)

  • Houndify API

  • IBM Speech to Text

  • Snowboy Hotword Detection (works offline)

  • Tensorflow

  • Vosk API (works offline)

  • OpenAI whisper (works offline)

  • Whisper API

Installation

To install Google Integration run the following snippet, which will install the required dependencies

pip install dronebuddylib[SPEECH_RECOGNITION_MULTI]

Usage

The Google integration module requires the following configurations to function

Required Configurations

  1. SPEECH_RECOGNITION_MULTI_ALGO_ALGORITHM_NAME - The maximum number of seconds the microphone listens before timing out.

Optional Configurations

  1. SPEECH_RECOGNITION_MULTI_ALGO_ALGO_MIC_TIMEOUT - The maximum number of seconds the microphone listens before timing out.

  2. SPEECH_RECOGNITION_MULTI_ALGO_ALGO_PHRASE_TIME_LIMIT - The maximum duration for a single phrase before cutting off.

  3. SPEECH_RECOGNITION_MULTI_ALGO_IBM_KEY - The IBM API key for using IBM speech recognition

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(AtomicEngineConfigurations.SPEECH_RECOGNITION_MULTI_ALGO_ALGORITHM_NAME,
                                 SpeechRecognitionMultiAlgoAlgorithmSupportedAlgorithms.GOOGLE.name)
engine = SpeechRecognitionEngine(SpeechRecognitionAlgorithm.MULTI_ALGO_SPEECH_RECOGNITION, engine_configs)

result = engine.recognize_speech(audio_steam=data)

How to use with the mic

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(AtomicEngineConfigurations.SPEECH_RECOGNITION_MULTI_ALGO_ALGORITHM_NAME,
                                 SpeechRecognitionMultiAlgoAlgorithmSupportedAlgorithms.GOOGLE.name)
engine = SpeechRecognitionEngine(SpeechRecognitionAlgorithm.MULTI_ALGO_SPEECH_RECOGNITION, engine_configs)

 while True:

    with speech_microphone as source:

        try:
            result = engine.recognize_speech(source)
            if result.recognized_speech is not None:
                intent = recognize_intent_gpt(intent_engine, result.recognized_speech)
                execute_drone_functions(intent, drone_instance, face_recognition_engine, object_recognition_engine,
                                        text_recognition_engine, voice_engine)
            else:
                logger.log_warning("TEST", "Not Recognized: voice ")

        except speech_recognition.WaitTimeoutError:
            engine.recognize_speech(source)

        time.sleep(1)  # Sleep to simulate work and prevent a tight loop

Output

The output will be given in the following json format

{
        'recognized_speech': "",
        'total_billed_time': ""
}
Where
  • recognized_speech - Text with the recognized speech

  • total_billed_time - if a paid service the billed time