Google Voice Recognition

The official documentation for vosk can be found here. Follow the steps to create the cloud console.

  1. Installation: To use Google Speech Recognition, you first need to set up the Google Cloud environment and install necessary SDKs or libraries in your development environment.

  2. API Key and Setup: Obtain an API key from Google Cloud and configure it in your application. This key is essential for authenticating and accessing Google’s speech recognition services.

  3. Audio Input and Processing: Your application should be capable of capturing audio input, which can be sent to Google’s speech recognition service. The audio data needs to be in a format compatible with Google’s system.

  4. Handling the Output: Once Google processes the audio, it returns a text transcription. This output can be used in various ways, such as command interpretation, text analysis, or as input for other systems.

  5. Customization: Google Speech Recognition allows customization for specific vocabulary or industry terms, enhancing recognition accuracy for specialized applications.

Installation

To install Google Integration run the following snippet, which will install the required dependencies

pip install dronebuddylib[SPEECH_RECOGNITION_GOOGLE]

Usage

The Google integration module requires the following configurations to function

  1. SPEECH_RECOGNITION_GOOGLE_SAMPLE_RATE_HERTZ -

  2. SPEECH_RECOGNITION_GOOGLE_LANGUAGE_CODE -

  3. SPEECH_RECOGNITION_GOOGLE_ENCODING -

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_SAMPLE_RATE_HERTZ, 44100)
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_LANGUAGE_CODE, "en-US")
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_ENCODING, "LINEAR16")

engine = SpeechToTextEngine(SpeechRecognitionAlgorithm.GOOGLE_SPEECH_RECOGNITION, engine_configs)
result = engine.recognize_speech(audio_steam=data)

How to use with the mic

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_SAMPLE_RATE_HERTZ, 44100)
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_LANGUAGE_CODE, "en-US")
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_ENCODING, "LINEAR16")

engine = SpeechToTextEngine(SpeechRecognitionAlgorithm.GOOGLE_SPEECH_RECOGNITION, engine_configs)

with sr.Microphone() as source:
    print("Listening for commands...")
    audio = recognizer.listen(source)

    try:
        # Recognize speech using Google Speech Recognition
        command = engine.recognize_speech(audio)
        print(f"Recognized command: {command}")

        # Process and execute the command
        control_function(command)
    except e:
        print(e)

Output

The output will be given in the following json format

{
        'recognized_speech': "",
        'total_billed_time': ""
}
Where
  • recognized_speech - Text with the recognized speech

  • total_billed_time - if a paid service the billed time