Google Cloud Speech-to-Text

Johocen_ai · October 2, 2024, 3:06pm

Describe the problem/error/question

How to setup HTTP Request to use Google Cloud Speech-to-Text api ?
Can I use Google Cloud Natural Language OAuth2 API or must use Google Service Account account ?
I had run the on the Google cloud successfully and got the code as following, how can I implement into n8n?

<import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

MAX_AUDIO_LENGTH_SECS = 8 * 60 * 60

def run_batch_recognize():

Instantiates a client.

client = SpeechClient()

The output path of the transcription result.

gcs_output_folder = “gs://bucket”

The name of the audio file to transcribe:

audio_gcs_uri = “gs://bucket/AUDIO_FILE.wav”

config = cloud_speech.RecognitionConfig(
explicit_decoding_config=cloud_speech.ExplicitDecodingConfig(
encoding=cloud_speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=24000,
audio_channel_count=1
),
features=cloud_speech.RecognitionFeatures(
enable_word_confidence=true,
enable_word_time_offsets=true,
multi_channel_mode=cloud_speech.RecognitionFeatures.MultiChannelMode.SEPARATE_RECOGNITION_PER_CHANNEL,
),
model=“long”,
language_codes=[“en-US”],
)

output_config = cloud_speech.RecognitionOutputConfig(
gcs_output_config=cloud_speech.GcsOutputConfig(uri=gcs_output_folder),
)

files = [cloud_speech.BatchRecognizeFileMetadata(uri=audio_gcs_uri)]

request = cloud_speech.BatchRecognizeRequest(
recognizer=“projects/PROJECT_ID/locations/global/recognizers/_”,
config=config,
files=files,
recognition_output_config=output_config,
)
operation = client.batch_recognize(request=request)

print(“Waiting for operation to complete…”)
response = operation.result(timeout=3 * MAX_AUDIO_LENGTH_SECS)
print(response)>

What is the error message (if any)?

JSON parameter need to be an valid JSON

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (n8n cloud):
Operating system: Windows

n8n · October 2, 2024, 3:06pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

ihortom · October 2, 2024, 8:23pm

Welcome to the community @Johocen_ai !

Tip for sharing information

Pasting your n8n workflow

Ensure to copy your n8n workflow and paste it in the code block, that is in between the pairs of triple backticks, which also could be achieved by clicking </> (preformatted text) in the editor and pasting in your workflow.

```
<your workflow>
```

That implies to any JSON output you would like to share with us.

I believe the following Google doc should guide you to how to utilize HTTP Request node, Speech-to-Text request construction | Cloud Speech-to-Text Documentation | Google Cloud.

In the simplest and basic form the request body would have to look something like this

{
    "config": {
        "encoding": "LINEAR16",
        "sampleRateHertz": 16000,
        "languageCode": "en-US",
    },
    "audio": {
        "uri": "gs://bucket-name/path_to_audio_file"
    }
}

Johocen_ai · October 3, 2024, 4:15am

Hi @ihortom
Thanks for the quick help.
Believe me, I had tried very hard but got confused and frustrated…

Using Google Cloud Speech-to-Text
Tested with the code

{
    "config": {
        "encoding": "LINEAR16",
        "sampleRateHertz": 24000,
        "languageCode": "zh-TW"
    },
    "audio": {
        "uri": "gs://audio4stt/audio-files/Sam702737ff5a3945c5a44d97b2841cf04c.mp3"
    }
}

It works successfully but can not find expected Transcription in the Output

[
  {
    "totalBilledTime": "6s",
    "requestId": "7181677674020407336",
    "usingLegacyModels": true
  }
]

Besides, I was planning to use Google Drive as Trigger, when a new audio file uploaded, it will start Speech-to-Text. But if the audio uri is in the Google Cloud Storage, how can I move the file?

Using OpenAI Transcribe a Recording
Can not find the binary file ‘data’ to Input Data Field Name in the previous node Google Drive Trigger, which field is for binary file? or should I add a converter?

Thanks for your help.

ihortom · October 3, 2024, 5:58pm

Hey @Johocen_ai

It looks like GCS is the only option available as the source of the files for Google Cloud Speech-to-Text. This shouldn’t be a problem though as you can use Google Cloud Storage node to upload the file from Google Drive.

The doc section explaining how the output is handled is Konstruksi permintaan Speech-to-Text | Dokumentasi Cloud Speech-to-Text | Google Cloud.

For a start, it depends on what mode you use synchronous or asynchronous. If it is a synchronous mode the output is expected to be in the form

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.98267895,
          "transcript": "YOUR_TRANSCRIPT_GOES_HERE"
        }
      ]
    }
  ]
}

They also point out

If no speech from the supplied audio could be recognized, then the returned results list will contain no items. Unrecognized speech is commonly the result of very poor-quality audio, or from language code, encoding, or sample rate values that do not match the supplied audio.

Each synchronous Speech-to-Text API response returns a list of results, rather than a single result containing all recognized audio. The list of recognized audio (within the transcript elements) will appear in contiguous order.

The asynchronous output has a different form

{
  "name": "operation_name",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata"
    "progressPercent": 34,
    "startTime": "2016-08-30T23:26:29.579144Z",
    "lastUpdateTime": "2016-08-30T23:26:29.826903Z"
  }
}

HTTP Request node won’t be able to return the actual transcript output in that case.

The actual binary file has to be available in the node immediately preceding the node where this binary is required. Otherwise it will fail. If Google Drive node with the downloaded binary does not immediately precede OpenAI, you need to bring it in to the node that does. See the solutions demoed in [onedrive via Graph HTTP] why binary don't upload if not exactly previous node? - #2 by ihortom.

Johocen_ai · October 4, 2024, 4:49am

Hi @ihortom

Thank you very much for your detailed instructions.

Finally, at least one workflow is working…Using OpenAI Transcribe a Recording.
When Google Drive to be used as Trigger, it will not include binary file, so it need to have Google Drive in 2 steps.

But I still can not figure out how to set the audio uri in the Google Cloud Speech-to-Text api as variable instead of a fixed file path.

It will be nice to use Google Cloud Speech-to-Text api since it provides enableSeparateRecognitionPerChannel, which is not available in OpenAI Transcribe a Recording.

Thanks

ihortom · October 4, 2024, 4:00pm

Well for a start, you need to upload the file to GCS. When uploaded, you should get back the location of the audio uploaded which you can pass over to Speach-to-Text API for transcription.

system · January 2, 2025, 4:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.