Quick tutorial how to convert speech (.wav file) to text using Google API
1. Sign Up for a Free Tier Account
Google Cloud offers a Free Tier plan, which will be used in this tutorial. An account is required to get an API key.
2. Generate an API Key
Follow these steps to generate an API key:
- Sign-in to Google Cloud Console
- Click “API Manager”
- Click “Credentials”
- Click “Create Credentials”
- Select “Service Account Key”
- Under “Service Account” select “New service account”
- Name service (whatever you’d like)
- Select Role: “Project” -> “Owner”
- Leave “JSON” option selected
- Click “Create”
- Save generated API key file
- Rename file to api-key.json Make sure to move the key into speech-to-text cloned repo, if you plan to test this code.
3. Convert Audio File to Wav format
There are a lot of tools you may use to convert audio files.
4. Break up audio file into smaller parts
Google Cloud Speech API only accepts files no longer than 60 seconds. To be on the safe side, I broke my files in 30-second chunks. To do that I used an open source command line library called ffmpeg. I ran it on Windows, you can install ffmpeg using instruction from this site (https://www.wikihow.com/Install-FFmpeg-on-Windows) and then run in your command line (cmd.exe) below instruction:
Clean out old parts if needed via rm -rf parts/*
ffmpeg -i source/genevieve.wav -f segment -segment_time 30 -c copy parts/out%09d.wav
5. Install required Python modules
Install:
- google-api-python-client
- httplib2
- oauth2client
- pyasn1
- pyasn1-modules
- rsa
- six
- SpeechRecognition
- tqdm
- uritemplate
6. Run the Code
- Loads API key from step 2 in memory
- Gets a list of files (chunks)
- For every file, calls speech to text API endpoint
- Adds results to a list
- Saves results
Comments
Post a Comment