This repository contains the Speech Analysis Framework, a collection of components and code from Google Cloud that you can use to transcribe audio, and create a data pipeline workflow to display analytics of the transcribed audio files.
It can and:
- Process uploaded audio files to Cloud Storage.
- Enrich the processed audio files with Cloud Speech-to-Text and Cloud Natural Language APIs.
- Write the enriched data to BigQuery.
- Redact sensitive information with Cloud Data Loss Prevention.
You can:
- Gain insights into quality metrics to track such as call silence, call duration, agent speaking time, user speaking time, and sentence heat maps.
- Build visualizations reports
- Examples of what the visualizations you can build:
Speech Analysis Framework Limitations:
- The framework can only identify two callers on a stereo or mono audio file. This is a limitation within the Framework code not Cloud Speech-to-Text API.
- The framework can only process .wav or .flac files. This is a limitation within the Framework code not Cloud Speech-to-Text API.
The process follows:
- An audio file is uploaded to Cloud Storage
- The Cloud Function is triggered on object.create
- The Cloud Function sends a long running job request to Cloud Speech-to-Text
- The Cloud Function then sends the job ID from Cloud Speech-to-Text with additional metadata to Cloud Pub/Sub
- The Cloud Dataflow job enriches the data, optionally redacts sensitive information and writes to BigQuery
To Learn More visit Visualize speech data with Speech Analysis Framework
-
Create a storage bucket for Dataflow Staging Files
gsutil mb gs://[BUCKET_NAME]/
-
Through the Google Cloud Console create a folder named tmp in the newly created bucket for the DataFlow staging files
-
Create a storage bucket for Uploaded Audio Files
gsutil mb gs://[BUCKET_NAME]/
- Create a BigQuery Dataset
bq mk [YOUR_BIG_QUERY_DATABASE_NAME]
- Create Cloud Pub/Sub Topic
gcloud pubsub topics create [YOUR_TOPIC_NAME]
- Enable Cloud Dataflow API
gcloud services enable dataflow
- Enable Cloud Speech-to-Text API
gcloud services enable speech
- Enable Cloud Natural Language API
gcloud services enable language.googleapis.com
- Enable DLP Optional
gcloud services enable dlp.googleapis.com
- Deploy the Google Cloud Function
- In the cloned repo, go to the “saf-longrun-job-func” directory and deploy the following Cloud Function.
gcloud functions deploy safLongRunJobFunc --region=us-central1 --stage-bucket=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME] --runtime=nodejs8 --trigger-event=google.storage.object.finalize --trigger-resource=[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]
- Deploy the Cloud Dataflow Pipeline
- python3 --version Python 3.7.8
- In the cloned repo, go to “saf-longrun-job-dataflow” directory and deploy the Cloud Dataflow Pipeline. Run the commands below to deploy the dataflow job.
# Apple/Linux
python3 -m venv env
source env/bin/activate
pip3 install apache-beam[gcp]
pip3 install dateparser
or
# Windows
python3 -m venv env
env\Scripts\activate
pip3 install apache-beam[gcp]
pip3 install dateparser
- The Dataflow job will create the BigQuery Table you listed in the parameters.
- Please wait as it might take a few minutes to complete.
python3 saflongrunjobdataflow.py --project=[YOUR_PROJECT_ID] --input_topic=projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME] --runner=DataflowRunner --temp_location=gs://[YOUR_DATAFLOW_STAGING_BUCKET]/tmp --output_bigquery=[DATASET NAME].[TABLE] --requirements_file="requirements.txt"
- In the cloned repo, go to “sample-audio-files” to locate sample audio files to process by Speech Analysis Framework
- For the [TOPIC_NAME], do not include the full path, just the name of the TOPIC
- Choose true or false to run DLP. DLP will use all info types to scan the data.
# stereo wav audio sample
gsutil -h x-goog-meta-dlp:[true or false] -h x-goog-meta-callid:1234567 -h x-goog-meta-stereo:true -h x-goog-meta-pubsubtopicname:[TOPIC_NAME] -h x-goog-meta-year:2019 -h x-goog-meta-month:11 -h x-goog-meta-day:06 -h x-goog-meta-starttime:1116 cp [YOUR_FILE_NAME.wav] gs://[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]
# mono flac audio sample
gsutil -h x-goog-meta-dlp:[true or false] -h x-goog-meta-callid:1234567 -h x-goog-meta-stereo:false -h x-goog-meta-pubsubtopicname:[TOPIC_NAME] -h x-goog-meta-year:2019 -h x-goog-meta-month:11 -h x-goog-meta-day:06 -h x-goog-meta-starttime:1116 cp [YOUR_FILE_NAME.flac] gs://[YOUR_UPLOADED_AUDIO_FILES_BUCKET_NAME]
- After a few minutes you will be able to see the data in BigQuery.
- Sample select statements that can be executed in the BigQuery console.
-- Order Natural Language Entities for all records
SELECT
*
FROM (
SELECT
entities.name,
entities.type,
COUNT(entities.name) AS count
FROM
`[YOUR_PROJECT_ID].[YOUR_DATASET].[YOUR_TABLE]`,
UNNEST(entities) entities
GROUP BY
entities.name,
entities.type
ORDER BY
count DESC )
-- List word, start time, end time, speaker tag and confidence for all records
SELECT
ARRAY(
SELECT
AS STRUCT word,
startSecs,
endSecs,
speakertag,
confidence
FROM
UNNEST(words)) transcript
FROM
`[YOUR_PROJECT_ID].[YOUR_DATASET].[YOUR_TABLE]`
-- Search Transcript with a regular expression
SELECT
transcript,
fileid,
callid,
year,
month,
day,
sentimentscore,
magnitude,
date,
silencesecs
FROM
`[YOUR_PROJECT_ID].[YOUR_DATASET].[YOUR_TABLE]`
WHERE
(REGEXP_CONTAINS(transcript, '(?i) [YOUR_WORD]' ))
This is not an officially supported Google product