Introduction and Expectations

In this project we will use whisper¹, then new speech to text A.I. model, for transcribing unscripted dialog from The Late Late Show TV program. We will document and log the entire process for future visit. The outcome will be a SPA for data visualization using Plotly, React, Bootstrap and a Flask back-end with a PostreSQL database.

The data analysis will be performed in Python using a mix of scripts and Jupyter notebooks. The analysis may require addition of other A.I. submodules like resemblyze² in order to distinguish speakers in a conversation or any other solutions that arise to any given problems.

The process will be iterative over a few collections of videos:

Extract audio from the collection (a TV season or a shorter period of time).
Source separation on music and speech.
Speaker separation to identify at least Craig Ferguson and non-Craig Ferguson speakers.
Audio data cleaning, stats and features.
Processing with whisper.
Text data cleaning and loading into a SQL database.
Data Analysis, find answers for word frequency, changes over time, or any other analytics questions.
Documentation and outcome presentation and thoughts for the iteration.

Once the process is done a few times, the whole collection of collections may be processed under the most successful parameters or they may be kept as separate processes. Then a single page web application will be created for presenting the results, which will include the analysed data in a production database.

Once the SPA is completed, it will be uploaded to a PVM or container in AWS or any other cloud provider. Then the page will be shared to the public.

Disclaimer: None of the data or results from the analysis will be used for monetary gains as I do not own the copyright to the original material. It is purely for educational purposes and I will absorb all cloud services expenses without asking for donations for this specific project.

Resources

Submodules

We have 3 projects that will happen under different repositories.

Extract audio data from video and convert it to text data. Features and visualization of such features.
Process and analyse the text data. Visualize the statistics from the text data, zero audio info.
Create a web app for sharing the results and deploy it to the cloud.

We will use the same html template for each submodule.

pip install org-template-builder

Then we’ll create our directories and move into them to start the template.

python -m orgtemp etl --author "Alberto Valdez" && git rm resources/theme -f

git add . && git commit -m "initial commit"

We will repeat this process with the rest of the submodules once the first one is completed.

ETL

See the details of the ETL process in its own repository here.

Data Viz

See the details of the Visualization and Analytics process in its own repository here.

SPA

See the details of the Single Page Application in its own repository here.

Footnotes

¹ https://github.com/openai/whisper

² https://github.com/resemble-ai/Resemblyzer

albertov5 / stt-craig Goto Github PK

stt-craig's Introduction

Introduction and Expectations

Resources

Submodules

ETL

Data Viz

SPA

Footnotes

stt-craig's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent