Code Monkey home page Code Monkey logo

kabooks's Introduction

KABooks - KABooks Audiobooks dataset creator

KABooks is a recursive acronym for "KABooks AudioBooks dataset creator" which is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. It is based on the work of Pansori [https://arxiv.org/abs/1812.09798].

Receiving an audio file and the corresponding text as input, KABooks will clean the text, dividing it into sentences, transcribe each segment and find the ground truth text at the complete text book.

Use at your own risk.

Installation

Make sure to have ffmpeg installed:

$ apt-get update
$ apt install ffmpeg
$ conda create -n kabooks python=3.9 pip
$ conda activate kabooks

Requirements Installation

Install pytorch:

pip3 install torch torchvision torchaudio

Install the KABooks requirements:

$ pip install -r requirements.txt

Audio Segmentation

This step receives the json file from the previous step and performs the segmentation of the audio file. This script is based on the script provided by Keith Ito, who kindly provided it via email. In this step, a logical list of segments is first created, storing the filename, the start and end times. Then, go through this logical list, dividing the original audio, saving each segment to disk.

This functionality is provided by the script named "audio_segmentation.py" and can be used separately. Run the script using as input argument the path of the audio file (mp3) to be segmented.

$ python segment_tools.py 

The input must be an mp3 file, which must be inside the input folder. After executing the script, the audio segments will be generated in the wavs folder, and the segments will have the same names as the original file.

Transcribe

Here there is a script to use Wav2Vec2. This functionality is provided by the script named "transcribe_audios.py" and can be used separately. Run the script using as input argument of the input directory of wavs files, the transcription output file. For example:

$ python transcription_tools.py

The script's default input is the contents of the wavs folder. The result will be a .csv (transcription.csv) file containing the transcript of each of the audio files present in the wavs folder.

Search Text

In this step, each transcript from the previous step will be compared with the full text referring to the input audiobook. For each transcript the script will return a sentence with the greatest similarity, which was found in the full text.

The result will be a .csv (result.csv) containing the transcript, the original sentence and a similarity value, for each of the audio segments present in the wavs folder.

$ python search_substring.py

You can also use the same version of this script, but using threads:

$ python search_substring_with_threads.py --number_threads=16

References:

Thanks

kabooks's People

Contributors

freds0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.