Comments (3)
I'm starting same test of long audio segmentation, considering the speaker's voice activity.
On this fork: https://github.com/eziolotta/rVADfast
But i have same problem with quality of audio output...
from deepspeech-italian-model.
First experiment of segmentation of short audio, using rVADfast and an algorithm that analyze segments found by rVAD to generate a new sequence of speech segments.
rVAD (and same other) tend to cut last bit signal of a speech segment.
Code and other tests yet to be published.
Input Clip : 644_2532_000000.wav - 15 second - (MLS Dataset)
Output : 5 Speech Segments (wav files)
test_segmentation_short_audio.zip
i try to extend algo to long audio (maybe hour, try Public Podcast )
from deepspeech-italian-model.
Continuing the experiments with rVADFast, I was able to segment one random Podcast of Emilia Romagna Region
Obtaining 143 segments with a duration from a minimum of 2 seconds to a maximum of 2 minutes.
Execution time for this process was approximately 1.5 hours
Audios are without transcription, so in this case an automatic transcription and human validation must be applied.
Unfortunately, other Speakers are also involved in podcasts, and some time words are not clear, check is required during validation. There is no background noise in Podcasts and the audio is clean.
Other Podcast here
Licence: Creative Commons Attribution 4.0
Output Dataset of My experiment can be downloaded here:
http://t.ly/xHHL
from deepspeech-italian-model.
Related Issues (20)
- MITADS - Transcript roman numbers HOT 4
- Readme improvements
- Not clear how to do a simple speech recognition HOT 9
- deepspeech - lm.binary and trie: how to? HOT 4
- Create the "contributing" file HOT 1
- Voxforge bad samples, help for cleaning up HOT 3
- MITADS - convert numbers to their literal expression HOT 2
- LIST OF AUDIO+TEXT DATASETS HOT 10
- Really bad results on Raspberry Pi 4 HOT 1
- Other italian models for transfer learning HOT 4
- MITADS - new corpora to import HOT 3
- MLS and MAILABS: considerations and issues ( Have you seen my apostrophe?) HOT 9
- Building a custom external scorer (extending the Italian text corpus) HOT 4
- ERROR: Model provided has model identifier 'K�+�', should be 'TFL3' HOT 5
- Project license HOT 3
- Migrate to Coqui
- Docker build fail HOT 2
- Documentation about how to run the various bash script alone
- DOCKERFILE Merge flag TRANSFER_LEARNING and DROP_SOURCE_LAYER HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeech-italian-model.