dagshub / audio-datasets Goto Github PK

open-source audio datasets

Home Page: https://dagshub.com/DagsHub/audio-datasets

hacktoberfest open-source audio audio-datasets hacktoberfest-2022 hacktoberfest2022 hacktoberfest-22 hacktoberfest22 codepeak codepeak2022

audio-datasets's Introduction

What is DagsHub?

DagsHub is a platform where machine learning and data science teams can build, manage, and collaborate on their projects. With DagsHub you can:

Version code, data, and models in one place. Use the free provided DagsHub storage or connect it to your cloud storage
Track Experiments using Git, DVC or MLflow, to provide a fully reproducible environment
Visualize pipelines, data, and notebooks in and interactive, diff-able, and dynamic way
Label your data directly on the platform using Label Studio
Share your work with your team members
Stream and upload your data in an intuitive and easy way, while preserving versioning and structure.

DagsHub is built firmly around open, standard formats for your project. In particular:

Git
DVC
MLflow
Label Studio
Standard data formats like YAML, JSON, CSV

Therefore, you can work with DagsHub regardless of your chosen programming language or frameworks.

DagsHub Client API & CLI

This client library is meant to help you get started quickly with DagsHub. It is made up of Experiment tracking and Direct Data Access (DDA), a component to let you stream and upload your data.

For more details on the different functions of the client, check out the docs segments:

Some functionality is supported only in Python.

To read about some of the awesome use cases for Direct Data Access, check out the relevant doc page.

Installation

pip install dagshub

Direct Data Access (DDA) functionality requires authentication, which you can easily do by running the following command in your terminal:

dagshub login

Quickstart for Data Streaming

The easiest way to start using DagsHub is via the Python Hooks method. To do this:

Your DagsHub project,
Copy the following 2 lines of code into your Python code which accesses your data:
```
from dagshub.streaming import install_hooks
install_hooks()
```
That’s it! You now have streaming access to all your project files.

🤩 Check out this colab to see an example of this Data Streaming work end to end:

Next Steps

You can dive into the expanded documentation, to learn more about data streaming, data upload and experiment tracking with DagsHub

Analytics

To improve your experience, we collect analytics on client usage. If you want to disable analytics collection, set the DAGSHUB_DISABLE_ANALYTICS environment variable to any value.

Made with 🐶 by DagsHub.

audio-datasets's People

Contributors

Stargazers

Watchers

audio-datasets's Issues

About Dataset

6 actors who played 14 sentences; 6 emotions: disgust, fear, anger, joy, surprise, sadness.

Speech Accent Archive

Dataset: Speech Accent Archive
About: Parallel English speech samples from 177 countries

Source: Karopiczak-ESC
About: A labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. 5 type main category: Animals, Natural soundscapes & water sounds , Human, non-speech sounds, Interior/domestic sounds, Exterior/urban noises.

Bird Audio Detection challenge

Claim Dataset: Bird Audio Detection challenge

About Dataset

Detecting bird sounds in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management. The current generation of software tools require manual work from the user: to choose the algorithm, to set the settings, and to post-process the results. This is holding bioacoustics back in embracing its “big data” era: let’s make this better!

Public domain sounds

Dataset Claim : Public domain sounds

About Dataset

Good for wake word detection; a wide array of sounds that can be used for object detection research (524 MB - 635 SOUNDS - Open for public use).

Toronto emotional speech set (TESS)

Claimed Dataset: TESS

About:
These stimuli were modeled on the Northwestern University Auditory Test No. 6 (NU-6; Tillman & Carhart, 1966). A set of 200 target words were spoken in the carrier phrase "Say the word _____' by two actresses (aged 26 and 64 years) and recordings were made of the set portraying each of seven emotions (anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral). There are 2800 stimuli in total.
Two actresses were recruited from the Toronto area. Both actresses speak English as their first language, are university educated, and have musical training. Audiometric testing indicated that both actresses have thresholds within the normal range.

AudioMNIST

Claim Dataset: AudioMNIST

Dataset Description: Deep neural networks have been successfully applied to problems in many domains. Understanding their inner workings with respect to feature selection and decision making, however, remains challenging and thus trained models are often regarded as black boxes. Layerwise Relevance Propagation (LRP) addresses this issue by finding those features that a model relies on, offering deeper understanding and interpretation of trained networks. This repository contains code and data used in Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals (https://arxiv.org/abs/1807.03418).

LEGO Spoken Dialogue Corpus

Claim Dataset: LEGO Spoken Dialogue Corpus

About Dataset: The LEGOv2 database is a parameterized and annotated version of the CMU Let’s Go database from 2006 and 2007.
This spoken dialogue corpus contains interactions captured from the CMU Let’s Go (LG) System by Carnegie Mellon University in 2006 and 2007. It is based on raw log-files from the LG system.
The corpus has been parameterized and annotated by the Dialogue Systems Group at Ulm University, Germany.

Estonian Emotional Speech Corpus

Claimed Dataset: Estonian Emotional Speech Corpus

About Dataset

26 text passage read by 10 speakers; 4 main emotions: joy, sadness, anger and neutral.

URDU-Dataset

Dataset Claim: URDU-Dataset

About Dataset

URDU dataset contains emotional utterances of Urdu speech gathered from Urdu talk shows. It contains 400 utterances of four basic emotions: Angry, Happy, Neutral, and Emotion. There are 38 speakers (27 male and 11 female).

This data is created from Youtube. Speakers are selected randomly. Anyone can use this data only for research purposes. Nomenclature followed while naming the files in the dataset is to provide information about the speaker, gender, number of the file for that speaker and overall numbering of the file in particular emotion. Files are named as follows:

Coswara Dataset

Claim Dataset: Coswara

About Dataset: The COVID-19 pandemic presents global challenges transcending boundaries of country, race, religion, and economy. The current gold standard method for COVID-19 detection is reverse transcription-polymerase chain reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and violates social distancing. Also, as the pandemic is expected to stay for a while, there is a need for an alternate diagnosis tool that overcomes these limitations and is deployable at a large scale. The prominent symptoms of COVID-19 include cough and breathing difficulties. We foresee that respiratory sounds, when analyzed using machine learning techniques, can provide useful insights, enabling the design of a diagnostic tool.

Towards this, the paper presents an early effort in creating (and analyzing) a database, called Coswara, of respiratory sounds, namely, cough, breath, and voice. The sound samples are collected via worldwide crowdsourcing using a website application. The curated dataset is released as open access. As the pandemic is evolving, data collection and analysis are a work in progress. We believe that insight from the analysis of Coswara can be effective in enabling sound-based technology solutions for point-of-care diagnosis of respiratory infection, and in the near future, this can help to diagnose COVID-19.

basic-arabic-vocal-emotions-dataset

Dataset: BAVED
About: Basic Arabic Vocal Emotions Dataset (BAVED) is a datasetthat contains an arabic words spelled in diffrent levels of emotions recorded in an audio/wav format.

Claim: WARBLRB10k-10K Smartphone Recording Dataset

warblrb10k is a collection of 10,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app.

Papers with Code- https://paperswithcode.com/dataset/warblrb10k
Dataset Homepage- https://dcase.community/challenge2018/task-bird-audio-detection

I will add the Dataset to DagsHub.

The Variably Intense Vocalizations of Affect and Emotion Corpus (VIVAE)

Claim Dataset: VIVAE

About Dataset:
The Variably Intense Vocalizations of Affect and Emotion Corpus (VIVAE) consists of a set of human non-speech emotion vocalizations. The full set, comprising 1085 audio files, features eleven speakers expressing three positive (achievement/ triumph, sexual pleasure, and surprise) and three negatives (anger, fear, physical pain) affective states, each parametrically varied from low to peak emotion intensity. The smaller core set of 480 files represents a fully crossed subsample of the full set (6 emotions x 4 intensities x 10 speakers x 2 items) selected based on judged authenticity.

Speech Commands Dataset

Dataset Claim: Speech Commands Dataset

About Dataset

The dataset (1.4 GB) has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website.

Invalid discord Invite link

Current discord invite link is not working need to update it

EmoSynth

Claim Datatset: EmoSynth

About Dataset

EmoSynth is a dataset of 144 audio files which have been labelled by 40 listeners for their the perceived emotion, in regards to the dimensions of Valence and Arousal.

LEGO Spoken Dialogue Corpus

Dataset: Lego Corpus
About: 347 dialogs with 9,083 system-user exchanges; emotions classified as garbage, non-angry, slightly angry and very angry.

EmoV-DB

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

This dataset is built for the purpose of emotional speech synthesis. The transcript were based on the CMU arctic database: http://www.festvox.org/cmu_arctic/cmuarctic.data.

It includes recordings for four speakers- two males and two females.

The emotional styles are neutral, sleepiness, anger, disgust and amused.

Each audio file is recorded in 16bits .wav format

Free spoken digit dataset

Dataset: FSDD
About: 4 speakers, 2,000 recordings (50 of each digit per speaker), English pronunciations.

Claim: FSDnoisy18k- Open Audio Noise Label Dataset

The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes.

Papers with Code- https://paperswithcode.com/dataset/fsdnoisy18k
Dataset Homepage- http://www.eduardofonseca.net/FSDnoisy18k/

I will add the Dataset to DagsHub.

EmotionTTS / CSD dataset

Dataset: CSD
About: Children's Song Dataset is open source dataset for singing voice research. This dataset contains 50 Korean and 50 English songs sung by one Korean female professional pop singer.

CMU Multimodal Opinion Sentiment Intensity (CMU-MOSI)

Claim Dataset: CMU-MOSI

About Dataset: CMU Multimodal Opinion Sentiment Intensity (CMU-MOSI) is a dataset of opinion level sentiment intensity in online videos. The CMU-MOSI dataset opened the door to utterance level sentiment analysis in English videos. The dataset was largest of its kind at release time. It contains 2199 opinion utterances with sentiment annotated between very negative to very positive in seven Likert steps.

CHiME-Home

Dataset Claim: CHiME-Home

About Dataset

The CHiME-Home dataset is a collection of annotated domestic environment audio recordings.

Arabic Speech Corpus

Dataset Claimed: Arabic Speech Corpus

about Dataset: This Speech corpus has been developed as part of PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in south Levantine Arabic (Damascian accent) using a professional studio. Synthesized speech as an output using this corpus has produced a high quality, natural voice.

Emo-DB Dataset

Dataset: EmoDB
About: The EMODB database is the freely available German emotional database. The database is created by the Institute of Communication Science, Technical University, Berlin, Germany. Ten professional speakers (five males and five females) participated in data recording. The database contains a total of 535 utterances. The EMODB database comprises of seven emotions: 1) anger; 2) boredom; 3) anxiety; 4) happiness; 5) sadness; 6) disgust; and 7) neutral. The data was recorded at a 48-kHz sampling rate and then down-sampled to 16-kHz.

RSC

About:
Extract Runescape classic sounds from cache to wav (and vice versa). Jagex used Sun's original .au sound format, which is headerless, 8-bit, u-law encoded, 8000 Hz pcm samples. this module can decompress original sounds from sound archives as headered WAVs, and recompress (+ resample) new WAVs into archives.

data

DAPS Dataset

Dataset: DAPS
About: The DAPS (Device and Produced Speech) dataset is a collection of aligned versions of professionally produced studio speech recordings and recordings of the same speech on common consumer devices (tablet and smartphone) in real-world environments.

Golos - Russian ASR dataset

Claim: Golos - Russian ASR dataset, see also https://github.com/sberdevices/golos

JL Corpus

Claimed set : JL Corpus

About:
For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions.

Acted Emotional Speech Dynamic Database

Claim dataset: Acted Emotional Speech Dynamic Database

About Dataset

Speech Emotion Recognition (SER) is the process of extracting emotional paralinguistic information from speech. It is a field with growing interest and potential applications in Human-Computer Interaction, content management, social interaction, and as an add-on module in Speech Recognition and Speech-To-Text systems. The performance of such applications depends a lot on a high-quality speech emotion recognition dataset.

CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset)

Dataset Claimed: CREMA-D

about Dataset: CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified).

Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral and Sad) and four different emotion levels (Low, Medium, High and Unspecified).

Grammatical errors in URDU-Dataset documentation

Fix grammatical errors in the readme file of URDU-Dataset.

Claim: UrbanSound8K- Labeled Urban Sound Excerpts Dataset

Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.

Papers with Code- https://paperswithcode.com/dataset/urbansound8k-1
Dataset Homepage- https://urbansounddataset.weebly.com/urbansound8k.html

I will add the Dataset to DagsHub.

NOTE: The Issue #67 to claim this dataset was raised last year and has been abandoned since then. Hence, I am picking this up for this year's Hacktoberfest.

zerospeech2021_dataset

Claim: zerospeech2021_dataset, the dataset of the most recent Zero Speech Challenge

Urban Sound 8K dataset

Claimed Set -:
Urban Sound 8K dataset

About-
Contains Urban sounds from 10 classes like an air conditioner, dog bark, drilling, siren, street music, etc.

Claim: Libri-CSS

https://github.com/chenzhuo1011/libri_css

Claim: FSL4- 4K User Contributed Loops Dataset

The FSL4 dataset contains ~4000 user-contributed loops uploaded to Freesound. Loops.

Papers with Code- https://paperswithcode.com/dataset/fsl4
Dataset Homepage- https://zenodo.org/record/3685832#.Y0kpMNdBxD8

I will add the Dataset to DagsHub.

Arabic Natural Audio Dataset

Dataset Claimed: Arabic Natural Audio Dataset

about Dataset: Emotion expression is an essential part of human interaction. The same text can hold different meanings when expressed with different emotions. Thus understanding the text alone is not enough for getting the meaning of an utterance. Acted and natural corpora have been used to detect emotions from speech. Many speech databases for different languages including English, German, Chinese, Japanese, Russian, Italian, Swedish and Spanish exist for modeling emotion recognition. Since there is no reported reference of an available Arabic corpus, we decided to collect the first Arabic Natural Audio Dataset (ANAD) to recognize discrete emotions.

MUSDB18 & MUSDB18-HQ

Claimed dataset: MUSDB18
About: A dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems.

MS-SNSD

Microsoft Scalable Noisy Speech Dataset MS-SNSD

About:

This dataset contains a large collection of clean speech files and variety of environmental noise files in .wav format sampled at 16 kHz.
The main application of this dataset is to train Deep Neural Network (DNN) models to suppress background noise. But it can be used for other audio and speech applications.
We provide the recipe to mix clean speech and noise at various signal to noise ratio (SNR) conditions to generate large noisy speech dataset.
The SNR conditions and the number of hours of data required can be configured depending on the application requirements.
This dataset will continue to grow in size as we encourage researchers and practitioners to contribute to this dataset by adding more clean speech and noise clips.
This dataset will immensely help researchers and practitioners in accademia and industry to develop better models.
We also provide test set that is different from training set to evaluate the developed models.
We provide html code for building two Human Intelligence Task (HIT) crowdsourcing applications to allow users to rate the noisy audio clips. We implemented an absolute category rating (ACR) application according to ITU-T P.800. In addition we implemented a subjective testing method according to ITU-T P.835 which allows to rate the speech signal, background noise, and the overall quality.

Att-Hack Dataset

Claim: Att-HACK - French Expressive Speech Database with Social Attitudes

Flickr Audio Caption Corpus

Claim Dataset: Flickr Audio Caption

About Dataset: The Flickr 8k Audio Caption Corpus contains 40,000 spoken captions of 8,000 natural images. It was collected in 2015 to investigate multimodal learning schemes for unsupervised speech pattern discovery. This corpus only includes audio recordings, and not the original text captions or associated images.

Claim: Parkinson Speech Dataset with Multiple Types of Sound Recordings

Parkinson's speech dataset - The training data belongs to 20 Parkinson’s Disease (PD) patients and 20 healthy subjects. From all subjects, multiple types of sound recordings (26) are taken for this 20 MB set.

LJ Speech Dataset

Dataset: LJ Speech
About: This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

Mozilla Common Voice

Claimed Set -:
Mozilla Common Voice

About-
An open-source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

AESDD - Acted Emotional Speech Dynamic Database

Urban Sound Dataset

Dataset claimed: https://urbansounddataset.weebly.com/

About Dataset
Dataset on urban sounds ( <16GB)

Datasets compiled by Justin Salamon, Christopher Jacoby and Juan Pablo Bello. All files come from www.freesound.org.
Please see FREESOUNDCREDITS.txt (included in the dataset) for an attribution list.

The UrbanSound and UrbanSound8K datasets are offered free of charge for non-commercial use only under the terms of the Creative Commons Attribution Noncommercial License (by-nc), version 3.0: http://creativecommons.org/licenses/by-nc/3.0/

The datasets and their contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, NYU is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the UrbanSound or UrbanSound8K datasets or any part of them.

voice_gender_detection

Dataset claim: voice_gender_detection

About Dataset

GitHub repo for Voice gender detection using the VoxCeleb dataset (7000+ unique speakers and utterances, 3683 males / 2312 females).

dagshub / audio-datasets Goto Github PK

audio-datasets's Introduction

What is DagsHub?

DagsHub Client API & CLI

Installation

Quickstart for Data Streaming

Next Steps

Analytics

audio-datasets's People

Contributors

Stargazers

Watchers

Forkers

audio-datasets's Issues

About Dataset

About Dataset

About Dataset

About Dataset

About Dataset

warblrb10k is a collection of 10,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app.

About Dataset

About Dataset

The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes.

About Dataset

About Dataset

Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.

The FSL4 dataset contains ~4000 user-contributed loops uploaded to Freesound. Loops.

Microsoft Scalable Noisy Speech Dataset MS-SNSD

About Dataset

Recommend Projects

Recommend Topics

Recommend Org