Code Monkey home page Code Monkey logo

audio-datasets's Introduction

DagsHub Client


Tests pip License Python Version DagsHub Docs DagsHub Client Docs

DagsHub Sign Up Discord DagsHub on Twitter

What is DagsHub?

DagsHub is a platform where machine learning and data science teams can build, manage, and collaborate on their projects. With DagsHub you can:

  1. Version code, data, and models in one place. Use the free provided DagsHub storage or connect it to your cloud storage
  2. Track Experiments using Git, DVC or MLflow, to provide a fully reproducible environment
  3. Visualize pipelines, data, and notebooks in and interactive, diff-able, and dynamic way
  4. Label your data directly on the platform using Label Studio
  5. Share your work with your team members
  6. Stream and upload your data in an intuitive and easy way, while preserving versioning and structure.

DagsHub is built firmly around open, standard formats for your project. In particular:

Therefore, you can work with DagsHub regardless of your chosen programming language or frameworks.

DagsHub Client API & CLI

This client library is meant to help you get started quickly with DagsHub. It is made up of Experiment tracking and Direct Data Access (DDA), a component to let you stream and upload your data.

For more details on the different functions of the client, check out the docs segments:

  1. Installation & Setup
  2. Data Streaming
  3. Data Upload
  4. Experiment Tracking
    1. Autologging
  5. Data Engine

Some functionality is supported only in Python.

To read about some of the awesome use cases for Direct Data Access, check out the relevant doc page.

Installation

pip install dagshub

Direct Data Access (DDA) functionality requires authentication, which you can easily do by running the following command in your terminal:

dagshub login

Quickstart for Data Streaming

The easiest way to start using DagsHub is via the Python Hooks method. To do this:

  1. Your DagsHub project,
  2. Copy the following 2 lines of code into your Python code which accesses your data:
    from dagshub.streaming import install_hooks
    install_hooks()
  3. That’s it! You now have streaming access to all your project files.

🤩 Check out this colab to see an example of this Data Streaming work end to end:

Open In Colab

Next Steps

You can dive into the expanded documentation, to learn more about data streaming, data upload and experiment tracking with DagsHub


Analytics

To improve your experience, we collect analytics on client usage. If you want to disable analytics collection, set the DAGSHUB_DISABLE_ANALYTICS environment variable to any value.

Made with 🐶 by DagsHub.

audio-datasets's People

Contributors

arnavrneo avatar cyberflamego avatar deanp70 avatar drecali avatar hazalkl avatar idivyanshbansal avatar kingabzpro avatar kinkusuma avatar l-theorist avatar megans925 avatar mertbozkir avatar michizhou avatar nir-barazida avatar nirbarazida avatar rutam21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audio-datasets's Issues

EMOVO

Dataset claimed: EMOVO

About Dataset

6 actors who played 14 sentences; 6 emotions: disgust, fear, anger, joy, surprise, sadness.

ECS-50 Dataset

Source: Karopiczak-ESC
About: A labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification. 5 type main category: Animals, Natural soundscapes & water sounds , Human, non-speech sounds, Interior/domestic sounds, Exterior/urban noises.

Bird Audio Detection challenge

Claim Dataset: Bird Audio Detection challenge

About Dataset

Detecting bird sounds in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management. The current generation of software tools require manual work from the user: to choose the algorithm, to set the settings, and to post-process the results. This is holding bioacoustics back in embracing its “big data” era: let’s make this better!

Public domain sounds

Dataset Claim : Public domain sounds

About Dataset

Good for wake word detection; a wide array of sounds that can be used for object detection research (524 MB - 635 SOUNDS - Open for public use).

Toronto emotional speech set (TESS)

Claimed Dataset: TESS

About:
These stimuli were modeled on the Northwestern University Auditory Test No. 6 (NU-6; Tillman & Carhart, 1966). A set of 200 target words were spoken in the carrier phrase "Say the word _____' by two actresses (aged 26 and 64 years) and recordings were made of the set portraying each of seven emotions (anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral). There are 2800 stimuli in total.
Two actresses were recruited from the Toronto area. Both actresses speak English as their first language, are university educated, and have musical training. Audiometric testing indicated that both actresses have thresholds within the normal range.

AudioMNIST

Claim Dataset: AudioMNIST

Dataset Description: Deep neural networks have been successfully applied to problems in many domains. Understanding their inner workings with respect to feature selection and decision making, however, remains challenging and thus trained models are often regarded as black boxes. Layerwise Relevance Propagation (LRP) addresses this issue by finding those features that a model relies on, offering deeper understanding and interpretation of trained networks. This repository contains code and data used in Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals (https://arxiv.org/abs/1807.03418).

LEGO Spoken Dialogue Corpus

Claim Dataset: LEGO Spoken Dialogue Corpus

About Dataset: The LEGOv2 database is a parameterized and annotated version of the CMU Let’s Go database from 2006 and 2007.
This spoken dialogue corpus contains interactions captured from the CMU Let’s Go (LG) System by Carnegie Mellon University in 2006 and 2007. It is based on raw log-files from the LG system.
The corpus has been parameterized and annotated by the Dialogue Systems Group at Ulm University, Germany.

URDU-Dataset

Dataset Claim: URDU-Dataset

About Dataset

URDU dataset contains emotional utterances of Urdu speech gathered from Urdu talk shows. It contains 400 utterances of four basic emotions: Angry, Happy, Neutral, and Emotion. There are 38 speakers (27 male and 11 female).

This data is created from Youtube. Speakers are selected randomly. Anyone can use this data only for research purposes. Nomenclature followed while naming the files in the dataset is to provide information about the speaker, gender, number of the file for that speaker and overall numbering of the file in particular emotion. Files are named as follows:

Coswara Dataset

Claim Dataset: Coswara

About Dataset: The COVID-19 pandemic presents global challenges transcending boundaries of country, race, religion, and economy. The current gold standard method for COVID-19 detection is reverse transcription-polymerase chain reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and violates social distancing. Also, as the pandemic is expected to stay for a while, there is a need for an alternate diagnosis tool that overcomes these limitations and is deployable at a large scale. The prominent symptoms of COVID-19 include cough and breathing difficulties. We foresee that respiratory sounds, when analyzed using machine learning techniques, can provide useful insights, enabling the design of a diagnostic tool.

Towards this, the paper presents an early effort in creating (and analyzing) a database, called Coswara, of respiratory sounds, namely, cough, breath, and voice. The sound samples are collected via worldwide crowdsourcing using a website application. The curated dataset is released as open access. As the pandemic is evolving, data collection and analysis are a work in progress. We believe that insight from the analysis of Coswara can be effective in enabling sound-based technology solutions for point-of-care diagnosis of respiratory infection, and in the near future, this can help to diagnose COVID-19.

basic-arabic-vocal-emotions-dataset

Dataset: BAVED
About: Basic Arabic Vocal Emotions Dataset (BAVED) is a datasetthat contains an arabic words spelled in diffrent levels of emotions recorded in an audio/wav format.

The Variably Intense Vocalizations of Affect and Emotion Corpus (VIVAE)

Claim Dataset: VIVAE

About Dataset:
The Variably Intense Vocalizations of Affect and Emotion Corpus (VIVAE) consists of a set of human non-speech emotion vocalizations. The full set, comprising 1085 audio files, features eleven speakers expressing three positive (achievement/ triumph, sexual pleasure, and surprise) and three negatives (anger, fear, physical pain) affective states, each parametrically varied from low to peak emotion intensity. The smaller core set of 480 files represents a fully crossed subsample of the full set (6 emotions x 4 intensities x 10 speakers x 2 items) selected based on judged authenticity.

Speech Commands Dataset

Dataset Claim: Speech Commands Dataset

About Dataset

The dataset (1.4 GB) has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website.

EmoSynth

Claim Datatset: EmoSynth

About Dataset

EmoSynth is a dataset of 144 audio files which have been labelled by 40 listeners for their the perceived emotion, in regards to the dimensions of Valence and Arousal.

EmoV-DB

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

This dataset is built for the purpose of emotional speech synthesis. The transcript were based on the CMU arctic database: http://www.festvox.org/cmu_arctic/cmuarctic.data.

It includes recordings for four speakers- two males and two females.

The emotional styles are neutral, sleepiness, anger, disgust and amused.

Each audio file is recorded in 16bits .wav format

EmotionTTS / CSD dataset

Dataset: CSD
About: Children's Song Dataset is open source dataset for singing voice research. This dataset contains 50 Korean and 50 English songs sung by one Korean female professional pop singer.

CMU Multimodal Opinion Sentiment Intensity (CMU-MOSI)

Claim Dataset: CMU-MOSI

About Dataset: CMU Multimodal Opinion Sentiment Intensity (CMU-MOSI) is a dataset of opinion level sentiment intensity in online videos. The CMU-MOSI dataset opened the door to utterance level sentiment analysis in English videos. The dataset was largest of its kind at release time. It contains 2199 opinion utterances with sentiment annotated between very negative to very positive in seven Likert steps.

CHiME-Home

Dataset Claim: CHiME-Home

About Dataset

The CHiME-Home dataset is a collection of annotated domestic environment audio recordings.

Arabic Speech Corpus

Dataset Claimed: Arabic Speech Corpus

about Dataset: This Speech corpus has been developed as part of PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in south Levantine Arabic (Damascian accent) using a professional studio. Synthesized speech as an output using this corpus has produced a high quality, natural voice.

Emo-DB Dataset

Dataset: EmoDB
About: The EMODB database is the freely available German emotional database. The database is created by the Institute of Communication Science, Technical University, Berlin, Germany. Ten professional speakers (five males and five females) participated in data recording. The database contains a total of 535 utterances. The EMODB database comprises of seven emotions: 1) anger; 2) boredom; 3) anxiety; 4) happiness; 5) sadness; 6) disgust; and 7) neutral. The data was recorded at a 48-kHz sampling rate and then down-sampled to 16-kHz.

RSC

About:
Extract Runescape classic sounds from cache to wav (and vice versa). Jagex used Sun's original .au sound format, which is headerless, 8-bit, u-law encoded, 8000 Hz pcm samples. this module can decompress original sounds from sound archives as headered WAVs, and recompress (+ resample) new WAVs into archives.

data

DAPS Dataset

Dataset: DAPS
About: The DAPS (Device and Produced Speech) dataset is a collection of aligned versions of professionally produced studio speech recordings and recordings of the same speech on common consumer devices (tablet and smartphone) in real-world environments.

JL Corpus

Claimed set : JL Corpus

About:
For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions.

Acted Emotional Speech Dynamic Database

Claim dataset: Acted Emotional Speech Dynamic Database

About Dataset

Speech Emotion Recognition (SER) is the process of extracting emotional paralinguistic information from speech. It is a field with growing interest and potential applications in Human-Computer Interaction, content management, social interaction, and as an add-on module in Speech Recognition and Speech-To-Text systems. The performance of such applications depends a lot on a high-quality speech emotion recognition dataset.

CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset)

Dataset Claimed: CREMA-D

about Dataset: CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified).

Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral and Sad) and four different emotion levels (Low, Medium, High and Unspecified).

Claim: UrbanSound8K- Labeled Urban Sound Excerpts Dataset

Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.

Papers with Code- https://paperswithcode.com/dataset/urbansound8k-1
Dataset Homepage- https://urbansounddataset.weebly.com/urbansound8k.html

  • I will add the Dataset to DagsHub.

NOTE: The Issue #67 to claim this dataset was raised last year and has been abandoned since then. Hence, I am picking this up for this year's Hacktoberfest.

Arabic Natural Audio Dataset

Dataset Claimed: Arabic Natural Audio Dataset

about Dataset: Emotion expression is an essential part of human interaction. The same text can hold different meanings when expressed with different emotions. Thus understanding the text alone is not enough for getting the meaning of an utterance. Acted and natural corpora have been used to detect emotions from speech. Many speech databases for different languages including English, German, Chinese, Japanese, Russian, Italian, Swedish and Spanish exist for modeling emotion recognition. Since there is no reported reference of an available Arabic corpus, we decided to collect the first Arabic Natural Audio Dataset (ANAD) to recognize discrete emotions.

MUSDB18 & MUSDB18-HQ

Claimed dataset: MUSDB18
About: A dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems.

MS-SNSD

Microsoft Scalable Noisy Speech Dataset MS-SNSD

About:

  • This dataset contains a large collection of clean speech files and variety of environmental noise files in .wav format sampled at 16 kHz.
  • The main application of this dataset is to train Deep Neural Network (DNN) models to suppress background noise. But it can be used for other audio and speech applications.
  • We provide the recipe to mix clean speech and noise at various signal to noise ratio (SNR) conditions to generate large noisy speech dataset.
  • The SNR conditions and the number of hours of data required can be configured depending on the application requirements.
  • This dataset will continue to grow in size as we encourage researchers and practitioners to contribute to this dataset by adding more clean speech and noise clips.
  • This dataset will immensely help researchers and practitioners in accademia and industry to develop better models.
  • We also provide test set that is different from training set to evaluate the developed models.
  • We provide html code for building two Human Intelligence Task (HIT) crowdsourcing applications to allow users to rate the noisy audio clips. We implemented an absolute category rating (ACR) application according to ITU-T P.800. In addition we implemented a subjective testing method according to ITU-T P.835 which allows to rate the speech signal, background noise, and the overall quality.

Flickr Audio Caption Corpus

Claim Dataset: Flickr Audio Caption

About Dataset: The Flickr 8k Audio Caption Corpus contains 40,000 spoken captions of 8,000 natural images. It was collected in 2015 to investigate multimodal learning schemes for unsupervised speech pattern discovery. This corpus only includes audio recordings, and not the original text captions or associated images.

LJ Speech Dataset

Dataset: LJ Speech
About: This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

Urban Sound Dataset

Dataset claimed: https://urbansounddataset.weebly.com/

About Dataset
Dataset on urban sounds ( <16GB)

Datasets compiled by Justin Salamon, Christopher Jacoby and Juan Pablo Bello. All files come from www.freesound.org.
Please see FREESOUNDCREDITS.txt (included in the dataset) for an attribution list.

The UrbanSound and UrbanSound8K datasets are offered free of charge for non-commercial use only under the terms of the Creative Commons Attribution Noncommercial License (by-nc), version 3.0: http://creativecommons.org/licenses/by-nc/3.0/

The datasets and their contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, NYU is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the UrbanSound or UrbanSound8K datasets or any part of them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.