Code Monkey home page Code Monkey logo

ac-audio-extractor's Introduction

Audio Commons Audio Extractor

The Audio Commons Audio Extractor is a tool for analyzing audio files and extract both music properties (for music samples and music pieces) as well as high-level non-musical properties (timbre models). See this blog post for further details about the Audio Commons Audio Extractor.

To facilitate its usage, the tool has been dockerized and should run efforlessly in any platform with Docker installed. Below you'll find some instructions for running the tool as well as the full list of included audio features.

Checkout the web demonstrator that shows the power of some of the music properties extracted with this tool.

License

The Audio Commons Audio Extractor is licenses under AGPLv3 expect for the included timbral models code which is licensed under Apache 2. Both license files are included in this source code repository.

Running the audio extractor

The Audio Commons Audio Extractor is expected to be used as a command line tool and run from a terminal. Assuming you have Docker installed, you can easily analyze an audio file using the following command (the audio file must be located in the same folder from where you run the command, be aware that the first time you run this command it will take a lot of time as Docker will need to download the actual Audio Commons Audio Extractor tool first):

docker run -it --rm -v `pwd`:/tmp mtgupf/ac-audio-extractor:v3 -i /tmp/audio.wav -o /tmp/analysis.json -st

The example above mounts the current directory pwd in the virtual tmp directory inside Docker. The output file audio.json is also written in tmp, and therefore appears in the current directory. You can also mount different volumes and specify paths for input audio and analysis output using the following command (read the Docker volumes documentation for more information):

docker run -it --rm -v /local/path/to/your/audio/file.wav:/audio.wav -v /local/path/to/output_directory/:/outdir mtgupf/ac-audio-extractor:v3 -i /audio.wav -o /outdir/analysis.json  -st

You can use the --help flag with the Audio Commons Audio Extractor so see a complete list of all available options:

docker run -it --rm -v `pwd`:/tmp mtgupf/ac-audio-extractor:v3 --help

uusage: analyze.py [-h] [-v] [-t] [-m] [-s] -i INPUT -o OUTPUT [-f FORMAT]
                  [-u URI]

Audio Commons Audio Extractor (v3). Analyzes a given audio file and writes
results to a JSON file.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         if set, prints detailed info on screen during the
                        analysis
  -t, --timbral-models  include descriptors computed from timbral models
  -m, --music-pieces    include descriptors designed for music pieces
  -s, --music-samples   include descriptors designed for music samples
  -i INPUT, --input INPUT
                        input audio file
  -o OUTPUT, --output OUTPUT
                        output analysis file
  -f FORMAT, --format FORMAT
                        format of the output analysis file ("json" or
                        "jsonld", defaults to "jsonld")
  -u URI, --uri URI     URI for the analyzed sound (only used if "jsonld"
                        format is chosen)

Note that you can use the flags t, m and s to enable or disable the computation of some specific audio features.

Output formats

The Audio Commons audio extractor can write the analysis output to a JSON file with a flat hierarchy, or generate a structured output in JSON-LD (JSON for linked data). You can choose the format to use with the --format argument. By default format is set to jsonld. When using JSON-LD, you can optionally specify a URI for the analyzed sound resource so that the triples added in the graph are referenced to that URI. For that, use the --uri argument. Bellow are example outputs for the JSON and JSON-LD formats.

JSON output example

{
    "duration": 6.0,
    "lossless": 1.0,
    "codec": "pcm_s16le",
    "bitrate": 705600.0,
    "samplerate": 44100.0,
    "channels": 1.0,
    "audio_md5": "8da67c9c2acbd13998c9002aa0f60466",
    "loudness": -28.207069396972656,
    "dynamic_range": 0.6650657653808594,
    "temporal_centroid": 0.5078766345977783,
    "log_attack_time": 0.30115795135498047,
    "filesize": 529278,
    "single_event": false,
    "tonality": "G# minor",
    "tonality_confidence": 0.2868785858154297,
    "loop": true,
    "tempo": 120,
    "tempo_confidence": 1.0,
    "note_midi": 74,
    "note_name": "D5",
    "note_frequency": 592.681884765625,
    "note_confidence": 0.0,
    "brightness": 50.56954356039029,
    "depth": 13.000903137777897,
    "metallic": 0.4906048209174263,
    "roughness": 0.7237051954207928,
    "genre": "Genre B",
    "mood": "Mood B"
}

JSON-LD output example

{
    "@context": {
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "ac": "https://w3id.org/ac-ontology/aco#",
        "afo": "https://w3id.org/afo/onto/1.1#",
        "afv": "https://w3id.org/afo/vocab/1.1#",
        "ebucore": "http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#",
        "nfo": "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#"
    },
    "@type": "ac:AudioFile",
    "ebucore:bitrate": 705600.0,
    "ebucore:filesize": 529278,
    "ebucore:hasCodec": {
        "@type": "ebucore:AudioCodec",
        "ebucore:codecId": "pcm_s16le"
    },
    "nfo:compressionType": "nfo:losslessCompressionType",
    "ac:audioMd5": "8da67c9c2acbd13998c9002aa0f60466",
    "ac:availableItemOf": {
        "@type": "ac:AudioClip"
    },
    "ac:signalAudioFeature": [
        {
            "@type": "afv:Loop",
            "afo:value": true
        },
        {
            "@type": "afv:Tempo",
            "afo:confidence": 1.0,
            "afo:value": 120
        },
        {
            "@type": "afv:Key",
            "afo:confidence": 0.2868785858154297,
            "afo:value": "G# minor"
        },
        {
            "@type": "afv:TemporalCentroid",
            "afo:value": 0.5078766345977783
        },
        {
            "@type": "afv:MIDINote",
            "afo:confidence": 0.0,
            "afo:value": 74
        },
        {
            "@type": "afv:Pitch",
            "afo:confidence": 0.0,
            "afo:value": 592.681884765625
        },
        {
            "@type": "afv:Loudness",
            "afo:value": -28.207069396972656
        },
        {
            "@type": "afv:Note",
            "afo:confidence": 0.0,
            "afo:value": "D5"
        },
        {
            "@type": "afv:LogAttackTime",
            "afo:value": 0.30115795135498047
        }
    ],
    "ac:signalChannels": 1,
    "ac:signalDuration": 6.0,
    "ac:singalSamplerate": 44100.0
}

Build the docker image locally

There is no need to build the Docker image locally because Docker will automatically retrieve the image from the remote Docker Hub. However, if you need a custom version of the image you can also build it locally using the instructions in the Dockerfile of this repository. Use the following command:

docker build -t mtgupf/ac-audio-extractor:v3 .

Pushing the image to MTG's Docker Hub

The pre-built image for the Audio Commons annotations tools is hosted in MTG's Docker Hub account. To push a new version of the image use the following command (and change the tag if needed):

docker push mtgupf/ac-audio-extractor:v3

This is only meant for the admins/maintainers of the image. You'll need a Docker account with wrtie access to MTG's Docker Hub space.

Included audio features

Audio file properties

These audio features are always computed and include:

  • duration: Duration of audio file in seconds.
  • lossless: Whether audio file is in lossless codec (true or false).
  • codec: Audio codec.
  • bitrate: Bit rate.
  • samplerate: Sample rate in Hz.
  • channels: Number of audio channels.
  • audio_md5: The MD5 checksum of raw undecoded audio payload. It can be used as a unique identifier of audio content.
  • filesize: Size of the file in nytes.

Dynamics

These audio features are always computed and include:

  • loudness: The integrated (overall) loudness (LUFS) measured using the EBU R128 standard.
  • dynamic_range: Loudness range (dB, LU) measured using the EBU R128 standard.
  • temporal_centroid: Temporal centroid (sec.) of the audio signal. It is the point in time in a signal that is a temporal balancing point of the sound event energy.
  • log_attack_time: The log (base 10) of the attack time of a signal envelope. The attack time is defined as the time duration from when the sound becomes perceptually audible to when it reaches its maximum intensity.
  • single_event: Whether the audio file contains one single audio event or more than one (true or false). This computation is based on the loudness of the signal and does not do any frequency analysis.

Music samples and music pieces

These audio features are only computed when using the -m or -s flags and include:

  • tempo: BPM value estimated by beat tracking algorithm.
  • tempo_confidence: Reliability of the tempo estimation above (in a range between 0 and 1).
  • loop: Whether audio file is loopable (true or false).
  • tonality: Key value estimated by key detection algorithm.
  • tonality_confidence: Reliability of the key estimation above (in a range between 0 and 1).

Music samples

These audio features are only computed when using the -s flag and include:

  • note_name: Pitch note name based on median of estimated fundamental frequency.
  • note_midi: MIDI value corresponding to the estimated note.
  • note_frequency: Frequency corresponding to the estimated note.
  • note_confidence: Reliability of the note name/midi/frequency estimation above (in a range between 0 and 1).

Music pieces

These audio features are only computed when using the -m flag and include:

  • genre: Music genre of the analysed music track (not yet implemented).
  • mood: Mood estimation for the analysed music track (not yet implemented).

Timbre models

As described in deliverable D5.2, a number of timbre models have been developed and are included in this tool. Timbre models estimate perceptual qualities of the sounds which tend to be quite subjective and ill-defined. These audio features are only computed when using the -t flag and include:

  • brightness: brightness of the analyzed audio in a scale from [0-100]. A bright sound is one that is clear/vibrant and/or contains significant high-pitched elements.
  • hardness: hardness of the analyzed audio in a scale from [0-100]. A hard sound is one that conveys the sense of having been made (i) by something solid, firm or rigid; or (ii) with a great deal of force.
  • depth: depth of the analyzed audio in a scale from [0-100]. A deep sound is one that conveys the sense of having been made far down below the surface of its source.
  • roughness: roughness of the analyzed audio in a scale from [0-100]. A rough sound is one that has an uneven or irregular sonic texture.
  • boominess: bominess of the analyzedn sound in a scale from [0-100].
  • warmth: warmth of the analyzedn sound in a scale from [0-100].
  • sharpness: sharpness of the analyzedn sound in a scale from [0-100].
  • reverb: will return true if the signal has reverb or false otherwise.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.