Code Monkey home page Code Monkey logo

xception1d's Introduction

Xception-1-dimensional

Build Status Code coverage

Implementation of a neural network architecture for solving speech recognition tasks with limited vocabulary out of the box. This architecture is based on the Xception architecture, presented by François Chollet in 2017.

  • It achieved state-of-the-art results with the Google Tensorflow Speech Commands data set, surpassing human performance in the most complex tasks.
  • It works always in temporal domain, without needing to perform tedious and computationally expensive Fourier transforms
  • It can be easily adapted to variable size audio clips and to different tasks

We suggest this architecture as the de facto solution when a voice commands recognition with restricted vocabulary task arises; considering the computing power is not a limiting factor.

Getting started

If you are interested in run the code, please, follow the next steps.

  1. Clone the repository
  2. Navigate with your terminal inside the folder of the project and install the required libraries using the following command: pip install -r requirements.txt
  3. Use the settings_template.json file in the root of the project as a template for creating a settings.json file and fill it with your configuration.
  4. The directory config contains the settings for reproducing the results submitted with the paper. Choose one, select a seed and run it using the following command: python main.py [config filepath] [seed]

The seeds that have been used for generating the current results are the following ones 655321, 655322, 655323, 655324, 655325. Feel free to create new settings and store them in the config file to try new parameters.

Contribution

If you wish to contribute in any way, please, submit a pull request

xception1d's People

Contributors

ivallesp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

xception1d's Issues

in file data_tools.py ,the function add_noise_to_wavfile have some error?

def add_noise_to_wavfile(wav, amplitude_factor, clip_to_original_range=False):
    max_wav = max(wav)
    min_wav = min(wav)
    noise = np.random.rand(*wav.shape) * (max_wav - min_wav) - min_wav
    wav = wav + amplitude_factor * noise
    if clip_to_original_range:
        wav = np.clip(wav, min_wav, max_wav)
    return wav

the calution of noise should be
noise = np.random.rand(*wav.shape) * (max_wav - min_wav) + min_wav?

the range of np.random.rand(*wav.shape) is (0,1), the range of noise is expected to be (min_wav,max_wav)?
so the third term shoub be + min_wav?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.