Code Monkey home page Code Monkey logo

eyra's Introduction

Eyra

Eyra provides tools for data gathering designed to be used to make speech corpora for under-resourced languages.

This project is now being maintained at the cadia-lvl/Eyra fork.

The team at Reykjavik University published an article on this software for the SLTU 2016 conference, which can be found online and in this repository at Docs/Petursson_et_al_2016.pdf.

Installation

Currently, the Eyra backend has to be run on Linux. Systems we've used are mainly Debian 8 and to a lesser degree, Ubuntu 14.04 and 16.04.

The recording devices themselves (phones, laptops, anything with a compatible browser) can use Chrome or Firefox. However, when using phones, we recommend using our Android app (located in AndroidApp). It bypasses a nasty bug we discovered where audio recorded through a phone's browser is 48kHz, but the data in it appears to be limited to 16kHz.

If you want Quality Control (QC) to work, you need to install Kaldi and more, look at how to set it up in DEVELOPER.md.

Laptop installation (local wifi)

Setup a laptop which the phones (recording devices) can connect to in an offline setting.

  • Backend (on the laptop)
    You have to initially have an internet connection and run
    ./Setup/setup.sh --all and then
    sudo service apache2 restart

    Warning: Running ./Setup/setup.sh --all or ./Setup/setup.sh --ap disables your wifi while setting up the access point. If this is not what you want, a way to enable the wifi is the following:

    • sudo nano /etc/NetworkManager/NetworkManager.conf -> change managed=false to managed=true
    • sudo service network-manager restart

    Wifi should now work again.

  • Client-side (on the devices)

    Optionally, you can go to Settings->Set instructor if you want to link this device to a certain field worker/instructor.

Internet installation (e.g. for crowdsourcing)

Setup a server (we use apache).

  • Backend
    Run ./Setup/setup.sh --all --no-ap and then
    sudo service apache2 restart

    You might want to look at Setup/src/apache/tmpl/etc_apache2_sites-available_datatool.conf (src) or /etc/apache2/sites-enabled/datool.conf (generated) and e.g. adjust the parameters for the mpm_worker_module.

    The laptop setup uses a self-signed certificate (which needs to be manually put on and installed on the phones), but the internet one should use a real certificate (this depends on which certificate used). We used letsencrypt for a free certificate. This has to be done manually.

    This should work on both Debian 8 Jessie and Ubuntu Server 14.04.

  • Client-side
    Same as the laptop installation, except, no need to manually install the certificates and of course the link to your server depends on where you host it (you might need to change this in the Android app code (see DEVELOPER.md for details)).

Usage

Eyra is not perfect software. You can look at issues on github (e.g. with label bugs) for example. If you do fix something please contribute!

Basic usage:

In the GUI

  1. Hit begin
  2. Type your username (anything)
  3. Enter your info (gender, etc.)
  4. Hit Rec to start recording and display a prompt
  5. [optional] Hit Skip to skip this prompt and immediately start the next one
  6. Hit Stop when you have read the prompt

See Docs/UserGuideInstructions.pdf. An example instructions on recording offline can be found at Docs/DataUploadingInstructions.pdf.

If you require your users to give consent for their recordings to be used, you can look at an example participant agreement used at RU at Docs/EXAMPLE_PARTICIPANTAGREEMENT.pdf. This is only an example, and you should have your lawyers look over your own agreement.

More details about the software and its usage can be found in DEVELOPER.md.

Contributing

See CONTRIBUTING.md. A list of contributors with contact info can be found in the CONTRIBUTORS file.

Credits

This project wouldn't have been possible without the cooperation of

  • Google

Project head:

  • Jón Guðnason

Original developers at Reykjavik University:

  • Matthías Pétursson
  • Róbert Kjaran
  • Simon Klüpfel
  • Sveinn Ernstsson

Many thanks to the people at Google:

  • Oddur Kjartansson
  • Linne Ha
  • Martin Jansche
  • and more

Additional developers

  • Judy Fong
  • Stefán Gunnlaugur Jónsson

Technical Writer

  • Judy Fong

License

This software is licenced under the Apache Version 2.0 licence as stated in the LICENCE document. Some parts of the software are licenced under the MIT licence or other open licences. A non-exhaustive list can be found in the NOTICE document.

eyra's People

Contributors

judyfong avatar mattixpet avatar rkjaran avatar stefangunnlaugur avatar sveinne67 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eyra's Issues

Fix loading for sites

$rootScope.isLoaded = true
right now stays true after the first view you visit. Originally it was $scope.isLoaded and therefore worked as intended, to show the loading screen on long angular processing pages.

Couple of minor frontend improvements

  • In a less of a developer version the logger.error should NOT LOG ANYTHING
    • And CHANGE some ERRORS to LOGS alongside this.
  • Create js/ folder for js
    • Volume meter, recorderjs, services, controllers, etc. etc.
  • Remove alerts
  • Rename minFreeTokenIdx to highest used token idx
    • Refactor. Very minor, a single variable.
  • Use jshint on all javascript
  • Simplify message to users not clicking 'Done this before?' and being already in the database
  • Combine start and speaker-info page if possible

Instant QC

Look into what can be done about Quality Control on the frontend. (e.g. detect volume, clipping..)

Fix volume meter on back click.

In a browser, the volume meter doesn't work if you click back and enter again. Can't remember exactly why or if that is the exact description, but something in that direction.

Fix audio playback issue. On some phones volume is extremely low through the android app

  • Here is an idea: Some phones simply have poor playback unrelated to any app. Letting the app fix things is a bit heavy handed. Might it not be better to attempt to fix these issues by other means first, such as:https://www.androidpit.com/improve-the-sound-and-volume-quality-on-android - Sveinn
  • What phones? Has it been established that this is not a phone issue. How was this issue established. Need a phone that works well and a phone that performs below standard. Does this issue affect recording, thus impairing data quality? Is user experience sufficiently impaired - is it affected? - Sveinn

Bug with 1 prompt missing per .ark file [QC]

Appears as though the last prompt of the files used to generate the decoding graphs is dropped from the .ark and .scp files. (if you split them with GNU split, it happens with each file (as in genGraphs.sh).

Remove Flask-MySQLdb

Remove Flask-MySQLdb and simply use MySQLdb, no need for the flask extension (low usage on github) I think.

Trim down Kaldi -> Koldi

  • We don't need the world, just a few binaries, and the few scripts needed to train a simple system. Could just replace the Makefile...
  • Create a patch that changes the Makefile. Simon already did this for the tools/ directory (with sed though). We also don't really want to compile with debugging symbols (-g) and we want to compile optimized binaries (-O2 or -O3). OpenBLAS should also be an APT dependency, not compiled straight from Github. - Robert

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.