Code Monkey home page Code Monkey logo

milijan / docent-learner Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ericwhyne/docent-learner

0.0 2.0 0.0 1.02 MB

A web interface which statelessly and randomly traverses digital collections of images or text and asks users to answer questions or tag the content. This was built and designed for quickly creating domain specific training sets for machine learning tasks.

License: Apache License 2.0

Shell 5.20% HTML 2.97% Python 87.95% JavaScript 3.49% CSS 0.40%

docent-learner's Introduction

docent-learner

schoolmaster

Docent-learner is a web interface which statelessly and randomly traverses digital collections of images, text, or other content and asks users to answer questions or tag the content. This software was designed for quickly creating domain specific training sets for machine learning tasks. It was written in a few hours during a DARPA hackathon and has been iteratively improved as it's applied to different problem sets. The goal of this project is to make deploying functionality for creating training data sets simple, scalable, and provide a clutter-free interface that allows for fast tagging of big data by many users.

  • Docent - A person who acts as a guide, typically on a voluntary basis, in a museum, art gallery, or zoo.
  • Learner - A person who is learning a subject or skill.

Capabilities (mostly integrated):

  • Supports arbitrarily large numbers of tagging users simultaneously
  • Cookies to track tagger's user agent (aka browser) and session
  • Mechanical Turk integration
  • Image tagging
  • Tweet tagging
  • Text selection

Roadmap:

  • The ability to refine and train machine learning models to make suggesions that might help taggers (allow the docent to learn as it goes). This is partially working and uses Vowpal Wabbit.
  • Gamification of tagging by setting up points system or rewards to emotionally and competitively involve taggers.

The easiest way to get started with Docent Learner is to use Vagrant. For more information on Vagrant go to https://www.vagrantup.com/

Entering these commands into a machine with Vagrant will result in a fully functional Docent-Learner install:

git clone https://github.com/ericwhyne/docent-learner.git
cd docent-learner
vagrant up

Then point your browser here: http://127.0.0.1:8081/docent-learner/

To add your own images and tags, ssh into the running Vagrant machine:

vagrant ssh

Then delete all files in /var/www/html/docent-learner/images and replace them with your own images.

Update your questions via the administration interface. Upon provisioning the password to the admin interface is randomized and available in the following file. Each time you provision, the new password will be appended to the file. This password file is in the .gitignore and will never be committed to the github repository through normal pushes or pull requests.

In the vagrant VM:
vagrant@docent-learner:~$ cat /vagrant/admin-password.txt
or on the host machine:
eric@glamdring:~/workspace/docent-learner$ cat admin-password.txt 

As your images become tagged, a json file will be written to disk for each image. To aggregate these results into a single json file, use the aggregate tool in the utils directory.

vagrant@docent-learner:/vagrant/utils$ ./aggregate-images-json.sh 
Filename: 2014-09-27-image-tags.json
vagrant@docent-learner:/vagrant/utils$ cat 2014-09-27-image-tags.json 
[
/var/www/html/docent-learner/images/{ "has_labrador":"true","tagged_image":"dog-with-stick.jpg"},
/var/www/html/docent-learner/images/{ "is_dog":"true","tagged_image":"doxen-race.jpeg"},
/var/www/html/docent-learner/images/{ "has_labrador":"true","tagged_image":"puppies-in-grass.jpeg"}
]
vagrant@docent-learner:/vagrant/utils$ 

As currently configured, the docent Learner interface will only allow an image to be tagged once however if a user clicks the back button on their browser to re-tag an image it will result in two records in the file's json file. How these redundant tags are deconflicted is up to the developer using them during the creation of their machine learning model.

To deploy this software directly on an Ubuntu system:

sudo apt-get -y install apache2
sudo apt-get -y install libapache2-mod-wsgi
./deploy-Ubuntu.sh

After deploying to the ubuntu system, the root interface will be at http://host-ip/docent-learner/ on the standard http port 80.

Deploy with Docker

Enter the docker container

docker run -t -p 80:80 mbartoli/docent-learner /bin/bash

Restart apache2

sudo service apache2 restart

docent-learner will now be running on 127.0.0.1/docent-learner on your host machine

TODO restart apache2 without having to interact with the docker container

docent-learner's People

Contributors

ericwhyne avatar mbartoli avatar yyeric avatar ericwlange avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.