Code Monkey home page Code Monkey logo

ybar's Introduction

ybar: Raising the Bar on Data Collection

The repository contains code for a web based mobile phone application for crowdsourcing collection of geographically distributed data.

ybar

Good data is vital for insight, but it is hard to get, especially with current tools. Current tools are often grossly limited. Traditional surveys, for instance, collect data only on the people filling the surveys, and even then, only very limited data --- self-reports about what people do.

ybar changes that. It is a platform for collecting statistically sound metrics for geographic regions (cites, forests, villages, states, etc.) --- noise pollution, air pollution, uncollected garbage, bird sightings, trees, average number of potholes per square kilometer, average proportion of women on the street (a measure of their participation in the public life), etc.

There are three key innovations:

  1. Statistically sound estimates: Lot of people propose using passive data collection, but it is a flawed idea given imbalances in who will download your application, and switch it on where and when. Use the software to randomly sample locations (and times) and pay people to go to specific locations at specific times.

  2. Rich and verifiable data: With the power of cheap sensors, we can today collect a rich variety of data. And we can verify that the person assigned to collect the data was at a particular location and particular time through passive data collection on location and time, and active data collection measures like taking a selfie at the location.

  3. ML-based backend: Say we ask people to click photos of the streets to estimate potholes. But wouldn't it be neat to estimate the average number of potholes and average size automatically? With a ML-based backend trained on crowd-sourced data, we finally can. And over time, we can build a lot of these pipelines.

How Does ybar Work?

The initial spec sheet for the software provides a great place to learn about how ybar is implemented. While implementing the software, we stumbled upon a few insights. The final version is a bit different.

We illustrate a potential workflow supported by the application with a concrete example:

A 'research company' (you) take request from a researcher to estimate the proportion of women on the streets in Delhi. You use geo_sampling to come up with a sample of locations, and allocator to come up with daily itineraries for the people who work for you. You then create tasks and post them to the application. The worker accepts the tasks, takes pictures in a manner specified in the task, the details of which have to be super precise (what angle, what height, etc.), submits the tasks for approval to the administrator (you), and gets money once the submission is approved. Once all the tasks are done, you either link the collected images to a ML-pipeline or to M-Turk to code the images.

Other Potential Applications

License

The code relies heavily on other open source software. And the licensing restrictions noted by the respective open source software apply. Whatever amendments have been made at our end are released under the MIT License.

ybar's People

Contributors

dependabot[bot] avatar soodoku avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.