Code Monkey home page Code Monkey logo

data-digging's Introduction

Data-digging

This repository contains scripts and documentation related to analyzing classification data from Zooniverse projects. Most content is tailored to Panoptes-based Project Builder projects, but there is also some legacy Ouroboros-based code.

Where do I go to get started?

You have a few choices when getting started analyzing Zooniverse data.

  • Try the Panoptes Aggregation Code (external): Try out the Panoptes Aggregation Code, a software package designed to take project data exports and produced aggregated results.
  • Try Data-digging's general python scripts: The scripts_GeneralPython directory contains multiple Python scripts and Jupyter notebooks to get you started. See the README in that directory for more details.
  • Adapt Existing Scripts: The Data-digging repo contains many examples of data reduction scripts from multiple projects. Depending on your project type and details, find a similar project and edit their existing scripts to fit your specific use case. Check out scripts in the scripts_ProjectExamples directory, or browse the library of External Links for analysis scripts that are hosted in external locations and repos.

Contents

ExternalLinks.md: File with links to external code.

docs: Column descriptions for Panoptes export CSV files.

notebooks_ProcessExports: This directory holds Jupyter notebooks for performing basic parsing of data export CSVs.

scripts_GeneralPython: This directory holds top-level example scripts that are generally applicable to any project. These scripts convert a classification data export CSV into more useful formats and data products. In most cases, these scripts extract information from the compact JSON-formatted “annotations” column data into an easier flat CSV file.

scripts_ProjectExamples: This directory holds project-specific subdirectories, each with scripts and data files.

scripts_Utility: This directory holds Python scripts for one-off tasks.

other_code: This directory holds other contributed code.

data-digging's People

Contributors

aliburchard avatar bamford avatar camallen avatar chelseatroy avatar ckrawczyk avatar eatyourgreens avatar fiona-jones avatar hughdickinson avatar jebyrnes avatar lcjohnso avatar mcbouslog avatar mkosmala avatar philrosenfield avatar pmasonff avatar shaunanoordin avatar trouille avatar vrooje avatar wgranger avatar willettk avatar zambonee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-digging's Issues

Generalized Panoptes Reduction Script

This repo contains multiple examples of individually-customized, project-specific scripts that all perform a similar transformation:
INPUT = the Panoptes classifications export CSV (with JSON-encoded annotations info)
OUTPUT = a flat CSV file containing extracted marking/classification data in a flat, non-hierarchical format

Using the information provided in the workflow export CSV, one should be able to write a generalized reduction script that could replace many project-specific scripts.

Bot finder

It would be useful for projects to be able to identify single-answer prolific classifiers (which are typically bots).

To do: add this to basic_project_stats.py.

NoPackagesFoundError: Package missing in current Win64 channels: - freetype 2.5.5 1

I am trying to install this Python Environment using the yml file. However I run into the following error :

NoPackagesFoundError: Package missing in current Win64 channels: 
                  - freetype 2.5.5 1

I am using the following code to install the environment using Anaconda 2 (after putting the .yml file in the current directory for Python.)

conda env create -f basic_project_stats.yml

I was not sure what the last line in the yml file which is 'prefix' is doing, but I changed as following to the directory where the python environments reside in my computer .

 prefix: /Users/Public/Anaconda2/envs/python279_volcrowe

I also tried installing a win 64 channel as suggested in a stackoverflow post using the following code( https://stackoverflow.com/questions/38739694/install-python-package-package-missing-in-current-win-64-channels?rq=1

conda config --add channels bioninja

I am able to add the bioninja thing but after it when I run the following command:
conda env create -f environment.yml

It gives me the same Win64 channel error as earlier.

Can anyone please help me solve this issue ?

`example_scripts` folder should be deleted after 31 May 2020

Required Cleanup

In April 2020, the Zooniverse sent out an email to project owners informing them that they should visit https://github.com/zooniverse/Data-digging/blob/master/example_scripts/check_for_duplicate_marks.py to get a duplicate-checking script.

Simultaneously in April 2020, we had a big repo cleanup (#51) which fundamentally changed where the files are located and organised.

As a result, we have temporarily reinstated (#53) the /example_scripts folder with a copy of the check_for_duplicate_marks.py script for the sake of URL integrity.

❗ The /example_scripts folder should be deleted after 31 May 2020 as the "old script URL" would have had plenty of time to serve its purpose by then.

cc @mrniaboc @lcjohnso

Repo Reorganization

Adapt repo for better use and visibility of its content.

  • Reorganize folders and content to sort project-specific examples, general data reduction scripts, and other utility scripts.
  • Create explicitly a list of links to external repos and code, replacing current stub text files.

Next Notebook(s): extracting a flattened CSV of annotations

  • Read in a pre-processed workflow-id+version classification file
  • Read in workflow, workflow_contents exports and extract needed info
  • 1 flat csv for question-task annotations, multiple for drawing-task annotations
  • use WWK scripts to do the same for survey tasks
  • allow for different separators than ,, e.g. \t

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.