Code Monkey home page Code Monkey logo

sukrutrao / fast-dawid-skene Goto Github PK

View Code? Open in Web Editor NEW
42.0 42.0 11.0 38 KB

Code for the algorithms in the paper: Vaibhav B Sinha, Sukrut Rao, Vineeth N Balasubramanian. Fast Dawid-Skene: A Fast Vote Aggregation Scheme for Sentiment Classification. KDD WISDOM 2018

Home Page: https://sites.google.com/view/fast-dawid-skene

License: MIT License

Python 100.00%
crowdsourced-aggregation crowdsourcing expectation-maximization python sentiment-classification

fast-dawid-skene's People

Contributors

sukrutrao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fast-dawid-skene's Issues

Variable number of labelers?

Hi, I tried to test the algorithm over some real-life data. I found the limitation of having a fixed number of labelers a little bit impractical. I wonder if this is a particular implementation limitation or one of the aggregation model? What if a user provide some kind of reasonable upper limit for a task labelers count (3-5-7)? Thank you!

pandas Typeerror

Hi,
I have cloned the repo and activated a conda environment based on the requirements file you provide. I use Python 2.7 and when I run python scripts/fast_dawid_skene.py --dataset toy --k 2 --mode aggregate --algorithm FDS --print_result I get this error: TypeError: argument of type 'int' is not iterable. This is the stack trace:

Traceback (most recent call last):
File "scripts/fast_dawid_skene.py", line 64, in
main()
File "scripts/fast_dawid_skene.py", line 58, in main
run(args)
File "/home/konstantina/projects/Fast-Dawid-Skene/scripts/../fast_dawid_skene/main.py", line 41, in run
result_annotations.reset_index(level=0, inplace=True)
File "/home/konstantina/anaconda3/envs/dawid-skene/lib/python2.7/site-packages/pandas/core/frame.py", line 3055, in reset_index
if level is None or i in level:
TypeError: argument of type 'int' is not iterable

I changed level=0 to level={0} in main.py and there is no error anymore. Is this an appropriate solution?

How to print one final label per data instance

Hey,
I am currently running the command python scripts/fast_dawid_skene.py --dataset toy --mode aggregate --algorithm FDS --print_result successfully for my dataset, but when I check the results (either in the output file or the shell output), I see one final annotation per annotator as a result. I thought that one label per document will be created. I checked what happens at the end of main.py for printing and also the code in utils.to_csv, but I am unsure what to change to make it work. Could you please explain how to get the final labels? Thanks!

Incremental mode

Hi, was able to use the aggregator actually, thank you very much!

It has squeezed out 1.8M responses into 500K labels using 4.5 hours on 1 thread on server and 35Gb of memory ;) I think we can incorporate the solution, but I need to implement some enchancements to make it more usefull in production scenario. I will share my thoughts here just to let you know what we think would be useful in our real situation:

  • for the MLtoRank scenario we need to constantly get new labels and aggregate new judgements. 5 hours delay for adding an extra 1000 labels may be too long and too much electricity to burn. So we need to learn how to perform incremental step. That may include a possibility to backup all distributions and other state, prefill new cells with defaults and perform 1-2 extra iterations.

  • the prior may be enchanced by implementing a "partial" steps that updates only part of the rows imperically close to the changed ones. Then after one partial step we can perform one full to settle down if needed.

  • we have a method to order extra 3 marks if the first 3 do not give a confident label. So we will need to output a confidence level for chosen label for the decision to make an extra order.

And one extra off topic:

  • actually we have a binary labels with extra "grey" option. This is not a true ordinal, because "grey" option is rare, a few percent: it is allowed to use it in complex situations, also it can be produced if we have a lot of confident answers with trues and false. I think we can write some sort of heuristic based on labels probability distribution. E.g. calculate P(white) * 0.3 + P(gray) + P(black) * 0.3 + and compare with P(white) and P(black).

I would be happy to hear any thoughts regarding this, thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.