Code Monkey home page Code Monkey logo

bionlp-2016's Introduction

BioNLP-2016

Here are the scripts, code and vectors for the ACL BioNLP 2016 workshop paper:

Chiu et al. How to Train good Word Embeddings for Biomedical NLP

API Package

word2vec: original word2vec from Mikolov: https://code.google.com/archive/p/word2vec/
wvlib: lib to read word2vec file: https://github.com/spyysalo/wvlib
geniass: lib to segment bioMedical text: http://www.nactem.ac.uk/y-matsu/geniass/

Scripts

pre-process.sh: segment and tokenized input text (e.g. raw PubMed or PMC text)
create_shf_low_text.sh: create lowercased and sentence-shuffled text (input: tokenized text)
createModel.sh: Create word2vec.bin file with different parameters
intrinsicEva.sh: run intrinsic evaluation on UMNSRS and Mayo data-set (input: Dir. for testing vector)
ExtrinsicEva.sh: run extrinsic evaluation

Code

Pre-processing:
tokenize_text.py: tokenized text (requires NLTK)
geniass: segment sentence

Intrinsic evaluation:
evaluate.py: perform intrinisic evaluation

Extrinsic evaluation: (Keras folder: Need either tensorflow or theano installed):
mlp.py: simple feed-forward Neural Network
setting.py: parameters for the Neual Network

Word vectors

https://drive.google.com/open?id=0BzMCqpcgEJgiUWs0ZnU0NlFTam8

License

All data on this page is made available under the Creative Commons Attribution (CC BY) license

bionlp-2016's People

Contributors

billy322 avatar spyysalo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bionlp-2016's Issues

The input and cmd query of BioNLP-2016/keras/data/ner/JNLPBA/tools/evalIOB2.pl ?

Dear author,

I have some problems to the input data format and cmd query of the perl program (BioNLP-2016/keras/data/ner/JNLPBA/tools/ evalIOB2.pl )

The program needs 2 input data the one is gold answer the other is prediction, right?

The 2 input data format should be 2 columns?
Text \t Bio_tag ?

The executed cmd query is
perl evalIOB2.pl gold.txt predict.txt is the true?

Thank you

How do you evaluate your work on bionlp dataset?

I'm working on the Bionlp Genia 2011 dataset, and there is a problem that I can only evaluate my predicted dataset on the official website in which I think I can't register anymore.
Another website also considered down, for the last signing in of the manager dated back to 2016.
Further more, the google site of bionlp 2011 also refers to a down site which might be relocated.
Please tell me if there is any other way to access the evaluation tool of GE 2011 and I do really appreciate it.

How to execute evalIOB2.pl correctly in cmd?

Dear author:

How to execute evalIOB2.pl correctly in cmd?

the 25th annotation says that
"evalIOB2.pl answer_file reference_file\n" .

but which is the gold standard or prediction? (answer_file or reference_file)?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.