Code Monkey home page Code Monkey logo

reach's Introduction

Build Status Gitter Maven Central

Reach

What is it?

Reach stands for Reading and Assembling Contextual and Holistic Mechanisms from Text. In plain English, Reach is an information extraction system for the biomedical domain, which aims to read scientific literature and extract cancer signaling pathways. Reach implements a fairly complete extraction pipeline, including: recognition of biochemical entities (proteins, chemicals, etc.), grounding them to known knowledge bases such as Uniprot, extraction of BioPAX-like interactions, e.g., phosphorylation, complex assembly, positive/negative regulations, and coreference resolution, for both entities and interactions.

Reach is developed using Odin, our open-domain information extraction framework, which is released within our processors repository.

Please scroll down to the bottom of this page for additional resources, including a Reach output visualizer, REST API, and datasets created with Reach.

Licensing

This project is, and will always be, free for research purposes. However, starting with version 1.2, we are using a license that restricts its use for commercial purposes. Please contact us for details.

Changes

Authors

Reach was created by the following members of the CLU lab at the University of Arizona:

Citations

If you use Reach, please cite this paper:

@inproceedings{Valenzuela+:2015aa,
  author    = {Valenzuela-Esc\'{a}rcega, Marco A. and Gustave Hahn-Powell and Thomas Hicks and Mihai Surdeanu},
  title     = {A Domain-independent Rule-based Framework for Event Extraction},
  organization = {ACL-IJCNLP 2015},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Software Demonstrations (ACL-IJCNLP)},
  url = {http://www.aclweb.org/anthology/P/P15/P15-4022.pdf},
  year      = {2015},
  pages = {127--132},
  Note = {Paper available at \url{http://www.aclweb.org/anthology/P/P15/P15-4022.pdf}},
}

More publications from the Reach project are available here.

Installation

This software requires Java 1.8, Scala 2.11, and CoreNLP 3.x or higher.

The jar is available on Maven Central. To use, simply add the following dependency to your pom.xml:

<dependency>
   <groupId>org.clulab</groupId>
   <artifactId>reach_2.11</artifactId>
   <version>1.3.2</version>
</dependency>

The equivalent SBT dependencies are:

libraryDependencies ++= Seq(
    "org.clulab" %% "reach" % "1.3.2"
)

How to compile the source code

This is a standard sbt project, so use the usual commands (i.e. sbt compile, sbt assembly, etc.) to compile. Add the generated jar files under target/ to your $CLASSPATH, along with the other necessary dependency jars. Take a look at build.sbt to see which dependencies are necessary at runtime.

Running things

Processing a directory of .nxml papers

The most common usage of Reach is to parse a directory containing one or more papers in the .nxml, or .csv/.tsv format. In order to run the system on such a directory of papers, you must create a .conf file. See src/main/resources/application.conf for an example configuration file. The directory containing the files to be processed should be specified using the papersDir variable.

sbt "runMain org.clulab.reach.RunReachCLI /path/to/yourapplication.conf"

If the configuration file is omitted, Reach uses the default .conf. That is, the command:

sbt "runMain org.clulab.reach.RunReachCLI"

will run the system using the .conf file under src/main/resources/application.conf.

The interactive shell for rule debugging

sbt "runMain org.clulab.reach.ReachShell"

enter :help to get a list of available commands.

The sieve-based assembly system

Reach now provides a sieve-based system for assembly of event mentions. While still under development, the system currently has support for (1) exact deduplication for both entity and event mentions, (2) unification of mentions through coreference resolution, and (3) the reporting of intra and inter-sentence causal precedence relations (ex. A causally precedes B) using linguistic features, and (4) a feature-based classifier for causal precedence. Future versions will include additional sieves for causal precedence and improved approximate deduplication.

For more details on the sieve-based assembly system, please refer to the following paper:

@inproceedings{GHP+:2016aa,
  author       = {Gus Hahn-Powell and
Dane Bell and
Marco A. Valenzuela-Esc\'{a}rcega and Mihai Surdeanu},
  title        = {This before That: Causal Precedence in the Biomedical Domain},
  booktitle    = {Proceedings of the 2016 Workshop on Biomedical Natural Language Processing},
  organization = {Association for Computational Linguistics}
  year         = {2016}
  Note         = {Paper available at \url{https://arxiv.org/abs/1606.08089}}
}

The sieve-based assembly system can be run over a directory of .nxml and/or .csv files:

sbt "runMain org.clulab.reach.RunReachCLI"

In src/main/resources/application.conf, you will need to...

  1. set outputTypes to ["assembly-tsv"]
  2. set your input directory of papers via papersDir
  3. set your output directory via outDir

Currently, two .tsv files are produced for assembly results within each paper:

  1. results meeting MITRE's (March 2016) requirements
  2. results without MITRE's constraints

Two additional output files are produced for assembly results across all papers:

  1. results meeting MITRE's (March 2016) requirements
  2. results without MITRE's constraints

The interactive Assembly shell

You can run interactively explore assembly output for various snippets of text using the assembly shell:

sbt "runMain org.clulab.assembly.AssemblyShell"

Modifying the code

Reach builds upon our Odin event extraction framework. If you want to modify event and entity grammars, please refer to Odin's Wiki page for details. Please read the included Odin manual for details on the rule language and the Odin API.

Reach web services

We have developed a series of web services on top of the Reach library. All are freely available here.

Reach datasets

We have generated multiple datasets by reading publications from the open-access PubMed subset using Reach. All datasets are freely available here.

Funding

The development of Reach was funded by the DARPA Big Mechanism program under ARO contract W911NF-14-1-0395.

reach's People

Contributors

myedibleenso avatar hickst avatar mihaisurdeanu avatar marcovzla avatar enoriega avatar terroni avatar jmuhlich avatar bgyori avatar

Watchers

James Cloos avatar John Giorgi avatar  avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.