Code Monkey home page Code Monkey logo

ebm-nlp's Introduction

This corpus release contains 4,993 documents annotated with (P)articipants, (I)nterventions, and (O)utcomes.
The files included in this release are as follows

documents/
  Documents are labeled by their PubMed identification number (PMID).
  Each document has two files:
    1) PMID.text - the raw text of the abstract
    2) PMID.tokens - the space-separated tokens (from nltk's punkt tokenizer) that labels are assigned to

annotations/
  All annotation files are presented as a space-separated list of labels for the corresponding document tokens.
  The first division of annotations is in to either:
  1) individual/
      This folder contains each individual annotation provided by each worker, with multiple annotations per document
      Annotators are assigned unique worker id (WID) numbers. The annotation files are labeled as:
        PMID_WID.ann
  2) aggregated/
      This folder contains the aggregated (cleaner, less noisy) annotations with only one file per document
  Within these folders, the annotations are separated by the two annotation phases:
  1) starting_spans/
      These are the first-phase annotations where workers highlighted spans containing target information.
  2) hierarchical_labels/
      These are the second-phase annotations where workers received the previous PMID_AGGREGATED.ann annotations
      for each document and assigned more specific labels to whichever already labeled tokens deemed relevant.
  After this, the files are separated by PICO element and then the train/test partitions. The test folder contains two versions:
  1) test/gold/
      These annotations were collected from medical professionals and are the true target testing set
  2) test/crowd/
      This annotations were collected on AMT and used to validate the quality of the crowd-sourced labels

      The label mappings for each PIO element are:

      participants/
        0: No label
        1: Age
        2: Sample size
        3: Condition

      interventions/
        0: No label
        1: Surgical
        2: Physical
        3: Pharmacological
        4: Educational
        5: Psychological
        6: Other
        7: Control

      outcomes/
        0: No label
        1: Physical
        2: Pain
        3: Mortality
        4: Adverse effects
        5: Mental
        6: Other

        Here, two sections of the hierarchy have been collapsed due to frequent cross-labeling:
        
        Mental =
          Mental health
          Mental behavioral impact
          Participant behavior

        Other =
          Satisfaction with care
          Non-health outcomes
          Quality of intervention
          Resource use
          Withdrawl from study

        The full labels are available upon request ([email protected])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.