Code Monkey home page Code Monkey logo

pharmaceuticalnlp's Introduction

Information extraction on drug dosage

Information extraction system that labels the following named entities given a natural language description of the dosage for a farmaceutical drug:

  • quantity (DOS)
  • measuring unit (UNIT)
  • to whom the treatement is directed (WHO)
  • frequency (FREQ)
  • period (PER)

The model is based on python's CRF implementation, crfsuite. Part-ofspeech tags were obtained with nltk. For more details, read the pdf in the Report folder.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

You will need python3 to run this application. The following modules need to be installed:

nltk
sklearn
sklearn-crfsuite

Installing

Assuming you have python3 and pip3 already installed, run the following commands to obtain the needed dependencies:

pip3 install nltk
pip3 install sklearn
pip3 install sklearn-crfsuite

To run the application, open a terminal session and issue the following commands:

git clone https://github.com/costimasca/farmaceuticalNLP.git
cd farmaceuticalNLP
chmod +x dosage.py
./model.py "The recommended oral dosage for adults is 300 mg once daily at bedtime."

You can specify any sentence using different quantities (1, two, 3 to 5, 0.5 etc.) and different measuring units (drops, ml, miligrams, tablespoons etc.) Feel free to choose one from the corpus.

In the above example, the output is the following label list: (['O', 'O', 'O', 'O', 'O', 'WHO', 'O', 'DOS', 'UNIT', 'FREQ', 'FREQ', 'O', 'O', 'O'])

Performance

Using 10-fold cross validation:


 	precision	recall	f1-measure
PER 	0.950		0.918	0.933
WHO 	0.944		0.934	0.938
UNIT 	0.983		0.956	0.969
DOS 	0.981		0.962	0.971
FREQ 	0.987		0.982	0.985
avg 	0.969		0.942	0.955

pharmaceuticalnlp's People

Contributors

costimasca avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.