Code Monkey home page Code Monkey logo

date-annotator's People

Contributors

clemessien avatar

Stargazers

 avatar

Watchers

 avatar

date-annotator's Issues

ner_deid_subentity_augmented model for location and id sub-entity identification

Added ner_deid_subentity_augmented model for location and id sub-entity identification

Tried several modes for annotation of id and location subentities. The closest I came across was the ner_deid_subentity_augmented model. While it detects

states and countries, it does a poor job for id annotation.

For a sample text

alternate no is 5731111111. Tom's email is [email protected]. My favorite website is
http://www.goal.com. My social security is 886-12-1234 or simpley written as 886121234. I was born on 1st of January 1900. 
Today is 07/09/2021. I saw the doctor at Boone Medical Center.  
A. Record date : 2093-01-13, David Hale, M.D., Name : Hendrickson, Ora MR. # 7194334 
Date : 01/13/93 PCP : Oliveira, 25-year-old, Record date : 1-11-2000. Cocke County Baptist Hospital. 0295 Keats Street. 
Phone +1 (302) 786-5227. Patient's complaints first surfaced when he started working for Brothers Coal-Mine.

The output is:

Entity begin end label confidence
Wolfgang 11 18 PATIENT 0.9881
Columbia 34 41 COUNTRY 0.9804
Missouri 43 50 STATE 0.6373
886037234 192 200 MEDICAL RECORD 0.8157
1st of January 1900 217 235 DATE 0.64545
07/09/2021 247 256 DATE 1.0

Progress Update

Steps to install NLPSpark library

Requirements & Setup

  1. Java 8
  2. ssh server
  3. Apache Spark 3.1.x (or 3.0.x, or 2.4.x, or 2.3.x)
  4. spark-nlp

Run the following commands to install Java

  1. sudo apt-get update
  2. sudo apt-get install openjdk-8-jdk
  3. export JAVA_HOME=path_to_java_home
  4. java -version
    This should return something like this:
    openjdk version "1.8.0_242"
    OpenJDK Runtime Environment (build 1.8.0_242-b09)
    OpenJDK 64-Bit Server VM (build 25.242-b09, mixed mode)

To install ssh server

If ssh is already installed and enabled, skip this step or else run the following commands;

  • sudo apt-get install openssh-server
  • sudo systemctl enable ssh
  • yping sudo systemctl start ssh

To install Apcahe Spark

  1. wget https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
  2. tar xvf spark-*
  3. sudo mv spark-3.0.1-bin-hadoop2.7 /opt/spark
  4. nano ~/.barsh (add the following lines below)
  5. echo export SPARK_HOME=/opt/spark
    echo export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
    echo export PYSPARK_PYTHON=/usr/bin/python3
  6. To verify that this install correctly, run the following
    • start-master.sh
    • start-slave.sh spark://ubuntu1:7077
    • open the following link in your browser: http://127.0.0.1:8080/
    • Then you can kill the process

To install NLP Spark

  1. run conda install -c johnsnowlabs spark-nlp

  2. Register for spark nlp jsl license at https://nlp.johnsnowlabs.com/docs/en/licensed_install

  3. Run the following command: pip install -q spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade

  4. run spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.0 to Load Spark NLP with pyspark

  5. Run the test script i.e. spark_nlp_test.py located in server folder

Challenges encountered

  • I have been able to install the NLP Spark library along with its dependencies and configured it to work on wsl.
  • I have also added a sample test script that break sample text into tokens.
  • But this sample script requires the specified NLP model to be downloaded online during runtime which could take significant amount
    of time.

Possible solution

A possible solution I tried was to download the ML mode specifically the "clinical ner" model which can handle the date annotation and de-identification and I am facing a major blocker as it appears that this model and similar models are not accessible. The error message is Permission Denied

Alternative solution

So I would try to look into the source code to see if there is a link to where the models that successfully download during runtime are located and attempt to download them for use offline

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.