Code Monkey home page Code Monkey logo

mednorm's Introduction

MedNorm

This repository contains the neural network model code for linking phrases to their concepts in the dictionary. This model was used to link mentions of side effects of drugs from online reviews with the corresponding terms in the MedDRA dictionary, in the PT part (preffered terms). Also it contains a corpus of russian internet reviews with normalization markup: phrase from review and it's concept. Model weights for russian model can be found at huggingface repo, also the demonstration of the trained model can be found in Demo.ipynb: data loading and evaluating on the test set.

Usage

At first, you should download Anaconda and create a virtual env and activate it:

conda env create -f env.yml
conda activate normalization

Train and test

To train model use train.py with this args:

  • -tr - train data path in a simple json format. This format is a list of reviews (python dicts), each review has nested fields: ['objects']['MedEntity'], in MedEntity there is a list of entities which has text field and MedDRA field, which contain the text of the phrase and the PT phrase of the medDRA dictionary, respectively.
  • -res - result path, where model will be saved, with concept vectors and ConceptVectorizer object, which helps computing embeddings for dictionary
  • -args - Path to the configuration file, which have lr, epochs, batch size, use_cuda flag, default: train_args.txt
  • -model - Path to the initial transformer model, listed in https://huggingface.co/, which will be fine-tuned. default: DeepPavlov/rubert-base-cased
  • -dict - MedDRA (or another dictionary) in .asc format, .asc format is code and term separated by '$'
  • -val - Path to the validation data in sumilar to -tr json format. Validation data is used for early stopping and computing metrics. Whithout validation data no early stopping is used.
  • -load_pretrained - Path to already trained model from this repo for futher finetuning (training) with ConceptVectorizer and concept embeddings. When you define this param, you dont need to provide "transformer_model_path".
  • -ts - Path to the test data file in similar to -tr json format to evaluate after training.
  • -use_concept_less - If this flag is specified, then during the test all phrases that have an empty MedDRA field are replaced with the conceptless label, the model tries to determine by the threshold whether the phrase has a concept in the dictionary or not.
  • --use_cuda - If this flag is specified, gpu is used

Demonstration and RDRS corpus

A demo on the RDRS drug review dataset is included in the Demo.ipynb. The demo case is presented in a simple .csv format with fields mention,tag,pt code,fold id. Tag is a type of mention: adverse drug reaction or indication of the disease. Fold id sets dataset split on 5 parts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.