Code Monkey home page Code Monkey logo

ascle's Introduction

image

Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

Python 3.6.13 Python 3.8.13 Python 3.8.16 Python 3.10.12

We introduce Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise.

This work, Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation, has been accepted by JMIR recently!

Framework of Ascle

Ascle consists of three modules:

🌟 Generative Functions: For the first time, Ascle includes four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation;

Basic NLP Functions: Ascle consists of 12 essential NLP functions such as word tokenization and sentence segmentation;

Query and Search Capabilities: Ascle provides user-friendly query and search functions on clinical databases.

⚙️indicates that we have our fine-tuned models for this particular task.
⭐️indicates that we conducted evaluations for this particular task.

Table of Contents

Updates

29_07_2024 - We uploaded a new folder, Ascle-JPBench, containing open-sourced EN-JP medical task data examples. Ascle-JPBench will support comprehensive tasks such as QA, NLI, and multiple choice.
17_05_2024 - We are currently updating Ascle. In the next version, Ascle will include the question-answering task based on the RAG framework and will support multiple languages for all tasks.
07_11_2023 - New Release v2.2: we changed the toolkit name to Ascle from EHRKit, easier to use!
10_07_2023 - New Release v2.0: a large re-organization and improvement from v1.0.
24_05_2023 - New Release Pretrained Models for Machine Translation.
15_03_2022 - Merged the ehrkit folder to support off-shelf medical text processing.
10_03_2022 - Made all tests available in an ipynb file and updated the most recent version.
17_12_2021 - New folder collated_tasks containing Fall 2021 functionalities added
11_05_2021 - cleaned up the notebooks, fixed up the readme using depth=1.
04_05_2021 - Tests run-through added in tests.
22_04_2021 - Freezing development.
22_04_2021 - Completed the tutorials and readme.
20_04_2021 - Spring functionality finished -- mimic classification, summarization, and query extraction.

Setup

Download Repository

You can download Ascle as a git repository; simply clone to your choice of directories (keep depth small to keep the old versions out and reduce size).

git clone https://github.com/Yale-LILY/Ascle.git

Environment

cd Ascle
python3 -m venv asclevir/
source asclevir/bin/activate
pip install -r requirements.txt

NOTE: there is a chance that your Python version is not compatible with scispacy, so you can install with the following command:

pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_sm-0.5.0.tar.gz

Then you are good to go!

Ascle Demo

We provide various generative functions and basic NLP functions. A quick start is to run the demo.py:

cd Ascle
python demo.py

Note: this may take some time, as some packages will be downloaded.

Load Ascle

from Ascle import Ascle

# create Ascle 
med = Ascle()

Text Simplification

# Text Simplification
main_record = """
              The patient presents with symptoms of acute bronchitis,
              including cough, chest congestion, and mild fever.
              Auscultation reveals coarse breath sounds and occasional 
              wheezing. Based on the clinical examination, a diagnosis
              of acute bronchitis is made, and the patient is prescribed 
              a short course of bronchodilators and advised to rest and
              stay hydrated.
              """

# choose the model
layman_model = "ireneli1024/bart-large-elife-finetuned"

med.update_and_delete_main_record(main_record)

# call the text simplification function and print the output
print(med.get_layman_text(layman_model, min_length=20, max_length=70))

>> """
   The patient presents with symptoms of acute bronchitis including
   cough, chest congestion and mild fever. Auscultation reveals coarse 
   breath sounds and occasional wheezing. Based on these symptoms and 
   the patient's history of previous infections with the same condition, 
   the doctor decides that the patient is likely to have a cold or bronch.
   """

Machine Translation

main_record = """
              Myeloid derived suppressor cells (MDSC) are immature myeloid 
              cells with immunosuppressive activity. They accumulate in 
              tumor-bearing mice and humans with different types of cancer, 
              including hepatocellular carcinoma (HCC).
              """
              
med.update_and_delete_main_record(main_record)

# call the machine translation function and print the output
print(med.get_translation_mt5("French"))

>> """
   Les cellules suppressives dérivées de myéloïdes (MDSC) sont des
   cellules myéloïdes immatures ayant une activité immunosuppressive, 
   accumulées chez des souris et des humains ayant différents types de 
   cancer, y compris le carcinome hépatocellulaire (HCC).
   """

Fine-tuned Models

In Ascle, users can access any publicly available language model. Additionally, we provide users with 32 of our fine-tuned models which are suitable for multiple-choice QA, text simplification, and machine translation tasks.

Please feel to download our fine-tuned models:

Tasks Base Model Fine-Tuned Data Huggingface Link
Multi-choice QA BioBERT HEADQA Download
ClinicalBERT HEADQA Download
SapBERT HEADQA Download
PubMedBERT HEADQA Download
GatorTron HEADQA Download
BioBERT MedMCQA-w-context Download
ClinicalBERT MedMCQA-w-context Download
SapBERT MedMCQA-w-context Download
PubMedBERT MedMCQA-w-context Download
GatorTron MedMCQA-w-context Download
BioBERT MedMCQA-wo-context Download
ClinicalBERT MedMCQA-wo-context Download
SapBERT MedMCQA-wo-context Download
PubMedBERT MedMCQA-wo-context Download
GatorTron MedMCQA-wo-context Download
Text Simplification BART eLife Download
BioBART eLife Download
BigBirdPegasus eLife Download
BART PLOS Download
BioBART PLOS Download
BigBirdPegasus PLOS Download
Machine Translation mT5 UFAL (en_es) Download
mT5 UFAL (en_fr) Download
mT5 UFAL (en_ro) Download
mT5 UFAL (en_cs) Download
mT5 UFAL (en_de) Download
mT5 UFAL (en_hu) Download
mT5 UFAL (en_pl) Download
mT5 UFAL (en_sv) Download
MarianMT UFAL (en_es) Download
MarianMT UFAL (en_fr) Download
MarianMT UFAL (en_ro) Download

Get involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!

Acknowledgement

This project started at the year of 2018. There are many people participated and made contributions:

Rui Yang*, Qingcheng Zeng*, Keen You*, Yujie Qiao*, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D.L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li

Our sincere gratitude also goes to Dr.Edison Marrese-Taylor and Prof. Yutaka Matsuo from the University of Tokyo, for their invaluable guidance and support throughout this project.

🕯️ Especially in the memory of Prof. Dragomir Radev, who has dedicated so much to this project.

Paper

Please find our paper at https://arxiv.org/abs/2311.16588.

Citation

@misc{yang2023ascle,
      title={Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation}, 
      author={Rui Yang and Qingcheng Zeng and Keen You and Yujie Qiao and Lucas Huang and Chia-Chun Hsieh and Benjamin Rosand and Jeremy Goldwasser and Amisha D Dave and Tiarnan D. L. Keenan and Emily Y Chew and Dragomir Radev and Zhiyong Lu and Hua Xu and Qingyu Chen and Irene Li},
      year={2023},
      doi={10.2196/60601},
      eprint={2311.16588},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

We will continue to maintain and update this repository. If you have any questions, feel free to contact us.
Rui Yang: [email protected] Dr. Irene Li: [email protected]

ascle's People

Contributors

yangrui525 avatar irenezihuili avatar qcznlp avatar

Stargazers

 avatar  avatar Guntsv avatar  avatar Wentao avatar  avatar Moritz Blum  avatar chaoyi-wu avatar Angel Bartolli avatar Gyasi Sutton avatar Robin avatar RickyWang avatar Qintian Sun avatar temp avatar  avatar Jeff Hammerbacher avatar YeboSun avatar Shuntaro Yada avatar Kelly Peterson avatar  avatar Yujie Qiao avatar  avatar donghee choi avatar  avatar Anh Minh Nguyen avatar ismail BASKIN avatar Philip Andersson avatar Huan He avatar Juan Diego Rodriguez avatar Surya avatar  avatar David Ireoluwa Akins (aka AwesomDev) avatar  avatar Tao Lin avatar Aashiq Muhamed avatar Tristan NGUYEN avatar  avatar  avatar  avatar Subhasis Jethy avatar Salva avatar Amirthavarshini V avatar  avatar Sudarsun Santhiappan avatar baeseongsu avatar Alex Hamilton avatar Alec Chapman avatar Paul Wetzel avatar David M. Rosenberg avatar  avatar Ahmad Idrissi avatar israelSC avatar  avatar Leonardo avatar Chao-Chun (Joe) Hsu avatar  avatar  avatar Lucas Oliveira avatar Kalyan avatar Tushar Bhatnagar avatar Bo Wang avatar Shahrukh Khan avatar Anthony Costa avatar Jason Dou avatar Sami Nas avatar  avatar PRANITA YOGESH MAHAJAN avatar

Watchers

James Cloos avatar  avatar PRANITA YOGESH MAHAJAN avatar Yujie Qiao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.