The argos_nlp from esilgard

Give PathTest SVMs confidence levels

use sklearn's fit_transform to turn SVM classifier into estimator and output varying confidence levels per instance

add MM classification

create algorithm to classify multiple myeloma risk classification according to the
International Myeloma Working Group (IMWG) Molecular Classification of Multiple Myeloma

http://myeloma.org/ArticlePage.action?articleId=3069

Standard
All others including:
any trisomy
t(11;14)(q13;q32)
Intermediate
t(4;14)(p16;q32)
del 13
any monosomy
High
del 17p13
t(14;16)(q32;q23)
t (14;20)

oncoplex tests

create algorithm to extract oncoplex and other mutational analysis test results from pathology reports

specimen size

algorithm for extraction of specimen size(s) from pathology reports (oneFieldPerSpecimen)
** this may break out into disease specific algorithms **

prognostic risk classification based on cytogenetics

write in new risk classification algorithms for other disease groups (like MDS) based on karyotypes parsed from genetics reports

NOTE
this also may mean breaking out the AML SWOG classification from classify_heme_category.py (depending on how different the salient variations are from other risk stratifications)

fix multiple line offset bug

fix offsets when karyotype ends up on multiple lines
example
ISCN Diagnosis: 44,XX,t(4;16)(q21;q22),del(5)(q13q33),del(7)(q11.2),-12,add(17)(p11.2), -20[18]/46,XX[2]

currently the engine's highlighting the next line down
Summary: POSITIVE for translocation 4;16, deletion 5q, 7q, and 17p, and loss of 12 and 20

fix 'normal' cytogenetics offset bug

fix the character offsets in the 'normal' field (the count of regular XX or XY clones)
(seems to be off by -2)

example:
46,XY,der(21)t(11;21)(q13;q22)[2]/46,XY[18]
currently is highlighting the '6,' before 'XY'

re.search to re.match move

change OneFieldPerSpecimen to a re.match instead of re.search -> driving off text files with regex instead of strings for a smaller set of slightly more complex pattterns

cytogenetics for "all" disease group

write in the ability for the cytogenetics branch to process reports for the "all" (or unknown) disease group parameter
this means either:

writing in a disease group classifier for cytology (could be sparse input)
and/or writing in some kind of general output for karyotype processing (some small subset of salient mutations or types of abnormalities)

automated output value validation

create hook script to automatically sync/check dropdown field values from metadata.json against data_dictionary options for each algorithm/field

add MDS classification

add algorithm to classify prognosis risk based on the "5-group cytogenetic classification of MDS"
(used in IPSS http://www.mds-foundation.org/ipss-r-calculator/)

Prognosis	Single	Double	Complex
Very Good	-Y; del(11q)
Good	normal; del(5q); del(20q);del(12p)	including del(5q)
Intermediate	del(7q); +8; i(17q); +19; any other	any
Poor	-7; inv(3)/t(3q)/del(3q)	Including -7/del(7q)	any 3
Very Poor			>any 3

cytogenetics duplicate cell count bug

fix bug in cytogenetics parsing that counts similar types of mutations twice
example:
49,XY,der(3)t(3;16)(q12;q22),+8,der(16)inv(16)(p13.1q22)t(3;16)(q12;q22),+21,+22[20]

currently the field '3q' shows a cell count of 40
(adding both the derivation and translocation of (3;16)

tubule formation

pull in algorithm for tubule formation in breast cancer pathology reports (based off oneFieldPerReportML)

clinical branch

bring clinical pipeline from dev into master repo (currently houses prognostic staging algorithm)

fix string cleaning offset bug

fix offset bug produced by stripping off the leading '//' from the karyotype string
(off by -2)
example:
//46,XY[7]

currently highlights '//46,XY[' as the karyotype string

mitotic count

pull in algorithm for mitotic count in breast cancer pathology reports (based off oneFieldPerReportML)

gleason grade

create algorithm to extract gleason grade from prostate cancer pathology reports (oneFiledPerReport)

nottingham grade

pull in algorithm to classify nottingham grade in breast cancer reports (based off oneFieldPerReportML)

lymph involvement

algorithm for extraction of lymph node involvement from pathology reports (oneFieldPerSpecimen)

ML disease group classification

swap out keyword voting algorithm for a machine learning based classification of pathology reports by disease group (based of CSS dataset)

change cytogenetics data model so that references to former cell lines (sl, sdl2, etc.) will also collect the character offsets of the previous abnormalities. This means a change in data structure so that each abnormality is assigned character offsets. (as opposed to only each clone/cell line).

fix datetime creation

move the datetime creation out of OneFieldPerReport into separate module or class for use by other modules (and other report branches hopefully) and extend to include other potential date string input formats and datetime output formats

PSA algorithm

create algorithm to extract PSA from prostate cancer pathology reports (oneFiledPerReport)

smoking status

algorithm for smoking status classification from clinic notes (oneFieldPerPatient)

make cytogenetics pipeline FISH "aware"

add in a field for FISH results (prefix NF) vs regular cytogenetics (prefix NE) to identify testing method

fix insufficient, trisomies, monosomies offset bug

currently there are no character offsets captured for the following fields:
insufficient
trisomies
monosomies
mutations

Using the NLP Engine for Cytogenetics

Similar to pathology, are there example disease groups for Cytogenetics? Like how pathology has 'brain' and 'head_neck'.

On inspecting process.py there is a comment saying "disease specific cytogenetics data dictionary (no general algorithms currently)", but when I look at data_dictionary.txt I can't see any disease specific data.

Apologies in advance, I'm new to the area of health and don't understand most medical terms so I may be missing something basic. Could you please provide a sample usage with sample disease groups for Cytogenetics?

eTOH

algorithm for drinking status classification from clinic notes (oneFieldPerPatient)

automated json schema validation

write a pre-commit hook script to automatically validate output against the json schema

esilgard / argos_nlp Goto Github PK

argos_nlp's People

Contributors

Stargazers

Watchers

Forkers

argos_nlp's Issues

Recommend Projects

Recommend Topics

Recommend Org