esilgard / argos_nlp Goto Github PK
View Code? Open in Web Editor NEWmain repository for the argos/hidra nlp engine
License: Apache License 2.0
main repository for the argos/hidra nlp engine
License: Apache License 2.0
use sklearn's fit_transform to turn SVM classifier into estimator and output varying confidence levels per instance
create algorithm to classify multiple myeloma risk classification according to the
International Myeloma Working Group (IMWG) Molecular Classification of Multiple Myeloma
http://myeloma.org/ArticlePage.action?articleId=3069
create algorithm to extract oncoplex and other mutational analysis test results from pathology reports
algorithm for extraction of specimen size(s) from pathology reports (oneFieldPerSpecimen)
** this may break out into disease specific algorithms **
write in new risk classification algorithms for other disease groups (like MDS) based on karyotypes parsed from genetics reports
NOTE
this also may mean breaking out the AML SWOG classification from classify_heme_category.py (depending on how different the salient variations are from other risk stratifications)
fix offsets when karyotype ends up on multiple lines
example
ISCN Diagnosis: 44,XX,t(4;16)(q21;q22),del(5)(q13q33),del(7)(q11.2),-12,add(17)(p11.2), -20[18]/46,XX[2]
currently the engine's highlighting the next line down
Summary: POSITIVE for translocation 4;16, deletion 5q, 7q, and 17p, and loss of 12 and 20
fix the character offsets in the 'normal' field (the count of regular XX or XY clones)
(seems to be off by -2)
example:
46,XY,der(21)t(11;21)(q13;q22)[2]/46,XY[18]
currently is highlighting the '6,' before 'XY'
change OneFieldPerSpecimen to a re.match instead of re.search -> driving off text files with regex instead of strings for a smaller set of slightly more complex pattterns
write in the ability for the cytogenetics branch to process reports for the "all" (or unknown) disease group parameter
this means either:
create hook script to automatically sync/check dropdown field values from metadata.json against data_dictionary options for each algorithm/field
add algorithm to classify prognosis risk based on the "5-group cytogenetic classification of MDS"
(used in IPSS http://www.mds-foundation.org/ipss-r-calculator/)
Prognosis | Single | Double | Complex |
---|---|---|---|
Very Good | -Y; del(11q) | ||
Good | normal; del(5q); del(20q);del(12p) | including del(5q) | |
Intermediate | del(7q); +8; i(17q); +19; any other | any | |
Poor | -7; inv(3)/t(3q)/del(3q) | Including -7/del(7q) | any 3 |
Very Poor | >any 3 |
fix bug in cytogenetics parsing that counts similar types of mutations twice
example:
49,XY,der(3)t(3;16)(q12;q22),+8,der(16)inv(16)(p13.1q22)t(3;16)(q12;q22),+21,+22[20]
currently the field '3q' shows a cell count of 40
(adding both the derivation and translocation of (3;16)
pull in algorithm for tubule formation in breast cancer pathology reports (based off oneFieldPerReportML)
bring clinical pipeline from dev into master repo (currently houses prognostic staging algorithm)
fix offset bug produced by stripping off the leading '//' from the karyotype string
(off by -2)
example:
//46,XY[7]
currently highlights '//46,XY[' as the karyotype string
pull in algorithm for mitotic count in breast cancer pathology reports (based off oneFieldPerReportML)
create algorithm to extract gleason grade from prostate cancer pathology reports (oneFiledPerReport)
pull in algorithm to classify nottingham grade in breast cancer reports (based off oneFieldPerReportML)
algorithm for extraction of lymph node involvement from pathology reports (oneFieldPerSpecimen)
swap out keyword voting algorithm for a machine learning based classification of pathology reports by disease group (based of CSS dataset)
change cytogenetics data model so that references to former cell lines (sl, sdl2, etc.) will also collect the character offsets of the previous abnormalities. This means a change in data structure so that each abnormality is assigned character offsets. (as opposed to only each clone/cell line).
move the datetime creation out of OneFieldPerReport into separate module or class for use by other modules (and other report branches hopefully) and extend to include other potential date string input formats and datetime output formats
create algorithm to extract PSA from prostate cancer pathology reports (oneFiledPerReport)
algorithm for smoking status classification from clinic notes (oneFieldPerPatient)
add in a field for FISH results (prefix NF) vs regular cytogenetics (prefix NE) to identify testing method
currently there are no character offsets captured for the following fields:
insufficient
trisomies
monosomies
mutations
Similar to pathology, are there example disease groups for Cytogenetics? Like how pathology has 'brain' and 'head_neck'.
On inspecting process.py
there is a comment saying "disease specific cytogenetics data dictionary (no general algorithms currently)", but when I look at data_dictionary.txt
I can't see any disease specific data.
Apologies in advance, I'm new to the area of health and don't understand most medical terms so I may be missing something basic. Could you please provide a sample usage with sample disease groups for Cytogenetics?
algorithm for drinking status classification from clinic notes (oneFieldPerPatient)
write a pre-commit hook script to automatically validate output against the json schema
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.