Code Monkey home page Code Monkey logo

embeddings's Introduction

embeddings

This repository contains code accompanying publication of the paper:

Y. Choi, Y. Chiu, D. Sontag. Learning Low-Dimensional Representations of Medical Concepts. To appear in Proceedings of the AMIA Summit on Clinical Research Informatics (CRI), 2016.

In the base directory there are three files containing the two best 300-dimensional embeddings learned in the paper, and the embeddings used in the previous work which we compared to:

  • claims_codes_hs_300.txt.gz: Embeddings of ICD-9 diagnosis and procedure codes, NDC medication codes, and LOINC laboratory codes, derived from a large claims dataset from 2005 to 2013 for roughly 4 million people.
  • stanford_cuis_svd_300.txt.gz: Embeddings of UMLS concept unique identifiers (CUIs), derived from 20 million clinical notes spanning 19 years of data from Stanford Hospital and Clinics, using a data set released in a paper by Finlayson, LePendu & Shah.
  • DeVine_etal_200.txt.gz: Embeddings of UMLS CUIs learned by De Vine et al. CIKM '14, derived from 348,566 medical journal abstracts (courtesy of the authors).

In the eval directory there are three files of interest:

  • eval/Embedding_Evaluation.ipynb, an iPython notebook which reproduces the main results of the paper. If you come up with your own embeddings, you can use this benchmark to quantitatively compare them to our embeddings.
  • eval/visualize_claims_embeddings.py a Python program you can run which will allow you to look at nearest neighbors for the claims_codes_hs_300.txt embeddings (after decompressing the file using gunzip).
  • eval/visualize_stanford_embeddings.py, same as above but for the stanford_cuis_svd_300.txt embeddings.

Note that you may need to decompress, using gunzip, files in the eval directory prior to being able to run some of the programs. Additionally, to run the iPython notebook, you need to place the file MRCONSO.RRF from the UMLS Metathesaurus into the eval directory (we do not distribute this).

embeddings's People

Contributors

dsontag avatar

Watchers

James Cloos avatar LittleRedHat avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.