Code Monkey home page Code Monkey logo

superheat-examples's Introduction

Examples accompanying the paper Superheat: An R package for creating beautiful heatmaps for visualizing complex data by Rebecca L Barter and Bin Yu

This paper contains three open source examples, each of which is based on a publicly available dataset. These examples are

  • Case study I: combining data sources to explore global organ transplantation trends

  • Case study II: uncovering correlations in language using word2vec

  • Case study III: predicting the brain's response to visual stimuli using fMRI

Each case study is described breifly below, and is accompanied by a reproducibility rating that describes how easy it would be for you to reproduce the results yourself. It is basically a reflection of how publicly available the data is.

Case Study I: combining data sources to explore global organ transplantation trends

The data for this example comes from two sources. The first is from the Global Observatory on Donation and Transplantation database curated by WHO which can be downloaded from here. The second dataset comes from the United Nations Development Program's Human Development Reports which can be downloaded from here.

Reproducibility: easy (simply download the data into the same directory)

Case study II: uncovering correlations in language using word2vec

Word2Vec is an extremely popular group of models for embedding words into high dimensional spaces such that their relative distances to one another convey semantic meaning. These models are quite remarkable and are an exciting step towards teaching machines to understand language.

In 2013, Google published pre-trained vectors trained on part of the Google News dataset which consists of around 100 billion words. The model contains 300-dimensional vectors for 3 million words and phrases and these vectors can be downloaded from here.

The original file is GoogleNews-vectors-negative300.bin.gz, and the convert_word2vec.py file (adapted from suggestions from this stack overflow post) in the /word2vec folder of this repository contains code to convert the original file to a .csv format. An R version has been generated and is called GoogleNews.RData.

Reproducibility: medium (need to download the large Google News dataset and then convert it to csv using a separate convert_word2vec.py)

Case study III: predicting the brain's response to visual stimuli using fMRI

This case study aims to evaluate the performance of a number of models of the brain's response to visual stimuli. The study is based on data collected from a functional Magnetic Resonance Imaging (fMRI) experiment performed on a single individual by the Gallant neuroscience lab at UC Berkeley~\citep{vu_nonparametric_2009, vu_encoding_2011}. fMRI measures oxygenated blood flow in the brain which can be considered as an indirect measure of neural activity (the two processes are highly correlated). The measurements obtained from an fMRI experiment correspond to the aggregated response of hundreds of thousands of neurons within cube-like voxels of the brain, where the segmentation of the brain into 3D voxels is analogous to the segmentation of an image into 2D pixels.

The data itself is hosted on the Collaborative Research in Computational Neuroscience (CRCNS) Data Sharing repository and can be found here. However, this repository only contains the voxel responses and the raw images. It does not contain the Gabor wavelet transformations.

Reproducibility: hard (the data repository unfortunately does not contain the Gabor wavelet features, and some of the code chunks were run using 24 cores of a computer cluster)

superheat-examples's People

Contributors

rlbarter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.