Code Monkey home page Code Monkey logo

mcmzxx / mfmap.py Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 3.57 MB

This repository contains code for MFmap (model fidelity map), a semi-supervised generative model integrating gene expression, copy number and mutation data, matching cell lines to cancer subtypes. MFmap compresses high dimensional omics data of cell lines and bulk tumours into subtype informative low dimensional latent representations and predicts the subtype of cell lines with high accuracy and good generative performance. http://h2926513.stratoserver.net:3838/MFmap_shiny/

R 1.87% Python 96.46% Shell 1.68%
generative-model vae semi-supervised-learning cancer genomics cancer-stratification cell-lines tcga ccle high-dimensional-data

mfmap.py's Introduction

Model fidelity map (MFmap)

The MFmap is a semi-supervised generative model integrating omics data, matching cell lines to cancer subtypes. Publication based on MFmap:

MFmap web app

MFmap shiny app is publically available at (http://h2926513.stratoserver.net:3838/MFmap_shiny/)

MFmap methodology overview

Prepare input data for MFmap.

The original mutation and CNV data are represented as a binary matrix indicating the presence/absence of a DNA alteration in a cell line or a bulk tumour sample. This sparse binary matrix is projected onto a cancer reference network (Huang et al.,Bioinformatics 2018) and network diffusion algorithm is used to propagate the signals of single somatic events onto their network neighbours, resulting in continuous matrix with the same dimensions of the original binary matrix.

The architecture of MFmap.

MFmap has three components: an encoder, a classifier and a decoder, encoded by different colours in the above figure. Input layer of the encoder contains two views, network smoothed mutation and CNV data are concatenated and input into DNA view, RNA view inputs gene expression data. The encoder maps the input data of each sample into a latent representation vector <img src="https://render.githubusercontent.com/render/math?math=\vec{z}". The decoder uses <img src="https://render.githubusercontent.com/render/math?math=\vec{z}" to reconstruct the DNA and RNA views in its output layer. Subtypes of bulk tumours are used to train the classifier. During training, the classifier predicts subtypes of cell lines.

Visualising results output by MFmap.

To organise and summarise sample associations we used the visualisation concept of OncoGPS(Kim et al.,Cell Syst. 2017). The biological meanings of latent representations are annotated and MFmap reference map is generated. Both steps are based on bulk tumour latent representations learnt by MFmap. Samples can then be projected onto the MFmap reference map to visualise sample properties such as drug sensitivity and subtypes, complemented with the information of sample relationship.

Use MFmap

Clone this repository to use MFmap

git clone https://github.com/mcmzxx/mfMap.py.git
cd mfMap.py

# run the example using simulated data
bash run-example.sh

# run the example using realistic colon cancer data
mkdir data_bak
cd data_bak
# download the data from data repository (https://cloud.hs-koblenz.de/s/WFWjMq9pJ8i29WD)
cd ..
bash run_example_real_data.sh

Input Data

MFmap takes inputs of datasets containing both cell lines bulk tumours: (1) gene expression profile which is a gene by sample matrix, (2) DNA profile which is a gene by sample matrix, and (3) cancer subtype labels of bulk tumours.

In gene expression profile or DNA profile, values should be separated by tabs.

barcode TCGA.A6.2678.01 TCGA.AA.3950.01 TCGA.DM.A1HB.01 CL40_LARGE_INTESTINE SW403_LARGE_INTESTINE SNUC4_LARGE_INTESTINE
HSPA2 0.178085 0.573315 0.503795 0.547310 0.243164 0.495841
HSPA6 0.337385 0.556164 0.487331 0.531813 0.550296 0.784094
HIST1H4I 0.108904 0.511286 0.450405 0.488413 0.400680 0.446989
HSPA8 0.331599 0.600488 0.503966 0.562086 0.411304 0.495323
B2M 0.399159 0.561251 0.518686 0.608198 0.240596 0.479772

Cancer subtype label list has the following three columns, separated by tabs.

barcode subtype type
TCGA.A6.2678.01 NOLBL tumor
TCGA.AA.3950.01 CMS1 tumor
TCGA.DM.A1HB.01 CMS1 tumor
SNUC4_LARGE_INTESTINE NOLBL cell
SW403_LARGE_INTESTINE NOLBL cell
CL40_LARGE_INTESTINE NOLBL cell

mfmap.py's People

Contributors

mcmzxx avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

broadinstitute

mfmap.py's Issues

`run_example.sh` does not run

Hi -- I was hoping to test out this method but I'm having some trouble getting it working unfortunately.

After creating a fresh conda (as below) I've tried running the example script but unfortunately it's throwing an error -- which I've pasted the output of below. (I changed the path /storage/zhang/mfMap.py in the source code as needed to the equivalent on my machine)

I'm not super familiar with python code so do you have any tips on how I can get this to run?

Thanks,
Sam


conda create --name mfmap -c pytorch -c conda-forge \
  pytorch \
  numpy \
  pandas \
  scikit-learn \
  tensorflow

conda activate mfmap
(mfmap) ###@### mfMap.py % bash run_example.sh
Number of parameters: 4705052

SUPERVISED

(1, 1)
0.001
best train acc:0
best train er:inf
best validation acc:0
Number of parameters: 4705052

SUPERVISED

(1, 1)
0.001
best train acc:0
best train er:inf
best validation acc:0
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/###/git/mfMap.py/mfMAP_cpu.py", line 653, in <module>
    run_train()
  File "/Users/###/git/mfMap.py/mfMAP_cpu.py", line 502, in run_train
    train_accuracy,train_total_loss_ave,train_classifier_er_ave=train(e_index=epoch_index, e_num=FLAGS.p1_epoch_num+FLAGS.p2_epoch_num,k_cnv_recon=c1,k_expr_recon=c1, k_kl=c1,k_hbc=c2)
  File "/Users/###/git/mfMap.py/mfMAP_cpu.py", line 332, in train
    for batch_index, sample in enumerate(train_loader):
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 918, in __init__
    w.start()
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/###/miniconda3/envs/mfmap/lib/python3.9/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.