Code Monkey home page Code Monkey logo

multimodalmusig.jl's Introduction

MultiModalMuSig

Build Status Coverage Status codecov.io

A Julia package implementing several topic models used for mutation signature estimation.

The Multi-modal correlated topic model (MMCTM) takes an array of arrays of matrices, where the first column of each matrix is a mutation type index, and the second column is the mutation count for a particular sample.

The following example shows how to perform inference using the MMCTM on SNV and SV counts:

using MultiModalMuSig
using CSV
using DataFrames
using VegaLite
using Random

Random.seed!(42)

snv_counts = CSV.read("data/brca-eu_snv_counts.tsv", delim='\t')
sv_counts = CSV.read("data/brca-eu_sv_counts.tsv", delim='\t')

X = format_counts_mmctm(snv_counts, sv_counts)
model = MMCTM([7, 7], [0.1, 0.1], X)
fit!(model, tol=1e-5)

snv_signatures = DataFrame(hcat(model.ϕ[1]...))
sv_signatures = DataFrame(hcat(model.ϕ[2]...))

snv_signatures[:term] = snv_counts[:term]
snv_signatures = melt(
    snv_signatures, :term, variable_name=:signature, value_name=:probability
)
snv_signatures |> @vlplot(
    :bar, x={:term, sort=:null}, y=:probability, row=:signature,
    resolve={scale={y=:independent}}
)

snv_signatures

This code runs the MMCTM for 7 SNV and 7 SV signatures, with signature hyperparameters set to 0.1. Since these types of models can get stuck in poor local optima, it's a good idea to fit many models and pick the best one.

Sample-signature probabilities can be extracted like so:

# sample 3, SNV signature probabilities (modality 1)
model.props[3][1]
# SV signature probabilities (modality 2)
model.props[3][2]

# SNV probabilities for all samples
snv_props = hcat(
	[model.props[i][1] for i in 1:length(model.props)]...
)

The MMCTM can be run on multiple modalities, e.g.

X = format_counts_mmctm(snv_counts, sv_counts, indel_counts)
model = MMCTM([7, 7, 5], [0.1, 0.1, 0.1], X)

The DataFrame inputs to format_counts_mmctm have an optional term column, and further columns for each sample.

To run the CTM instead, just run the MMCTM with a single modality:

X = format_counts_ctm(snv_counts)
model = MMCTM([7], [0.1], X)
fit!(model, tol=1e-5)

The LDA implementation can be run like so:

X = format_counts_lda(snv_counts)
model = LDA(7, 0.1, 0.1, X)
fit!(model, tol=1e-5)

In the above code, both the sample-signature and signature-term hyperparameters have been set to 0.1, respectively. After fitting LDA, signatures can be found in model.β, and signature probabilities can be found in model.θ.

multimodalmusig.jl's People

Contributors

funnell avatar github-actions[bot] avatar

Watchers

 avatar

Forkers

mandronescu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.