Code Monkey home page Code Monkey logo

deploid-dev / deploid Goto Github PK

View Code? Open in Web Editor NEW
20.0 6.0 9.0 159.72 MB

dEploid is designed for deconvoluting mixed genomes with unknown proportions. Traditional ‘phasing’ programs are limited to diploid organisms. Our method modifies Li and Stephen’s algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haloptype searches in a multiple infection setting.

Home Page: http://deploid.readthedocs.io/en/latest/index.html

License: GNU General Public License v3.0

Shell 2.74% C++ 91.23% R 0.12% Makefile 0.61% M4 3.55% HTML 0.85% C 0.84% Dockerfile 0.06% Ruby 0.01%
mcmc deconvoluting-mixed-genomes unknown-proportions malaria parasites phasing hmm-model

deploid's People

Contributors

bredelings avatar shajoezhu avatar stephengroat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deploid's Issues

lbeta function

Currently, likelihood is computed by
https://github.com/mcveanlab/PfDeconv/blob/updateSingleHap3/src/utility.cpp#L72
which uses lgamma function.

However, in the R version, I am using the lbeta function, see
https://github.com/mcveanlab/pf3k_mixed_infection/blob/master/src/sim_and_plot_src.r#L230

Hence I would like to use the lbeta in the C++ version too, just to make things consistent:
https://github.com/mcveanlab/PfDeconv/blob/updateSingleHap3/src/utility.cpp#L71

@jalmagro, I was wondering have you had experience in lbeta function implementation before?

cluster error

 ./pfDeconv -ref /well/mcvean/joezhu/pf3k/pf3k_5_1/labStrains/PG0389.C_ref.txt -alt /well/mcvean/joezhu/pf3k/pf3k_5_1/labStrains/PG0389.C_alt.txt -plaf /well/mcvean/joezhu/pf3k/pf3k_5_1/labStrains/labStrains_samples_PLAF.txt -panel /well/mcvean/joezhu/pf3k/pf3k_5_1/labStrains/clonalPanel.csv -exclude /well/mcvean/joezhu/pf3k/pf3k_5_1/labStrains/PG0389.C.exclude.csv 
Error: Number of sites was wrong (compared to ref count) in: /well/mcvean/joezhu/pf3k/pf3k_5_1/labStrains/labStrains_samples_PLAF.txt

this was due to quote was in labStrains_samples_PLAF.txt, which cause the excluded chrom info was not found

gprof output

./pfDeconv_prof -ref tests/PG0390_ref.txt -alt tests/PG0390_alt.txt -plaf tests/labStrains_samples_PLAF.txt -panel tests/clonalPanel.csv -o tmp1
gprof ./pfDeconv_prof gmon.out | less
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 19.51     26.00    26.00 316068221     0.00     0.00  Maths::Special::Gamma::log_gamma(double)
  9.88     39.17    13.17 316068221     0.00     0.00  Maths::Special::Gamma::logGammaFrac(double, double)
  8.15     50.03    10.86     3380     0.00     0.01  UpdatePairHap::calcFwdProbs()
  7.38     59.87     9.84 486419899     0.00     0.00  Maths::Algebra::Series::bcorr(double, double)
  4.41     65.75     5.88 809837860     0.00     0.00  sumOfMat(std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >
&)
  3.59     70.53     4.78     3380     0.00     0.00  UpdatePairHap::buildEmission()
  3.29     74.92     4.39 802488120     0.00     0.00  Maths::Special::Gamma::logBeta(double, double)
  3.11     79.07     4.15     3380     0.00     0.00  UpdatePairHap::samplePaths()
  3.00     83.06     4.00 802488120     0.00     0.00  Maths::Arithmetic::ln_add1(double)
  2.95     87.00     3.94 57848700     0.00     0.00  normalizeBySumMat(std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<doubl
e> > > >&)
  2.87     90.82     3.83 803480970     0.00     0.00  void std::vector<unsigned long, std::allocator<unsigned long> >::_M_emplace_back_aux<unsigned long const&>(unsigned long const&)
  2.81     94.58     3.75 173705892     0.00     0.00  void std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::_M
_emplace_back_aux<std::vector<double, std::allocator<double> > const&>(std::vector<double, std::allocator<double> > const&)
  2.59     98.03     3.45 229097319     0.00     0.00  UpdateHap::sampleNoReplace(std::vector<double, std::allocator<double> >, unsigned long)
  2.55    101.42     3.40    77428     0.00     0.00  std::vector<double, std::allocator<double> > vecSum<double>(std::vector<double, std::allocator<double> >&, std::vector<double, std:
:allocator<double> >&)
  2.47    104.72     3.30 514338525     0.00     0.00  normalizeBySum(std::vector<double, std::allocator<double> >&)
  2.30    107.78     3.06 574958028     0.00     0.00  void std::vector<double, std::allocator<double> >::_M_emplace_back_aux<double const&>(double const&)
  1.52    109.81     2.03 916596940     0.00     0.00  sumOfVec(std::vector<double, std::allocator<double> >&)
  1.47    111.76     1.96    10002     0.00     0.00  McmcMachinery::calcExpectedWsaf(std::vector<double, std::allocator<double> >&)
  1.46    113.70     1.94     3380     0.00     0.00  UpdatePairHap::addMissCopying()
  1.46    115.64     1.94 229100699     0.00     0.00  computeCdf(std::vector<double, std::allocator<double> >&)
  1.33    117.41     1.77     3302     0.00     0.00  UpdateSingleHap::buildEmission()
  1.26    119.09     1.68 57845320     0.00     0.00  UpdatePairHap::computeColMarginalDist(std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, 
std::allocator<double> > > >&)
  1.10    120.56     1.47 229093939     0.00     0.00  UpdateHap::sampleIndexGivenProp(std::vector<double, std::allocator<double> >)
  0.98    121.87     1.31     3302     0.00     0.00  UpdateSingleHap::addMissCopying()
  0.98    123.17     1.30     3302     0.00     0.00  UpdateSingleHap::samplePaths()
  0.95    124.44     1.27     3302     0.00     0.00  UpdateSingleHap::calcFwdProbs()
  0.95    125.70     1.26 229199595     0.00     0.00  std::mersenne_twister_engine<unsigned long, 64ul, 312ul, 156ul, 31ul, 13043109905998158313ul, 29ul, 6148914691236517205ul, 17ul, 8
202884508482404352ul, 37ul, 18444473444759240704ul, 43ul, 6364136223846793005ul>::operator()()
  0.94    126.95     1.25 229199595     0.00     0.00  MersenneTwister::sample()
  0.83    128.06     1.11 401244060     0.00     0.00  calcLLK(double, double, double, double, double)
  0.76    129.07     1.01    23444     0.00     0.00  calcLLKs(std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> >&, std::vector<double, std::all
ocator<double> >&)
  0.62    129.89     0.82    54080     0.00     0.00  void std::vector<std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double
> > > >, std::allocator<std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > > >::_M_emplace_back_aux<std::vector<st
d::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > const&>(std::vector<std::vector<double, std::allocator<double> >, std::alloca
tor<std::vector<double, std::allocator<double> > > > const&)

Input data format

In the current version, input takes PLAF, REF count, ALT count, reference panel
REF and ALT currently only supports one sample, i.e. one column (plus chrom and position columns)
maybe merge REF and ALT into one file? if it is only one sample?

seg fault found for PH0917

../pfDeconv_dbg -ref PH0917.Cx_ref.txt -alt PH0917.Cx_alt.txt -plaf asiaGroup1_samples_PLAF.txt -panel asiaGroup1Panel0.05.csv -seed 1
   ###########################################
   #            Initialization               #
   ###########################################
     Update proportion to:    0.0717476    0.00396933    0.805869    0.00411025    0.114304    
    Current log likelihood = -874800
   ###########################################
   #        Initialization finished          #
   ###########################################

   MCMC iteration: 0
    Attempt of update proportion   (successed) 
     Update proportion to:    0.0707023    0.0038571    0.80269    0.00406732    0.118684    
    Current log likelihood = -873790

   MCMC iteration: 1
    Update Single Hap 
     Updating hap: 2
      Update Chrom with index 0, starts at 0, with 1318 sites
      Update Chrom with index 1, starts at 1318, with 1476 sites
      Update Chrom with index 2, starts at 2794, with 1594 sites
      Update Chrom with index 3, starts at 4388, with 2939 sites
Segmentation fault (core dumped)

vcf format output

  • read in vcf,
    • extract ref and alt from vcf
  • write exceptions
  • write unittests
  • write out vcf
  • file names in log file

optimize panel

the following code should be optimized, include the in panel, and parse into updateHap

see void UpdateSingleHap::samplePaths() and void UpdateSingleHap::calcFwdProbs(), for

        double pRec = this->panel_->recombProbs_[j-1];
        double pRecEachHap = pRec / nPanel_;
        double pNoRec = 1.0 - pRec;

see void UpdatePairHap::samplePaths() and void UpdatePairHap:: calcFwdProbs(), for

        double pRec = this->panel_->recombProbs_[j-1];
        double pRecEachHap = pRec / nPanel_;
        double pNoRec = 1.0 - pRec;

        double recRec = pRecEachHap * pRecEachHap;
        double recNorec = pRecEachHap * pNoRec;
        double norecNorec = pNoRec * pNoRec;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.