Code Monkey home page Code Monkey logo

marginphase's People

Contributors

benedictpaten avatar mhaukness avatar tpesout avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

marginphase's Issues

Illumina parameters

Hi there,

I noticed that in your paper you also used Illumina sequencing reads, is it possible to provide its default parameter file? Thanks.

Best,
Yudi

this is for phase variation or phase diploid assembly result ?

hi~
I thought marginPhase is for phase variation for resequencing project instead of phase diplod denovo assembly result to haploid genome? like HapCHAT ?
I am not sure this, maybe this is a litter stupid question, sorry ~

thanks~

best regard
Si

reference genome

Hi @benedictpaten

thanks for delivering this tool.

We would like to see its performance for haplotyping a genome which we generated through Canu. The organism is diploid but highly heterozygous, haploid size aprox. 500Mbp. We also have the raw files (PacBio RSII).

For your pipeline to be more effective, shall I use as reference the diploid (Canu) or the haploid sequence (for now I have tested Haplomerger2 and PurgeHaplotigs post-Canu)? So I am just wondering which fasta reference to feed to your program.

Thanks,
Amina

Unable to checkout repository

Hi!

Somebody recommended I use marginPhase to get a VCF for our data. I'm trying to check out the repository, but I'm unable to. Any thoughts?

git clone [email protected]:benedictpaten/marginPhase.git
Cloning into 'marginPhase'...
Warning: Permanently added the RSA host key for IP address '192.30.253.112' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

this repo is enormous

-> % git clone https://github.com/benedictpaten/marginPhase.git
Cloning into 'marginPhase'...
remote: Enumerating objects: 89, done.
remote: Counting objects: 100% (89/89), done.
remote: Compressing objects: 100% (70/70), done.
remote: Total 6855 (delta 46), reused 51 (delta 19), pack-reused 6766
Receiving objects: 100% (6855/6855), 342.42 MiB | 1.09 MiB/s, done.
Resolving deltas: 100% (3736/3736), done.

What in this repository requires 342 MB? Can this be brought down if it's not essential?

marginPhase error: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed.

Hi @benedictpaten

do you know what would cause error Assertion seqLen >= rProbs->length + rProbs->refStart' failed. ?

here is the execution log:

aechchik@dee-serv07:/scratch/beegfs/monthly/aechchik/amphioxus/mp$ ../../build/marginPhase /scratch/beegfs/monthly/aechchik/amphioxus/mp/reads_to_haploADref.bam  /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa marginPhase/params/params.pacbio.json
Set log level to INFO
> Parsing model parameters from file: marginPhase/params/params.pacbio.json
> Parsing input reads from file: /scratch/beegfs/monthly/aechchik/amphioxus/mp/reads_to_haploADref.bam
        Created 9724438 profile sequences
> Parsing prior probabilities on positions from reference sequences: /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa
marginPhase: /scratch/beegfs/monthly/aechchik/amphioxus/mp/marginPhase/impl/referencePriorProbs.c:171: createReferencePriorProbabilities: Assertion `seqLen >= rProbs->length + rProbs->refStart' failed.
Aborted

thanks
Amina

Test failing due to illegal instruction (Debian packaging)

Hello,

I am currently packaging marginPhase1 for the Debian-Med2 packaging team. Unfortunately, the test fails and I am not sure why. Right after "Creating read partitioning HMMs", I get an "Illegal instruction". The full log of my build is available here, to which the test is also ran and its output is visible: https://paste.debian.net/1154344/

Any suggestions or solution would be much appreciated.

Kind regards,
Shayan Doust

Compilation error

Hello,

I have followed the installation instructions and got this error message when compiling:

[  1%] Building C object CMakeFiles/hts.dir/externalTools/htslib/hfile_s3.c.o
/home/OXFORDNANOLABS/bsipos/soft/marginPhase/externalTools/htslib/hfile_s3.c:70:2: error: #error No HMAC() routine found by configure
 #error No HMAC() routine found by configure
  ^
/home/OXFORDNANOLABS/bsipos/soft/marginPhase/externalTools/htslib/hfile_s3.c: In function ‘s3_rewrite’:
/home/OXFORDNANOLABS/bsipos/soft/marginPhase/externalTools/htslib/hfile_s3.c:335:30: error: ‘DIGEST_BUFSIZ’ undeclared (first use in this function)
         unsigned char digest[DIGEST_BUFSIZ];
                              ^
/home/OXFORDNANOLABS/bsipos/soft/marginPhase/externalTools/htslib/hfile_s3.c:335:30: note: each undeclared identifier is reported only once for each function it appears in
/home/OXFORDNANOLABS/bsipos/soft/marginPhase/externalTools/htslib/hfile_s3.c:336:9: warning: implicit declaration of function ‘s3_sign’ [-Wimplicit-function-declaration]
         size_t digest_len = s3_sign(digest, &secret, &message);
         ^
make[2]: *** [CMakeFiles/hts.dir/externalTools/htslib/hfile_s3.c.o] Error 1
make[1]: *** [CMakeFiles/hts.dir/all] Error 2
make: *** [all] Error 2

Are there any unstated dependencies which can cause this?

Regards,
Botond

Getting generic error while running

Hi Benedict,

I have been trying to run the software. Apparently everything works well (compiling, etc.) but if try to run it after a while that the tool is running I get this error:

Set log level to INFO

Parsing model parameters from file: /rugpfs/fs0/vgl/store/gformenti/bin/marginPhase/params/params.pacbio.json
Parsing input reads from file: ../pb_aligned/all_sorted.bam
/var/spool/slurmd/job9041593/slurm_script: line 3: 62563 Killed marginPhase ../pb_aligned/all_sorted.bam ../GCA_003692655.1_Chelidonia_genomic.fna /rugpfs/fs0/vgl/store/gformenti/bin/marginPhase/params/params.pacbio.json

And no further detail.

Thank you in advance for your help.

Best,

Giulio

general enhancement

Not really an issue, but a documentation possible enhancement, whenever you have time :)

  1. maybe mention that the params file can be found in the params folder. Also, probably you're working on that, but maybe it would be good to know a bit more on how these params were generated (what's the reasoning behind)? And what's the difference between the gap file and the other (for PacBio & Nanopore)?

  2. I got an error while feeding a zipped reference fasta. Maybe worth mentioning that in the docs? The previous bam step already took a while for me.

> Parsing prior probabilities on positions from reference sequences: /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa.gz.  Maybe you should run 'samtools faidx /scratch/beegfs/monthly/aechchik/amphioxus/hm2/amphio_A_ref_D.fa.gz'
  1. and yes, as it was discussed in another issue, explicit that the reference file should be the haploid version of the genome.

  2. is it feasible (or are you planning for next release) to multi-thread the operations? from what I see, is that it is memory intensive (about 4x the size of the genome I am feeding in, ~500Mbp genome -> ~2Gb RAM) but not (yet) parallelizable

I'll add here more points in case I get more ideas.
Keep up the good work!

Best,
Amina

Margin could not phase some HiFi variants with UL reads

For polishing HPRC diploid assemblies we need to phase the variants called by applying DeepVariant on the alignments of all HiFi reads to each haplotype. We are using WhatsHap and Margin to phase HiFi variants with Ultra Long reads (>100kb). Since Margin is multi-threaded it is boosting the speed of our pipeline versus WhatsHap which is not multi-threaded. So we are inclined to use Margin. However we noticed that there exist some variants phased by WhatsHap but not by Margin. I made a small set of three variants (per haplotype) to make this issue easily reproducible. I'm explaining the issues in more detail below and also attaching the IGV snapshots of the three variants left unphased in the paternal haplotype (hap1). The equivalent variants on the other haplotype (maternal or hap2) are having the same issue so I skip writing about them but they can be explored and investigated using the files and IGV sessions provided at the end of this post.

Examples:

  1. Location: HG002#1#JAHKSE010000019.1:39117238-39117397
    This variant is happening in a TG dimer repeat. There is a deletion of length two that has be phased here (genotype = TTG/T). As I mentioned above this variant is called from the HiFi alignments, which are not shown in the screenshot. The UL reads are shown in two groups. Just for more informative visualization these two groups are phased naively in IGV by the nearest SNP which is phased by both WhatsHap and Margin. As expected the UL reads are not very clean here however there exist some signal for phasing and WhatsHap could use it to phase the variant but Margin could not. Top window is showing the variant record in the WhatsHap output and the bottom one is for Margin.

Example_1_Hap1

  1. Location: HG002#1#JAHKSE010000052.1:29343390-29343498
    This variant similar to the first one is happening in a TG dimer repeat and the deletion is of length 2 (genotype = AGT/A). Again UL alignments are phased naively by the nearest phased SNP. Similar to the previous case it is not clean but there is some signal.

Example_2_Hap1

  1. Location: HG002#1#JAHKSE010000064.1:12689434-12689590
    In this case we have two variants beside each other. The first one is a 5-bp deletion with the genotype of ACAAAG/A and the other one is a 1-bp insertion with the genotype of G/GA. Again the UL reads are phased by the nearest SNP. WhatsHap could phase both variants correctly however Margin leaves the insertion unphased.

Example_3_Hap1

The related files such as the raw and phased vcf files, assembly fasta files, UL alignments and HiFi alignments are all uploaded here:
https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/04ab5ae4-0170-11ee-904c-0a13c5208311--HPRC_Polishing/HPRC_Y1/HG002/margin_issues/

"igv_session.xml" can be downloaded locally and loaded in IGV to explore the variants explained above.

Below are the WDL files used for running Margin and WhatsHap:
https://github.com/miramastoras/hpp_production_workflows/blob/79d7f972700ec7f87f0a920c1eb2cf34cc03c692/QC/wdl/tasks/marginPhase.wdl
https://github.com/miramastoras/hpp_production_workflows/blob/79d7f972700ec7f87f0a920c1eb2cf34cc03c692/QC/wdl/tasks/whatsHapPhase.wdl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.