Code Monkey home page Code Monkey logo

malva's People

Contributors

bunco3 avatar ldenti avatar mpre avatar yp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

malva's Issues

Long reads?

Hello,

Can the test reads be long reads? In theory I don't see why not, but do you think the results would be as robust? I am just worried the Kmer counting does something wild.

Thanks

Alex

Installation failure, can't find htslib

I'm trying to get MALVA installed on a Linux server (Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-112-generic x86_64)) following the instructions on the README in the root directory and get the following error:

./malva-geno: error while loading shared libraries: libhts.so.3: cannot open shared object file: No such file or directory

No changes were made to the build instructions, and there were no reported errors during compilation of sdsl-lite, kmc, and htslib, or MALVA itself.

Empty VCF in output

Nice work with malva. I tried this on several R1 FASTQs of paired-end 150x2 whole-genome sequencing datasets, and end up with empty VCFs in all cases. Can you help debug? Here are the commands used:

mkdir sample_name
kmc -t16 -m64 -k43 -f /path/to/fastqs/fastq_name_S4_R1_001.fastq.gz sample_name/sample_name.res sample_name
malva-geno -k35 -r43 -b16 /path/to/reference/b37.fasta fingerprint_snps.vcf sample_name/sample_name.res > sample_name/sample_name.fingerprint.vcf

Logs from kmc reported:

Stats:
   No. of k-mers below min. threshold :    956248069
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :   1888206968
   No. of unique counted k-mers       :    931958899
   Total no. of k-mers                :  12689712777
   Total no. of reads                 :    116433462
   Total no. of super-k-mers          :    792031281

Logs from malva-geno reported:

[malva-geno/Reference parsing] Time elapsed 0s
[malva-geno/Reference processed] Time elapsed 6s
[malva-geno/VCF parsing (Bloom Filter construction)] Time elapsed 6s
[malva-geno/Processed 5000 variants] Time elapsed 18s
[malva-geno/Processed 6246 variants] Time elapsed 18s
[malva-geno/BF creation complete] Time elapsed 27s
[malva-geno/Reference BF construction] Time elapsed 27s
[malva-geno/Reference BF creation complete] Time elapsed 80s
[malva-geno/KMC output processing] Time elapsed 94s

vcfs can have "chr" in the chromosome name

malva-geno crashes when the vcf has "chr" in a name. Commenting out the section of main.cpp that strips "chr" from the id in the reference map lets malva run fine. I'd recommend placing the burden on the users to match vcf and reference naming conventions.

libkmc.so not found

Hello, I have installed MALVA following the conda approach but it returns

error while loading shared libraries: libkmc.so: cannot open shared object file: No such file or directory

Is there an issue with the bioconda recipe? thanks

Speeding up malva using more threads

First, thanks for a very nice tool!

I have a couple of questions:

  1. Is it possible to speed up Malva using more threads? I know that KMC easily can use more threads, meaning that the first step of Malva (getting kmers from the reads) can be parallelized, but can malva-geno (computing the signatures and performing the genotyping) also be parallelized in any way?

  2. Assume I want to genotype many samples. I guess that one could potentially save a lot oftime by not computing the variant signatures and reference kmers for each sample (since these should be the same, given the input variants). It seems now that malva-geno does this every time a new sample is genotyped. Is it possible to re-use this data so that genotyping of new samples would become faster?

Thanks!

MALVA (malva-geno?) gets killed

I installed Malva using conda into a docker contianer and am running it with Nextflow.

RUN conda update conda \
    && conda install -c bioconda malva=1.1.1

The command to run MALVA is:

#!/bin/bash -ue
MALVA \
  GRCh38_full_analysis_set_plus_decoy_hla.fa \
  ALL.chr1.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz \
  GRC194242437_trimmed.fq.gz \
  > GRC194242437.vcf

Thre known variants are the 1000 Genomes 2019 high-depth GRCH38 variants.

I am using the 1000 Genomes version of GRCh38.

The fastq file is from a concatentation of paired-end reads (i.e. all read1 hen all read2; not interlaced). They contain only 1x worth of reads.

The MALVA script outputs:

[MALVA] Running KMC
[MALVA] Running malva-geno
[malva-geno/Reference parsing] Time elapsed 0s
[malva-geno/Reference processed] Time elapsed 10s
[malva-geno/VCF parsing (Bloom Filter construction)] Time elapsed 10s
/home/maxh/conda/bin/MALVA: line 93: 13284 Killed                  ${EXECUTABLE} -k ${k} -r ${refk} -e ${erate} -f ${freq} -s ${samples} -c ${maxcov} -b ${bfsize} ${reference} ${vcf_file} ${kmc_out_prefix}
[MALVA] Cleaning up

Thus KMC and malva-geno run but the run times of malva-geno are suspiciously short. I don't quite understand why bash prints out the line from MALVA that it does as this is not line 93, it is line 97 and it is what runs malva-geno. Infact it should never get to line 93 as this is only reached is kmc_pre and kmc_suf are not present.

What could be the cause? Resources? I limit MALVA to 8GB currently, is that too low?

MALVA with low coverage data

Hi there,

We'd like to try using MALVA on our own low-coverage WGS data (~1x). We've noticed that the MALVA release we're using (version 1.3.1; build h3889886_0) is only genotyping sites where a sample has >=2 coverage. Is there a way to modify the default behaviour to do this? There's nothing obvious in the provided flags but maybe it's possible to modify the original code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.