algolab / malva Goto Github PK

View Code? Open in Web Editor NEW

9.0 8.0 4.0 19.84 MB

genotyping by Mapping-free ALternate-allele detection of known VAriants

Home Page: https://algolab.github.io/malva/

License: GNU General Public License v3.0

Shell 1.14% Makefile 0.09% C++ 25.28% C 73.13% CMake 0.36%

bioinformatics alignment-free genotyping ngs vcf

malva's People

Contributors

Stargazers

Watchers

Forkers

smw1414 flywind2 mpre bunco3

malva's Issues

Long reads?

Hello,

Can the test reads be long reads? In theory I don't see why not, but do you think the results would be as robust? I am just worried the Kmer counting does something wild.

Thanks

Alex

Installation failure, can't find htslib

I'm trying to get MALVA installed on a Linux server (Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-112-generic x86_64)) following the instructions on the README in the root directory and get the following error:

./malva-geno: error while loading shared libraries: libhts.so.3: cannot open shared object file: No such file or directory

No changes were made to the build instructions, and there were no reported errors during compilation of sdsl-lite, kmc, and htslib, or MALVA itself.

Empty VCF in output

Nice work with malva. I tried this on several R1 FASTQs of paired-end 150x2 whole-genome sequencing datasets, and end up with empty VCFs in all cases. Can you help debug? Here are the commands used:

mkdir sample_name
kmc -t16 -m64 -k43 -f /path/to/fastqs/fastq_name_S4_R1_001.fastq.gz sample_name/sample_name.res sample_name
malva-geno -k35 -r43 -b16 /path/to/reference/b37.fasta fingerprint_snps.vcf sample_name/sample_name.res > sample_name/sample_name.fingerprint.vcf

Logs from kmc reported:

Stats:
   No. of k-mers below min. threshold :    956248069
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :   1888206968
   No. of unique counted k-mers       :    931958899
   Total no. of k-mers                :  12689712777
   Total no. of reads                 :    116433462
   Total no. of super-k-mers          :    792031281

Logs from malva-geno reported:

[malva-geno/Reference parsing] Time elapsed 0s
[malva-geno/Reference processed] Time elapsed 6s
[malva-geno/VCF parsing (Bloom Filter construction)] Time elapsed 6s
[malva-geno/Processed 5000 variants] Time elapsed 18s
[malva-geno/Processed 6246 variants] Time elapsed 18s
[malva-geno/BF creation complete] Time elapsed 27s
[malva-geno/Reference BF construction] Time elapsed 27s
[malva-geno/Reference BF creation complete] Time elapsed 80s
[malva-geno/KMC output processing] Time elapsed 94s

vcfs can have "chr" in the chromosome name

malva-geno crashes when the vcf has "chr" in a name. Commenting out the section of main.cpp that strips "chr" from the id in the reference map lets malva run fine. I'd recommend placing the burden on the users to match vcf and reference naming conventions.

libkmc.so not found

Hello, I have installed MALVA following the conda approach but it returns

error while loading shared libraries: libkmc.so: cannot open shared object file: No such file or directory

Is there an issue with the bioconda recipe? thanks

Speeding up malva using more threads

First, thanks for a very nice tool!

I have a couple of questions:

Is it possible to speed up Malva using more threads? I know that KMC easily can use more threads, meaning that the first step of Malva (getting kmers from the reads) can be parallelized, but can malva-geno (computing the signatures and performing the genotyping) also be parallelized in any way?
Assume I want to genotype many samples. I guess that one could potentially save a lot oftime by not computing the variant signatures and reference kmers for each sample (since these should be the same, given the input variants). It seems now that malva-geno does this every time a new sample is genotyped. Is it possible to re-use this data so that genotyping of new samples would become faster?

Thanks!

MALVA (malva-geno?) gets killed

I installed Malva using conda into a docker contianer and am running it with Nextflow.

RUN conda update conda \
    && conda install -c bioconda malva=1.1.1

The command to run MALVA is:

#!/bin/bash -ue
MALVA \
  GRCh38_full_analysis_set_plus_decoy_hla.fa \
  ALL.chr1.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz \
  GRC194242437_trimmed.fq.gz \
  > GRC194242437.vcf

Thre known variants are the 1000 Genomes 2019 high-depth GRCH38 variants.

I am using the 1000 Genomes version of GRCh38.

The fastq file is from a concatentation of paired-end reads (i.e. all read1 hen all read2; not interlaced). They contain only 1x worth of reads.

The MALVA script outputs:

[MALVA] Running KMC
[MALVA] Running malva-geno
[malva-geno/Reference parsing] Time elapsed 0s
[malva-geno/Reference processed] Time elapsed 10s
[malva-geno/VCF parsing (Bloom Filter construction)] Time elapsed 10s
/home/maxh/conda/bin/MALVA: line 93: 13284 Killed                  ${EXECUTABLE} -k ${k} -r ${refk} -e ${erate} -f ${freq} -s ${samples} -c ${maxcov} -b ${bfsize} ${reference} ${vcf_file} ${kmc_out_prefix}
[MALVA] Cleaning up

Thus KMC and malva-geno run but the run times of malva-geno are suspiciously short. I don't quite understand why bash prints out the line from MALVA that it does as this is not line 93, it is line 97 and it is what runs malva-geno. Infact it should never get to line 93 as this is only reached is kmc_pre and kmc_suf are not present.

What could be the cause? Resources? I limit MALVA to 8GB currently, is that too low?

MALVA with low coverage data

Hi there,

We'd like to try using MALVA on our own low-coverage WGS data (~1x). We've noticed that the MALVA release we're using (version 1.3.1; build h3889886_0) is only genotyping sites where a sample has >=2 coverage. Is there a way to modify the default behaviour to do this? There's nothing obvious in the provided flags but maybe it's possible to modify the original code.

algolab / malva Goto Github PK

malva's People

Contributors

Stargazers

Watchers

Forkers

malva's Issues

Long reads?

Installation failure, can't find htslib

Empty VCF in output

vcfs can have "chr" in the chromosome name

libkmc.so not found

Speeding up malva using more threads

MALVA (malva-geno?) gets killed

MALVA with low coverage data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent