iqbal-lab-org / minos Goto Github PK
View Code? Open in Web Editor NEWVariant call adjudication
License: MIT License
Variant call adjudication
License: MIT License
minos adjudicate --reads /data1/kdyoung/KNTA/Batch01_20211129/1_S1_L001_R1_001_filtered.fastq.gz --reads /data1/kdyoung/KNTA/Batch01_20211129/1_S1_L001_R2_001_filtered.fastq.gz /home/kdyoung/01.Variant/10.KNTA/Batch01/1/01.snp/04.minos_vcf/ /home/kdyoung/bacteria_sample/ref/06.Mycobacterium/H37Rv/Mycobacterium.fasta /home/kdyoung/01.Variant/10.KNTA/Batch01/1/01.snp/01.samtools/1_H37Rv.vcf /home/kdyoung/01.Variant/10.KNTA/Batch01/1/01.snp/02.delly/DELLY_1.vcf --force --sample_name sample_1
output file only log file why not final.vcf ?
Here's an error log that appeared on multiple samples while running minos adjudicate
on 50K CRyPTIC Mtb samples:
Command error:
Traceback (most recent call last):
File "/usr/local/bin/minos", line 210, in <module>
args.func(args)
File "/usr/local/lib/python3.6/dist-packages/minos/tasks/adjudicate.py", line 22, in run
adj.run()
File "/usr/local/lib/python3.6/dist-packages/minos/adjudicator.py", line 152, in run
self._run_gramtools_with_split_vcf()
File "/usr/local/lib/python3.6/dist-packages/minos/adjudicator.py", line 332, in _run_gramtools_with_split_vcf
filtered_outfile=split_vcf_out,
File "/usr/local/lib/python3.6/dist-packages/minos/gramtools.py", line 245, in write_vcf_annotated_using_coverage_from_gramtools
filtered_record = update_vcf_record_using_gramtools_allele_depths(vcf_records[i], all_allele_coverage[i][0], all_allele_coverage[i][1], allele_groups, mean_depth, read_error_rate, kmer_size)
gtyper.run()
File "/usr/local/lib/python3.6/dist-packages/minos/genotyper.py", line 162, in run
self._calculate_log_likelihoods()
File "/usr/local/lib/python3.6/dist-packages/minos/genotyper.py", line 127, in _calculate_log_likelihoods
non_zeros,
File "/usr/local/lib/python3.6/dist-packages/minos/genotyper.py", line 76, in _log_likelihood_homozygous
allele_depth * math.log(mean_depth),
ValueError: math domain error
.command.stub: line 99: 24118 Terminated nxf_trace "$pid" .command.trace
Example of command executed:
Command executed:
sample_name=$(grep "^#[^#]" /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_root/00/01/23/46/12346/Pipelines/1/variant_call/0.7.3.ref.3/minos/final.remasked.vcf | awk '{print $NF}') # Extract the sample name from the original vcf
sample_name_in_file=${sample_name//\//_} # Replace "/" with something reasonable
minos_outdir=small_vars.minos.11656
minos adjudicate --sample_name $sample_name --gramtools_build_dir /hps/nobackup2/zi/Cryptic_releases/nextflow.work.100chunks/82/882e1e9eafc2c6119fb917b4bf2fcd/gramtools.build_dir --reads /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_root/
00/01/23/46/12346/Pipelines/1/variant_call/0.7.3.ref.3/samtools/rmdup.bam $minos_outdir /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_refs/3/ref.fa "/hps/nobackup2/zi/Cryptic_releases/nextflow.work.100chunks/b1/9dcef20eef89a7e02e2bbb6eadc6d2
/small_vars_clustered.vcf"
I have reproduced this locally and am looking into it
Keep complete VCF for debugging. Filter final VCF by remove all the alleles that had zero coverage across their length because that means they weren't seen in any of the reads.
... now that gramtools can do it.
See iqbal-lab-org/gramtools#83
I have not been able to successfully use minos as a result of errors associated with dependencies. Particularly, the options passed to gramtools when running "minos adjudicator" seems to be incorrect and often return an error. For example, the run_gramtools function in the minos/gramtools.py script specifies "quasimap" command. This is not recognized by gramtools and thus stops the run.
Already, I have spent several hours trying to detect and troubleshoot but still couldn't resolve the issues. Can anyone please help?
Filter with default DP>2 and GT_CONF_PERCENTILE >2.5.
https://samtools.github.io/hts-specs/VCFv4.2.pdf
FORMAT headers can only have the following keys: ID, Number, Type, and Description.
I have a depth header from minos (v0.5.1 according to the header) which is the folllowing:
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="total kmer depth from gramtools",Source="minos">
Source
keys can only be in INFO fields. So I guess remove it from FORMAT and create an INFO field also for DP? Either way it needs to be removed from the DP FORMAT header
Try to fix issues in current VCF output by minos, according to the latest spec, which is linked here: https://samtools.github.io/hts-specs/
Known issues:
Use --max_alleles_per_cluster
option in the nextflow process cluster_small_vars_vcf
when --norun
used, should sys.exit()
after printing the command to run
@bricoletc Why not just run the usual python3 setup.py test
?
I have a multisample VCF from minos v0.5.1 where some samples are missing the GT_CONF_PERCENTILE field. It seems to only be on samples with a null genotype (./.
) An example of the INFO string
GT:DP:COV:GT_CONF:GT_CONF_PERCENTILE:STATUS
and an example incorrect sample entry
./.:0:0,0:0.0:FAIL
In reference to this issue:
#10
The list of ALT alleles found in final VCFs outputted by multi sample pipeline differs between files.
REFGEN POS . AG AT . . KMER=7 DP:GT:COV:GT_CONF:GT_CONF_PERCENTILE 47:0/1:35,12:89.47:2.44
REFGEN POS . AG AT,GG,GT . . KMER=7 DP:GT:COV:GT_CONF:GT_CONF_PERCENTILE 12:0/0:12,0,0,0:123.74:15.31
See #116 (comment)
From VCF spec 4.3:
"ALT field must be a symbolic allele, or a breakend replacement string, or match the regular expression ^([ACGTNacgtn]+|\*|\.)$
."
We only want to consider ACGT alleles, and so discard any alleles that do not match this regex: ^[ACGTacgt]+$
.
At the end of the regenotyping pipeline, it would be very easy to dump the following
Then at the end we can just "cat" all the bitarrays for ./. or het, and cat all the intvectors, and then the distance measurement is trivial:
dist=0
for i= 0 to number of records-1
for j= i to number of records-1
if the bitfield[i]==bitfield[j]==0 (meaning it is neither missing nor het)
if the int vector[i] != int vector [j]
dist++
Ought to be v fast
Split the reference approximately every N kilobases.
Don't split inside a variant (ie inside REF string of VCF).
Need to include at least one read length of reference genome flanking each variant, so that quasimap works and can call the variant. This means chunks can overlap.
Run variant calling on each chunk.
When putting back together, for variants that are in >1 chunk, only use the call with the biggest genotype confidence.
Although not required according to the VCF spec, some tools break if it's not there. Example header line:
##contig=<ID=foo,length=12345>
At the moment the log file just has the version of the programs.
Throws this nextflow error:
[13/ec0354] NOTE: Missing output file(s) gramtools.build_dir/split*.vcf
expected by process split_vcfs
-- Execution is retried (1)
ERROR ~ Error executing process > 'minos_all_small_vars (12)'
Caused by:
Process minos_all_small_vars (12)
terminated with an error exit status (1)
Command executed:
sample_name=$(cat sample_name.37)
minos_outdir=small_vars.minos.37
minos adjudicate --sample_name $sample_name --gramtools_build_dir /hps/nobackup/iqbal/zi/Cryptic_releases/nextflow.work/f2/855ff40b18dd238c529b231e1f7d95/gmtools_build_dir --reads /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_root/00/00/05/27/527/Pipelines/1/variant_call/0.7.3.ref.3/samtools/rmdup.bam $minos_outdir /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_refs/3/ref.fa "small_vars_clustered.vcf"
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pyfastaq/utils.py", line 21, in open_file_read
f = open(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/hps/nobackup/iqbal/zi/Cryptic_releases/nextflow.work/77/f9b9651dc28b949dfcf91257f014f0/gramtools.build_dir/split.0.gramtools_build/perl_generated_vcf'
Work dir:
/hps/nobackup/iqbal/zi/Cryptic_releases/nextflow.work/27/fc27423b35372a9638fbed7c3b4976
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
Check what happens if there is whitespace in fasta headers of input file. It might break the pipeline because names then might not match what is in the VCF file.
Could ignore the unmapped reads BAM file by changing this line:
https://github.com/iqbal-lab-org/minos/blob/master/minos/multi_sample_pipeline.py#L411
so that the --reads option is only used once, to give the mapped reads bam for that chunk.
ie put "./." as the genotype when not enough info to call it.
Use --output-directory
instead. gramtools reporting this warning:
2018-05-22 13:55:37,777 gramtools WARNING Depreciated argument: --run-directory; instead use: --output-directory
Having successfully got all the way to adjudication at the end, i reran with -resume, and saw this
7a/87355b] Cached process > process_input_vcf_file (52667)
[07/7479e1] Cached process > process_input_vcf_file (52670)
[e7/e96cb4] Cached process > process_input_vcf_file (52671)
[fa/0df118] Cached process > process_input_vcf_file (52669)
[e0/8a8c50] Cached process > process_input_vcf_file (52668)
[8a/011b05] Cached process > process_input_vcf_file (52673)
[5c/925d24] Cached process > process_input_vcf_file (52672)
[7c/6d669a] Cached process > process_input_vcf_file (52674)
[f5/ee5638] Submitted process > pre_cluster_small_vars_merge
Currently gramtools build of chunks happens in a python loop, possibly with threads if told to do so.
Line 205 in c9d856c
this is a massive and unnecessary bottleneck when running on tens of thousands of samples,
This is the build times for the first 600-odd chunks (out of 5000) when running on 50,000 TB genomes. As you can see, the half of the chunks take less than 140 seconds to build. 3 of them take 16 mins (1000sec). 5 of them take over 30 mins.
minos multisample outputs a nextflow script. There are command-line args for most of the RAM limits for the sub-stages, but not this:
params.pre_cluster_small_vars_merge_ram = 8
suggest add a cmd line arg
comma or tab separated? how about pairwise sequencing?
can I use compressed files, like gz or tar.gz?
here is an example of my manifest.tsv, is this ok?
#########################################################################
name vcf reads reads
201550 NGS_Mapping/SNP/gvcf/201550.gvcf.gz 01.Cleandata/201550/201550.1.fq.clean.gz 01.Cleandata/201550/201550.2.fq.clean.gz
183207 NGS_Mapping/SNP/gvcf/183207.gvcf.gz 01.Cleandata/183207/183207.1.fq.clean.gz 01.Cleandata/183207/183207.2.fq.clean.gz
#########################################################################
thanks.
For chunking, need the bam to be indexed because we're pulling out regions of the bam file. Add check that:
Gramtools can fail and minos carries on. Dies later when it doesn't find expected output file from gramtools quasimap.
In setup.py
use entry_points=...
and use a __main__.py
file instead of the current script scripts/minos
.
If total REF length is more than (by default) 5% of the reference genome length, then stop.
Hello,
I am reaching out to you because I am encountering an error when running the minos program using the "minos.simg" image, and I am unable to resolve it.
The command I am executing is:
nextflow run -with-singularity minos.simg nextflow/regenotype.nf -profile medium --ref_fasta reference.fasta --manifest manifest.tsv --make_distance_matrix
It appears that all the steps are executed successfully until it reaches the minos part. The error displayed on the screen is as follows:
N E X T F L O W ~ version 22.04.0
Launching `nextflow/regenotype.nf` [big_chandrasekhar] DSL2 - revision: bfe1b8e0b5
executor > local (30)
executor > local (30)
[1c/b2812b] process > parse_manifest [100%] 1 of 1 ✔
[38/ad6887] process > make_vcf_for_gramtools:vcf_merge [100%] 1 of 1 ✔
[0c/cc6992] process > make_vcf_for_gramtools:vcf_cluster [100%] 1 of 1 ✔
[da/135eb4] process > gramtools_build [100%] 1 of 1 ✔
[15/3b24ae] process > minos (4) [100%] 19 of 19, failed: 18, retries: 12 ✔
[3a/e2191a] process > make_per_sample_vcfs_dir (1) [100%] 2 of 2, failed: 2, retries: 2
[80/232c20] process > ivcf_merge_chunks (1) [100%] 1 of 1 ✔
[62/f3d520] process > ivcf_final_merge [100%] 3 of 3, failed: 3, retries: 2 ✘
[- ] process > distance_matrix -
[8b/ca4db5] NOTE: Process `make_per_sample_vcfs_dir (1)` terminated with an error exit status (255) -- Execution is retried (1)
[4b/212478] NOTE: Process `make_per_sample_vcfs_dir (1)` terminated with an error exit status (255) -- Execution is retried (2)
[2a/9eeff2] NOTE: Process `ivcf_final_merge` terminated with an error exit status (255) -- Execution is retried (1)
[f7/49c909] NOTE: Process `ivcf_final_merge` terminated with an error exit status (255) -- Execution is retried (2)
Error executing process > 'ivcf_final_merge'
Caused by:
Process `ivcf_final_merge` terminated with an error exit status (255)
Command executed:
cat <<"EOF" > files.txt
work/80/232c20b8928f00a42a91876f6825c3/merged.vcf
EOF
ivcfmerge files.txt Compare_prueba/merged.vcf
touch done_file
Command exit status:
255
Command output:
(empty)
Command error:
FATAL: container creation failed: mount /proc/self/fd/3->/usr/local/var/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/3: failed to find loop device: could not attach image file to loop device: no loop devices available
Could you please assist me in resolving this issue? Alternatively, do you have any suggestions for possible alternatives?
I have already modified the maximum number of loops available in my computer:
$cat /etc/modules
options loop max_loop=256
8192eu
loop
Thank you in advance.
Do not remove alleles with no coverage on each per-sample run of minos during multi sample pipeline. Otherwise bcftools gets confused and outputs multiple lines where the ALT list is different between samples.
When building the singularity image, following the provided instructions, I got the following error.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-qw7in_h3/pillow/setup.py", line 1037, in <module>
raise RequiredDependencyException(msg)
__main__.RequiredDependencyException:
The headers or library files could not be found for jpeg,
a required dependency when compiling Pillow from source.
Please see the install instructions at:
https://pillow.readthedocs.io/en/latest/installation.html
I could solve it adding the the package libjpeg-dev
to the apt-get install
command in .ci/install_dependencies.sh
In rare cases, cortex has the filter column as "MISMAPPED_UNPLACEABLE". These are getting through into minos results. Filter these VCF records out right at the start of a minos run because they should not be used.
Install for minos fails due to broken install of gramtools@9313eceb606a6fc159e4a14c168b7a6f888c5ed2
E.g. when building from the Singularity.def in this repo
SUCCESS: sdsl was installed successfully!
The sdsl include files are located in '/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/include'.
The library files are located in '/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/lib'.
Sample programs can be found in the examples-directory.
A program 'example.cpp' can be compiled with the command:
g++ -std=c++11 -DNDEBUG -O3 [-msse4.2] \
-I/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/include -L/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/lib \
example.cpp -lsdsl -ldivsufsort -ldivsufsort64
Tests in the test-directory
A cheat sheet in the extras/cheatsheet-directory.
Have fun!
[ 28%] No install step for 'sdsl'
[ 29%] No test step for 'sdsl'
[ 30%] Completed 'sdsl'
[ 30%] Built target sdsl
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
ERROR: gramtools backend compilation returned: 2
----------------------------------------
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-of5i099r-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-7vv5y6wr-record/install-record.txt --single-version-externally-managed --compile" failed with error code 255 in /tmp/pip-of5i099r-build/
FATAL: failed to execute %post proc: exit status 1
FATAL: While performing build: while running engine: while running /usr/local/libexec/singularity/bin/starter: exit status 255
This issue also causes the clockwork container build to fail. See issue [86]
Gramtools v1.8.0 installs correctly. Potentially minos needs to be updated to work with gramtools v1.8.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.