Code Monkey home page Code Monkey logo

minos's People

Contributors

bricoletc avatar iqbal-lab avatar jordivea-odap avatar martinghunt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

minos's Issues

i don't know error

minos adjudicate --reads /data1/kdyoung/KNTA/Batch01_20211129/1_S1_L001_R1_001_filtered.fastq.gz --reads /data1/kdyoung/KNTA/Batch01_20211129/1_S1_L001_R2_001_filtered.fastq.gz /home/kdyoung/01.Variant/10.KNTA/Batch01/1/01.snp/04.minos_vcf/ /home/kdyoung/bacteria_sample/ref/06.Mycobacterium/H37Rv/Mycobacterium.fasta /home/kdyoung/01.Variant/10.KNTA/Batch01/1/01.snp/01.samtools/1_H37Rv.vcf /home/kdyoung/01.Variant/10.KNTA/Batch01/1/01.snp/02.delly/DELLY_1.vcf --force --sample_name sample_1

output file only log file why not final.vcf ?

Logging issues

  1. Put full command line in log file
  2. Typos in line reporting estimating read length and error rate

Math error during genotyping

Here's an error log that appeared on multiple samples while running minos adjudicate on 50K CRyPTIC Mtb samples:

Command error:
  Traceback (most recent call last):
    File "/usr/local/bin/minos", line 210, in <module>
      args.func(args)
    File "/usr/local/lib/python3.6/dist-packages/minos/tasks/adjudicate.py", line 22, in run
      adj.run()
    File "/usr/local/lib/python3.6/dist-packages/minos/adjudicator.py", line 152, in run
      self._run_gramtools_with_split_vcf()
    File "/usr/local/lib/python3.6/dist-packages/minos/adjudicator.py", line 332, in _run_gramtools_with_split_vcf
      filtered_outfile=split_vcf_out,
    File "/usr/local/lib/python3.6/dist-packages/minos/gramtools.py", line 245, in write_vcf_annotated_using_coverage_from_gramtools
      filtered_record = update_vcf_record_using_gramtools_allele_depths(vcf_records[i], all_allele_coverage[i][0], all_allele_coverage[i][1], allele_groups, mean_depth, read_error_rate, kmer_size)
      gtyper.run()
    File "/usr/local/lib/python3.6/dist-packages/minos/genotyper.py", line 162, in run
      self._calculate_log_likelihoods()
    File "/usr/local/lib/python3.6/dist-packages/minos/genotyper.py", line 127, in _calculate_log_likelihoods
      non_zeros,
    File "/usr/local/lib/python3.6/dist-packages/minos/genotyper.py", line 76, in _log_likelihood_homozygous
      allele_depth * math.log(mean_depth),
  ValueError: math domain error
  .command.stub: line 99: 24118 Terminated              nxf_trace "$pid" .command.trace

Example of command executed:

Command executed:

  sample_name=$(grep "^#[^#]" /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_root/00/01/23/46/12346/Pipelines/1/variant_call/0.7.3.ref.3/minos/final.remasked.vcf | awk '{print $NF}') # Extract the sample name from the original vcf
  sample_name_in_file=${sample_name//\//_} # Replace "/" with something reasonable
  minos_outdir=small_vars.minos.11656
  minos adjudicate  --sample_name $sample_name --gramtools_build_dir /hps/nobackup2/zi/Cryptic_releases/nextflow.work.100chunks/82/882e1e9eafc2c6119fb917b4bf2fcd/gramtools.build_dir --reads /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_root/
00/01/23/46/12346/Pipelines/1/variant_call/0.7.3.ref.3/samtools/rmdup.bam $minos_outdir /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_refs/3/ref.fa "/hps/nobackup2/zi/Cryptic_releases/nextflow.work.100chunks/b1/9dcef20eef89a7e02e2bbb6eadc6d2
/small_vars_clustered.vcf"

I have reproduced this locally and am looking into it

Compatibility issues between minos' gramtools.py script and gramtools' command

I have not been able to successfully use minos as a result of errors associated with dependencies. Particularly, the options passed to gramtools when running "minos adjudicator" seems to be incorrect and often return an error. For example, the run_gramtools function in the minos/gramtools.py script specifies "quasimap" command. This is not recognized by gramtools and thus stops the run.

Already, I have spent several hours trying to detect and troubleshoot but still couldn't resolve the issues. Can anyone please help?

Invalid DP FORMAT header

https://samtools.github.io/hts-specs/VCFv4.2.pdf
FORMAT headers can only have the following keys: ID, Number, Type, and Description.

I have a depth header from minos (v0.5.1 according to the header) which is the folllowing:

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="total kmer depth from gramtools",Source="minos">

Source keys can only be in INFO fields. So I guess remove it from FORMAT and create an INFO field also for DP? Either way it needs to be removed from the DP FORMAT header

Null genotypes missing GT_CONF_PERCENTILE value

I have a multisample VCF from minos v0.5.1 where some samples are missing the GT_CONF_PERCENTILE field. It seems to only be on samples with a null genotype (./.) An example of the INFO string

GT:DP:COV:GT_CONF:GT_CONF_PERCENTILE:STATUS

and an example incorrect sample entry

./.:0:0,0:0.0:FAIL

Ignore all non-ACGT alleles from input VCF files

See #116 (comment)

From VCF spec 4.3:
"ALT field must be a symbolic allele, or a breakend replacement string, or match the regular expression ^([ACGTNacgtn]+|\*|\.)$."

We only want to consider ACGT alleles, and so discard any alleles that do not match this regex: ^[ACGTacgt]+$.

Dump binary encoded genotypes after regenotyping

At the end of the regenotyping pipeline, it would be very easy to dump the following

  1. Some kind of summary/signature of all the snps/indels in the VCF (might just be md5)
  2. for each sample, a JSON with two entries. One is a bitfield and one an integer array, each as long as the VCF has records (ie one bit/integer per record). In these we put:
  • for each record, set bit to 1 if genotype is either ./. or het
  • for each record, set integer to the (haploid) genotype.
    Once stored at the end of regenotyping, this will make distance measuring trivial

Then at the end we can just "cat" all the bitarrays for ./. or het, and cat all the intvectors, and then the distance measurement is trivial:

dist=0
for i= 0 to number of records-1
for j= i to number of records-1

if the bitfield[i]==bitfield[j]==0 (meaning it is neither missing nor het)
if the int vector[i] != int vector [j]
dist++

Ought to be v fast

Implement chunking the reference genome

Split the reference approximately every N kilobases.
Don't split inside a variant (ie inside REF string of VCF).
Need to include at least one read length of reference genome flanking each variant, so that quasimap works and can call the variant. This means chunks can overlap.
Run variant calling on each chunk.
When putting back together, for variants that are in >1 chunk, only use the call with the biggest genotype confidence.

removing perl_generated_vcf caused a bug

ERROR ~ Error executing process > 'minos_all_small_vars (12)'

Caused by:
Process minos_all_small_vars (12) terminated with an error exit status (1)

Command executed:

sample_name=$(cat sample_name.37)
minos_outdir=small_vars.minos.37
minos adjudicate --sample_name $sample_name --gramtools_build_dir /hps/nobackup/iqbal/zi/Cryptic_releases/nextflow.work/f2/855ff40b18dd238c529b231e1f7d95/gmtools_build_dir --reads /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_root/00/00/05/27/527/Pipelines/1/variant_call/0.7.3.ref.3/samtools/rmdup.bam $minos_outdir /nfs/leia/research/iqbal/mhunt/Cryptic_production/Pipeline_refs/3/ref.fa "small_vars_clustered.vcf"

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pyfastaq/utils.py", line 21, in open_file_read
f = open(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/hps/nobackup/iqbal/zi/Cryptic_releases/nextflow.work/77/f9b9651dc28b949dfcf91257f014f0/gramtools.build_dir/split.0.gramtools_build/perl_generated_vcf'

Work dir:
/hps/nobackup/iqbal/zi/Cryptic_releases/nextflow.work/27/fc27423b35372a9638fbed7c3b4976

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

nextflow script for multisample does not respect cached precluster_small_vars_merge

Having successfully got all the way to adjudication at the end, i reran with -resume, and saw this

7a/87355b] Cached process > process_input_vcf_file (52667)
[07/7479e1] Cached process > process_input_vcf_file (52670)
[e7/e96cb4] Cached process > process_input_vcf_file (52671)
[fa/0df118] Cached process > process_input_vcf_file (52669)
[e0/8a8c50] Cached process > process_input_vcf_file (52668)
[8a/011b05] Cached process > process_input_vcf_file (52673)
[5c/925d24] Cached process > process_input_vcf_file (52672)
[7c/6d669a] Cached process > process_input_vcf_file (52674)
[f5/ee5638] Submitted process > pre_cluster_small_vars_merge

multi-sample pipeline: parallelise (nextflow) chunk building

Currently gramtools build of chunks happens in a python loop, possibly with threads if told to do so.

def run_gramtools_build_on_each_split(self):

this is a massive and unnecessary bottleneck when running on tens of thousands of samples,

This is the build times for the first 600-odd chunks (out of 5000) when running on 50,000 TB genomes. As you can see, the half of the chunks take less than 140 seconds to build. 3 of them take 16 mins (1000sec). 5 of them take over 30 mins.
Screenshot 2019-06-02 at 22 56 01

How to build a manifest.tsv file. Can you provide a sample file?

comma or tab separated? how about pairwise sequencing?
can I use compressed files, like gz or tar.gz?

here is an example of my manifest.tsv, is this ok?

#########################################################################
name vcf reads reads
201550 NGS_Mapping/SNP/gvcf/201550.gvcf.gz 01.Cleandata/201550/201550.1.fq.clean.gz 01.Cleandata/201550/201550.2.fq.clean.gz
183207 NGS_Mapping/SNP/gvcf/183207.gvcf.gz 01.Cleandata/183207/183207.1.fq.clean.gz 01.Cleandata/183207/183207.2.fq.clean.gz
#########################################################################

thanks.

if chunking VCF, check bam is indexed

For chunking, need the bam to be indexed because we're pulling out regions of the bam file. Add check that:

  1. reads file is a BAM file
  2. reads file is indexed (.bai file exists). If not, then make it.

Catch when gramtools fails

Gramtools can fail and minos carries on. Dies later when it doesn't find expected output file from gramtools quasimap.

Use entry points

In setup.py use entry_points=... and use a __main__.py file instead of the current script scripts/minos.

no loop devices available

Hello,

I am reaching out to you because I am encountering an error when running the minos program using the "minos.simg" image, and I am unable to resolve it.

The command I am executing is:

nextflow run -with-singularity minos.simg  nextflow/regenotype.nf  -profile medium --ref_fasta reference.fasta --manifest manifest.tsv --make_distance_matrix 

It appears that all the steps are executed successfully until it reaches the minos part. The error displayed on the screen is as follows:

N E X T F L O W  ~  version 22.04.0
Launching `nextflow/regenotype.nf` [big_chandrasekhar] DSL2 - revision: bfe1b8e0b5
executor >  local (30)
executor >  local (30)
[1c/b2812b] process > parse_manifest                     [100%] 1 of 1 ✔
[38/ad6887] process > make_vcf_for_gramtools:vcf_merge   [100%] 1 of 1 ✔
[0c/cc6992] process > make_vcf_for_gramtools:vcf_cluster [100%] 1 of 1 ✔
[da/135eb4] process > gramtools_build                    [100%] 1 of 1 ✔
[15/3b24ae] process > minos (4)                          [100%] 19 of 19, failed: 18, retries: 12 ✔
[3a/e2191a] process > make_per_sample_vcfs_dir (1)       [100%] 2 of 2, failed: 2, retries: 2
[80/232c20] process > ivcf_merge_chunks (1)              [100%] 1 of 1 ✔
[62/f3d520] process > ivcf_final_merge                   [100%] 3 of 3, failed: 3, retries: 2 ✘
[-        ] process > distance_matrix                    -
[8b/ca4db5] NOTE: Process `make_per_sample_vcfs_dir (1)` terminated with an error exit status (255) -- Execution is retried (1)
[4b/212478] NOTE: Process `make_per_sample_vcfs_dir (1)` terminated with an error exit status (255) -- Execution is retried (2)
[2a/9eeff2] NOTE: Process `ivcf_final_merge` terminated with an error exit status (255) -- Execution is retried (1)
[f7/49c909] NOTE: Process `ivcf_final_merge` terminated with an error exit status (255) -- Execution is retried (2)
Error executing process > 'ivcf_final_merge'

Caused by:
  Process `ivcf_final_merge` terminated with an error exit status (255)

Command executed:

  cat <<"EOF" > files.txt
  work/80/232c20b8928f00a42a91876f6825c3/merged.vcf
  EOF
      ivcfmerge files.txt Compare_prueba/merged.vcf
      touch done_file

Command exit status:
  255

Command output:
  (empty)

Command error:
  FATAL:   container creation failed: mount /proc/self/fd/3->/usr/local/var/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/3: failed to find loop device: could not attach image file to loop device: no loop devices available

Could you please assist me in resolving this issue? Alternatively, do you have any suggestions for possible alternatives?

I have already modified the maximum number of loops available in my computer:

$cat /etc/modules
options loop max_loop=256
8192eu

loop

Thank you in advance.

multi sample pipeline keep zero cov alleles

Do not remove alleles with no coverage on each per-sample run of minos during multi sample pipeline. Otherwise bcftools gets confused and outputs multiple lines where the ALT list is different between samples.

Missing jpeg library when building the Singularity image

When building the singularity image, following the provided instructions, I got the following error.

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-qw7in_h3/pillow/setup.py", line 1037, in <module>
        raise RequiredDependencyException(msg)
    __main__.RequiredDependencyException:

    The headers or library files could not be found for jpeg,
    a required dependency when compiling Pillow from source.

    Please see the install instructions at:
       https://pillow.readthedocs.io/en/latest/installation.html

I could solve it adding the the package libjpeg-dev to the apt-get install command in .ci/install_dependencies.sh

remove cortex MISMAPPED_UNPLACEABLE VCF records

In rare cases, cortex has the filter column as "MISMAPPED_UNPLACEABLE". These are getting through into minos results. Filter these VCF records out right at the start of a minos run because they should not be used.

Install for minos is broken gramtools dependency issue

Install for minos fails due to broken install of gramtools@9313eceb606a6fc159e4a14c168b7a6f888c5ed2

E.g. when building from the Singularity.def in this repo

SUCCESS: sdsl was installed successfully!
    The sdsl include files are located in '/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/include'.
    The library files are located in '/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/lib'.
    
    Sample programs can be found in the examples-directory.
    A program 'example.cpp' can be compiled with the command:
    g++ -std=c++11 -DNDEBUG -O3 [-msse4.2] \
       -I/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/include -L/tmp/pip-of5i099r-build/cmake-build-release/libgramtools/lib \
       example.cpp -lsdsl -ldivsufsort -ldivsufsort64
    
    Tests in the test-directory
    A cheat sheet in the extras/cheatsheet-directory.
    Have fun!
    [ 28%] No install step for 'sdsl'
    [ 29%] No test step for 'sdsl'
    [ 30%] Completed 'sdsl'
    [ 30%] Built target sdsl
    Makefile:83: recipe for target 'all' failed
    make: *** [all] Error 2
    ERROR: gramtools backend compilation returned:  2
    
    ----------------------------------------
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-of5i099r-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-7vv5y6wr-record/install-record.txt --single-version-externally-managed --compile" failed with error code 255 in /tmp/pip-of5i099r-build/
FATAL:   failed to execute %post proc: exit status 1
FATAL:   While performing build: while running engine: while running /usr/local/libexec/singularity/bin/starter: exit status 255

This issue also causes the clockwork container build to fail. See issue [86]

Gramtools v1.8.0 installs correctly. Potentially minos needs to be updated to work with gramtools v1.8.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.