Code Monkey home page Code Monkey logo

mantis's People

Contributors

kautto avatar lgretton avatar lozybean avatar nanakiksc avatar rbonneville avatar tips48 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mantis's Issues

Default thread count

The default thread count isn't getting loaded properly as it should. Needs to default to 1 thread without errors.

Handle 0/1-base length reads

Issues are currently encountered if the script runs into a read with either no quality/sequence data (0 bases), or only data for 1 base. The SAMRead structure needs to be able to handle such reads and report them as erroneous to allow for filtering, instead of crashing due to divide-by-zero errors.

Issue in runni

Hi
The Bed file generated from RepeatFinder is of length 3109671 lines. I tried to use this BED file for MANTIS but after running couple of hours, I often receive the message as Broken pipe. In all the condition I have used 3 threads as per your advice. Could u please help me in resolving this?

score range

hello,I have read this script from a paper named "Performance evaluation for rapid detection of pan-cancer
microsatellite instability with MANTIS" which writed "Once the scores for each locus are assigned,the average of all the locus instability scores is calculated,to provide a single numerical value representing the
average aggregate instability present in the sample. Scores reported range from 0.0 (entirely stable) to 2.0 (entirelyunstable).". However,I find mantis.py that the default value isn't the same,which one above?
Parameter(key='dif_threshold', default=0.4),
Parameter(key='euc_threshold', default=0.187),
Parameter(key='cos_threshold', default=0.07),

Cosine dissimilarity score reported incorrectly

MANTIS reports identical values for cosine dissimilarity and Euclidean distance metrics. Although step-wise difference is recommended above either of those metrics, this should still be fixed.

In README: make RepeatFinder

In the README, it would be nice to say that the RepeatFinder source is in the tools directory and that the user needs to make it.
It's quite simple to realize what's going on as a user, but I think it would be nice to make it explicit anyway.
Apart from that, RepeatFinder compiles and runs perfectly!

Output Files - status and *.txt

Hi!

I have tested your tool on four different bam files. All the runs finished with an empty *.txt files and with no status output file. what is that means?

Thank you

typo in module checking

Dear Mantis developer,

Thanks for developing this tool!
I notice something which might be a typo here:

for module in modules:

I think you should iterate over missing instead of modules.

I had an environment where pysam was present but numpy was absent. In this case, pysam is still reported to be absent.

Thanks for your attention,
Frédéric

help: Error: Specified locus does not appear to be the starting point for kmer

Hi,
I was hoping you could help me to resolve some errors I'm having with running mantis. I'm getting thousands of these messages (almost one per bed entry):

Error: Specified locus does not appear to be the starting point for kmer.

Following the workflow below, the intervals in the BED file started 1bp before the repeat, so I bumped them up by 1, but got the same errors.
The program still seems to run fairly happily though, and with slightly different scores from the 2 BED files (both being unstable in a moderate/high TMB tumour with a suspicious germline MSH6 variant).

Is it my BED (https://gist.github.com/drmjc/d62d9705b4ad7d6909cfb7b622c9d4d6), or something else?

Thanks for looking into this,
Mark

The mantis bedfile was created as per the following:

  1. A 3 column bed file targeting the coding region's microsatellites was downloaded from the mSINGS app (https://bitbucket.org/uwlabmed/msings/src/b8c10cf58cecddb1356f7e9ee1ccbfdc29759314/doc/mSINGS_TCGA.bed?at=master&fileviewer=file-view-default).
  2. Using the RepeatFinder app, a bed file was produced covering the entire genome's content of microsatellites by feeding the app the hs37d5 genome fasta file.
  3. the RepeatFinder bed was run through the included fix_RF_bed_output.py script.
  4. bedtools intersect with the 3 column bed file from step 1, was then used to narrow down the whole genome bed to include ~2700 sites in the
    coding region containing microsatellites. This new file remained in the required format for MANTIS.
  5. The intervals appeared to be start-1.

Code:

./RepeatFinder -i genome.fa -o genome_RepeatFinder.bed
python fix_RF_bed_output.py -i genome_RepeatFinder.bed -o genome_RepeatFinder_fixed.bed
bedtools intersect -a genome_RepeatFinder_fixed.bed -b mSINGS_TCGA.bed > hs37d5_microsatellites.bed
# fix start-1 error
awk -F $"\t" 'BEGIN {OFS=FS} {$2=$2+1; print}' hs37d5_microsatellites.bed > a
mv a hs37d5_microsatellites.bed

Failed to fetch sequence; Error with k-mer repeat count calculations; terminating program.

Hi,
Using MANTIS v1.0.4, I got the following error:

Getting repeat counts for repeat units (k-mers) ...
b'[W::fai_get_val] Reference chr1:10485-10499 not found in file, returning empty sequence\n[11/21/19 10:32:54] MANTIS K-Mer Repeat Counter\n[11/21/19 10:32:54] Loading target MSI loci from BED file ...\nTraceback (most recent call last):\n  File "MANTIS/v1.0.4/kmer_repeat_counter.py", line 859, in <module>\n    msi_loci = mll.load_loci(config[\'bedfile\'])\n  File "MANTIS/v1.0.4/kmer_repeat_counter.py", line 56, in load_loci\n    self.correct_off_by_one_errors(locus)                    \n  File "MANTIS/v1.0.4/kmer_repeat_counter.py", line 75, in correct_off_by_one_errors\n    raw_sequence = self.get_sequence(position)\n  File "MANTIS/v1.0.4/kmer_repeat_counter.py", line 106, in get_sequence\n    for subseq in pysam.faidx(self.genome_path, locus)[1:]:\n  File "MANTIS/v1.0.4/venv/lib/python3.6/site-packages/pysam/utils.py", line 75, in __call__\n    stderr))\npysam.utils.SamtoolsError: \'samtools returned with error 1: stdout=>chr1:10485-10499\\n, stderr=[faidx] Failed to fetch sequence in chr1:10485-10499\\n\'\n'
Error with k-mer repeat count calculations; terminating program.

The bed file that is used as input was created with RepeatFinder with reference genome hs37d5.fa.

Could you help to resolve this issue?

Thanks,

Jessica

Microsatellite Loci BED File

Hello. I have a human tumor bam and normal bam files. I would like to perform MSI test.
I am new to this and I am not sure how to go about making Microsatellite Loci BED File.
Could anyone kindly let me know how I can make one or share with me?
Thanks.

installing MANTIS

Hi. I am new to github. Can you let me know the best simple way to install MANTIS after git clone?
Thanks a lot.

output files

Hello, I might be missing this but I wonder if there is a description/ explanation on the output of Mantis files and the columns? thanks a lot!

Limit coordinate range for requests

Currently, when requesting reads from the input BAM files, a 5 base pair padding is added to the range of coordinates. This can result in a negative coordinate value (e.g. with a start coordinate of 1, the request would start at -4). This will cause an error with Pysam that results in the program crashing. The range of the request coordinates needs to be limited so that only valid (> 0) values are used in requests.

Error: Specified locus does not appear to be the starting point for kmer ...

This seems to be related to #17 and #22.

When I opened #17, MANTIS would crash for our GRCh37 bams (reference used and MS bed file where also generated for this release), where chromosome are not prefix chr. It does not do that anymore apparently (although I don't know which change is responsible for this).

However, we have been noticing the following in the log files:

[05/16/18 14:55:23] Loading target MSI loci from BED file ...
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer GCCC.
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer CCTC.
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer CTG.
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer GCT.
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer GGT.
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer GGA.
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer GT.
[05/16/18 14:55:26] Error: Specified locus does not appear  to be the starting point for kmer T.

Maybe not surprisingly, we get as many warnings/errors in the log file as there are lines in the bed file.

We have compared the results for 50+ samples with an earlier pipeline run that did not print the warnings - all DIF scores match, so there seems to be no problem here and the bed file should be fine.

I suppose there are still some rough edges in the handling of chromosome names.

Support GATK .bai naming

For a BAM file named "file.bam", the GATK pipeline typically generates an index file named "file.bai" (instead of "file.bam.bai"). Could the tool be updated to support that naming convention as well?

Running Mantis on WGS Sample

Input is WGS normal and downsampled tumor paired samples with 24 coverage each.
Can anyone guide me on how to select loci.bed for WGS? My loci.bed has around 2 million lines.
And also I need help to understand the output files.

Thanks,
Olivia

RepeatFinder: recommended minimum k-mer length

The documentation states

-l | Minimum k-mer length (bp). Default: 1

when generating a bed file to use with MANTIS.

My questions are:

  1. How informative are the 1-mer regions? Will performance degrade when using MANTIS just on the 2- to 5-mer bed?
  2. Did anybody benchmark if this improves running time?

Issue in

Hi
I would like to know how long it will take to predict a file in MANTIS, and what is the maximum thread I can use for the program?

MANTIS on WGS samples

Hi,

I am testing MANTIS on a few samples and was wondering if you have tested it on WGS samples?
In the Readme you have recommended thresholds for Whole-Exome usage, but do you have some recommendations for whole-genome usage?

Any help and support is appreciated.

Thank you.

Issue

Hi,
I have tried many times running MANTIS but it is taking lot of time in the first step (Getting repeat counts for repeat units (K-mer)... ) and after two or three hours I receive the message as broken pipe. And I generated the BED file in the specified format using Repeatfinder. Could u help me in resolving the issue?

Is it possible that using MANTIS to analyze RNAseq data?

Hello,

I understand that MANTIS was designed to analyze DNA-based sequencing data, but I'm wondering if RNAseq bam files could be used as input data. Or is there any recommended software for estimating RNAseq MSI status.

Thanks!

Loci in MANTIS output are sorted alphabetically

If sorting the loci, at least the positions within each chromosome block should be sorted (alpha)numerically instead of alphabetically. One could alternatively just take the order of loci from the input bed file.

Thanks and best regards
Malte

Missing RepeatFinder tool

Hi Russell,

I cloned the master repository for MANTIS today but could not locate the RepeatFinder tool. Could you please help me with that?

Thanks!
Gunjan

Issue running MANTIS

Hello,

I have an issue while running MANTIS on exome datasets.
I wrote the following code lines :

python2.7 mantis.py -b ../../Test_MANTIS/genome_scan.bed --genome ../../Test_MSISensor/IndexedGenome/hg19.fa -n ../../Test_MANTIS/constit.cleaned.bam -t ../../Test_MANTIS/somatic.cleaned.bam -o ../../Test_MANTIS/resultats_mantis_hg19.txt -mrq 20.0 -mlq 25.0 -mlc 20 -mrr 1

And I got these printed on my terminal:

Microsatellite Analysis for Normal-Tumor InStability (v1.0.4)
python /Users/chloe/work/MANTIS/kmer_repeat_counter.py
-b /Users/chloe/Test_MANTIS/genome_scan.bed
-n /Users/chloe/Test_MANTIS/albert_constit.cleaned.bam
-t /Users/chloe/Test_MANTIS/albert_somatic.cleaned.bam
-o /Users/chloe/Test_MANTIS/resultats_mantis_albert_hg19.kmer_counts.txt
--min-read-quality 20.0
--min-locus-quality 25.0
--min-read-length 35
--genome /Users/chloe/Test_MSISensor/IndexedGenome/hg19.fa
--threads 1
Getting repeat counts for repeat units (k-mers) ...

Then, after 48h running, I got this error that I don't understand :

Traceback (most recent call last):
File "/Users/chloe/work/MANTIS/kmer_repeat_counter.py", line 866, in
normal = krc.process(config['normal_filepath'], msi_loci, config)
File "/Users/chloe/work/MANTIS/kmer_repeat_counter.py", line 614, in process
self.status_check(queue_out.qsize())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 143, in qsize
return self._maxsize - self._sem._semlock._get_value()
NotImplementedError

I don't get where this error comes from. Could you help me please?

Thanks,
Chloé

How to select MSI loci most predictive of a sample's status ??

Hi to you ,
I am new to MSI and MANTIS .

I have had a BED file with recording MS loci , but I wanna reduce the number of this MS loci for the bigger accuracy of MANTIS ,because MANTIS would be affected by the loci number .

I have read MANTIS paper , you said like this ,"To assess the effect of considering different
numbers and selective microsatellite loci on MSI analysis, we identified the 10, 20, 30, 40, 50, 100, 250,500 and 1000 loci most predictive of a sample’s status across COAD/READ, UCEC and STAD cohorts, for mSINGS, MSISensor and MANTIS ",
so could you tell me the detailed introduction about how to select the most predictive MSI loci ?
what alorgthims did you use ?

I heard some people just used 22 MSI loci to evaluate their tumor samples , but I don't know how they select these 22 loci , do you know this ?

Best

Error with k-mer repeat count calculations; terminating program, which seems to be triggered by "_multiprocessing.SemLock Permission Denied"

Hello!

Sorry to bother you, and thanks for any response.

I am trying the latest version mantis v1.0.5 ( ps, which in its log shows v1.0.4 ), and came across the following error:

Mon Oct 18 12:21:13 CST 2021
Microsatellite Analysis for Normal-Tumor InStability (v1.0.4)
/home/dna/pipe_MSIsensor-pro/environments/envs/msisensor_v1/bin/python /home/dna/pipe_MSIsensor-pr
o/environments/sbin/MANTIS-1.0.5/kmer_repeat_counter.py
-n /data01/huangrc/out/02_realign/NP01-B-Blood-wes.recalibrated.bam
-t /data01/huangrc/out/02_realign/NP01-B-wes.recalibrated.bam
-b /data01/huangrc/out_mantis2/01_reference/reference.bed
-o /data01/huangrc/out_mantis2/02_msi/NP01-B-wes.mantis.kmer_counts.txt
--min-read-quality 20.0
--min-locus-quality 25.0
--min-read-length 35
--genome /home/dna/data/gatk_resource_bundle/hg19/ucsc.hg19.fasta
--threads 8
Getting repeat counts for repeat units (k-mers) ...
b'/home/dna/pipe_MSIsensor-pro/environments/sbin/MANTIS-1.0.5/kmer_repeat_counter.py:149: SyntaxWarning: "is" with a literal. Did you mean "=="?\n if (offset is 0) or read.seq[0:offset] == locus.kmer[offset:]:\n/home/dna/pipe_MSIsensor-pro/environments/sbin/MANTIS-1.0.5/kmer_repeat_counter.py:474: SyntaxWarning: "is" with a literal. Did you mean "=="?\n if self.debug_output and (n % 10000 is 0):\n/home/dna/pipe_MSIsensor-pro/environments/sbin/MANTIS-1.0.5/kmer_repeat_counter.py:530: SyntaxWarning: "is" with a literal. Did you mean "=="?\n if qsize is 0:\n/home/dna/pipe_MSIsensor-pro/environments/sbin/MANTIS-1.0.5/kmer_repeat_counter.py:612: SyntaxWarning: "is" with a literal. Did you mean "=="?\n if loop_counter % proc_check_interval is 0:\n[10/18/21 12:21:18] MANTIS K-Mer Repeat Counter\n[10/18/21 12:21:18] Loading target MSI loci from BED file ...\n[10/18/21 13:29:55] Loaded 2993651 loci.\n[10/18/21 13:29:55] Processing normal input file with 8 thread(s) ...\nTraceback (most recent call last):\n File "/home/dna/pipe_MSIsensor-pro/environments/sbin/MANTIS-1.0.5/kmer_repeat_counter.py", line 868, in \n normal = krc.process(config['normal_filepath'], msi_loci, config)\n File "/home/dna/pipe_MSIsensor-pro/environments/sbin/MANTIS-1.0.5/kmer_repeat_counter.py", line 560, in process\n queue_out = Queue()\n File "/home/dna/pipe_MSIsensor-pro/environments/envs/msisensor_v1/lib/python3.8/multiprocessing/context.py", line 103, in Queue\n return Queue(maxsize, ctx=self.get_context())\n File "/home/dna/pipe_MSIsensor-pro/environments/envs/msisensor_v1/lib/python3.8/multiprocessing/queues.py", line 42, in init\n self._rlock = ctx.Lock()\n File "/home/dna/pipe_MSIsensor-pro/environments/envs/msisensor_v1/lib/python3.8/multiprocessing/context.py", line 68, in Lock\n return Lock(ctx=self.get_context())\n File "/home/dna/pipe_MSIsensor-pro/environments/envs/msisensor_v1/lib/python3.8/multiprocessing/synchronize.py", line 162, in init\n SemLock.init(self, SEMAPHORE, 1, 1, ctx=ctx)\n File "/home/dna/pipe_MSIsensor-pro/environments/envs/msisensor_v1/lib/python3.8/multiprocessing/synchronize.py", line 57, in init\n sl = self._semlock = _multiprocessing.SemLock(\nPermissionError: [Errno 13] Permission denied\n'
Error with k-mer repeat count calculations; terminating program.
Finishing mantis msi
Mon Oct 18 13:30:13 CST 2021

after running for 1 hour or so.

I found a similar thread in #21, but it seems my error is a little different from it, and triggered by "_multiprocessing.SemLock Permission denied"

my python version is 3.8.0, pysam==v0.17.0, numpy==v1.21.2

And my server has /home/shm set to be 777. And several other BAMs have successfully got results in the same server.

ps, I am trying --threads 1 too

Looking forward to any response, and thanks very much!!

Error with k-mer repeat count calculations; terminating program.

Hi,
I just installed the latest version and ran on Python 3.5. I had the following error message:
Microsatellite Analysis for Normal-Tumor InStability (v1.0.4)
python /home/marissa/MANTIS-master/kmer_repeat_counter.py
-b /home/marissa/MANTIS-master/msi.test.bed
-n /home/marissa/MANTIS-master/Sample-N1-EX-KD.bam
-t /home/marissa/MANTIS-master/Sample_T1-EX-KD.bam
-o /home/marissa/MANTIS-master/msi.kmer_counts.txt
--min-read-quality 25.0
--min-locus-quality 30.0
--min-read-length 35
--genome /Volumes/Data/References/HiSeqAnalysisSoftware_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa
--threads 1
Getting repeat counts for repeat units (k-mers) ...
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/local/lib/python3.5/encodings/init.py", line 31, in
import codecs
File "/usr/local/lib/python3.5/codecs.py", line 95
*, _is_text_encoding=None):
^
SyntaxError: invalid syntax
Error with k-mer repeat count calculations; terminating program.

here is how the msi.test.bed file looks like:

chr1 9780710 9780720 (GAG)3 0 +
chr1 9780921 9780931 (GCT)3 0 +
chr1 9786990 9787000 (GAG)3 0 +
chr1 11182070 11182082 (TCT)4 0 +
chr1 11190676 11190686 (GTG)3 0 +
chr1 11190751 11190761 (TTC)3 0 +
chr1 11206852 11206863 (AC)5 0 +

Could you let me know how to solve the issue?
Thanks.

Marissa

Do not prefix chromosome names with "chr" per default

Hi,

I am trying to run MANTIS and I think forcing chromosome names to start with "chr" is not a good idea.

My reference fasta, my bam files and the bed file I generated with RepeatFinder do not have them, so adding them crashes MANTIS.

Quick workaround if anybody has got the same problem

diff --git a/kmer_repeat_counter.py b/kmer_repeat_counter.py
index fc22d02..753b1b3 100755
--- a/kmer_repeat_counter.py
+++ b/kmer_repeat_counter.py
@@ -46,7 +46,8 @@ class MSILocusLoader:
                         locus = MSILocus(line)
                         if locus.chromosome[0:3] != 'chr':
                             # Force-prepend the chr prefix
-                            locus.chromosome = 'chr{0}'.format(locus.chromosome)
+                            #locus.chromosome = 'chr{0}'.format(locus.chromosome)                                                
+                            pass                                                                                                                                                          
                                                                                                                                                                                             
                         # Correct any off-by-one errors that may occur because of                                                                                                           
                         # unstandardized open- and closed-endedness of bed file coordinates.

MANTIS on Mouse samples

Hi,

I was wondering if you have tested MANTIS on mouse samples?
Also I was thinking since Mantis uses the MS regions generated by RepeatFinder and gets the distribution of the repeats at those locations in the input tumor and normal samples, it should behave similarly for Mouse also, right?

Please let me know if I am assuming something incorrectly.
Your support and suggestions are much appreciated.

Thank you!

Add the support of TUMOR-only sample

Hi.

Can we add a option to run MANTIS with only a tumor sample?

The normal-tumor pair will have better result, but the tumor-sample-only mode will be Simpler ?

Best regards

interpreter mismatch

it causes bug when is in the multi-interpreter environment;

For example, there are two interpreters: python3 and python, but only python3 install numpy, calculate_instability.py will crash with exception: no module named numpy.

About MS loci BED file used in MANTIS

Hi,
I’m not familiar with MANTIS.When running MANTIS on WGS data,I don’t know how to generate the proper MS loci BED file.Because I use Repeatfinder to generate the bed file including MS loci through entire genome. As the answer in:
#40 (comment)
I shouldn’t use the MS loci BED file through entire genome.But I don’t have capture regions(my data is WGS data) or interested region.How can I filter my MS loci bed?
I want to know if these two approaches work:
1)select protein coding gene region
2)select MS loci that MANTIS paper uses for TCGA data
Are there any approaches for WGS data?
Thanks!

Remove the step of correct off-by-one error in MANTIS ?

HI,

These codes below are from MANTIS ,which is used to correct off-by-one errors .

I think it's unnecessary to do this step ,because if the off-by-one error happened , it means the step to prepare the input BED file got something wrong . we don't know what problems or matters accounts for the off-by-one error or other errors that caused by some unknow reasons.

So I think if MANTIS find that the sequence extraced from the reference genome cannot match K-mer , MANTIS should warn the user to attention and check their process of preparing BED input file , instead of simply regarding this situation as an off-by-one error.

Perhaps MANTIS has other consideration that I don't know for this .
Hope it's a good suggestion .

` def correct_off_by_one_errors(self, locus):
# Generate the locus position by adding a 1 basepair padding around the locus
# coordinates, which accounts for the off-by-one errors.

    position = '{0}:{1}-{2}'.format(locus.chromosome,   locus.start - 1,  locus.end + 1)
    raw_sequence = self.get_sequence(position)
    #accommodate newer versions of PySam, which causes this to return chromosome and position
    raw_sequence = raw_sequence.split(":")[-1] # '109345678-109345682ATTTT'
    #strip_coord_re = re.compile(r'^[\d\-]+')
    sequence = MSILocusLoader.strip_coord_re.sub("", raw_sequence) # 'ATTTT'
    # kmer_length = len(self.kmer)  # 1
    if sequence[1:1+locus.kmer_length] != locus.kmer:
        # Sequence doesn't start where expected; shift accordingly.
        if sequence[0:locus.kmer_length] == locus.kmer:
            # Shift back by one
            locus.start -= 1
            locus.end -= 1
        elif sequence[2:2+locus.kmer_length] == locus.kmer:
            # Shift forward by one
            locus.start += 1
            locus.end += 1
        else:
            tprint('Error: Specified locus does not appear '
                + ' to be the starting point for kmer {kmer}.'.format(
                    kmer=locus.kmer))`

Check for Pysam availability

The main script needs to check for Pysam being available as a module in the environment and exit gracefully if it isn't found.

Handle missing chromosomes

K-mer repeat counting currently fails if a chromosome that is not present in the BAM file is requested, due to Pysam returning an exception. The script should check for available chromosomes before making a request to pysam.AlignmentFile.fetch()

can't finish the process

Hi,

I have some problems with MANTIS. I can't finish the process. I used WGS data (70 Gb - normal and 150 Gb - tumor) and 3 mln rows of loci bed basing on hg38 genome.

Also I used only chr16 from bam files (~2-3 Gb) and loci.bed for chr16 (100K) lines, but my process can't finish.

Though?

Microsatellite Analysis for Normal-Tumor InStability (v1.0.4)
/usr/bin/python3 /home/ubuntu/1/MANTIS/kmer_repeat_counter.py
-b /home/ubuntu/1/MANTIS/loci.bed2
-n /home/ubuntu/1/MANTIS/DNASeq_blood.sorted.chr16.bam
-t /home/ubuntu/1/MANTIS/DNASeq_cancer.sorted.chr16.bam
-o /home/ubuntu/1/MANTIS/1.kmer_counts.txt
--min-read-quality 25.0
--min-locus-quality 30.0
--min-read-length 35
--genome /home/ubuntu/1/MANTIS/hg38.fa
--threads 64
Getting repeat counts for repeat units (k-mers) ...

Regards, Maxim

chrM

Add in support for other chromosomes (e.g. chrM), instead of only having support for chromosomes 1-22 and X/Y.

Repeatfinder missing?

Hi,

Can you pint me towards the repeat finder program? I can't see it in the repository.

Thanks

Dan

k-mer repeat count Error

Hi,
I got following error:
Microsatellite Analysis for Normal-Tumor InStability (v1.0.3) python /gpfs/gsfs6/users/MoCha/patidarr/MANTIS/kmer_repeat_counter.py \ -b /data/MoCha/patidarr/MANTIS/loci.bed \ -n /gpfs/gsfs6/users/MoCha/processedDATA/714841/20170910/714841_germline~WES/714841_germline~WES.bwa.final.bam \ -t /gpfs/gsfs6/users/MoCha/processedDATA/714841/20170910/714841~288-R~KV6MF7NX7~WES/714841~288-R~KV6MF7NX7~WES.bwa.final.bam \ -o /gpfs/gsfs6/users/MoCha/processedDATA/714841/20170910/714841~288-R~KV6MF7NX7~WES/MANTIS/714841~288-R~KV6MF7NX7~WES.kmer_counts.txt \ --min-read-quality 25.0 \ --min-locus-quality 30.0 \ --min-read-length 35 \ --genome /data/MoCha/patidarr/ref/ucsc.hg19.fasta \ --threads 20 Getting repeat counts for repeat units (k-mers) ... Error with k-mer repeat count calculations; terminating program.

Could you please let me know how do I resolve it?

Thanks,
Rajesh

RAM usage on targeted reads

I noticed my processes would start failing after it finished running kmer_repeat_counter.py for several hours because mantis.py would consume all of the available system memory plus the entire swapfile. I ran this on a server with 88 processing threads and 192GB of RAM. I ran with the --threads 88 option.

I am analyzing targeted sequencing reads covering about 0.9 Mb, all CDS. I am also using a pool of normals as my normal control since we don't have the normal tissue for these samples.

Also I generated my loci.bed file from the GRCh37.fa reference genome.

How much RAM does mantis.py need per thread?

`structures.py`: TabError: inconsistent use of tabs and spaces in indentation

I suppose that this is the same bug as #10 on python3.

Microsatellite Analysis for Normal-Tumor InStability (v1.0.4)
python /fast/users/messersc_c/dev/MANTIS/kmer_repeat_counter.py \
-b /fast/groups/cubi/projects/biotools/Mantis/appData/hg19/loci.bed \
-n /fast/projects/DKTK_Master/ngs_mapping/output/bwa.MTK01-germline-DNA1-WES1/out/bwa.MTK01-germline-DNA1-WES1.bam \
-t /fast/projects/DKTK_Master/ngs_mapping/output/bwa.MTK01-tumor-DNA1-WES1/out/bwa.MTK01-tumor-DNA1-WES1.bam \
-o /fast/projects/DKTK_Master/mtk_but_not_master_pipeline/MANTIS_test/results.kmer_counts.txt \
--min-read-quality 20.0 \
--min-locus-quality 25.0 \
--min-read-length 35 \
--genome /fast/projects/cubit/current/static_data/reference/GRCh37/g1k_phase1/human_g1k_v37.fasta \
--threads 4 
Getting repeat counts for repeat units (k-mers) ...
Traceback (most recent call last):
  File "/fast/users/messersc_c/dev/MANTIS/kmer_repeat_counter.py", line 13, in <module>
    from structures import SAMRead, CIGAR, Locus, MSILocus
  File "/fast/users/messersc_c/dev/MANTIS/structures.py", line 120
    def to_array(string):
                        ^
TabError: inconsistent use of tabs and spaces in indentation
Error with k-mer repeat count calculations; terminating program.

What worked for me is opening the file with vim and running :retab. I was able to run MANTIS successfully afterwards. See #19

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.