compmetagen / strainest Goto Github PK

StrainEst - abundance estimation of strains

License: GNU General Public License v3.0

Python 2.55% Makefile 0.96% TeX 1.07% HTML 18.56% Gnuplot 0.04% Perl 13.40% Shell 0.26% Awk 0.06% C 22.62% C++ 39.54% Perl 6 0.43% Dockerfile 0.08% Common Workflow Language 0.44%

strainest's People

Contributors

Stargazers

Watchers

Forkers

marcomeola ys4 zpnew shitou-6 carden24 liaoherui

strainest's Issues

count_coverage() takes at least 1 positional argument

Hi,

Thanks for your reply on how to build your own database, I'm gonna try to do it but first I wanted to run the test provide with strainest, but when I have tried to run it:

strainest est P_acnes/snp_clust.dgrp reads.sorted.bam outputdir

I got the following error:

Traceback (most recent call last):
  File "/home/people/name/.conda/envs/strainest-env/bin/strainest", line 11, in <module>
    load_entry_point('strainest==1.2', 'console_scripts', 'strainest')()
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/strainest/scripts/strainest_cmd.py", line 180, in est
    threads)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/strainest/api/_est.py", line 138, in est
    counts = get_counts(align, positions, quality_thr)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/strainest/api/_est.py", line 47, in get_counts
    quality_threshold=quality_threshold)
  File "pysam/libcalignmentfile.pyx", line 1438, in pysam.libcalignmentfile.AlignmentFile.count_coverage
TypeError: count_coverage() takes at least 1 positional argument (0 given)

I have the following files in the folder:

90318631 Apr 10 15:15 reads.sam
30239759 Apr 10 15:15 reads.bam
28391376 Apr 10 15:15 reads.sorted.bam
21760 Apr 10 15:15 reads.sorted.bam.bai

I also have:
head P_acnes/snp_clust.dgrp
NC_006085.1,Ref,GCF_000735065.1_PacnesHL201PA1v01_genomic.fna,GCF_000376705.1_ASM37670v1_genomic.fna,GCF_000144795.1_ASM14479v1_genomic.fna,GCF_000144105.1_ASM14410v1_genomic.fna,GCF_000194905.1_ASM19490v1_genomic.fna,GCF_001469615.1_ASM146961v1_genomic.fna,GCF_000145435.1_ASM14543v1_genomic.fna,GCF_001297145.1_ASM129714v1_genomic.fna,GCF_000144465.1_ASM14446v1_genomic.fna,GCF_000144325.1_ASM14432v1_genomic.fna,GCF_000145555.1_ASM14555v1_genomic.fna,GCF_000145535.1_ASM14553v1_genomic.fna,GCF_000735055.1_PacnesHL202PA1v1.0_genomic.fna,GCF_000240035.1_ASM24003v1_genomic.fna,GCF_000178075.1_ASM17807v1_genomic.fna,GCF_001469595.1_ASM146959v1_genomic.fna,GCF_000144005.1_ASM14400v1_genomic.fna,GCF_001660955.1_ASM166095v1_genomic.fna,GCF_000302515.1_ASM30251v1_genomic.fna,GCF_000144425.1_ASM14442v1_genomic.fna
22,A,C,A,A,C,A,A,A,C,C,A,A,A,C,A,A,A,C,A,A,A
50,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,T,A
60,G,A,G,G,A,G,G,G,A,A,G,G,G,A,G,G,G,A,G,G,G
68,C,T,C,C,T,C,C,C,T,T,C,C,C,T,C,C,C,T,C,C,C
80,T,C,T,T,T,T,T,T,C,T,T,T,T,T,T,T,T,T,T,T,T
83,G,A,G,G,A,G,G,G,A,A,G,G,G,A,G,G,G,A,G,G,G
92,T,C,T,T,T,T,T,T,C,T,T,T,T,T,T,T,T,T,T,T,T
100,G,G,A,G,G,G,G,A,G,G,G,G,G,G,G,G,G,G,G,G,A
103,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T

Do you know what is going on?

Thank you very much in advance.

Help with reference preperation

Hi, I am having difficulty preparing a custom database. Is this tool still being maintained?

I downloaded the reference data from ftp://ftp.fmach.it/metagenomics/strainest/ref2/ but, I don't see the bowtie database. I did not see any instruction on how to get from these file to the ones similar to the reference used in the tutorial: ftp://ftp.fmach.it/metagenomics/strainest/ref/pacnes.tar.gz

Thanks.
sp

Draft genome as reference genome

Hi,

I was gonna run:

strainest mapgenome genome1.fna genome2.fna reference.fna mapped.fna

including 20 draft genomes and my reference genome which is also a draft genome (it has 45 contigs). In --help it says:

"Align one or more genomes to a reference genome. Only the first sequence
in the reference genome is considered. Input and output files must be in
FASTA format."

I understand that it is gonna use only the first contig of my reference genome, isn't it?
If that is the case, is there any way to use all the contigs? Or do I have to join my contigs and manually include some NNN in between?

What would you recommend me?

Thank you very much in advance.

error in strainest mapgenomes when preparing the reference

faced ERROR1 when running "strainest mapgenomes R2.fna R3.fna R1.fna map.fna"(strainest was installed on CentOS Linux 7 (Core), Linux 3.10.0-957.27.2.el7.x86_64 )
then faced ERROR2 when trying to solve the problem by reinstall strainest with docker just as davidealbanese said on page "#1"

ERROR1 report just as follows:
(base) [zhurj@mnhead repfna]$ strainest mapgenomes R2.fna R3.fna R1.fna map.fna

Traceback (most recent call last):
File "/home/zhurj/share/miniconda3/bin/strainest", line 11, in
load_entry_point('strainest==1.2.4', 'console_scripts', 'strainest')()
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/click/core.py", line 764, in call
return self.main(args, kwargs)
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(args, kwargs)
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/strainest/scripts/strainest_cmd.py", line 56, in mapgenomes
record = strainest.api.mapgenome(genome, reference, mapped)
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/strainest/api/_mapgenome.py", line 80, in mapgenome
mar = strainest.mummer.MummerAlignmentReader(align_handler)
File "/home/zhurj/share/miniconda3/lib/python3.7/site-packages/strainest/mummer.py", line 37, in init**
if line.startswith('-- Alignments between'):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

ERROR2 report as follows:
Using default tag: latest
Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?). Using system default: https://index.docker.io/v1/
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Are there any other solutions to fix the probem? thanks very much

B. longum database, error "invalid value encountered in double_scalars"

Hello,

thanks for providing this tool! I used the provided B. longum database to test the presence of B. longum strains in metagenomic samples. After running the strainest est command on a single sample against the database, results were output as expected but the script terminated with the following error:

/usr/lib/python2.7/dist-packages/scipy/stats/stats.py:3029: RuntimeWarning: invalid value encountered in double_scalars
  r = r_num / r_den

The info.txt looked like this

Filename	MinDepth	MaxDepth	NPos	NGen	Alpha	MSEAve	MSEStd	NLassoIter	R	PVal
probioticMappedBifido.sorted.bam	8.0	22.0	63996	1	0.009529665531520763	0.0006187400807969682	8.779356743394051e-05	2	nan	1.0

In the results directory, the abund.txt, max_ident.txt, counts.txt, info.txt, and mse.pdf files were produced as expected.

How should I interpret this error message?

Reference database preparation

I would like to test Strep. salivarius, for which you have provided a reference database, however the description in the paper is quite high level and would seem to require more experience/expertise in this area to implement.

Regarding #2, is there any progress on a tool to assist with reference database preparation? Or alternatively, a worked example?

Many thanks,

Andrew

ftp link to download reference databases is broken

Hi,

Is the ftp link to download reference databases broken? I am not able to connect to ftp server - ftp.fmach.it with any of these links -

ftp://ftp.fmach.it/metagenomics/strainest/ref2/
ftp://ftp.fmach.it/metagenomics/strainest/ref/pacnes.tar.gz

Thanks,

DeprecationWarning error

Hi, I got the following error message when running the test code strainest est P_acnes/snp_clust.dgrp reads.sorted.bam outputdir

/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
File "/usr/local/bin/strainest", line 11, in
load_entry_point('strainest==1.2', 'console_scripts', 'strainest')()
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/strainest/scripts/strainest_cmd.py", line 180, in est
threads)
File "/usr/local/lib/python2.7/dist-packages/strainest/api/_est.py", line 138, in est
counts = get_counts(align, positions, quality_thr)
File "/usr/local/lib/python2.7/dist-packages/strainest/api/_est.py", line 47, in get_counts
quality_threshold=quality_threshold)
File "pysam/libcalignmentfile.pyx", line 1438, in pysam.libcalignmentfile.AlignmentFile.count_coverage
TypeError: count_coverage() takes at least 1 positional argument (0 given)

How can I fix it? Thank you.

No compatible genomes available

Hi,

I have been trying to run strainest but I having some issues. Here I explain what I have done so far:

I have 24 samples. I recovered more than 100 draft genomes from those samples. I'm interested in 10 of them. For each of this representative draft genomes I also have around 5-15 related draft genomes.
I created I database following the steps in the paper and the advices given in the other question I asked in github about how to create a reference database.

Then I followed the rest of the steps (mapgenomes, mapping, sorting, and so on...). And finally I run strainest est ... for each clust.dgrp and each sort.bam file.

The problem is that in all the abund.txt file each "strain" for each sample has 0.0 value.

In the error file it says:

No compatible genomes available. No compatible genomes available. No compatible genomes available...

In one case, it also says:

/home/people/name/.local/lib/python2.7/site-packages/scipy/stats/stats.py:3003: RuntimeWarning: invalid value encountered in double_scalars
  r = r_num / r_den

The other files look like:

counts.txt:

genomeA_sample_1_counts.txt

Pos	A	C	G	T
11539	0	11	0	2
11790	0	0	7	0
11916	0	0	4	0
12458	0	5	0	0

genomeC_sample_4_counts.txt

Pos	A	C	G	T
842	25	0	0	0
860	13	8	0	0
928	0	0	8	0
929	0	0	0	8

max_ident.txt:

OTU   sample2_genomeE_sort.bam
genomeE1.fa 0.661305
genomeE2.fa 0.488562
genomeE3.fa 0.515173
genomeE4.fa 0.492763
genomeE5.fa 0.628793
genomeE6.fa 0.479861
genomeE7.fa 0.533466
genomeE8.fa 0.492381
genomeE9.fa 0.479224
genomeE10.fa 0.473707
genomeE11.fa 0.534655

OTU   sample12_genomeG_sort.bam
genomeG1.fa 0.494239
genomeG2.fa 0.486111
genomeG3.fa 0.485082
genomeG4.fa 0.505453
genomeG5.fa 0.628189
genomeG6.fa 0.489712
genomeG7.fa 0.466872
genomeG8.fa 0.454938
genomeG9.fa 0.484465
genomeG10.fa 0.488066
genomeG11.fa 0.507716
genomeG12.fa 0.476132

here several info.txt

Filename	MinDepth	MaxDepth	NPos	NGen	Alpha	MSEAve	MSEStd	NLassoIter	R	PVal
sample15_genomeC_sort.bam	2.0	52.0	39443	1	0.024818219956205072	0.14995312845727451	0.0006160129318127253	2	nan	1.0

Filename	MinDepth	MaxDepth	NPos	NGen	Alpha	MSEAve	MSEStd	NLassoIter	R	PVal
sample2_genomeF_sort.bam	1.0	13.0	26640	0	nan	nan	nan	nan	nan	nan

Filename	MinDepth	MaxDepth	NPos	NGen	Alpha	MSEAve	MSEStd	NLassoIter	R	PVal
sample23_genomeAsort.bam	1.0	19.0	27382	0	nan	nan	nan	nan	nan	nan

Filename	MinDepth	MaxDepth	NPos	NGen	Alpha	MSEAve	MSEStd	NLassoIter	R	PVal
sample4_genomeB_sort.bam	1.0	32.0	24019	0	nan	nan	nan	nan	nan	nan

Filename	MinDepth	MaxDepth	NPos	NGen	Alpha	MSEAve	MSEStd	NLassoIter	R	PVal
sample13_genomeD_sort.bam	1.0	30.0	3911	0	nan	nan	nan	nan	nan	nan

Filename	MinDepth	MaxDepth	NPos	NGen	Alpha	MSEAve	MSEStd	NLassoIter	R	PVal
sample6_genomeH_sort.bam	1.0	42.0	19963	0	nan	nan	nan	nan	nan	nan

Do you have any idea about why I'm not getting any value in the abund.txt files?

Thank you very much in advance.

Pan Genome?

Is there a way to use StrainEST for pan genomes?

How to interpret the warning message "WARNING: the maximum identity threshold is <0.99"

Hi there,

Many thanks for your work on strainest. Could you give a short explanation on how to interpret the following warning message?

WARNING: the maximum identity threshold is <0.99 and StrainEst has inferred a mixture of strains. The mixture of strains could be a single strain with no available reference genome. Please check the file counts.txt.

In our case, we get this when we use strainest on a low-complexity community of a certain bacterial species (2-4 strains) where the strains are quite similar to each other in terms of their genomes sequences. We do have the genome sequences available which we use in the strainest analysis.

Thanks,
Ali

operands could not be broadcast

I have tried to run it again with the new version 1.1.3 and now I get a different error:

strainest est P_acnes/snp_clust.dgrp reads.sorted.bam outputdir

Traceback (most recent call last):
  File "/home/people/name/.conda/envs/strainest-env/bin/strainest", line 11, in <module>
    load_entry_point('strainest==1.2.2', 'console_scripts', 'strainest')()
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/strainest/scripts/strainest_cmd.py", line 180, in est
    threads)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/strainest/api/_est.py", line 138, in est
    counts = get_counts(align, positions, quality_thr)
  File "/home/people/name/.conda/envs/strainest-env/lib/python2.7/site-packages/strainest/api/_est.py", line 48, in get_counts
    counts[:, i] += np.asarray(ref_count, dtype=np.int64).flatten()
ValueError: operands could not be broadcast together with shapes (4,) (10240976,) (4,)

S aureus reference

Hi,
In an earlier version of the tool, there was a precompiled reference for S. aureus, but I can't seem to find it anymore.

Is there any reason why you decided to remove it? Or is it just missing by mistake?

Insufficient number of positions covered (<2)

Hi,
I've been trying to run strainest using the pre-built database for E coli. But I keep getting the same error message in the title.

I generated the bowtie2 index of the map_align.fasta provided, then ran the code as in the readme, using the snv.txt file as the snp matrix.

bowtie2 --very-fast --no-unal -p $THREADS --mm -x $BOWTIE_DB -1 $READ1 -2 $READ2 -S ${SAMPLE}.sam

samtools view -b ${SAMPLE}.sam | samtools sort -o ${SAMPLE}.sorted.bam

samtools index ${SAMPLE}.sorted.bam

strainest est $SNPMATRIX ${SAMPLE}.sorted.bam $OUTDIR -t $THREADS

The only output I get from strainest est is this.
Insufficient number of positions covered (<2)

Any idea as to what it might be?

Thank you in advance.

Reference Database preparation

Hi Davide,

I'm trying to build a custom database for Bifidobacterium longum, and I'm following the steps you used according to the 'Methods' section in your paper. Although I know you offer a processed database for this bacteria, I want to do it by myself cause I will need to study bacterias for which you don't provide processed database.

I think the problem is my lack of understanding about the files format and use. I'm able to run all the steps until that where you map the 'representative genomes for the metagenome alignment' (A1, A2...) against the representative sequence (SR, chosen according to NCBI).

strainest mapgenomes A1.fasta A2.fasta SR.fasta MA.fasta

The problem is I don't know which are these 'representative genomes for the metagenome alignment'.

From your Methods section: "iii) for each cluster, the genome with the lowest average distance from the members of its group was chosen as a representative (A1,…, A10, see Fig. 1d)." First of all, I am not able to find a file that says which assemblies compose each cluster, then I can't select a representative assembly for every cluster. My 'clusters.txt' file doesn't seems to me to be the actual file I'm looking for, cause there are always three columns, both first with an assembly code and the third with a value, that I guess is a the distance between both assemblies. Are these assemblies indeed their cluster representant?

I'm sure you can help me to understand this. Also, I can provide any file you want to check.

Many thanks in advance! 😃

KeyError: "['Ref'] not found in axis" when using strainest snpdist

Hello,

I have been attempting to create a custom database for E. coli using approx 1250 genomes on ncbi (complete genomes, < 0.05 mash distance from SR, excluded overly fragmented genomes) for SNV profiling. I have a separate 65 rep. genomes spanning the phlyogroups curated internally by my group that I plan to use for alignment.

I managed to run mapgenomes and map2snp steps below with less trouble. In the snpdist step however, I get a keyerror.

Any advice on handling this error? Thank you!

strainest mapgenomes ${SEQDIR}/*.fasta ${SEQDIR}/GCA_000005845.2_ASM584v2_genomic.fasta ${SEQDIR}/MR.fasta

strainest map2snp ${SEQDIR}/GCA_000005845.2_ASM584v2_genomic.fasta ${SEQDIR}/MR.fasta ${SEQDIR}/snp.dgrp

strainest snpdist ${SEQDIR}/MR.fasta ${SEQDIR}/snp.dgrp ${SEQDIR}/snp_dist.txt

Traceback (most recent call last):
  File "/mnt/transient_nfs/conda/envs/strainest/bin/strainest", line 11, in <module>
    load_entry_point('strainest==1.2.4', 'console_scripts', 'strainest')()
  File "/mnt/transient_nfs/conda/envs/strainest/lib/python2.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/mnt/transient_nfs/conda/envs/strainest/lib/python2.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/mnt/transient_nfs/conda/envs/strainest/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/mnt/transient_nfs/conda/envs/strainest/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/mnt/transient_nfs/conda/envs/strainest/lib/python2.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/mnt/transient_nfs/conda/envs/strainest/lib/python2.7/site-packages/strainest/scripts/strainest_cmd.py", line 109, in snpdist
    strainest.api.snpdist(snp, dist, hist)
  File "/mnt/transient_nfs/conda/envs/strainest/lib/python2.7/site-packages/strainest/api/_snpdist.py", line 30, in snpdist
    snp.drop('Ref', axis=1, inplace=True)
  File "/home.roaming/s4187725/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 3940, in drop
    errors=errors)
  File "/home.roaming/s4187725/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3780, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home.roaming/s4187725/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 3812, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home.roaming/s4187725/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 4965, in drop
    '{} not found in axis'.format(labels[mask]))
KeyError: "['Ref'] not found in axis"```

I am running strainest on a conda environment with python 2.7 on Linux.

Python 3 error in mapgenomes

I have encountered the following error when using the mapgenomes function

Traceback (most recent call last):
  File "/shared/software/anaconda3/bin/strainest", line 11, in <module>
    load_entry_point('strainest==1.2', 'console_scripts', 'strainest')()
  File "/shared/software/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/shared/software/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/shared/software/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/shared/software/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/shared/software/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/shared/software/anaconda3/lib/python3.6/site-packages/strainest/scripts/strainest_cmd.py", line 56, in mapgenomes
    record = strainest.api.mapgenome(genome, reference, mapped)
  File "/shared/software/anaconda3/lib/python3.6/site-packages/strainest/api/_mapgenome.py", line 80, in mapgenome
    mar = strainest.mummer.MummerAlignmentReader(align_handler)
  File "/shared/software/anaconda3/lib/python3.6/site-packages/strainest/mummer.py", line 37, in __init__
    if line.startswith('-- Alignments between'):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

I have tried running the command using both python 2.7 and 3.6 to no avail.

Do you have any suggestions on how to solve this?

Thanks,

Calum

Empty SNV matrix, DB creation

Hello,
thanks for providing the tool, so far I am testing it for campylobacter strains.
I was able to run the tool with the docker container for a reduced database that I built (just selecting 10 reference strains). I then looked into building an actual database and this is where I am stuck right now.
I clustered mash-distances of ~3500 reference strains reducing them to around 650 species-specific representatives (for clusters that are not diverigng too much within MLST profiles).
The mapgenomes step works (it takes a while and creates a 1.1G MR.fasta file), but then when I run strainest map2snp SR.fa MR.fasta snp.dgrp, the step runs through in 2 minutes without an error but the file snp.dgrp just contains the header with the genome names.

How can I trace what is going wrong here?
Is this amount of genomes simply too many?

I also asked a related question on biostars:
https://www.biostars.org/p/430412/

Thanks for your feedback.

compmetagen / strainest Goto Github PK

strainest's People

Contributors

Stargazers

Watchers

Forkers

strainest's Issues

Recommend Projects

Recommend Topics

Recommend Org