edgraham / binsanity Goto Github PK

Unsupervised Clustering of Environmental Microbial Assemblies Using Coverage and Affinity Propagation

License: GNU General Public License v3.0

Python 95.38% Jupyter Notebook 0.06% Dockerfile 1.16% Shell 3.41%

binsanity's Introduction

BinSanity v.0.5.4

Quick install pip (Note for pip installations you will need to separately make sure hmmer and the subread package are installed)

$ pip3 install Binsanity

Please see the Wiki for full usage and installation requirements:

https://github.com/edgraham/BinSanity/wiki

If an issue arises in the process of utilizing BinSanity please create an issue and we will address is as soon as possible. To expedite a response please provide any associated error messages. As this project is actively being improved any comments or suggestions are welcome.

Binsanity Forum

Citation

Graham ED, Heidelberg JF, Tully BJ. (2017) BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation. PeerJ 5:e3035 https://doi.org/10.7717/peerj.3035

binsanity's People

Contributors

Stargazers

Watchers

Forkers

theonehyer python3pkg galambosd prehensilecode housw pythseq chequita ozcan wangdxf meren varir rajaldebnath driftbio bikmi

binsanity's Issues

Binsanity-profile unable to write .cov file

Hello!

When running

Binsanity-profile -i assembly.fasta -s ./ --ids anviocontigs_binsanity.ids -c binsanity -o ./ -T 11 --transform scale

Everything works fine (bam.saf files, readcounts and readcounts.summary files are all being produced), except for one thing: the binsanity.cov file that gets written is empty. The error message I receive:

Traceback (most recent call last):
  File "/usr/local/bin/Binsanity-profile", line 199, in <module>
    make_coverage(args.inputIds, args.outCov, args.outDirectory)
  File "/usr/local/bin/Binsanity-profile", line 70, in make_coverage
    contig_length = cov_data[k].pop(0)
IndexError: pop from empty list

Any idea what is going on and how I can fix this?

Release source code in PyPI

Hi!

I want o update BinSanity's recipe in Bioconda baceuse the current one doesn't support Python 3. However, I've noticed that the the source code was not uploaded to PyPI, only the wheel.

Would it be possible to upload BinSanity's source to PyPI?

Thank you!

Error during BinSanity execution prevent correct outputs from beign created

I'm testing BinSanity with the provided test data. However, my results directory seems to be different from what would be expected. Those are the files/directories in my results folder:

BINSANITY-INITIAL
BinSanityWf_binsanity_checkm
BinSanityWf_checkm_lineagewf-results.txt
BinSanityWf.checkm.logfile
BinSanityWf.log

I also inspected the BinSanityWf.log file and noticed that an error had occured during the execution:

(...)
Finished parsing hits for 33 of 33 (100.00%) bins.
Traceback (most recent call last):
  File "/home/antoniop.camargo/anaconda3/envs/binning/bin/Binsanity-wf", line 460, in <module>
    args.prefix)+"_checkm_lineagewf-results.txt"), ".fna", str(out_1), args.prefix)
  File "/home/antoniop.camargo/anaconda3/envs/binning/bin/Binsanity-wf", line 274, in checkm_analysis
    highRedundancy = np.setdiff1d(all, (highCompletion+lowCompletion++strainRedundancy))
TypeError: bad operand type for unary +: 'list'

        ******************************************************
        **********************BinSanity***********************
        |____________________________________________________|
        |                                                    |
        |             Computing Coverage Array               |
        |____________________________________________________|
        
          Preference: -3
          Maximum Iterations: 4000
          Convergence Iterations: 400
          Contig Cut-Off: 1000
          Damping Factor: 0.95
          Coverage File: Infant_gut_assembly.cov.x100.lognorm
          Fasta File: igm.fa
          Output Directory: IGM-BinsanityWF
          (4189, 11)

         ______________________________________________________
        |                                                      |
        |                 Clustering Contigs                   |
        |______________________________________________________|

        
          Cluster 0: 4
          Cluster 18: 15
          Cluster 33: 22
          Cluster 2: 25
          Cluster 1: 11
          Cluster 3: 3
          Cluster 4: 7
          Cluster 5: 24
          Cluster 39: 25
          Cluster 29: 20
          Cluster 31: 23
          Cluster 7: 19
          Cluster 6: 17
          Cluster 8: 332
          Cluster 9: 294
          Cluster 22: 7
          Cluster 10: 18
          Cluster 19: 208
          Cluster 11: 256
          Cluster 12: 570
          Cluster 21: 71
          Cluster 15: 258
          Cluster 13: 3
          Cluster 14: 4
          Cluster 16: 413
          Cluster 17: 504
          Cluster 20: 158
          Cluster 23: 494
          Cluster 25: 71
          Cluster 24: 4
          Cluster 26: 63
          Cluster 30: 16
          Cluster 27: 34
          Cluster 28: 1
          Cluster 32: 20
          Cluster 34: 19
          Cluster 35: 43
          Cluster 36: 24
          Cluster 37: 3
          Cluster 38: 86
          Total Number of Bins: 33

         *|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*
         _____________________________________________________
         _____________________________________________________
        |                                                     |
        |                   Creating Bins                     |
        |_____________________________________________________|
        

         _____________________________________________________

                       Putative Bins Computed
                       in 1276.9271821975708 seconds
         _____________________________________________________

         _____________________________________________________
        |                                                     |
        |       Evaluating Genome With CheckM Lineage_wf      |
        |_____________________________________________________|

Error in CheckM stage

Binsanity-lc v0.2.7
Python 2.7.13 (Intel Distribution for Python)

Commandline:

 Binsanity-lc -f. -l ${DATADIR}/contig.fa -c ${DATADIR}/COVERAGE.cov -o ${OUTDIR} --threads 20

Two errors occured during the run (full output attached as txt file):

          Cluster 50:Traceback (most recent call last):
  File "/home/juser/.local/bin/checkm", line 36, in <module>
    from checkm import main
  File "/home/juser/.local/lib/python2.7/site-packages/checkm/main.py", line 25, in <module>
    from checkm.defaultValues import DefaultValues
  File "/home/juser/.local/lib/python2.7/site-packages/checkm/defaultValues.py", line 26, in <module>
    class DefaultValues():
  File "/home/juser/.local/lib/python2.7/site-packages/checkm/defaultValues.py", line 29, in DefaultValues
    __DBM = DBManager()
  File "/home/juser/.local/lib/python2.7/site-packages/checkm/checkmData.py", line 114, in __init__
    if not self.setRoot():
  File "/home/juser/.local/lib/python2.7/site-packages/checkm/checkmData.py", line 140, in setRoot
    path = self.confirmPath(path=path)
  File "/home/juser/.local/lib/python2.7/site-packages/checkm/checkmData.py", line 162, in confirmPath
    path = raw_input("Where should CheckM store it's data?\n" \
EOFError: EOF when reading a line
 17

and

  File "/home/juser/.local/bin/Binsanity-lc", line 498, in <module>
    checkm_analysis(str(args.prefix)+"-checkm_lineagewf-binsanity.out",".fna",str(out_1),args.prefix)
  File "/home/juser/.local/bin/Binsanity-lc", line 302, in checkm_analysis
    del new_2[0]
IndexError: list assignment index out of range

In the file BinSanityLC-checkm_lineagewf-binsanity.out there is the following error message:


    It seems that the CheckM data folder has not been set yet or has been removed. Running: 'checkm data setRoot'.
    Where should CheckM store it's data?
    Please specify a location or type 'abort' to stop trying:

binsanitylc_output.txt

Empty List from Binsanity-profile

Hello,

When generating the cov profile using Binsanity-profile I get the error listed below, is this do to a dependency, I am using the most recent anaconda version

Many Thanks,
Brian

Load annotation file TwentTo50Megahit.bam.saf ... ||
|| Features : 62038931 ||
|| Meta-features : 62038931 ||
|| Chromosomes/contigs : 62038931 ||
|| ||
|| Process BAM file TwentTo50Megahit.bam... ||
|| Paired-end reads are included. ||
|| Assign alignments to features... ||

    ******************************************************
                Contigs formated to generate counts
    ******************************************************

Traceback (most recent call last):
File "/home/couger/.conda/envs/purge_haplotigs_env/bin/Binsanity-profile", line 176, in
make_coverage(args.inputIds,args.outCov,args.outDirectory)
File "/home/couger/.conda/envs/purge_haplotigs_env/bin/Binsanity-profile", line 60, in make_coverage
contig_length = cov_data[k].pop(0)
IndexError: pop from empty list

[Question] Why does BinSanity-profile require fasta file and contig ids?

I noticed there was a tutorial that says you can use Binsanity-profile but I'm confused why it needs a fasta file and contig ids when the bam file has all of this information in there.

What format does the resulting coverage file look like?

Can I just modify the output of Metabat's https://bitbucket.org/berkeleylab/metabat/issues/48/jgi_summarize_bam_contig_depths-coverage

(metagenomics_env) -bash-4.1$ Binsanity-profile -h
usage: Binsanity-profile -i fasta_file -s {sam,bam}_file --id contig_ids.txt -c output_file

    ***********************************************************************
    ******************************BinSanity********************************
    **                                                                   **
    **  Binsanity-profile is used to generate coverage files for         **
    **  input to BinSanity. This uses Featurecounts to generate a        **
    **  a coverage profile and transforms data for input into Binsanity, **
    **  Binsanity-refine, and Binsanity-wf                               **
    **                                                                   **
    ***********************************************************************
    ***********************************************************************

optional arguments:
  -h, --help            show this help message and exit
  -i INPUTFASTA         Specify fasta file being profiled
  -s INPUTMAPLOC
                            identify location of BAM files
                            BAM files should be indexed and sorted
  --ids INPUTIDS
                            Identify file containing contig ids
  -c OUTCOV
                            Identify name of output file for coverage information
  --transform TRANSFORM

                            Indicate what type of data transformation you want in the final file [Default:log]:
                            scale --> Scaled by multiplying by 100 and log transforming
                            log --> Log transform
                            None --> Raw Coverage Values
                            X5 --> Multiplication by 5
                            X10 --> Multiplication by 10
                            X100 --> Multiplication by 100
                            SQR --> Square root
                            We recommend using a scaled log transformation for initial testing.
                            Other transformations can be useful on a case by case basis
  -T THREADS            Specify Number of Threads For Feature Counts [Default: 1]
  -o OUTDIRECTORY       Specify directory for output files to be deposited [Default: Working Directory]
  --version             show program's version number and exit

Problem with cov-combine

I've tried to run cov-combine, but it kept giving the same error message:

$ python cov-combine.py -c coverage -o Combined.coverage
File "cov-combine.py", line 74
elif
^
SyntaxError: invalid syntax

This is the line 74:
if args.inputoutput is None:
parser.error('-o output fasta file needed')
if args.inputCoverage is None:
parser.error('-c suffic linking coverage profiles needed')
elif

else:
    time = time.time()

It looked odd to have an empty "elif", but I'm not really familiarized with python. I was wondering if it was my mistake or it's something with the code. Thanks so much

Error in Binsanity-lc.cov_array() - 'list' object has no attribute 'shape'

Using:

Binsanity-lc v0.2.7
Python 2.7.13 (from Intel)

After some hours of running Binsanity-lc:

              Reclustering redundant bin BinSanityLC-kmean-bin_35-bin_59.fna
            ____________________________________________________
          Preference: -25
          Maximum Iterations: 4000
          Convergence Iterations: 400
          Contig Cut-Off: 1000
          Damping Factor: 0.95
          Coverage File: /lustre/Testing/fooGrp/CONDABINSAN/COVERAGE.cov
          Fasta File: BinSanityLC-kmean-bin_35-bin_59.fna
          Kmer: 4
Traceback (most recent call last):
  File "/home/juser/.local/bin/Binsanity-lc", line 564, in <module>
    val1, val2 = cov_array((get_cov_data(os.path.join(args.outputdir,args.prefix +'-kmerGC.txt'))), location, redundant_bin,args.ContigSize)                                 
  File "/home/juser/.local/bin/Binsanity-lc", line 51, in cov_array
    print "          %s" % (cov_array.shape,)
AttributeError: 'list' object has no attribute 'shape'

tests?

Hi,
Is there any tests available? I've managed to install it but am not sure on certain things e.g. is bedtools 2.17 really required or is 2.18 OK?
Thanks, ben

IndexError: list index out of range

I am trying to run the test dats in binsanity. But I am getting the following error. How to rectify this
Traceback (most recent call last):
File "/usr/local/bin/Binsanity", line 182, in
val1, val2 = cov_array((get_cov_data(args.inputCovFile)), args.inputContigFiles, args.fastafile,args.ContigSize)
File "/usr/local/bin/Binsanity", line 21, in get_cov_data
all_cov_data[cov_data[0]] = cov_data
IndexError: list index out of range

MemoryError: Unable to allocate 192. GiB for an array with shape (160646, 160646) and data type float64

Hello BinSanity developers and community,

I am trying to bin contigs with BinSanity using anvi'o v6.2 and got error which seems like I have a memory issue as shown below. I ran a command anvi-cluster-contigs -p PROFILE.db -c ../../GR_contigs.db -C binsanity --driver binsanity --just-do-it. I wonder if there is a solution to avoid this issue. Thank you for your time and consideration.

Stay healthy and safe,
Joo-Young

(anvio-6.2) bash-4.2$ pwd && ls -lh
/mnt/gs18/scratch/users/leejooy5/anvio_2020Feb/GR_profile_DB/GR-merged/binsanity_tmp
total 1.8G
-rw-r----- 1 leejooy5 mmg 1.1K Apr 14 13:37 binsanity-logfile.txt
-rw-r----- 1 leejooy5 mmg  36M Apr 14 13:37 contig_coverages_log_norm.txt
-rw-r----- 1 leejooy5 mmg  31M Apr 14 13:37 contig_coverages.txt
-rw-r----- 1 leejooy5 mmg 2.6K Apr 14 13:37 logs.txt
-rw-r----- 1 leejooy5 mmg 836M Apr 14 13:37 sequence_contigs.fa
-rw-r----- 1 leejooy5 mmg 847M Apr 14 13:37 sequence_splits.fa
-rw-r----- 1 leejooy5 mmg  38M Apr 14 13:37 split_coverages_log_norm.txt
-rw-r----- 1 leejooy5 mmg  33M Apr 14 13:37 split_coverages.txt

(anvio-6.2) bash-4.2$ cat logs.txt
# DATE: 14 Apr 20 13:19:36
# CMD LINE: Binsanity -c /tmp/local/60268981/tmp1u3por0d/contig_coverages_log_norm.txt -f /tmp/local/60268981/tmp1u3por0d -l sequence_contigs.fa -o /tmp/local/60268981/tmp1u3por0d
Traceback (most recent call last):
  File "/mnt/home/leejooy5/miniconda3/bin/Binsanity", line 219, in <module>
    args.preference, args.inputContigFiles, args.outputdir, args.outname)
  File "/mnt/home/leejooy5/miniconda3/bin/Binsanity", line 63, in affinity_propagation
    convergence), copy=True, preference=int(preference), affinity='euclidean', verbose=False).fit_predict(array)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/cluster/_affinity_propagation.py", line 446, in fit_predict
    return super().fit_predict(X, y)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/base.py", line 462, in fit_predict
    self.fit(X)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/cluster/_affinity_propagation.py", line 381, in fit
    self.affinity_matrix_ = -euclidean_distances(X, squared=True)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/metrics/pairwise.py", line 303, in euclidean_distances
    distances = - 2 * safe_sparse_dot(X, Y.T, dense_output=True)
  File "/mnt/home/leejooy5/miniconda3/lib/python3.7/site-packages/sklearn/utils/extmath.py", line 151, in safe_sparse_dot
    ret = a @ b
MemoryError: Unable to allocate 192. GiB for an array with shape (160646, 160646) and data type float64

        ******************************************************
        **********************BinSanity***********************
        |____________________________________________________|
        |                                                    |
        |             Computing Coverage Array               |
        |____________________________________________________|

          Preference: -3
          Maximum Iterations: 4000
          Convergence Iterations: 400
          Contig Cut-Off: 1000
          Damping Factor: 0.95
          Coverage File: /tmp/local/60268981/tmp1u3por0d/contig_coverages_log_norm.txt
          Fasta File: sequence_contigs.fa
          Output directory: /tmp/local/60268981/tmp1u3por0d
          logfile: binsanity-logfile.txt
          (160646, 21)

         ______________________________________________________
        |                                                      |
        |                 Clustering Contigs                   |
        |______________________________________________________|

Binsanity-lc: TypeError: unhashable type: 'numpy.ndarray'

Hi,

I got memory error when running Binsanity-wf so I tried Binsanity-lc. I kept getting this error. I installed Binsanity through pip install and it should be version 0.2.6.5. Any idea what might be causing it?

        ____________________________________________________

         Clustering Bin  BinSanityLC-kmean-bin_17.fna
         via Affinity Propagation
        ____________________________________________________

      Preference: -3
      Maximum Iterations: 4000
      Convergence Iterations: 400
      Contig Cut-Off: 1000
      Damping Factor: 0.95
      Coverage File: profile_output.cov
      Fasta File: BinSanityLC-kmean-bin_17.fna
      Output Directory: binsanity_bins
      (3666, 1)

Traceback (most recent call last):
File "/slipstream/home/acacia/anaconda2/bin/Binsanity-lc", line 479, in
affinity_propagation(val3,val4,clust,args.damp,args.maxiter,args.conviter,args.preference,location_kmean,args.outputdir)
File "/slipstream/home/acacia/anaconda2/bin/Binsanity-lc", line 64, in affinity_propagation
outfile_data[apclust[i]] = [names[i]]
TypeError: unhashable type: 'numpy.ndarray'

Test works but binsanity-profile fails

Binsanity-profile -i contigs_v2.fa -s /Volumes/Red_4TB/scratch_BK/CS_BACKUP/p31_anvio_metagenomes/OC26_EXPERIMENT/binsanity_bins/bam_files/ -c binsanity.profile

/Users/tito_miniconda/opt/miniconda3/bin/Binsanity-profile:152: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if args.outDirectory is not ".":

        ******************************************************
                    Contigs formated to generate counts
        ******************************************************
        
Traceback (most recent call last):
  File "/Users/tito_miniconda/opt/miniconda3/bin/Binsanity-profile", line 164, in <module>
    feature_counts(args.inputMapLoc, args.outDirectory, args.Threads)
  File "/Users/tito_miniconda/opt/miniconda3/bin/Binsanity-profile", line 31, in feature_counts
    subprocess.call(["featureCounts", "-M", "-O", "-F", "SAF", "-T", str(threads), "-a", os.path.join(outputdir, str(
  File "/Users/tito_miniconda/opt/miniconda3/lib/python3.8/subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/Users/tito_miniconda/opt/miniconda3/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/tito_miniconda/opt/miniconda3/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'featureCounts'

Using BinSanity for metagenomes with bacteria, archaea, and protists

I want to try BinSanity on my dataset and I know there are at least Cyanobacteria and Diatoms. I suspect there are also archaea in this ocean dataset. Can I use BinSanity-wf and skip the CheckM stage or do I have to run BinSanity and BinSanity-refine separately?

Also, the default values in BinSanity-refine are different than the actual values. 0.9 vs. 0.95 for dampening factor I believe. Which one would be a better option for a dataset if around 800k contigs from a single sample (not a coassaembly) ?

problem with profile "OSError: [Errno 2] No such file or directory"

Hi,

I am trying to run Binsanity-profile, but I am getting an error. I have mapped reads using BBmap and the BBmap folder in my commandline contains the sorted file DG074megahit_sorted.bam and the index DG074megahit_sorted.bam.bai

My command line is:
nohup /usr/local/bin/Binsanity-profile -i /groups/edwards/camilla/Kart_assemblies/megahit/DG074_simplified_final.contigs.fa -s /groups/edwards/camilla/Kart_assemblies/megahit/Binsanity/DG074/BBmap/ --ids /groups/edwards/camilla/Kart_assemblies/megahit/Binsanity/DG074/DG074.id.txt -c /groups/edwards/camilla/Kart_assemblies/megahit/Binsanity/DG074/DG074.cov &

I get the following error:

                Contigs formated to generate counts
    ******************************************************

Traceback (most recent call last):
File "/usr/local/bin/Binsanity-profile", line 166, in
feature_counts(args.inputMapLoc,args.outDirectory,args.Threads)
File "/usr/local/bin/Binsanity-profile", line 28, in feature_counts
subprocess.call(["featureCounts","-M","-O", "-F", "SAF","-T",str(threads),"-a", os.path.join(outputdir,str(outname)+".saf"),"-o", os.path.join(outputdir,str(outname)+".readcounts"),str(mapfile_fil
e)],stdout=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 523, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1340, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Could you help me spot my error.
Thanks,
Camilla

Binsanity doesn't produce final bins

hello
i have already used binsanity 2 months ago and it succced but no i am using it for binning of metatranscriptomic data and it failed to produce the final bins. i tried the Binsanity, Binsanity-lc,Binsanity2 and all failed with "Killed" after 4 kmers step WITHOUT telling me the reason

when i tried run the test data, it failed to give the final bins too but this error was found
2022-05-06 14:49:34] INFO: Reading marker alignment files.
[2022-05-06 14:49:34] INFO: Concatenating alignments.
[2022-05-06 14:49:34] INFO: Placing 75 bins into the genome tree with pplacer (be patient).
[2022-05-06 14:53:45] INFO: { Current stage: 0:07:30.229 || Total: 0:07:30.229 }
[2022-05-06 14:53:45] INFO: [CheckM - lineage_set] Inferring lineage-specific marker sets.
[2022-05-06 14:53:45] INFO: Reading HMM info from file.
[2022-05-06 14:53:46] INFO: Parsing HMM hits to marker genes:
[2022-05-06 14:53:50] INFO: Determining marker sets for each genome bin.

Unexpected error: <class 'FileNotFoundError'>

Error in Testing the sample Data

I am trying to run Binsanity with test data provided in Github. But when I run Binsanity I am getting the following warning.
/usr/lib/python2.7/dist-packages/sklearn/utils/sparsetools/init.py:3: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ._min_spanning_tree import minimum_spanning_tree
/usr/lib/python2.7/dist-packages/sklearn/utils/sparsetools/_graph_validation.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ._graph_tools import csgraph_to_dense, csgraph_from_dense,
/usr/lib/python2.7/dist-packages/sklearn/utils/sparsetools/init.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ._traversal import connected_components
/usr/lib/python2.7/dist-packages/sklearn/utils/extmath.py:20: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ._logistic_sigmoid import _log_logistic_sigmoid
/usr/lib/python2.7/dist-packages/sklearn/utils/extmath.py:22: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .sparsefuncs_fast import csr_row_norms
/usr/lib/python2.7/dist-packages/scipy/spatial/init.py:90: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .ckdtree import *
/usr/lib/python2.7/dist-packages/scipy/spatial/init.py:91: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .qhull import *
/usr/lib/python2.7/dist-packages/scipy/stats/_continuous_distns.py:24: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import vonmises_cython
/usr/lib/python2.7/dist-packages/scipy/stats/stats.py:188: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ._rank import rankdata, tiecorrect
/usr/lib/python2.7/dist-packages/sklearn/metrics/cluster/supervised.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .expected_mutual_info_fast import expected_mutual_information
/usr/lib/python2.7/dist-packages/sklearn/metrics/pairwise.py:56: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .pairwise_fast import _chi2_kernel_fast, _sparse_manhattan
/usr/lib/python2.7/dist-packages/sklearn/neighbors/init.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .ball_tree import BallTree
/usr/lib/python2.7/dist-packages/sklearn/neighbors/init.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .kd_tree import KDTree
/usr/lib/python2.7/dist-packages/sklearn/utils/graph.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .graph_shortest_path import graph_shortest_path
/usr/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py:28: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import _ppoly
/usr/lib/python2.7/dist-packages/sklearn/linear_model/least_angle.py:24: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ..utils import array2d, arrayfuncs, as_float_array, check_arrays
/usr/lib/python2.7/dist-packages/sklearn/linear_model/coordinate_descent.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import cd_fast
/usr/lib/python2.7/dist-packages/sklearn/linear_model/init.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from .sgd_fast import Hinge, Log, ModifiedHuber, SquaredLoss, Huber
/usr/lib/python2.7/dist-packages/sklearn/svm/base.py:8: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import libsvm, liblinear
/usr/lib/python2.7/dist-packages/sklearn/svm/base.py:9: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import libsvm_sparse
/usr/lib/python2.7/dist-packages/sklearn/utils/random.py:9: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ._random import sample_without_replacement
/usr/lib/python2.7/dist-packages/sklearn/isotonic.py:11: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from ._isotonic import _isotonic_regression
/usr/lib/python2.7/dist-packages/sklearn/manifold/t_sne.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import utils
/usr/lib/python2.7/dist-packages/sklearn/cluster/k_means.py:34: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import _k_means
/usr/lib/python2.7/dist-packages/sklearn/cluster/hierarchical.py:24: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
from . import _hierarchical
and the following Error
Traceback (most recent call last):
File "/usr/local/bin/Binsanity", line 182, in
val1, val2 = cov_array((get_cov_data(args.inputCovFile)), args.inputContigFiles, args.fastafile,args.ContigSize)
File "/usr/local/bin/Binsanity", line 21, in get_cov_data
all_cov_data[cov_data[0]] = cov_data
IndexError: list index out of range

please provide release tag

Hello,

would it be possible to tag a release.

our policy is to install tagged release versions

thanks

Eric

Test produces no bins

Hello!

I tried installing BinSanity (on the cluster I have access to, where most of the dependencies are available as modules; I installed pandas and featureCounts myself) and used the provided igm.fa and Infant_gut_assembly.cov.x100.lognorm files provided to test the installation, following the guide in the instructions. Everything seemed to go as planned, but instead of getting the 22-bin output, I got 0 bins and 4188 clusters. I'm assuming I'm missing a dependency/a dependency did not act as expected, but I'm not sure what it would be. Any help would be appreciated.

To be clear: the "Computing Coverage Array" part was as expected, but the "Clustering Contigs" part was not.

Thanks!

Binsanity-wf fails

Hello,

I am trying to run the "Binsanity-wf" command as follows:
Binsanity-wf -f <my_assembly_dir> -l <my_assembly> -c binsanity.profile.cov.x100.lognorm -o output --threads 20

This successfully runs the Binsanity and CheckM, but then immediately fails after CheckM:

*******************************************************************************
 [CheckM - qa] Tabulating genome statistics.
*******************************************************************************

  Calculating AAI between multi-copy marker genes.

  Reading HMM info from file.
  Parsing HMM hits to marker genes:
    Finished parsing hits for 10 of 10 (100.00%) bins.

  { Current stage: 0:00:06.542 || Total: 0:06:01.380 }
Traceback (most recent call last):
  File "/gnu/store/vzmhc8h9i20as615l0lmbnjqb3lww85s-binsanity-0.2.5.5/bin/..Binsanity-wf-real-real", line 429, in <module>
    shutil.copyfileobj(readfile,outfile)
  File "/srv/sw/python/2.7.4/lib/python2.7/shutil.py", line 52, in copyfileobj
    fdst.write(buf)
ValueError: I/O operation on closed file

Do you have any insights into what might be causing this issue?

Bio.Alphabet removed from Biopython

There is an issue with the conda installation.

Removing the import works. Two solutions I think, dropping the use of Allphabet or fixing the version of Biopython used in the conda recipe.

For example when calling Binsanity-wf

Traceback (most recent call last):                                                             
  File "/opt/miniconda3/envs/binsanity/bin/Binsanity-wf", line 13, in <module>                 
    from Bio.Alphabet import IUPAC                                                             
  File "/opt/miniconda3/envs/binsanity/lib/python3.8/site-packages/Bio/Alphabet/__init__.py", l
ine 20, in <module>                                                                            
    raise ImportError(                                                                         
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simp
ly be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_
type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://bi
opython.org/wiki/Alphabet for more information.

Issue with low_completion.fna and 4mer

Hi,
When running the "lc" part of your software, I got this error :

	       	    ____________________________________________________
	       	            Calculating 4mer frequencies for 
	       	            redundant bin low_completion.fna
       		    ____________________________________________________
          kmer frequency calculated in 15164.698575735092 seconds

           	    ____________________________________________________
                    	Creating Profile for 
                    	redundant bin low_completion.fna
                    ____________________________________________________
           Combined profile created in 167.75693249702454 seconds

        	    ____________________________________________________
	             	Reclustering redundant bin low_completion.fna
	            ____________________________________________________
          Preference: -25
          Maximum Iterations: 4000
          Convergence Iterations: 400
          Contig Cut-Off: 1000
          Damping Factor: 0.95
          Coverage File: HC_HiSeq_BinSaniy_cov.cov.x100.lognorm
          Fasta File: low_completion.fna
          Kmer: 4
          (300263, 266)
BinSanity failed when refining you genomes :/. The Bin that it failed at was the following bin: low_completion.fna

Any ideas, can I used the bins in the REFINED-BINS folder ?

Binsanity-lc's TabError

Greetings. I installed Binsanity with pip, and when trying to run Binsanity for the first time, I get

File "/home/jsequeira/anaconda3/bin/Binsanity-lc", line 103
    contig_number = 0
                    ^
TabError: inconsistent use of tabs and spaces in indentation

Is the pip version up to date?

I/O Error with shutil.py

Command called:

Binsanity-wf -c S1_binsanity.cov.x100.lognorm -f ./ -l S1_simplifiedheaders.fasta -x 2000 --threads 16

When running, the program works fine through the first round of binning and CheckM results then dies with this error:

Traceback (most recent call last):
  File "/home/mcallis/software/BinSanity/bin/Binsanity-wf", line 429, in <module>
    shutil.copyfileobj(readfile,outfile)
  File "/usr/lib64/python2.7/shutil.py", line 52, in copyfileobj
    fdst.write(buf)
ValueError: I/O operation on closed file

I'm not sure, but I think it might be because the "outfile" has been closed in the previous lines.

with open("low_completion.fna","wb") as outfile:
            for filename in glob.glob(str(location2)+"/*.fna"):
                if filename == "low_completion.fna":
                    continue
        with open(filename,"rb") as readfile:
            shutil.copyfileobj(readfile,outfile)
            shutil.move("low_completion.fna", str(location))
            shutil.rmtree(str(location2))

Please let me know if I'm just missing something. I'm using python 2.7.11.

Binsanity-profile truncates the name of the alignment file

I'm using Binsanity v0.2.8

Apologies for all of the activity on here lately. I've been trying to integrate Binsanity into some of the pipelines at my institute. Thought it would to log some of the issues and questions instead of just forgetting about them.

(metagenomics_env) -bash-4.1$ Binsanity-profile -i scaffolds.fasta -s ./alignment_output/ -T 4 -o binsanity_output --ids ./alignment_output/contig_identifiers.list -c binsanity_output/output.cov

        ******************************************************
                    Contigs formated to generate counts
        ******************************************************


        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
	  v1.6.4

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           P alignment.bam                                  ||
||                                                                            ||
||             Output file : lignment.bam.readcounts                          ||
||                 Summary : lignment.bam.readcounts.summary                  ||
||              Annotation : lignment.bam.saf (SAF)                           ||
||      Dir for temp files : binsanity_output                                 ||
||                                                                            ||
||                 Threads : 4                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file lignment.bam.saf ...                                  ||
||    Features : 29710                                                        ||
||    Meta-features : 29710                                                   ||
||    Chromosomes/contigs : 29710                                             ||
||                                                                            ||
|| Process BAM file alignment.bam...                                          ||
||    Paired-end reads are included.                                          ||
||    Assign alignments to features...                                        ||
||    Total alignments : 166130                                               ||
||    Successfully assigned alignments : 84217 (50.7%)                        ||
||    Running time : 0.01 minutes                                             ||
||                                                                            ||
||                                                                            ||
|| Summary of counting results can be found in file "binsanity_output/lignme  ||
|| nt.bam.readcounts.summary"                                                 ||
||                                                                            ||
\\============================================================================//



        ********************************************************
                        Coverage profile produced
        ********************************************************


(metagenomics_env) -bash-4.1$ ls binsanity_output/
lignment.bam.readcounts  lignment.bam.readcounts.summary  lignment.bam.saf  output.cov.cov  output.cov.cov.x100.lognorm

utils scripts :: TabError: inconsistent use of tabs and spaces in indentation

Hello,

scripts form the utilsdirectory contains mixed tab and sapces as indentation.
that genrate an error

bin_evaluation
concat
get-ids
simplify-fasta
transform-coverage-profile

regards

Eric

BinSanity docker run-error in "_continuous_distns.py"

I am running BinSanity on docker, but I am getting the following error which seems to be a bug in the docker image.

I have already changed my ./local file which contains my python modules so that it doesn't interfere with docker.

My code-
$module load singularity/3.0.3
$singularity pull docker://shengwei/binsanity:latest #pulling the latest version on BinSanity
$singularity exec binsanity_latest.sif Binsanity-lc -h #checking if it runs

Error-
Traceback (most recent call last):
File "/opt/conda/bin/Binsanity-lc", line 7, in
from sklearn.cluster import AffinityPropagation
File "/opt/conda/lib/python2.7/site-packages/sklearn/cluster/init.py", line 6, in
from .spectral import spectral_clustering, SpectralClustering
File "/opt/conda/lib/python2.7/site-packages/sklearn/cluster/spectral.py", line 15, in
from ..metrics.pairwise import pairwise_kernels
File "/opt/conda/lib/python2.7/site-packages/sklearn/metrics/init.py", line 7, in
from .ranking import auc
File "/opt/conda/lib/python2.7/site-packages/sklearn/metrics/ranking.py", line 27, in
from scipy.stats import rankdata
File "/opt/conda/lib/python2.7/site-packages/scipy/stats/init.py", line 367, in
from .stats import *
File "/opt/conda/lib/python2.7/site-packages/scipy/stats/stats.py", line 173, in
from . import distributions
File "/opt/conda/lib/python2.7/site-packages/scipy/stats/distributions.py", line 13, in
from . import _continuous_distns
File "/opt/conda/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py", line 3345
SyntaxError: Non-ASCII character '\xe2' in file /opt/conda/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py on line 3346, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

binsanity-checkm created in current working directory

Hello,

I'm using BinSanity v0.2.5.9 and have encountered a small issue. When running the program it creates the directory 'binsanity-checkm' in the current working directory. As a consequence, if another instance o f BinSanity is run in the same directory it will fail as this directory will already exist. Would it be possible to give this directory a unique/temporary name in order to avoid this conflict? I often run binning methods in parallel over multiple independent samples so this situation is problematic.

Thanks,
Donovan

Binsanity-profile speed

Hi,

I have a quite large co-assembly that I'd like to run through Binsanity. However, the Binsanity-profile script seems to be incredibly slow. The multiBamCov step is running with babysteps on a node with 30 cores/750Gb RAM available, i.e., in 24 hours the first sample .readcounts file is barely over 1Mb in size.

The (24) SAM files were created with BWA, converted to BAM and sorted with samtools and indexed as well with samtools. For reference, the average BAM file is 8-10 Gb, and its index file 800 - 900 Mb. The co-assembly itself is 13 Gb. I have previously calculated coverage profiles on this co-assembly through bedtools genomecov, and that went rather smooth.

Do you have any recommendations or insights on why this may be happening?

I'm running samtools v1.3.1 and bedtools2 2.25.

Best,

Ruben

error with "concat"

Hello,

When I run concat on alignment files, I keep getting this kind of error:

IOError: [Errno 9] Bad file descriptor

Do you have any idea why this is occurring?

Thank you!
Lynn

No log outputted.

Hello,
As it is noticed that "BinSanity becomes highly memory intensive at 100,000 contigs or above", I have 2.5 million contigs, but only 2G memory were occupied using Binsanity command. Is it updated? By the way, no log was outputted. (the software was installed by bioconda)

Thoughts on using binsanity on data sets that have been MDA'd

What are your thoughts on the use of binsanity for binning from metagenomes where MDA was performed on the DNA? ( MDA is notorious for uneven coverage)

[Feature Request] Binsanity-profile to accept gzipped fasta and stdin

It would be really useful if Binsanity-profile could accept a gzipped fasta file and autodetect it. This would be a very easy fix. If not, then it would be nice to have it accept from stdin as an alternative.

Bio.Alphabet has been removed from Biopython error in Binsanity-refine

(binsanity_env) -bash-4.1$ Binsanity-refine
Traceback (most recent call last):
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/binsanity_env/bin/Binsanity-refine", line 14, in <module>
    from Bio.Alphabet import IUPAC
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/binsanity_env/lib/python3.6/site-packages/Bio/Alphabet/__init__.py", line 21, in <module>
    "Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
(binsanity_env) -bash-4.1$ conda list | grep "biopython"
biopython                 1.78             py36h8f6f2f9_2    conda-forge
(binsanity_env) -bash-4.1$ Binsanity --version
Binsanity v0.4.4

Error during Affinity Propagation stage with Binsanity-lc

Hi, interested in using BinSanity, thanks for your hard work in developing/maintaining it.

I'm using Binsanity v0.4.1, installed with conda.

When running my metagenomes on cluster nodes with 400gb of RAM, they failed so I'm trying to run Binsanity-lc. I used these parameters, again with 20 processors and 400gb of RAM:

Binsanity-lc -f ${indir} -l ${ID}_scaffolds.fasta -c ${pDIR}/profiles/${ID}.cov.cov.x100.lognorm -o ${pDIR}/${ID}-BinsanityWF -x 3000 --checkm_threads 20 --kmean_threads 20 --Prefix ${ID}

It looks like the kmeans step completes but in the Affinity Propagation stage it fails:

            ____________________________________________________

             Clustering Bin  SD01cat-kmean-bin_71.fna
             via Affinity Propagation
            ____________________________________________________
          Preference: -3
          Maximum Iterations: 4000
          Convergence Iterations: 400
          Contig Cut-Off: 3000
          Damping Factor: 0.95
          Coverage File: /storage/home/hcoda1/0/mwoodworth8/scratch/PREMIX/21.01.01_all_metagenomes/06.h.binsanity_pt_cat/profiles/SD01cat.cov.cov.x100.lognorm
          Fasta File: SD01cat-kmean-bin_71.fna
          Output Directory: /storage/home/hcoda1/0/mwoodworth8/scratch/PREMIX/21.01.01_all_metagenomes/06.h.binsanity_pt_cat/SD01cat-BinsanityWF
          (47, 1)
Traceback (most recent call last):
  File "/storage/home/hcoda1/0/mwoodworth8/.conda/envs/binsanity/bin/Binsanity-lc", line 516, in <module>
    print("The program failed to complete clustering with affinity propagation when it reached %s. Check the number of contigs in the following bin: %s. If the number is >100,000 it is likely you ran into a memory error.") % (clust)
TypeError: not enough arguments for format string

I've attached the log file below. Are there thoughts on what could be going on here?

thanks!

SD01cat-BinsanityLC-log.txt

BinSanity is only at v0.2.5.10 on PyPI

Just wanted to point this out since I am eager to use the latest version and we generally install everything via PyPI.

low_completion.fna errors

Command line for Binsanity-wf v0.2.5.4:

Binsanity-wf -c S6_binsanity.cov.x100.lognorm -f ./ -l S6_simplifiedheaders.fasta -x 2000 --threads 20

After doing the initial binning and all of the refinement, the program exits when it attempts to process the low_completion.fna file. I think earlier you try to move it out of the Redundant Bins folder, but in my case, that file is still in the folder.

Here is the exit warning:

            -------------------------------------------------------
                       Reclustering redundant bin low_completion.fna
            -------------------------------------------------------
Preference: -25
Maximum Iterations: 4000
Convergence Iterations: 400
Contig Cut-Off: 2000
Damping Factor: 0.95
Coverage File: S6_binsanity.cov.x100.lognorm
Fasta File: low_completion.fna
Traceback (most recent call last):
  File "/home/mcallis/software/BinSanity/bin/Binsanity-wf", line 466, in <module>
    val1, val2 = cov_array((get_cov_data('tetra-GC-out')), location, redundant_bin,args.ContigSize)
  File "/home/mcallis/software/BinSanity/bin/Binsanity-wf", line 49, in cov_array
    print cov_array.shape
AttributeError: 'list' object has no attribute 'shape'

Looking at the script, it seems like there isn't anything left in the workflow except some moving of files, so I should be ok without a rerun, right? Thanks!

Does Binsanity_lc continue from checkpoints?

Since I have a huge metagenome assembly, i had to use binsanity_lc or risk running out of Memory.

The binsanity_lc run seemed to go fine, but took longer than the three days max-runtime allowed on our institutes servers. So it was killed before it could finish.

Is there a way to call binsanity_lc so that it picks up at the latest checkpoint instead of restarting from the absolute beginning? Or do I have to restart from scratch?

I tried with carol's dataset and meren.cov but got errors.

                      Running Binsanity

                ---Computing Coverage Array ---
    -------------------------------------------------------

Preference: -10.0
Maximum Iterations: 4000
Convergence Iterations: 400
Contig Cut-Off: 1000
Damping Factor: 0.95
Coverage File: meren.cov
Fasta File: igm.fa
Output directory: BINSANITY-RESULTS
(4189, 11)

Total Number of Bins: 22
Traceback (most recent call last):
File "/data/xianghui/cgat-python/bin/Binsanity", line 175, in
affinity_propagation(val1, val2, args.fastafile, args.damp, args.maxiter, args.conviter, args.preference,args.inputContigFiles,args.outputdir)
File "/data/xianghui/cgat-python/bin/Binsanity", line 84, in affinity_propagation
unbinned_file.close()
NameError: global name 'unbinned_file' is not defined
Mon Nov 28 11:04:22 SGT 2016

Index out of range error while running Binsanity-wf

Hi!

I'm trying to run Binsanity-wf, but I'm getting this error. I suspect something is going wrong after checkm analysis.

This is the command I used:
Binsanity-wf --threads 24 -f ./ -l ./final_assembly.fasta -c ./coverage.cov.x100.lognorm -o ./BinsanityWF

Log:

    **********************BinSanity***********************
    |____________________________________________________|
    |                                                    |
    |             Computing Coverage Array               |
    |____________________________________________________|

      Preference: -3
      Maximum Iterations: 4000
      Convergence Iterations: 400
      Contig Cut-Off: 1000
      Damping Factor: 0.95
      Coverage File: ./coverage.cov.x100.lognorm
      Fasta File: ./final_assembly.fasta
      Output Directory: ./BinsanityWF
      (18, 1)

     ______________________________________________________
    |                                                      |
    |                 Clustering Contigs                   |
    |______________________________________________________|


      Cluster 0: 18
      Total Number of Bins: 1

     *|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*|*
     _____________________________________________________
    |                                                     |
    |                   Creating Bins                     |
    |_____________________________________________________|


     _____________________________________________________

                   Putative Bins Computed
                   in 0.109831094742 seconds
     _____________________________________________________

     _____________________________________________________
    |                                                     |
    |       Evaluating Genome With CheckM Lineage_wf      |
    |_____________________________________________________|

Error:

Alignment record is too long.

Hey!

I have been trying to run Binsanity v2.0.0. I am able to produce the featureCounts files but then when I get to Running binning using the bam file I get an error saying that I need to use long read mode. I can not find anywhere in the --help or github page how to run long read mode. The current command I am running is:

Binsanity-profile -i contigs.fa -s sample.bam -c sample1
Binsanity-lc -f . -l contigs.fa -c sample1.cov.x100.lognorm -o sample_binsanity -C 5

I have also tried to run with but I get the same results

Binsanity-wf -f . -l -c sample1.cov.x100.lognorm -o sample_binsanity

Here are my results :

//========================== featureCounts setting ===========================\
|| ||
|| Input files : 1 BAM file ||
|| o D1060.bam ||
|| ||
|| Output file : D1060.bam.readcounts ||
|| Summary : D1060.bam.readcounts.summary ||
|| Annotation : D1060.bam.saf (SAF) ||
|| Dir for temp files : . ||
|| ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
\============================================================================//

//================================= Running ==================================\
|| ||
|| Load annotation file D1060.bam.saf ... ||
|| Features : 373272 ||
|| Meta-features : 373272 ||
|| Chromosomes/contigs : 373272 ||
|| ||
|| Process BAM file D1060.bam... ||
|| ||
|| ERROR: Alignment record is too long. ||
|| Please use the long read mode. ||

FATAL Error: The program has to terminate and no counting file is generated.

How do I run long read mode? Thanks!

Hannah

Output dir not used in Binsanity-profile

Hello,

as far as I can tell there is a bug in Binsanity-profile where the output folder is not used for the coverage profile:

    out1 = open(str(outfile)+".cov", "w")

Should be

    out1 = open(os.path.join(outputdir,str(outfile)+".cov"), "w")

If I understand the documentation correctly.

This is a minor bug

cov_array Error: 'list' object has no attribute 'shape'

Hi,

I'm running Binsanity-wf. Everything was going fine but now it has called for an error.

So far the folders and files created are:

BinSanityWf_binsanity_checkm
BinSanityWf_checkm_lineagewf-results.txt
BINSANITY-INITIAL
BinSanityWf_GC_count.txt
BinSanityWf_4mer_frequencies.txt
BinSanityWf_kmerGC.txt
BinSanityWf.log

But BinSanityWf_kmerGC.txt is empty.

Checking the log files it stopped at this moment:

Creating Profile for
redundant bin low_completion.fna
______________________________________________________
Combined profile created in 2.07199311256 seconds

         ______________________________________________________ 
                                                           
           Reclustering redundant bin low_completion.fna                       
         ______________________________________________________
      Preference: -25
      Maximum Iterations: 4000
      Convergence Iterations: 400
      Contig Cut-Off: 1000
      Damping Factor: 0.95
      Coverage File: depth_WW_binsanity_50000.cov.x100.lognorm_OK
      Fasta File: low_completion.fna
      Kmer: 4

The error file says:

/services/tools/anaconda-2.2.0/lib/python2.7/site-packages/Bio/Seq.py:155: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour.
"the new string hashing behaviour.", BiopythonWarning)
Traceback (most recent call last):
File "/services/tools/anaconda-2.2.0/bin/Binsanity-wf", line 477, in
val1, val2 = cov_array((get_cov_data(os.path.join(args.outputdir,str(args.prefix)+'_kmerGC.txt'))), location, redundant_bin,args.ContigSize)
File "/services/tools/anaconda-2.2.0/bin/Binsanity-wf", line 50, in cov_array
print " %s" % (cov_array.shape,)
AttributeError: 'list' object has no attribute 'shape'

Any idea how to solve this problem?

Thanks in advance.

Segmentation fault

Hi,

I am trying to run Binsanity on a large number of contigs (about 130 000) and getting a segmentation fault. Could you please help me? Please find printed log below:
~/binsanity$ Binsanity -f ./synth -l samples_splits_f.fasta -c profiles.in

    -------------------------------------------------------
                      Running Binsanity

                ---Computing Coverage Array ---
    -------------------------------------------------------

Preference: -3
Maximum Iterations: 4000
Convergence Iterations: 400
Contig Cut-Off: 1000
Damping Factor: 0.95
Coverage File: profiles.in
Fasta File: samples_splits_f.fasta
Output directory: BINSANITY-RESULTS
(133693, 8)

    -------------------------------------------------------
                  ---Clustering Contigs---
    -------------------------------------------------------

Segmentation fault

The same command works on subsample (about 20 000 contigs)

Final Bins Output

Hi Elaina,

I'm wondering if you would please explain the final output to me. Particularly the BinSanity-Final-Bins directory.
I'm using Binsanity to try to recover MAGs from metagenomic samples and end up with the list of files below. Should I concatenate all the refined .fna files for a single Bin into one file or am I completely misreading the situation?

All the best,
Calum

low_completion-refined_0.fna
low_completion-refined_1.fna
low_completion-refined_2.fna
PB010_L_contigs_simplified_Bin-10-refined_0.fna
PB010_L_contigs_simplified_Bin-10-refined_10.fna
PB010_L_contigs_simplified_Bin-10-refined_11.fna
PB010_L_contigs_simplified_Bin-10-refined_12.fna
PB010_L_contigs_simplified_Bin-10-refined_1.fna
PB010_L_contigs_simplified_Bin-10-refined_2.fna
PB010_L_contigs_simplified_Bin-10-refined_3.fna
PB010_L_contigs_simplified_Bin-10-refined_4.fna
PB010_L_contigs_simplified_Bin-10-refined_5.fna
PB010_L_contigs_simplified_Bin-10-refined_6.fna
PB010_L_contigs_simplified_Bin-10-refined_7.fna
PB010_L_contigs_simplified_Bin-10-refined_8.fna
PB010_L_contigs_simplified_Bin-10-refined_9.fna
PB010_L_contigs_simplified_Bin-11-refined_0.fna
PB010_L_contigs_simplified_Bin-11-refined_10.fna
PB010_L_contigs_simplified_Bin-11-refined_11.fna
PB010_L_contigs_simplified_Bin-11-refined_12.fna
PB010_L_contigs_simplified_Bin-11-refined_13.fna
PB010_L_contigs_simplified_Bin-11-refined_14.fna
PB010_L_contigs_simplified_Bin-11-refined_15.fna
PB010_L_contigs_simplified_Bin-11-refined_16.fna
PB010_L_contigs_simplified_Bin-11-refined_17.fna
PB010_L_contigs_simplified_Bin-11-refined_18.fna
PB010_L_contigs_simplified_Bin-11-refined_19.fna
PB010_L_contigs_simplified_Bin-11-refined_1.fna
PB010_L_contigs_simplified_Bin-11-refined_2.fna
PB010_L_contigs_simplified_Bin-11-refined_3.fna
PB010_L_contigs_simplified_Bin-11-refined_4.fna
PB010_L_contigs_simplified_Bin-11-refined_5.fna
PB010_L_contigs_simplified_Bin-11-refined_6.fna
PB010_L_contigs_simplified_Bin-11-refined_7.fna
PB010_L_contigs_simplified_Bin-11-refined_8.fna
PB010_L_contigs_simplified_Bin-11-refined_9.fna
PB010_L_contigs_simplified_Bin-1-refined_0.fna
PB010_L_contigs_simplified_Bin-1-refined_1.fna
PB010_L_contigs_simplified_Bin-1-refined_2.fna
PB010_L_contigs_simplified_Bin-2-refined_0.fna
PB010_L_contigs_simplified_Bin-2-refined_1.fna
PB010_L_contigs_simplified_Bin-2-refined_2.fna
PB010_L_contigs_simplified_Bin-2-refined_3.fna
PB010_L_contigs_simplified_Bin-2-refined_4.fna
PB010_L_contigs_simplified_Bin-2-refined_5.fna
PB010_L_contigs_simplified_Bin-3-refined_0.fna
PB010_L_contigs_simplified_Bin-3-refined_1.fna
PB010_L_contigs_simplified_Bin-3-refined_2.fna
PB010_L_contigs_simplified_Bin-3-refined_3.fna
PB010_L_contigs_simplified_Bin-4-refined_0.fna
PB010_L_contigs_simplified_Bin-4-refined_10.fna
PB010_L_contigs_simplified_Bin-4-refined_11.fna
PB010_L_contigs_simplified_Bin-4-refined_12.fna
PB010_L_contigs_simplified_Bin-4-refined_13.fna
PB010_L_contigs_simplified_Bin-4-refined_14.fna
PB010_L_contigs_simplified_Bin-4-refined_15.fna
PB010_L_contigs_simplified_Bin-4-refined_16.fna
PB010_L_contigs_simplified_Bin-4-refined_1.fna
PB010_L_contigs_simplified_Bin-4-refined_2.fna
PB010_L_contigs_simplified_Bin-4-refined_3.fna
PB010_L_contigs_simplified_Bin-4-refined_4.fna
PB010_L_contigs_simplified_Bin-4-refined_5.fna
PB010_L_contigs_simplified_Bin-4-refined_6.fna
PB010_L_contigs_simplified_Bin-4-refined_7.fna
PB010_L_contigs_simplified_Bin-4-refined_8.fna
PB010_L_contigs_simplified_Bin-4-refined_9.fna
PB010_L_contigs_simplified_Bin-5-refined_0.fna
PB010_L_contigs_simplified_Bin-5-refined_1.fna
PB010_L_contigs_simplified_Bin-5-refined_2.fna
PB010_L_contigs_simplified_Bin-5-refined_3.fna
PB010_L_contigs_simplified_Bin-5-refined_4.fna
PB010_L_contigs_simplified_Bin-5-refined_5.fna
PB010_L_contigs_simplified_Bin-5-refined_6.fna
PB010_L_contigs_simplified_Bin-5-refined_7.fna
PB010_L_contigs_simplified_Bin-6-refined_0.fna
PB010_L_contigs_simplified_Bin-6-refined_10.fna
PB010_L_contigs_simplified_Bin-6-refined_1.fna
PB010_L_contigs_simplified_Bin-6-refined_2.fna
PB010_L_contigs_simplified_Bin-6-refined_3.fna
PB010_L_contigs_simplified_Bin-6-refined_4.fna
PB010_L_contigs_simplified_Bin-6-refined_5.fna
PB010_L_contigs_simplified_Bin-6-refined_6.fna
PB010_L_contigs_simplified_Bin-6-refined_7.fna
PB010_L_contigs_simplified_Bin-6-refined_8.fna
PB010_L_contigs_simplified_Bin-6-refined_9.fna
PB010_L_contigs_simplified_Bin-7-refined_0.fna
PB010_L_contigs_simplified_Bin-7-refined_10.fna
PB010_L_contigs_simplified_Bin-7-refined_11.fna
PB010_L_contigs_simplified_Bin-7-refined_12.fna
PB010_L_contigs_simplified_Bin-7-refined_13.fna
PB010_L_contigs_simplified_Bin-7-refined_14.fna
PB010_L_contigs_simplified_Bin-7-refined_15.fna
PB010_L_contigs_simplified_Bin-7-refined_1.fna
PB010_L_contigs_simplified_Bin-7-refined_2.fna
PB010_L_contigs_simplified_Bin-7-refined_3.fna
PB010_L_contigs_simplified_Bin-7-refined_4.fna
PB010_L_contigs_simplified_Bin-7-refined_5.fna
PB010_L_contigs_simplified_Bin-7-refined_6.fna
PB010_L_contigs_simplified_Bin-7-refined_7.fna
PB010_L_contigs_simplified_Bin-7-refined_8.fna
PB010_L_contigs_simplified_Bin-7-refined_9.fna
PB010_L_contigs_simplified_Bin-9-refined_0.fna
PB010_L_contigs_simplified_Bin-9-refined_10.fna
PB010_L_contigs_simplified_Bin-9-refined_11.fna
PB010_L_contigs_simplified_Bin-9-refined_12.fna
PB010_L_contigs_simplified_Bin-9-refined_13.fna
PB010_L_contigs_simplified_Bin-9-refined_14.fna
PB010_L_contigs_simplified_Bin-9-refined_15.fna
PB010_L_contigs_simplified_Bin-9-refined_16.fna
PB010_L_contigs_simplified_Bin-9-refined_1.fna
PB010_L_contigs_simplified_Bin-9-refined_2.fna
PB010_L_contigs_simplified_Bin-9-refined_3.fna
PB010_L_contigs_simplified_Bin-9-refined_4.fna
PB010_L_contigs_simplified_Bin-9-refined_5.fna
PB010_L_contigs_simplified_Bin-9-refined_6.fna
PB010_L_contigs_simplified_Bin-9-refined_7.fna
PB010_L_contigs_simplified_Bin-9-refined_8.fna
PB010_L_contigs_simplified_Bin-9-refined_9.fna

tag release on github accordingly to pypi

Hello,

pypi have version 0.3.6
github repo releases is out of sync. can you please tag a release when you update BinSanity in pypi ?

regards

Eric

Paired-end reads stored in a single-end read library

QC'ed the reads with illumina-utils (specifically iu-filter-quality-minoche and iu-merge pairs in two different attempts), performed preliminary binning with MegaHIT; feedback at the end was as in the title. Example:

Binsanity-profile -i minoche_CONTIGS/contigs.fa -s minoche_MAPPING/ -c Binsanity_minoche/MEGAHIT_coverage --transform scale -T 1 -o Binsanity_minoche_2/

    ******************************************************
                Contigs formated to generate counts
    ******************************************************


    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
      v2.0.2

//========================== featureCounts setting ===========================\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| SRR492065.bam ||
|| ||
|| Output file : SRR492065.bam.readcounts ||
|| Summary : SRR492065.bam.readcounts.summary ||
|| Paired-end : no ||
|| Count read pairs : no ||
|| Annotation : SRR492065.bam.saf (SAF) ||
|| Dir for temp files : Binsanity_minoche_2 ||
|| ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
\============================================================================//

//================================= Running ==================================\
|| ||
|| Load annotation file SRR492065.bam.saf ... ||
|| Features : 1546 ||
|| Meta-features : 1546 ||
|| Chromosomes/contigs : 1546 ||
|| ||
|| Process BAM file SRR492065.bam... ||
ERROR: Paired-end reads were detected in single-end read library : minoche_MAPPING/SRR492065.bam

Is it possible to use coverages from other sources ?

Hi,
I wanted to know if it was possible to use the coverage results of other tools such as Metabat and it's "jgi_summarize_bam_contig_depths" command instead of the Binsanity-profile command.
Did you already tried it ? Would you advise it ?

Best
Greg

The program failed to read in your coverage file

Hi Elaina,

Being simple, Bio.Alphabet was deleted from all the script and well as where written "IUPAC"..

But running again the Binsanity-wf , I got the message:

The program failed to read in your coverage file :(. Please check it to make sure it is in the right format.

The coverage files (binsanity.txt.cov and binsanity.txt.cov.x100.lognorm) were generated from previous BinSanity script.

I tried both files but no one works.

Any tips?

Error with coverage file

Hello!

For some reason, Binsanity-wf doesn't want to work with several, not all, of the coverage files outputted by Binsanity-profile. The error message says The program failed to read in your coverage file :(. Please check it to make sure it is in the right format. which I find it weird because I'm using the output of Binsanity-profile directly without changing anything. At first I thought it was the file size, but I think I've used BinSanity for files bigger than this and it worked. I also, stupidly, thought it was the file names (it just so happens that sample03 and sample13 failed, haha) so I changed the file name but it still doesn't work.

This has been previously posted in the google groups but the suggested solution doesn't work for me, sadly. I've tried looking for differences between files that worked and files that didn't, and I found nothing.

# works normally
Binsanity-wf -f ./ -l sample04_contigs.fasta \
        -c sample04.cov.x100.lognorm \
        -o ./test/ \
        --threads 32 \
        -p -3 --refine-preference -1

# doesn't work
Binsanity-wf -f ./ -l sample03_contigs.fasta \
        -c sample03.cov.x100.lognorm \
        -o ./test/ \
        --threads 32 \
        -p -3 --refine-preference -1

One fasta file is too big to upload, please let me know where I can email the related files. Thank you so much!
cov.zip

edgraham / binsanity Goto Github PK

binsanity's Introduction

BinSanity v.0.5.4

Citation

binsanity's People

Contributors

Stargazers

Watchers

Forkers

binsanity's Issues

Recommend Projects

Recommend Topics

Recommend Org