medema-group / big-scape Goto Github PK

View Code? Open in Web Editor NEW

60.0 60.0 26.0 185.68 MB

Similarity networks of biosynthetic gene clusters

License: GNU Affero General Public License v3.0

Python 6.79% Dockerfile 0.03% CSS 1.26% JavaScript 91.71% HTML 0.16% Shell 0.05%

bioinformatics biosynthetic-gene-clusters

big-scape's People

Contributors

Stargazers

Watchers

big-scape's Issues

BGC_fasta_folder is not defined

I am repeatedly getting this error, the fasta files are not generated in the output folder hence this error arises. But I haven't been able to figure out why the aligned files are not generated. I did everything I could. Please guide. FYI I'm using Bigscape 1.1.5

Handling non-convergence in AffinityPropagation

Related to #2. In the latest version of scikit-learn:

When fit does not converge, cluster_centers_ becomes an empty array and all training samples will be labelled as -1. In addition, predict will then label every sample as -1.

Previously, the cluster_centers_ output was None. Therefore, bigscape is no longer handling convergence failures correctly.

index.html network exprot

Thanks for your software, it very useful.
Can your give me some suggestion about how to export the the network picture after opening the index.html file ? I only found the FAM_xxxx tree picture could be export with SVG format. And, If I try to use the cytoscape to display the network, I felt it is hard to make a good picture as exhibited in the index.html web page. Or, could you give some suggestion about the network style and colour parameters used in the index.html file ? Then, I could use the similar parameter setting when I use the cytoscape to show the network.

Thanks

cosine distance among BGCs

Dear author，

Is there any way to calculate the cosine distance between two BGCs sequence?

I have try the textdistance of Python library, but the result make no sense.

Issue with input using WSL

Greetings Developers

I am new to bioinformatics so please pardon my rookie issue.

I am trying to run Bigscape using a Windows Subsystem for Linux (because I'm comfortable with Linux CLI). I installed Bigscape and all the requirements. I ran it with a smaller test dataset and it ran successfully. I tried doing it with a different dataset and it not detecting the .genbank files. My input files have the keyword "region" in them (example input file name: AFWF01000004.1.region001).

Any help would be appretiated.

Thanks

(base) kartikj112@Mach3:~/BiG-SCAPE-1.1.5$ python bigscape.py -i home/kartikj112/BiG-SCAPE-1.1.5/gbksinside -o output


   - - Processing input files - -
 Including files with one or more of the following strings in their filename: 'cluster', 'region'
 Skipping files with one or more of the following strings in their filename: 'final'

Importing GenBank files

 Starting with 0 files
 Files that had its sequence extracted: 0

Creating output directories

Trying threading on 16 cores

Predicting domains using hmmscan
 All fasta files had already been processed
 Finished generating domtable files.

Parsing hmmscan domtable files
 All domtable files had already been processed
 Finished generating pfs and pfd files.

Processing domains sequence files
 Adding sequences to corresponding domains file
 Reading the ordered list of domains from the pfs files
 Creating arrower-like figures for each BGC
  Parsing hmm file for domain information
    Done
  All SVG from the input files seem to be in the SVG folder
 Finished creating figures


   - - Calculating distance matrix - -
Performing multiple alignment of domain sequences
 No domain fasta files found to align
 Trying to read domain alignments (*.algn files)
No aligned sequences found in the domain folder (run without the --skip_ma parameter or point to the correct output folder)

'ascii' codec can't decode byte 0xce & No sequence extracted

Hi,

First of all, thank you for developing the pipeline! I met several errors when trying to analyze 1418 antiSMASH files in .gbk format.
Here is the detail:

- Processing input files - -
  Output folder already exists
  Logs folder already exists
  Cache folder already exists
  BGC fastas folder already exists
  Domtable folder already exists
  Domains folder already exists
  pfs folder already exists
  pfd folder already exists
  Including files with one or more of the following strings in their filename: 'cluster', 'region'
  Skipping files with one or more of the following strings in their filename: 'final'

Trying to read bundled MIBiG BGCs as reference
MIBiG BGCs seem to have been extracted already

Importing MIBiG files
Error with file /home/apps/software/BiG-SCAPE/1.1.1-IGB-gcc-4.9.4-Python-3.6.1/Annotated_MIBiG_reference/MIBiG_2.1_final/BGC0000199.1.gbk:
''ascii' codec can't decode byte 0xce in position 5255: ordinal not in range(128)'
(This file will be excluded from the analysis)
Error with file /home/apps/software/BiG-SCAPE/1.1.1-IGB-gcc-4.9.4-Python-3.6.1/Annotated_MIBiG_reference/MIBiG_2.1_final/BGC0001149.1.gbk:
''ascii' codec can't decode byte 0xe2 in position 5364: ordinal not in range(128)'
(This file will be excluded from the analysis)
Error with file /home/apps/software/BiG-SCAPE/1.1.1-IGB-gcc-4.9.4-Python-3.6.1/Annotated_MIBiG_reference/MIBiG_2.1_final/BGC0001231.1.gbk:
''ascii' codec can't decode byte 0xce in position 5461: ordinal not in range(128)'
(This file will be excluded from the analysis)
Error with file /home/apps/software/BiG-SCAPE/1.1.1-IGB-gcc-4.9.4-Python-3.6.1/Annotated_MIBiG_reference/MIBiG_2.1_final/BGC0001852.1.gbk:
''ascii' codec can't decode byte 0xce in position 7752: ordinal not in range(128)'
(This file will be excluded from the analysis)
Error with file /home/apps/software/BiG-SCAPE/1.1.1-IGB-gcc-4.9.4-Python-3.6.1/Annotated_MIBiG_reference/MIBiG_2.1_final/BGC0001851.1.gbk:
''ascii' codec can't decode byte 0xce in position 7842: ordinal not in range(128)'
(This file will be excluded from the analysis)
Warning: Input set has files with no Biosynthetic Genes (affects alignment mode)
See no_biosynthetic_genes_list.txt

Starting with 1923 files
Files that had its sequence extracted: 0

Importing GenBank files

Starting with 1418 files
Files that had its sequence extracted: 0

Creating output directories
SVG folder already exists
Networks folder already exists

Trying threading on 72 cores

Predicting domains using hmmscan
Traceback (most recent call last):
File "/home/apps/software/BiG-SCAPE/1.1.1-IGB-gcc-4.9.4-Python-3.6.1/bigscape.py", line 2503, in
for line in domtablefile.readlines():
File "/home/apps/software/Python/3.6.1-IGB-gcc-4.9.4/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1418: ordinal not in range(128)

Error 1: 'ascii' codec can't decode byte 0xce
Error 2: Files that had its sequence extracted: 0

What can be the problems? Hope to hear from you soon!

how can i install BiG-SCAPE

The instructions to download BiG-SCAPE is not avaliable anymore

Don't know why the task killed, the task have finished in network? is the task finished?

error:
/home/ec2-user/miniconda3/envs/bigscape_fix/lib/python3.7/site-packages/sklearn/linear_model/randomized_l1.py:575: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
eps=4 * np.finfo(np.float).eps, n_jobs=1,
/home/ec2-user/miniconda3/envs/bigscape_fix/lib/python3.7/site-packages/sklearn/decomposition/online_lda.py:31: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
EPS = np.finfo(np.float).eps

cat runtimes.txt
launch_hmmalign took 692.911 seconds
generate_network took 184.481 seconds
generate_network took 592.135 seconds
generate_network took 239.528 seconds
generate_network took 199.288 seconds
generate_network took 4.478 seconds
generate_network took 5908.956 seconds
generate_network took 563.710 seconds
generate_network took 927.190 seconds

Delete or fix the bioconda recipe

I encountered the same problem as #23 while trying to have a portable conda environment for snakemake.

See #23 (comment) for some options.

Random BGC families not showing in mixed class

Hi! I do not know if this is a bug, but whenever i use the mixed option--which as far as i understand should group all BGC families together--random BGC families that show up in other classes (NRPS, terpenes, etc...) do not appear in the mixed categorie. Is there a reason for this or am i not getting something?

I am using BiG-SCAPE 1.1.8 (2022-11-14)

How to install the latest Bigscape?

Hi there,
I noticed that the latest version of bigscape was released, the updates for prodigosine is important to me. However, The original version of biscape was installed by me based on the command lines you gived on the tutorial with docker. but I don't know how to install the latest version with a compressed packages.
Is there possible to install the latest version by the same command on the tutorial ?

Thanks.

BiG-SCAPE recognises gbk file but does not extract sequences

I installed BiG-SCAPE using the Singularity recipe shared in #26 and ran it with the following command:

singularity exec ~/programmes/big-scape/big-scape.img python /usr/src/BiG-SCAPE/bigscape.py \
        -i ../scratch/functional_annotation/002_antismash/${strain_file} \
        --include_gbk_str ${strain_file}_antismash \
        -o ${out_dir} \
        -c ${SLURM_CPUS_PER_TASK}

It seems to recognise my antiSMASH input .gbk file but does not extract sequences from it

  - - Processing input files - -
 Output folder already exists
 Logs folder already exists
 Cache folder already exists
 BGC fastas folder already exists
 Domtable folder already exists
 Domains folder already exists
 pfs folder already exists
 pfd folder already exists
 Including files with one or more of the following strings in their filename: 'Gt-19d1_antismash'
 Skipping files with one or more of the following strings in their filename: 'final'

Importing GenBank files

 Starting with 1 files
 Files that had its sequence extracted: 0

Which ultimately results in this error message:

   - - Calculating distance matrix - -
Performing multiple alignment of domain sequences
 No domain fasta files found to align
 Trying to read domain alignments (*.algn files)
No aligned sequences found in the domain folder (run without the --skip_ma parameter or point to the correct output folder)

Any ideas? Thanks very much in advance.

MIBiG singletons in network file

If singletons option is selected, the MIBiG singletons are not pruned from the final network file

MIBiG 3.1 files have no Biosynthetic Genes

I have a fresh install of BiGSCAPE v 1.1.5 (following: https://github.com/medema-group/BiG-SCAPE/wiki/installation#white_circle-installation-using-conda) in a fresh conda env. The problem also occurs in v 1.1.7. Running on linux

When I run
(bigscape115) chh@thoth:~/BiG-SCAPE115$ python bigscape.py --inputdir /home/chh/Ps_antismash7 --outputdir /home/chh/Ps_bigscape3.0 --mibig -c 36 --mix --mode glocal --include_singletons --pfam_dir /home/chh/BiG-SCAPEoldV/

I get the error
Trying to read bundled MIBiG BGCs as reference
Importing MIBiG files
Warning: Input set has files with no Biosynthetic Genes (affects alignment mode)
See no_biosynthetic_genes_list.txt

And 371 MIBiG reference BGCs are in that no_biosynthetic_genes_list. I assume this is going to be a problem later on, when BiGSCAPE tries to map my BGC clusters back to the MIBiG database?

Indent issue in setup.py

Error in conda build. I didn't have this issue while running python -m pip install... myself yet...

https://dev.azure.com/bioconda/bioconda-recipes/_build/results?buildId=14977&view=logs&j=e14e69ff-a0ae-55c4-b71d-229b239cfb2f&t=7df82132-b284-504b-53d6-7d3e63519572&l=867

It will work one day :-)

fix counterintuitive bigscape family number assignments?

I am running the latest bigscape conda package (v1.1.6)

Here is my command line:
python /home/a-m/alexp2/.conda/envs/bigscape_update/lib/python3.7/site-packages/bigscape/bigscape.py --mix --no_classify --include_singletons --clans-off --cutoffs 0.5 --inputdir /home/a-m/alexp2/antismash_results/antismash7/test_directory/ --outputdir /home/a-m/alexp2/bigscape_results/test_directory/ --pfam_dir /home/a-m/alexp2/multismash/pfam

My question is with how bigscape defines the family numbers. From the 'mix_clustering_c0.50.tsv' files in the network_files folder, one of the family numbers is zero. The family numbers also jump from 0 -> 1 -> 7 where I would expect it to be consecutive number increments like 1 -> 2 -> 3. The current family numbering scheme does not seem intuitive.

<> Is there a way to have bigscape start the numbering from one?
<> Is there a particular reason for the large family number jumps? Or is it possible to get bigscape to assign them consecutively?

mix_clustering_c0.50.tsv example file contents:

#BGC Name Family Number
CAHS01000016.1.region001 0
CP022725.1.region001 1
JANFMX010000007.1.region001 2
JANFMY010000003.1.region001 2
RHHM01000002.1.region001 7
RQRZ01000003.1.region001 7
RQSA01000004.1.region001 7
RQSB01000002.1.region001 7

Docker

Make sure the docker image is built correctly (check Pfam database location)
Build image automatically

Off-by-1 in domain coordinates

Python uses 0-indexing with an exclusive end coordinate. The hmmer3 domain table (from/to) uses 1-indexing with an includive end coordinate.

However it looks like, when subsetting sequences for the domain fasta files, the start coordinates are not adjusted.

Therefore, it seems that the amino acid sequences associated with domains are missing the first residue in the fasta output.

This line should be

-            seq[int(row[3]):int(row[4])])) #only use the range of the pfam domain within the sequence
+            seq[(int(row[3])-1):int(row[4])])) #only use the range of the pfam domain within the sequence

```NameError: name 'genbankDict' is not defined``` when the input gbk file is not detected with domain.

Dear developer,
I update bigscape to v1.1.7， I encountered an NameError: name 'genbankDict' is not defined when the input gbk file is not detected with domain. Even through I debug with genbankDict = {} at line 1294.
The command and log as below. The issue was only occured with any of input gbk without detection of domain.

(bigscape1.1.7) [yut@node06 TestData]$ time bigscape.py  -i bigscape_input/ -o bigscape_result
... 
Starting with 20 files
 Files that had its sequence extracted: 0

Creating output directories
 SVG folder already exists
 Networks folder already exists

Trying threading on 128 cores

Predicting domains using hmmscan
 All fasta files had already been processed
 Finished generating domtable files.

Parsing hmmscan domtable files
 Warning! The following domtable files had not been processed: MAG_1908QYC2_metabat2_bin.49~QYC2.1908_Scaff0430441.region001
  No domains where found in MAG_1908QYC2_metabat2_bin.49~QYC2.1908_Scaff0430441.region001.domtable. Removing it from further analysis
Traceback (most recent call last):
  File "/datanode02/yut/Software/BiG-SCAPE-1.1.7/bigscape.py", line 3333, in <module>
    main()
  File "/datanode02/yut/Software/BiG-SCAPE-1.1.7/bigscape.py", line 2522, in main
    parseHmmScan(domtableFile, pfd_folder, pfs_folder, options.domain_overlap_cutoff)
  File "/datanode02/yut/Software/BiG-SCAPE-1.1.7/bigscape.py", line 1294, in parseHmmScan
    info = genbankDict.get(outputbase)
NameError: name 'genbankDict' is not defined

the MAG_1908QYC2_metabat2_bin.49~QYC2.1908_Scaff0430441.region001.gbk as below

(bigscape1.1.7) [yut@node06 TestData]$ cat  bigscape_input/MAG_1908QYC2_metabat2_bin.49~QYC2.1908_Scaff0430441.region001.gbk
LOCUS       QYC2.1908_Scaff0430441  1719 bp    DNA     linear   UNK 01-JAN-1980
DEFINITION  QYC2.1908_Scaff0430441.
ACCESSION   QYC2.1908_Scaff0430441
VERSION     QYC2.1908_Scaff0430441
KEYWORDS    .
SOURCE
  ORGANISM
            .
COMMENT     ##antiSMASH-Data-START##
            Version      :: 7.1.0
            Run date     :: 2023-12-05 23:53:54
            NOTE: This is a single region extracted from a larger record!
            Orig. start  :: 0
            Orig. end    :: 1719
            ##antiSMASH-Data-END##
FEATURES             Location/Qualifiers
     protocluster    1..1719
                     /aStool="rule-based-clusters"
                     /category="other"
                     /contig_edge="True"
                     /core_location="[139:1519]"
                     /cutoff="30000"
                     /detection_rule="NasY"
                     /neighbourhood="30000"
                     /product="acyl_amino_acids"
                     /protocluster_number="1"
                     /tool="antismash"
     proto_core      140..1519
                     /aStool="rule-based-clusters"
                     /tool="antismash"
                     /cutoff="30000"
                     /detection_rule="NasY"
                     /neighbourhood="30000"
                     /product="acyl_amino_acids"
                     /protocluster_number="1"
     cand_cluster    1..1719
                     /candidate_cluster_number="1"
                     /contig_edge="True"
                     /detection_rules="NasY"
                     /kind="single"
                     /product="acyl_amino_acids"
                     /protoclusters="1"
                     /tool="antismash"
     region          1..1719
                     /candidate_cluster_numbers="1"
                     /contig_edge="True"
                     /product="acyl_amino_acids"
                     /region_number="1"
                     /rules="NasY"
                     /tool="antismash"
     CDS             140..1519
                     /gene_functions="biosynthetic (rule-based-clusters)
                     acyl_amino_acids: NasY"
                     /gene_kind="biosynthetic"
                     /locus_tag="ctg514_1"
                     /sec_met_domain="NasY (E-value: 4.9e-17, bitscore: 48.0,
                     seeds: 100, tool: rule-based-clusters)"
                     /transl_table=11
                     /translation="MKKLRLSLLRRLLRRWSRSLLAALPESVRFAFYRRMADLKPVSSG
                     RLELKIAHTQEELTACFGLLHDAYVSSGFMRPHPSGLRVTPYHALPTTTTLCAKVDGVV
                     VGTISIIREGVFGFPMQSAFDISGVRAKDGRIAEISALAIHPRWRKTGGSILFPLMKFM
                     YGYCTRYFDTRHLVIAVNPAHIEMYESLLFFRRLTANVVEHYDFVNGAPAVGATLDLHE
                     APELFKRAYEHKPGRRNLHRYFTETELPEITYPPRPWHTSNDPMLTPELLDHFFHEHTA
                     FNDLDDRRQSLLHSIYREPEWARVLPKLAPAAAAGVNLRRETRYSMSCPATLVASDQPA
                     NAQRVTIVEISEHGFLARTSRALPDGTQWSLVAELAEGVFSRGQVKLVRQARSDSGQSY
                     GFHIENPDDAWRRCVAWLDEAEQLPTSADIDADPAAPTALRTAAMSGSRELLSRRAHEC
                     V"
ORIGIN
        1 caagccgagg tcgctcagct catgagcagc ctgtaactct ttctttcaag tccagggttg
       61 cccgcggtca agcctaagct gccgcagtcg aaatcaagga cgagggcgga cagcccccgc
      121 accaaagagc cggcaaccca tgaagaaatt gaggctttcc ttgctgcggc gattgcttcg
      181 ccgctggtcc aggtcgctgc tggctgcttt gcccgagtcc gtacgatttg ctttctaccg
      241 gcgcatggcg gacctcaagc ccgtcagcag cggtcggctg gaactgaaga ttgcgcacac
      301 tcaggaagag ctgacggcct gcttcggcct gcttcatgac gcttacgtga gcagcggctt
      361 catgcgtccc catccctcag gcctgcgggt gacgccgtac cacgctttgc cgacgaccac
      421 gacactttgt gccaaggtcg atggcgtggt ggtcggcacg atctcgatca tccgcgaagg
      481 cgttttcgga ttcccgatgc agtcagcctt cgacatctca ggcgtgcgtg ccaaggacgg
      541 ccgcattgcc gagatatctg cactggccat ccacccgcgc tggcgcaaga cgggcgggtc
      601 gatcctgttc ccattgatga agttcatgta cgggtactgc acgcgatatt tcgacacccg
      661 ccatctggtg atcgccgtga accccgcgca catcgagatg tacgagtcgt tgctattttt
      721 ccgccgcttg acagccaacg tggtggaaca ctacgacttc gtcaacggcg cacctgctgt
      781 gggcgcgacg ctggacctcc acgaggcgcc tgaactcttc aaacgcgcct atgaacacaa
      841 gccaggccgg cgaaatctgc accgttattt cacagagaca gagttgccgg aaatcaccta
      901 cccgcccagg ccctggcaca ccagcaacga cccgatgttg acgccagagt tgctggatca
      961 cttcttccac gaacacacgg cgttcaacga cctcgatgac cggcgccaaa gcctgctgca
     1021 ctccatctac cgtgagcccg agtgggccag agtgctgccc aaactggcgc cggctgcagc
     1081 cgccggcgtc aacttgagac gcgaaacgcg ctattcgatg agttgccctg ccacgctggt
     1141 ggcctcggat cagcccgcga atgcgcagcg ggtcaccatc gtcgaaatct ccgaacatgg
     1201 ctttctggcc agaacgtccc gtgcgctgcc tgacggcacc caatggagcc tggtggccga
     1261 actcgccgag ggcgtgttca gccgcggcca ggtcaagctc gtgcggcagg ccagaagcga
     1321 cagcggacag agctacggtt tccatatcga gaaccccgac gacgcctggc gacgctgcgt
     1381 cgcttggttg gatgaggcgg aacaattgcc gaccagcgct gatatcgacg cagatccagc
     1441 cgcgccgact gcgctgcgaa ccgcggcaat gtcaggaagt cgcgagctat tgtcccggcg
     1501 ggcacacgag tgcgtttgat tggcaccgcc cctctgccac gggctgagga accctggccg
     1561 tgccgattga tcgaattgtt gcggtccaga actgagtgca ggccttgacc gccgaagcgc
     1621 cgggccgtcg atatgccttt tcccgcaccg cagcgcgaga ccgacaacgt cgggacgatc
     1681 tcccgacaac cggtctggca aggttttgaa tgttgtgtc
//

I am looking forward you reply.

The error "did not find file MIBiG_1.4_final.zip"

Hi, I met the error "did not find file MIBiG_1.4_final.zip". Could you please help me figure out a possible solution?

IndexError when running --mix flag

Running BiG-SCAPE but getting an IndexError when adding the '--mix' flag.

Below is the submission script used which works fine without the --mix flag but produces the error message below when its added. Any ideas what might be happening?

Thanks,
Sam

#!/bin/bash

#SBATCH --job-name=BiG-SCAPE_fulltest_110522
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=10:00:00
#SBATCH --mem=10000M

# Change to the directory you submitted the job from
cd "${SLURM_SUBMIT_DIR}"

# What host, time and directory is the jobID running from
echo Running on host "$(hostname)"
echo Time is "$(date)"
echo Directory is "$(pwd)"
echo Slurm job ID is "${SLURM_JOBID}"
echo This jobs runs on the following machines:
echo "${SLURM_JOB_NODELIST}"

# Add miniconda
module add languages/miniconda/3.9.7

# Activate the BiG-SCAPE enviroment
source activate bigscape

# Run BiG-SCAPE
python ./BiG-SCAPE/bigscape.py -i ALL_BGC -o output_BGC --pfam_dir Pfam-A --mibig --mix

Mix (2314 BGCs)
  Calculating all pairwise distances
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 6 times)
Ignored unknown character Z (seen 3 times)
/user/home/sw17073/.conda/envs/bigscape/lib/python3.9/site-packages/sklearn/cluster/_affinity_propagation.py:250: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
  warnings.warn(
generate_network took 534.260 seconds
   Removing 1693 non-relevant MIBiG BGCs
  Writing output files
  Calling Gene Cluster Families
  Cutoff: 0.3
Traceback (most recent call last):
  File "/mnt/storage/scratch/sw17073/bigscape/./BiG-SCAPE/bigscape.py", line 3065, in <module>
    family_data = clusterJsonBatch(mix_set, pathBase, "mix", reduced_network, pos_alignments,
  File "/mnt/storage/scratch/sw17073/bigscape/./BiG-SCAPE/bigscape.py", line 1771, in clusterJsonBatch
    clanLabels = [familyIdx[exemplarsClans[labelsClans[i]]] for i in range(len(familyIdx))]
  File "/mnt/storage/scratch/sw17073/bigscape/./BiG-SCAPE/bigscape.py", line 1771, in <listcomp>
    clanLabels = [familyIdx[exemplarsClans[labelsClans[i]]] for i in range(len(familyIdx))]
IndexError: list index out of range

index.html won't finish loading data

Hi! Thanks for this great software, and sorry if this has already been answered, but I can't find any similar issues open or closed.

I ran BiG-SCAPE against my antiSMASH outputs (around 6k BGCs), which apparently ran successfully. However, when opening the index.html in my browser, the home page shows loading data and it never finishes. I am able to open the network pages for each of the classes, so the problem is only on the overview page.

Thanks in advance for your help!

bump version number?

https://github.com/medema-group/BiG-SCAPE/blame/b72495759af402e3e07b3954269ca9bdf1ac52f8/setup.py#L20

should this be 1.1.7 per the version released?

error: TypeError: 'NoneType' object is not subscriptable

command line: python bigscape.py -i bgc_contigs_all_10k -o bgc_contigs_all_10k_bigscape_mix --mix

error msg:
generate_network took 57.462 seconds
Writing output files
Calling Gene Cluster Families
Cutoff: 0.3
Traceback (most recent call last):
File "bigscape.py", line 3042, in
clanCutoff=options.clan_cutoff, htmlFolder=network_html_folder)
File "bigscape.py", line 1459, in clusterJsonBatch
labels[bgcExt2Int[bgcSub2Ext_[i]]] = bgcExt2Int[bgcSub2Ext_[exemplarsSub[labelsSub[i]]]]
TypeError: 'NoneType' object is not subscriptable

Stopped at Hmmalign step

Hi!

I'm trying to run a new database (22k regions), and everytime I do it the kob dies at the hmmalign step, but it gives no error (as you can see in the image)

I tried to run with other databases and it all works ok. And I tried to run this databasein two different computadors and the hmmalign gets killed in both.

Do you have any idea of what may be the issue? It can be do the size of the database?

Thanks!

Add official singularity image

We would like to use BiG-SCAPE either using Docker or, preferably, singularity. The latest release docker image is not working for us. I understand is under development (see #25 ). There are images in dockerhub such as https://hub.docker.com/r/aflatoxing/bigscape which are not up to date.

I have created a singularity image for the latest available version of bigscape, following the instructions given in this issue #22 but using as source code the latest release.

My final singularity image can be found here:
https://cloud.sylabs.io/library/currocam/default/bigscape

This image is working for us in our server. I would appreciate if you could offer an official image to use (maybe using this one as a reference). Also, I don't understand why the dockerfile image refers to the latest version of the project instead of a release. Doesn't it make it less reproducible and more susceptible to bugs?

Thank you in advance.

How to reproduce sif file

This is the our bigscape.def file. It is based os this file:

https://singularityhub.github.io/singularityhub-archive/containers/ISU-HPC-big-scape-latest/

Bootstrap: docker
from: continuumio/miniconda3
%labels
MAINTAINER [email protected]
%post
apt-get update -y
apt install -y wget unzip
export PATH=/opt/conda/bin:$PATH
echo 'export PATH=/usr/local/bin:/opt/conda/bin:$PATH' >>$SINGULARITY_ENVIRONMENT
conda install -y numpy scipy scikit-learn
conda install -c bioconda hmmer biopython fasttree anaconda networkx
cd /usr/src
wget https://github.com/medema-group/BiG-SCAPE/archive/refs/tags/v1.1.5.zip
unzip v1.1.5.zip
rm v1.1.5.zip
mv BiG-SCAPE-1.1.5 BiG-SCAPE
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/Pfam-A.hmm.gz
gunzip Pfam-A.hmm.gz && hmmpress Pfam-A.hmm && mv Pfam-A.* /usr/src/BiG-SCAPE/.
chmod +x /usr/src/BiG-SCAPE/*.py
echo 'export PATH=/usr/src/BiG-SCAPE:$PATH' >>$SINGULARITY_ENVIRONMENT
mkdir -p /local/scratch
chmod 777 /local/scratch
mkdir -p /local/scratch/input /local/scratch/output

I build the image by running

singularity build --sandbox --fakeroot bigscape bigscape.def

Then, I edit the source code as indicated in #22 and build the sif file

singularity build --fakeroot bigscape_singularity.sif bigscape

We are able to run the program using

singularity run bigscape_singularity.sif python /usr/src/BiG-SCAPE/bigscape.py --help

The inconsistent results in "mix_clustering_c0.30" and "mix_c0.30.network"

Hi,
BiG-SCAPE is a very useful tool! However, I found inconsistent results in output files ("mix_clustering_c0.30" and "mix_c0.30.network"). For eaxmple, BGC1, BGC2 and BGC3 belong to same GCF in the "mix_clustering_c0.30", but I cannot find the relationship between BGC1 and BGC3, or between BGC2 and BGC3 in the "mix_c0.30.network". Only find the relationship btween BGC1 and BGC2. This is necessary for visualization of the network.
Errors occurred in identifying GCF?

Thanks,
Rubing

bigscape v1.1.5 runtime error (missing 1 required positional argument: 'output_folder')

I downloaded and installed the package from https://github.com/medema-group/BiG-SCAPE/archive/refs/tags/v1.1.5.zip and I ran into the following error after the command

bigscape.py --inputdir ./bigscape --outputdir example_output --pfam_dir ./multismash/pfam

The inputs to BiG-SCAPE are twelve .gbk files output from an antismash6 run of three E. coli genomes that are contained in the "bigscape" folder. Not sure what I am doing wrong here...

Originally posted by @alpole23 in #90 (comment)

Update of the BiG-SCAPE bioconda recipe

Hello,

currently, I am working on creating a wrapper for BiG-SCAPE, so this tool can be added to Galaxy (https://usegalaxy.org), a bioinformatic tool collection. Before I create a PR, I wanted to ask if these changes are fine for you (https://github.com/bioconda/bioconda-recipes/compare/master...SantaMcCloud:bioconda-recipes:conda_recipes_update?expand=1).

I only did change the version to the newest version. I did try to work with the current recipe, but after installing it I got a bug where the fasta dir could not be used in the functions. After trying out the current version, this bug doesn't appear and the tool function fine now!

Incorrect number of genomes detected in overview of output

An incorrect number of genomes seems to be captured in the index HTML output. Thus, this affects one of the pie charts that is generated. In one case, when using one assembled genome with 11 BGCs, the overview in the index.html file incorrectly said that 11 genomes were used.
In another case, I used 15 assembled genomes but the overview page in the index html file said 118. The input is as indicated in the tutorial.

Conda build failing

https://dev.azure.com/bioconda/bioconda-recipes/_build/results?buildId=14990&view=logs&j=e14e69ff-a0ae-55c4-b71d-229b239cfb2f&t=7df82132-b284-504b-53d6-7d3e63519572&l=1129

Because of the python packaging

Filter, warn or quit if input GenBank is too large

Catch large loci that were used as input by mistake (e.g. using the whole genome).

Possible actions:

Warn user
Filter out
Stop

Error while running BiG-SCAPE

Hi,

I installed BiG-SCAPE (v1.1.5) using conda and trying to use this on my antismash output. However, the execution of the program halts with the following error:

launch_hmmalign took 13.121 seconds
Trying to read domain alignments (*.algn files)
Traceback (most recent call last):
File "/home/myuser/anaconda3/envs/bigscape/bin/bigscape.py", line 2835, in
dir_util.copy_tree(os.path.join(os.path.dirname(os.path.realpath(file)), "html_template", "output"), output_folder)
File "/home/myuser/anaconda3/envs/bigscape/lib/python3.7/site-packages/setuptools/_distutils/dir_util.py", line 139, in copy_tree
raise DistutilsFileError("cannot copy tree '%s': not a directory" % src)
distutils.errors.DistutilsFileError: cannot copy tree '/home/myuser/anaconda3/envs/bigscape/bin/html_template/output': not a directory

I am using the following command to run the analysis python3 bigscape.py -i gbk/ -o analysis/ --pfam_dir ./ -c 16

Can someone please help me with this?

Memory error

I recently tried to use bigscape to run a very large data set, about 280,000 BGCs, and it will prompt 'memory error' when finally calculating the distance, which seems to be the problem of insufficient memory. I tried to use 2T of memory to run it, but it still hasn't been solved. Or does bigscape support multi-node collaboration? This will expand my inventory.

Singularity Read-only file system Error

Hi,

Thanks for a nice tool. I am using an HPC environment to execute the software. I am using Singularity instead of Docker to run the software. Here is the line of code used:

singularity run ~/SingularityImages/bigscape.sif -i 00-Data/AntiSMASHgbk/ -o 01-BigScape -c 16 --include_gbk_str '*'

However, I found and issue when running the software

Error:

Traceback (most recent call last):
  File "/usr/src/BiG-SCAPE/bigscape.py", line 2755, in <module>
    SVG(False, os.path.join(svg_folder,bgc+".svg"), handle, bgc, os.path.join(pfd_folder,bgc+".pfd"), True, color_genes, color_domains, pfam_domain_categories, pfam_info, bgc_info[bgc].records, bgc_info[bgc].max_width)
  File "/usr/src/BiG-SCAPE/ArrowerSVG.py", line 663, in SVG
    with open(domains_color_file, "a") as color_domains_handle:
OSError: [Errno 30] Read-only file system: '/usr/src/BiG-SCAPE/domains_color_file.tsv'

This issue could not be avoided by using the argument --writable provided by singularity.

-w, --writable               by default all Singularity containers are
                             available as read only. This option makes
                             the file system accessible as read/write

Error:

singularity run --writable ~/SingularityImages/bigscape.sif -i 00-Data/AntiSMASHgbk/ -o 01-BigScape -c 16 --include_gbk_str '*'
FATAL:   no SIF writable overlay partition found in /faststorage/home/agomez/SingularityImages/bigscape.sif

If possible, could you help me to modify the code to run bigscape.py without suffering this issue? Thanks very much!!

Affinity propagation did not converge

When I used Bigscape to process tasks, my data was approximately 50000 BGCs,the command line I am using is (nohup python3 /path/BiG-SCAPE-1.1.5/bigscape.py -i /path/output -o /path/res --mode auto --mibig &), and the following error occurred, which I cannot understand and resolve:

Working for each BGC class
Sorting the input BGCs

PKSI (14928 BGCs)
Writing annotation files
Calculating all pairwise distances
Ignored unknown character X (seen 2 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 3 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 3 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 4 times)
Ignored unknown character X (seen 4 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 4 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
/home/XXX/miniconda3/envs/bigscape/lib/python3.11/site-packages/sklearn/cluster/_affinity_propagation.py:142: ConvergenceWarning: Affinity propagation did not converge, this model may return degenerate cluster centers and labels.
warnings.warn(

MIBiG t2pks BGCs not grouping into families

I ran BiG-SCAPE on the following MiBIG clusters: BGC0000187, BGC0000194 through BGC0000198, BGC0000213, BGC0000220, BGC0000247, BGC0000269, BGC0000275, BGC0001851, BGC0002045 which all appear as type II polyketide synthases when searching on MIBiG. I downloaded the cluster gbk files for each of these and then ran them on BiG-SCAPE. Despite the fact that these are all type II PKS BGCs, the output did not group any of them into shared families.

I wanted to know if there was a different input command that I could have used or if there was a different reason that these clusters would not be grouping together. I'm attaching the full run log in case that is helpful.

Thanks!

python3 bigscape.py -i mibig_gbks -o mibig_output

   - - Processing input files - -
 Output folder already exists
 Logs folder already exists
 Cache folder already exists
 BGC fastas folder already exists
 Domtable folder already exists
 Domains folder already exists
 pfs folder already exists
 pfd folder already exists
 Including files with one or more of the following strings in their filename: 'cluster', 'region'
 Skipping files with one or more of the following strings in their filename: 'final'

Importing GenBank files
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: Input set has files with no Biosynthetic Genes (affects alignment mode)
   See no_biosynthetic_genes_list.txt

 Starting with 13 files
 Files that had its sequence extracted: 13

Creating output directories
 SVG folder already exists
 Networks folder already exists

Trying threading on 16 cores

Predicting domains using hmmscan
 Predicting domains for 13 fasta files
 Finished generating domtable files.

Parsing hmmscan domtable files
 Processing 13 domtable files
 New domain sequences to be added; cleaning domains folder
 Finished generating pfs and pfd files.

Processing domains sequence files
 Adding sequences to corresponding domains file
 Reading the ordered list of domains from the pfs files
 Creating arrower-like figures for each BGC
  Parsing hmm file for domain information
    Done
  Found file with domains colors
  Reading BGC information and writing SVG
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/Bio/SeqFeature.py:230: BiopythonDeprecationWarning: Please use .location.strand rather than .strand
  warnings.warn(
 Finished creating figures


   - - Calculating distance matrix - -
Performing multiple alignment of domain sequences

 Using hmmalign
launch_hmmalign took 3.448 seconds
 Trying to read domain alignments (*.algn files)

Generating distance network files with ALL available input files
   Writing the complete Annotations file for the complete set
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'

 Working for each BGC class
  Sorting the input BGCs

  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'

  Others (13 BGCs)
   Writing annotation files
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
   Calculating all pairwise distances
generate_network took 0.064 seconds
   Writing output files
  Calling Gene Cluster Families
  Cutoff: 0.3
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/sklearn/cluster/_affinity_propagation.py:52: UserWarning: All samples have mutually equal similarities. Returning arbitrary cluster center(s).
  warnings.warn(
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'

BGCs missing in network

my input data include 5 BGCs，but only 3 present in Network. What happen to the other 2 ?

antiSMASH 7 support

Hi,
Can BiG-SCAPE be run using antismash version 7 output?
Thanks

dataclasses module not installed for python=3.6

Hi,

I am installing BiG-SCAPE from releases tarball. Version 1.1.4.

The created conda environment using environment.yml itself does miss a dependency: dataclasses.

I got an error when running bigscape.py:

Traceback (most recent call last):
  File "appBuilds/BiG-SCAPE-1.1.4/bigscape.py", line 75, in <module>
    import networkx as nx
  File "/opt/anaconda3/envs/bigscape/lib/python3.6/site-packages/networkx/__init__.py", line 81, in <module>
    from networkx import algorithms
  File "/opt/anaconda3/envs/bigscape/lib/python3.6/site-packages/networkx/algorithms/__init__.py", line 81, in <module>
    from networkx.algorithms import tree
  File "/opt/anaconda3/envs/bigscape/lib/python3.6/site-packages/networkx/algorithms/tree/__init__.py", line 1, in <module>
    from .branchings import *
  File "/opt/anaconda3/envs/bigscape/lib/python3.6/site-packages/networkx/algorithms/tree/branchings.py", line 30, in <module>
    from dataclasses import dataclass, field
ModuleNotFoundError: No module named 'dataclasses'

The dataclasses module is built-in only from python3.7. My networkx version (as installed by conda using environment.yml on 20220623) is 2.7.1.

This problem is caused by conda during solving dependencies for networkx.

Possible solutions here are:

Use a fixed lower version of networkx.
Include dataclasses in environment.yml
Bump python version to 3.7.

How to update Big-scape

About two years ago, I installed big-scape 1.1.2 as a docker image on my Ubuntu server. Mibig-2.1 is the default database being used by it. Mibig-3.1 is now available. How can I use the freshly released mibig database by updating my big-scape?

NameError: name 'bgc_fasta_folder' is not defined

Hi developers,

I am running into an issue with BiG-SCAPE v1.1.5. I installed it into a conda environment and tried to run using some test output from antismash. The inputs to BiG-SCAPE are twelve .gbk files output from an antismash6 run of three E. coli genomes that are contained in the "bigscape" folder. I uninstalled and reinstalled BiG-SCAPE using the conda package, but I keep running into the bgc_fasta_folder is not defined error when I run the following command line prompt.

bigscape --inputdir ./bigscape --outputdir example_output --pfam_dir ./multismash/pfam

Any help on fixing the error is appreciated. Thanks for your time.

version 1.1.0 takes antiSMASH5 result?

Hi, I am wondering if BiG-SCAPE 1.1.0 can take the BGC files from antiSMASH 5.1 instead of 6.0? I want to use the MiBIG 2.0 reference dataset from version 1.1.0 but do not want to rerun my data with antiSAMSH 6.0. Thanks!

MIBiG subnetworks not pruned

If multiple cutoff values are chosen, and if MIBiG BGCs are connected with input regions in the higher-value cutoffs, they will be kept, even if they form separate subnetworks and should be pruned in lower-valued cutoffs

sklearn is deprecated

The requirements.txt file still lists sklearn instead of scikit-learn (see https://pypi.org/project/sklearn/).

distance calculation

Hi，
Thanks for the development of such an amazing tool.
I am curious about the distance calculation of BGC pair. Where could I find the calculation formula?
Thanks.

Network_Annotations_Full.tsv result

Hello, I used the results obtained from the antismash software as input to a specified folder and got the Network_Annotations_Full.tsv file. Why does the BGC column in the table show results like BGC0000001, BGC0000002? Shouldn't it output the names of the gbk files I input, such as ABTEK_DN1_1.region001?I have another question. Hello, I was able to open the webpage after switching to a different browser, but I have another question. Why does it only show one specified genome? I have 45 folders under the antismash_res directory, each representing a different genome. The input directory is the result folder obtained from Antismash.

Archive tar.gz for release 1.1.3 ?

Thank you for adding the setup.py file !

Could you please create a tag for the new release 1.1.3 in the repo (https://github.com/medema-group/BiG-SCAPE/tags) in order to have a tar.gz archive, that is needed to build the Bioconda package.

Thanks!

Loraine

cannot view index.html and tutorial files won't unzip

Hello,
I was trying to run BiG-SCAPE for some antiSMASH runs but the index.html file is blank. This is my run output.

- Processing input files - -
  Including files with one or more of the following strings in their filename: 'cluster', 'region'
  Skipping files with one or more of the following strings in their filename: 'final'

Importing GenBank files
Warning: unknown product 'NI-siderophore'

Starting with 7 files
Files that had its sequence extracted: 7

Creating output directories

Trying threading on 16 cores

Predicting domains using hmmscan
Predicting domains for 7 fasta files
Finished generating domtable files.

Parsing hmmscan domtable files
Processing 7 domtable files
New domain sequences to be added; cleaning domains folder
Finished generating pfs and pfd files.

Processing domains sequence files
Adding sequences to corresponding domains file
Reading the ordered list of domains from the pfs files
Creating arrower-like figures for each BGC
Parsing hmm file for domain information
Done
Found file with domains colors
Reading BGC information and writing SVG
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/Bio/SeqFeature.py:230: BiopythonDeprecationWarning: Please use .location.strand rather than .strand
warnings.warn(
Finished creating figures

- Calculating distance matrix - -
  Performing multiple alignment of domain sequences

Using hmmalign
launch_hmmalign took 1.029 seconds
Trying to read domain alignments (*.algn files)

Generating distance network files with ALL available input files
Writing the complete Annotations file for the complete set

Working for each BGC class
Sorting the input BGCs

Warning: unknown product 'NI-siderophore'

RiPPs (2 BGCs)
Writing annotation files
Calculating all pairwise distances
generate_network took 0.052 seconds
Writing output files
Calling Gene Cluster Families
Cutoff: 0.3
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/sklearn/cluster/_affinity_propagation.py:52: UserWarning: All samples have mutually equal similarities. Returning arbitrary cluster center(s).
warnings.warn(

Others (3 BGCs)
Writing annotation files
Calculating all pairwise distances
generate_network took 0.054 seconds
Writing output files
Calling Gene Cluster Families
Cutoff: 0.3
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/sklearn/cluster/_affinity_propagation.py:52: UserWarning: All samples have mutually equal similarities. Returning arbitrary cluster center(s).
warnings.warn(

NRPS (3 BGCs)
Writing annotation files
Calculating all pairwise distances
generate_network took 0.052 seconds
Writing output files
Calling Gene Cluster Families
Cutoff: 0.3
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/sklearn/cluster/_affinity_propagation.py:52: UserWarning: All samples have mutually equal similarities. Returning arbitrary cluster center(s).
warnings.warn(

Terpene (1 BGCs)
Writing annotation files
Calculating all pairwise distances
generate_network took 0.042 seconds

PKSother (1 BGCs)
Writing annotation files
Calculating all pairwise distances
generate_network took 0.045 seconds

Saccharides (1 BGCs)
Writing annotation files
Calculating all pairwise distances
generate_network took 0.046 seconds

Main function took 26.904 s

I wasn't sure if this was an issue with my files so I tried following the tutorial but cannot unzip the genome files, as I get this error.
All samples have mutually equal similarities. Returning arbitrary cluster center(s).

Please let me know if any other information would be useful for troubleshooting either of these issues!

Python version in environment.yml outdated?

Hi there,

first up thanks for maintaining this tool.
This is not strictly speaking an issue but something you may want to be aware of. I have run into a small issue when following the install instructions on your Wiki. The conda environment was created without problems using conda 4.9.2 but when running the python bigscape.py --version command, I would receive the following error

ModuleNotFoundError: No module named 'dataclasses'

Looking at possible solutions, I found this issue fastai/fastai#867, which suggests it is related to the python version (3.6) specified in the environment.yml file. After resolving some conflicts that conda wasn't happy with, I have arrived at the following environment.yml file. Ignore the altered name.

name: bigscape_update
channels:
  - conda-forge
  - bioconda
dependencies:
  - python=3.10.2
  - hmmer
  - biopython=1.79
  - fasttree
  - numpy
  - scipy
  - networkx
  - scikit-learn=1.0.2

Upon completing the build and running the test command, it seems to be functioning except for a deprecation warning.

(bigscape_update)$ python ./bigscape.py --version
/nfs/home/nowakvi/.conda/envs/bigscape_update/lib/python3.10/site-packages/Bio/SubsMat/__init__.py:126: BiopythonDeprecationWarning: Bio.SubsMat has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.substitution_matrices as a replacement, and contact the Biopython developers if you still need the Bio.SubsMat module.
  warnings.warn(
BiG-SCAPE 1.1.2 (2021-06-03)

To the best of my knowledge, this shouldn't be a problem but I have two questions:

Would you expect the version changes I made to affect functionality?
Do you have an alternative solution to solving this error?

Best regards,
Vincent

medema-group / big-scape Goto Github PK

big-scape's People

Contributors

Stargazers

Watchers

Forkers

big-scape's Issues

How to reproduce sif file

Recommend Projects

Recommend Topics

Recommend Org