Comments (7)
Hi Rubing
The difference could be related to the different weights used to combine the distance indices. The weights are currently:
bgc_class_weight["PKSI"] = (0.22, 0.76, 0.02, 1.0)
bgc_class_weight["PKSother"] = (0.0, 0.32, 0.68, 4.0)
bgc_class_weight["NRPS"] = (0.0, 1.0, 0.0, 4.0)
bgc_class_weight["RiPPs"] = (0.28, 0.71, 0.01, 1.0)
bgc_class_weight["Saccharides"] = (0.0, 0.0, 1.0, 1.0)
bgc_class_weight["Terpene"] = (0.2, 0.75, 0.05, 2.0)
bgc_class_weight["PKS-NRP_Hybrids"] = (0.0, 0.78, 0.22, 1.0)
bgc_class_weight["Others"] = (0.01, 0.97, 0.02, 4.0)
bgc_class_weight["mix"] = (0.2, 0.75, 0.05, 2.0)
which correspond to Jaccard, Domain sequence identity, adjacency index and anchor boost (you could try taking the individual index values from the 'mix' class and recombining them with the weights of the class they belong to. Perhaps they don't make the cutoff value in that biosynthetic class?)
I think another factor could be the affinity propagation clustering algorithm, which might give different groupings.
Let me know if something else seems to have gone wrong, though!
from big-scape.
Thanks for your reply!
As describe above, this means the GCFs in "mix_clustering_c0.30" may not be right?
from big-scape.
Oh wait, I've just re-read your first comment. Do you mean that e.g. BGC2 and BGC3 don't have a connection in mix_c0.30.network and yet they are together in mix_clustering_c0.30? That is weird indeed... Would you be able to send me those two files? (jorge DOT navarromunoz AT wur.nl)
from big-scape.
ok, I have sent to your E-mail.
from big-scape.
Ahh, I see what you mean
I think I saw something similar to this a while ago... This is an artifact of the affinity propagation clustering, I'm afraid.
I'm working (slowly) on making BiG-SCAPE use a newer version of scikit-learn (from which we use affinity propagation). Hopefully this is something that has been addressed recently. But not much else that I can do now, unfortunately.
The only recommendation I can make now would be to use the connected components as GCFs
from big-scape.
ok, thanks. There are a series of indices between two BGCs. For example, Raw distance, Squared similarity, Jaccard index, DSS index, Adjacency index, raw DSS non-anchor, raw DSS anchor, Non-anchor domains and Anchor domains. Which one can be used as the criterion for identifying GCFs?
from big-scape.
Raw distance. The others are sub-components that lead to this value
from big-scape.
Related Issues (20)
- MIBiG 3.1 files have no Biosynthetic Genes HOT 2
- ```NameError: name 'genbankDict' is not defined``` when the input gbk file is not detected with domain. HOT 6
- How to install the latest Bigscape? HOT 4
- index.html network exprot HOT 2
- fix counterintuitive bigscape family number assignments? HOT 2
- Update of the BiG-SCAPE bioconda recipe HOT 3
- bump version number? HOT 3
- BGC_fasta_folder is not defined HOT 3
- Stopped at Hmmalign step HOT 6
- how can i install BiG-SCAPE
- Incorrect number of genomes detected in overview of output HOT 3
- cannot view index.html and tutorial files won't unzip HOT 6
- MIBiG t2pks BGCs not grouping into families HOT 4
- Random BGC families not showing in mixed class HOT 3
- index.html won't finish loading data HOT 6
- Network_Annotations_Full.tsv result HOT 2
- antiSMASH 7 support HOT 1
- ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers. HOT 2
- Package version number needs to be bumped to reflect github release version number HOT 2
- visualizing the singletons HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from big-scape.