Comments (5)
Hi,
I was able to replicate the error and fixed it in #266. Thanks for pointing that out!
The issue popped up because the pangenome doesn't have any spots. It turns out that your pangenome doesn’t have any persistent families, which is why no spots were detected. This happened because some of the GCA genomes used in the pangenome don’t have any CDS annotations (29 out of 75). You can see this clearly in the tile plot for example.
Working with GenBank annotations in ppanggolin can be a bit tricky since the annotations can be pretty inconsistent across different genomes, which affects ppanggolin's predictions. One way to avoid this is to use the fasta sequence of genomes and let ppanggolin handle the annotations, or you could stick to using RefSeq genomes when building the pangenome.
from ppanggolin.
Hi,
The bug fix is now included in version 2.1.1 .
Best
from ppanggolin.
Hi,
The problem is that you didn't find any spot in your pangenome and not in your genomes to project on. And it's explained here:
2024-08-14 16:54:10 projection.py:l711 INFO 1 RGPs have been predicted in the input genomes.
2024-08-14 16:54:10 projection.py:l1299 INFO Predicting spot of insertion in input genomes.
2024-08-14 16:54:11 spot.py:l122 INFO 1471 RGPs were not used as they are on a contig border (or have less than 3 persistent gene families until the contig border)
2024-08-14 16:54:11 spot.py:l124 INFO 0 RGPs are being used to predict spots of insertion
2024-08-14 16:54:11 spot.py:l126 INFO 0 number of different pairs of flanking gene families
It should not happen; we should pass the step if no spots are found.
We will fix this but need your data (genomes and pangenome) to reproduce the issue. I can send you a link to a secure repository where you can share everything.
Thanks
from ppanggolin.
Hello,
I think the dataset is mentioned on top of the logs.
You have created the pangenome with the 75 genomes listed in the log file and projected the sequence NZ_CP048437 from assembly GCF_010509575 right?
from ppanggolin.
Hi,
Sorry I was out of touch for a few days. Yes, the genomes are listed at the top of the log file, I am glad you were able to replicate the issue. Thank you for the work around and the fix :)
Best wishes,
from ppanggolin.
Related Issues (20)
- Clarification about the contents of `gene_to_gene_family.tsv ` from projection HOT 5
- product_string HOT 2
- Getting MSAs for single-copy gene families when duplicates are tolerated HOT 3
- Reading the gbff file error HOT 4
- ppanggolin msa --partition core HOT 6
- Writing gene-related data failed HOT 3
- error while writing genome annotations HOT 1
- ppanggolin projection: ValueError: The region is already with a different spot. HOT 2
- Add gene name info in the Tile Plot HOT 2
- 'ascii' codec HOT 3
- rarefaction curve : Population must be a sequence HOT 3
- Annotation error: gene coordinates exceeding contig length HOT 2
- conda installs old version HOT 2
- ValueError: The gene family has not beed associated to a partition. HOT 7
- Non-deterministic clustering (possibly due to multi-threading) HOT 4
- Segmentation Fault at Partition HOT 2
- Error from Pyton 3.12 HOT 6
- Let PPanGGOLIN keep running if partition step fails
- PPanGGOLiN version 2.1.1 - Failed building new version HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ppanggolin.