Comments (6)
Hi @haruosuz, that is a little odd. Typically poorly formatted GFF files are removed at the initial stage. Were your samples annotated with prokka? I would suggest you check the GFF file for removed sample to ensure it is formatted correctly and contains CDS/genes (i.e. is not empty). If it looks normal feel free to email me the and I will check to see if there is anything odd going on (perhaps include a handful of the files that worked as well as contrasts).
from pirate.
Thank you for your reply. The 322 genomes were annotated with DFAST. Among the 322 GFF files, there isn't any empty file. In the <PIRATE.gene_families.ordered.tsv> file, there are 1415 rows and 344 columns (i.e., 344 - 22 = 322 genomes). Is there any way to identify which of the 322 GFF files was excluded from the <PIRATE.pangenome_summary.txt> file? This is suggested by # 1415 gene families in 321 genomes.
in the <PIRATE.pangenome_summary.txt> file.
from pirate.
You can check the headers in the PIRATE.gene_families.tsv file and compare them to your input sample list.
from pirate.
Thank you for your reply.
The following command did not produce any output, indicating that there is no difference between the genomes listed in the headers in the PIRATE.gene_families.tsv file and input sample list provided in the "genome_list.txt" file:
diff <(head -n 1 PIRATE.gene_families.tsv | tr "\t" "\n" | tail +21) <(cat genome_list.txt | sort)
The discrepancy in the numbers (322 vs. 321 genomes) remains unclear. Here are the commands and their outputs provided:
$ wc -l genome_list.txt
322 genome_list.txt
$ head -n 1 PIRATE.pangenome_summary.txt
# 1415 gene families in 321 genomes.
from pirate.
So it found all your input genome files but is saying there is an additional one at one internal step? Are you sure you don't have a line including just whitespace in the genome_list.txt file?
from pirate.
I ran PIRATE with 322 genomes (gff files) as input. While the <PIRATE.gene_families.tsv> file contains 322 genomes, the <PIRATE.pangenome_summary.txt> file indicates only 321 genomes.
The <genome_list.txt> file, generated by PIRATE, contains no whitespace, as shown below:
$ cat genome_list.txt | wc -l
322
$ cat genome_list.txt | grep -v "^$" | wc -l
322
from pirate.
Related Issues (20)
- error observed during "aligning all feature sequences" HOT 2
- Missing genome in output HOT 12
- Output gene sequences to run gene alignment separately HOT 4
- PIRATE_plots.pdf created by plot_summary.R HOT 1
- Error after MCL clustering step HOT 5
- How do you tell which gene families are single-copy or multi-copy? HOT 2
- Feature request: Option to include original IDs and annotations in fasta headers for align_features_sequences script HOT 2
- Average_dose =1 is appropriate to determine whether a gene family is a single copy? HOT 1
- - ERROR: link_clusters.pl failed. HOT 1
- Undefined subroutine &main::translate called HOT 2
- Error when running PIRATE MCL process
- For some single loci, a gene family but for others not. HOT 1
- problem in installation HOT 9
- Bump version in new release HOT 4
- Missing output files and coregenom files HOT 3
- Running on large dataset HOT 2
- stuck at threshold 60 during MCL clustering HOT 3
- understanding pirate results
- question on presente/absence gene table data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pirate.