Comments (3)
Hello @KTbiotech,
The analysis/listgenes_core.txt
is created by copying the loci list of a threshold from the analysis/Genes_95%.txt
file that is created by TestGenomeQuality.
This file contains the list of loci that are present in 95% of the strains per threshold.
In the tutorial, we chose the loci list of any threshold from 60 to 195 since the number of loci is stable, as verified by the plot created with TestGenomeQuality.
Let us know if this answers your question.
Cheers,
Pedro
from chewbbaca_tutorial.
Dear Pedro,
same like KTbiotech.
how can I copy this Genes_95%.txt file to listgenes_core.txt file based on threshold?
for this analysis, do I need further script or just copy and change the file name??
if possible, could you give some detail for this procedure?
thank you in advance
from chewbbaca_tutorial.
Dear @KTbiotech and @bgka2009,
The Genes_95%.txt
file contains two columns: Threshold
and Present_genes
:
Threshold Present_genes
0 GCA-000007265-protein1.fasta GCA-000007265-protein10.fasta GCA-000007265-protein101.fasta
5 GCA-000012705-protein4139.fasta GCA-000012705-protein4161.fasta GCA-000196055-protein1567.fasta
...
195 GCA-001592385-protein969.fasta GCA-001592385-protein970.fasta GCA-001592385-protein971.fasta
You need to copy the list of files (Present_genes column) from the threshold chosen and paste it on a new file, one file per line, and name it listgenes_core.txt
.
If you simply copy and change the filename it will not work.
Below is a small Python3 snippet to create a file with the genes list from the desired threshold.
import csv
genes_95 = "path/to/Genes_95%.txt"
with open(genes_95, "r") as f:
genes_95_data = csv.DictReader(f, delimiter="\t")
for row in genes_95_data:
if row["Threshold"] == "[CHOSEN THRESHOLD]":
list_genes = row["Present_genes"].replace(" ", "\n")
output_dir = "path/to/output_dir/listgenes_core.txt"
with open(output_dir, "w") as out:
out.write(list_genes)
This snippet is just a suggestion, you may use whatever procedure you are most confortable with.
We will work on revising the tutorial instructions to make this and other issues more clear.
Please let us know if you were able to solve the issue.
Pedro
from chewbbaca_tutorial.
Related Issues (9)
- Missing training file for tutorial HOT 3
- About the prodigal HOT 6
- external scheme creation
- Error in AlleleCall tutorial HOT 12
- How to get the cgMLST_all.tsv HOT 2
- NotADirectoryError: [Errno 20] Not a directory: 'listgenes_core.txt/.schema_config' HOT 9
- Extracting core gene alignment HOT 4
- Mistake in the tutorial at Evaluate genome quality / missing step? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chewbbaca_tutorial.