Comments (1)
Hi,
I am sorry that you have bad feelings about our computational toolbox HGTphyloDetect. But some of your comments are too harsh for us.
Firstly, let's check the title of your comments: "Don’t waste your time, software not usable in current form". If you check the recent published paper Long-read genome sequencing provides novel insights into the harmful algal bloom species Prymnesium parvum in the journal Science of The Total Environment (Impact factor is about 10), you could know that there are already some scientists using our toolbox HGTphyloDetect in their own research.
Then, I also want to explain something about your listed issues (not one by one). For example:
Currently, the input protein accessions should come from the widely used GenBank database. If you check our pipeline, the protein id is important for us because we need to retrieve taxonomy information from there. And the GenBank database has contained 247777761 specific sequences (version 258, Oct 2023) maintained by NIH, which is a super cool resource updated since Dec 1982. I believe the current version can satisfy many many users in the community.
Right now, Blastp was run by using NCBI remote blastp function. As you may know, there are mainly two approaches that can be used to run Blastp. The first one is NCBI Blastp, users don't need to download the large NCBI database if they choose the remote Blastp, it is convenient. The second one is Diamond. While it is very fast, users need to download very large database before using the software. Actually, I also tried this option before, I need to spend more than 24 hours to download database if I remember correctly.
For the taxonomy information parsing, I used the toolkit ETE (http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html), which is very fast and very convenient! Besides, I also added the "try, except" block in the script to make sure the not found entries would not throw an error.
Although the software in running HGT prediction is not very quick, the accuracy and sensitivity are quite good by using some specific species as case studies. And we have made some descriptions in the discussion part in our published paper.
As you listed, particular directory structure is needed in HGTphyloDetect. However, we also provided detailed documentation about it. Especially, we provided detailed examples about how to run those steps one by one and what is the output.
Let's look forward to checking the future version of our software, hopefully you and many other people would like it! Also thanks for your feedback!
from hgtphylodetect.
Related Issues (14)
- blastp file in phylogenetics HOT 2
- An issue regrading duplication HOT 1
- Problem with running HGTphyloDetect HOT 2
- Using "from ete3 import NCBITaxa" cannot access NCBI HOT 1
- HGTPhyloDetect : Not working with GenBank Accession ID HOT 5
- What should I do when my query and target protein does not belong to the database ? HOT 1
- i am getting error while running HGT_workflow HOT 4
- Question about input file HOT 1
- New protein used for HGT HOT 1
- "NCBI database not present yet (first time used?)" followed by an error HOT 2
- How to modify the code in HGT_workflow.py to define the kingdom and subphylum based on my input protein. HOT 1
- Gene accession id not recognized. HOT 1
- the format of example, "YOL164W.txt" is not consistent HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hgtphylodetect.