This pipeline was not ready for deployment to the general public and saying it is easy

Don’t waste your time, software not usable in current form about hgtphylodetect HOT 1 CLOSED

sysbiochalmers commented on June 8, 2024 3

Don’t waste your time, software not usable in current form

from hgtphylodetect.

Comments (1)

le-yuan commented on June 8, 2024

Hi,

I am sorry that you have bad feelings about our computational toolbox HGTphyloDetect. But some of your comments are too harsh for us.

Firstly, let's check the title of your comments: "Don’t waste your time, software not usable in current form". If you check the recent published paper Long-read genome sequencing provides novel insights into the harmful algal bloom species Prymnesium parvum in the journal Science of The Total Environment (Impact factor is about 10), you could know that there are already some scientists using our toolbox HGTphyloDetect in their own research.

Then, I also want to explain something about your listed issues (not one by one). For example:

Currently, the input protein accessions should come from the widely used GenBank database. If you check our pipeline, the protein id is important for us because we need to retrieve taxonomy information from there. And the GenBank database has contained 247777761 specific sequences (version 258, Oct 2023) maintained by NIH, which is a super cool resource updated since Dec 1982. I believe the current version can satisfy many many users in the community.

Right now, Blastp was run by using NCBI remote blastp function. As you may know, there are mainly two approaches that can be used to run Blastp. The first one is NCBI Blastp, users don't need to download the large NCBI database if they choose the remote Blastp, it is convenient. The second one is Diamond. While it is very fast, users need to download very large database before using the software. Actually, I also tried this option before, I need to spend more than 24 hours to download database if I remember correctly.

For the taxonomy information parsing, I used the toolkit ETE (http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html), which is very fast and very convenient! Besides, I also added the "try, except" block in the script to make sure the not found entries would not throw an error.

Although the software in running HGT prediction is not very quick, the accuracy and sensitivity are quite good by using some specific species as case studies. And we have made some descriptions in the discussion part in our published paper.

As you listed, particular directory structure is needed in HGTphyloDetect. However, we also provided detailed documentation about it. Especially, we provided detailed examples about how to run those steps one by one and what is the output.

Let's look forward to checking the future version of our software, hopefully you and many other people would like it! Also thanks for your feedback!

from hgtphylodetect.

Don’t waste your time, software not usable in current form about hgtphylodetect HOT 1 CLOSED

Comments (1)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent