Hello, I am trying to add some genomes from ncbi for which I have downloaded AA fasta

I would like to use all these species from ncbi "<a href="https://ftp.ncbi.nlm.nih.gov

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Did you run the test command for fas.doAnno ? <div

Addition of Taxa doesnt seems to be working with fdog.addTaxa about fdog HOT 8 CLOSED

bionf commented on September 27, 2024

Addition of Taxa doesnt seems to be working with fdog.addTaxa

from fdog.

Comments (8)

swttalyan commented on September 27, 2024 1

Now the with the separate steps it seems to be working.
Doing annotation for /workspace/fdog/searchTaxa_dir/LETRE@7753@240123/LETRE@[email protected]... 1%| | 244/35015 [00:55<1:05:03, 8.91it/s]
Lets see I will continue this way, yes I am using conda env and have FAS and fDOG both installed. Thanks, hopefully it runs smoothly after this for getting the phyloprofiles.

from fdog.

swttalyan commented on September 27, 2024

I would like to use all these species from ncbi "https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_other/"

from fdog.

trvinh commented on September 27, 2024

Hi @swttalyan,
I need more information in order to reproduce the issue. Since I have tested both the functions with my data and they still worked. I used the same command that you have tried: fdog.addTaxa -i Fasta_files -m Mapping_file.txt -c. Both -i Fasta_files or -i Fasta_files/ will work, but not -i Fasta_files/*.

vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ fdog.addTaxa -i Fasta_files -m Mapping_file.txt -c
WARNING: 9607 not found in NCBI taxonomy database!
WARNING: rank of 9605 is not SPECIES (genus)
WARNING: 9605 not found in NCBI taxonomy database!
Parsing genome for 2 species...
  0%|                                                                                                                                  | 0/2 [00:00<?, ?it/s]WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9607@9607@240119/UNK9607@[email protected] file for details!
WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9605@9605@240119/UNK9605@[email protected] file for details!
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 28.24it/s]

Creating Blast DB for 2 species...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.16it/s]
PID 2837391
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9607@9607@240119/UNK9607@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46/46 [00:07<00:00,  5.91it/s]
Finished in 8.059s
PID 2839315
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9605@9605@240119/UNK9605@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:08<00:00,  5.43it/s]
Finished in 8.564s
==> Adding 2 taxa finished in 20.164s
==> Output for UNK9605@9605@240119 can be found in /home/vinh/fdog_data_2023/searchTaxa_dir, /home/vinh/fdog_data_2023/coreTaxa_dir, /home/vinh/fdog_data_2023/annotation_dir

My Mapping_file.txt looks like this:

vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ head Mapping_file.txt
#filename	tax_id
test.extended.fa	9607
tr_Q90XQ7_DANRE.extended.fa	9605

And this is an example with fdog.addTaxon

vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ fdog.addTaxon -f Fasta_files/tr_Q90XQ7_DANRE.extended.fa -i 9999 -c
NCBI taxon info: 9999 Urocitellus parryii
Species name	UROPA@9999@240119
Parsing FASTA file...
WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected] file for details!

Creating Blast DB...


Building a new DB, current time: 01/19/2024 14:56:35
New DB name:   /home/vinh/fdog_data_2023/coreTaxa_dir/UROPA@9999@240119/UROPA@9999@240119
New DB title:  /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected]
Sequence type: Protein
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 45 sequences in 0.0176709 seconds.


PID 2846204
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:08<00:00,  5.49it/s]
Finished in 8.469s

==> Output for UROPA@9999@240119 can be found in /home/vinh/fdog_data_2023/searchTaxa_dir, /home/vinh/fdog_data_2023/coreTaxa_dir, /home/vinh/fdog_data_2023/annotation_dir

Which fDOG version are you using? Please make sure to install the latest one (v0.1.26).
Did you put all the _protein.faa into your Fasta_files folder? Does that folder contains only the fasta files that you mention in your Mapping_file.txt? How do you know that it doesn't read all the fasta files? Please show me the log file with the error message.

Best,
Vinh

from fdog.

swttalyan commented on September 27, 2024

Many thanks for your response, yes I am using the latest version of the fDOG(v0.1.26).
I managed to get it running, there are lot of conditions for naming of the fasta files as the name shouldn;t contain more than one "_" or some special character therefore I renamed all fasta file to protein1.faa, protein2.faa and so on. and it run until creating blast db and now I am getting the following error:

/home/user/annotation_tools/COILS2/COILS2 -f < "/workspace/fdog/annotation_dir/tmp/2745623/LETRE@7753@240123_XP_061403495_1.fa" Error running
for all the sequences and species.

from fdog.

swttalyan commented on September 27, 2024

Also cann I know how much time it will take to add approx 100 vertebrate species, I am only interested in getting Fas score for a human seed protein from all vertebrate species, so this step of COILS2 required while calculating phyloprofile ?

NOTE: I have created fresh environment and install all requirements.

from fdog.

trvinh commented on September 27, 2024

Hi @swttalyan ,

if you encountered a problem with COILS2, please check the FAQs of FAS. Please make sure to test FAS before using fDOG.

The most time consuming step is the feature annotation. The runtime depends on the number of CPUs used for the annotation. In our benchmark, it took about 15min for a gene set with 20,000 proteins using 64 CPUs. To calculate the FAS scores, annotations are required for both the seed species (in your case, human) and the search/query species (the other vertebrate species). Therefore, you still need to run the annotation for species.

Best,
Vinh

from fdog.

swttalyan commented on September 27, 2024

Many thanks Vinh, unfortunately I am encountering no problems in running setupFAS and all annotations tools are installed problems occurs only while adding Taxa with fdog.addTaxa.

from fdog.

trvinh commented on September 27, 2024

Did you run the test command for fas.doAnno?

fas.doAnno -i test_annofas.fa -o test_output

Are you using conda environment? If yes, please make sure that you have FAS and fDOG installed in the same environment. If the test command above worked, could you please try running fdog.addTaxon for only one species (you can even run it with only the first 10 sequences of a species) with the option -a to ignore the annotation. Then manually run fas.doAnno using the protein fasta file in the searchTaxa_dir folder, which has just been created by fdog.addTaxon. If it works, you can use this solution for all species.

from fdog.

Addition of Taxa doesnt seems to be working with fdog.addTaxa about fdog HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent