Code Monkey home page Code Monkey logo

Comments (8)

swttalyan avatar swttalyan commented on June 29, 2024 1

Now the with the separate steps it seems to be working.
Doing annotation for /workspace/fdog/searchTaxa_dir/LETRE@7753@240123/LETRE@[email protected]... 1%| | 244/35015 [00:55<1:05:03, 8.91it/s]
Lets see I will continue this way, yes I am using conda env and have FAS and fDOG both installed. Thanks, hopefully it runs smoothly after this for getting the phyloprofiles.

from fdog.

swttalyan avatar swttalyan commented on June 29, 2024

I would like to use all these species from ncbi "https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_other/"

from fdog.

trvinh avatar trvinh commented on June 29, 2024

Hi @swttalyan,
I need more information in order to reproduce the issue. Since I have tested both the functions with my data and they still worked. I used the same command that you have tried: fdog.addTaxa -i Fasta_files -m Mapping_file.txt -c. Both -i Fasta_files or -i Fasta_files/ will work, but not -i Fasta_files/*.

vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ fdog.addTaxa -i Fasta_files -m Mapping_file.txt -c
WARNING: 9607 not found in NCBI taxonomy database!
WARNING: rank of 9605 is not SPECIES (genus)
WARNING: 9605 not found in NCBI taxonomy database!
Parsing genome for 2 species...
  0%|                                                                                                                                  | 0/2 [00:00<?, ?it/s]WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9607@9607@240119/UNK9607@[email protected] file for details!
WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9605@9605@240119/UNK9605@[email protected] file for details!
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 28.24it/s]

Creating Blast DB for 2 species...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.16it/s]
PID 2837391
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9607@9607@240119/UNK9607@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46/46 [00:07<00:00,  5.91it/s]
Finished in 8.059s
PID 2839315
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9605@9605@240119/UNK9605@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:08<00:00,  5.43it/s]
Finished in 8.564s
==> Adding 2 taxa finished in 20.164s
==> Output for UNK9605@9605@240119 can be found in /home/vinh/fdog_data_2023/searchTaxa_dir, /home/vinh/fdog_data_2023/coreTaxa_dir, /home/vinh/fdog_data_2023/annotation_dir

My Mapping_file.txt looks like this:

vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ head Mapping_file.txt
#filename	tax_id
test.extended.fa	9607
tr_Q90XQ7_DANRE.extended.fa	9605

And this is an example with fdog.addTaxon

vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ fdog.addTaxon -f Fasta_files/tr_Q90XQ7_DANRE.extended.fa -i 9999 -c
NCBI taxon info: 9999 Urocitellus parryii
Species name	UROPA@9999@240119
Parsing FASTA file...
WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected] file for details!

Creating Blast DB...


Building a new DB, current time: 01/19/2024 14:56:35
New DB name:   /home/vinh/fdog_data_2023/coreTaxa_dir/UROPA@9999@240119/UROPA@9999@240119
New DB title:  /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected]
Sequence type: Protein
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 45 sequences in 0.0176709 seconds.


PID 2846204
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:08<00:00,  5.49it/s]
Finished in 8.469s

==> Output for UROPA@9999@240119 can be found in /home/vinh/fdog_data_2023/searchTaxa_dir, /home/vinh/fdog_data_2023/coreTaxa_dir, /home/vinh/fdog_data_2023/annotation_dir

Which fDOG version are you using? Please make sure to install the latest one (v0.1.26).
Did you put all the _protein.faa into your Fasta_files folder? Does that folder contains only the fasta files that you mention in your Mapping_file.txt? How do you know that it doesn't read all the fasta files? Please show me the log file with the error message.

Best,
Vinh

from fdog.

swttalyan avatar swttalyan commented on June 29, 2024

Many thanks for your response, yes I am using the latest version of the fDOG(v0.1.26).
I managed to get it running, there are lot of conditions for naming of the fasta files as the name shouldn;t contain more than one "_" or some special character therefore I renamed all fasta file to protein1.faa, protein2.faa and so on. and it run until creating blast db and now I am getting the following error:

/home/user/annotation_tools/COILS2/COILS2 -f < "/workspace/fdog/annotation_dir/tmp/2745623/LETRE@7753@240123_XP_061403495_1.fa" Error running
for all the sequences and species.

from fdog.

swttalyan avatar swttalyan commented on June 29, 2024

Also cann I know how much time it will take to add approx 100 vertebrate species, I am only interested in getting Fas score for a human seed protein from all vertebrate species, so this step of COILS2 required while calculating phyloprofile ?

NOTE: I have created fresh environment and install all requirements.

from fdog.

trvinh avatar trvinh commented on June 29, 2024

Hi @swttalyan ,

if you encountered a problem with COILS2, please check the FAQs of FAS. Please make sure to test FAS before using fDOG.

The most time consuming step is the feature annotation. The runtime depends on the number of CPUs used for the annotation. In our benchmark, it took about 15min for a gene set with 20,000 proteins using 64 CPUs. To calculate the FAS scores, annotations are required for both the seed species (in your case, human) and the search/query species (the other vertebrate species). Therefore, you still need to run the annotation for species.

Best,
Vinh

from fdog.

swttalyan avatar swttalyan commented on June 29, 2024

Many thanks Vinh, unfortunately I am encountering no problems in running setupFAS and all annotations tools are installed problems occurs only while adding Taxa with fdog.addTaxa.

from fdog.

trvinh avatar trvinh commented on June 29, 2024

Did you run the test command for fas.doAnno?

fas.doAnno -i test_annofas.fa -o test_output

Are you using conda environment? If yes, please make sure that you have FAS and fDOG installed in the same environment. If the test command above worked, could you please try running fdog.addTaxon for only one species (you can even run it with only the first 10 sequences of a species) with the option -a to ignore the annotation. Then manually run fas.doAnno using the protein fasta file in the searchTaxa_dir folder, which has just been created by fdog.addTaxon. If it works, you can use this solution for all species.

from fdog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.