Comments (8)
Now the with the separate steps it seems to be working.
Doing annotation for /workspace/fdog/searchTaxa_dir/LETRE@7753@240123/LETRE@[email protected]... 1%| | 244/35015 [00:55<1:05:03, 8.91it/s]
Lets see I will continue this way, yes I am using conda env and have FAS and fDOG both installed. Thanks, hopefully it runs smoothly after this for getting the phyloprofiles.
from fdog.
I would like to use all these species from ncbi "https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_other/"
from fdog.
Hi @swttalyan,
I need more information in order to reproduce the issue. Since I have tested both the functions with my data and they still worked. I used the same command that you have tried: fdog.addTaxa -i Fasta_files -m Mapping_file.txt -c
. Both -i Fasta_files
or -i Fasta_files/
will work, but not -i Fasta_files/*
.
vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ fdog.addTaxa -i Fasta_files -m Mapping_file.txt -c
WARNING: 9607 not found in NCBI taxonomy database!
WARNING: rank of 9605 is not SPECIES (genus)
WARNING: 9605 not found in NCBI taxonomy database!
Parsing genome for 2 species...
0%| | 0/2 [00:00<?, ?it/s]WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9607@9607@240119/UNK9607@[email protected] file for details!
WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9605@9605@240119/UNK9605@[email protected] file for details!
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 28.24it/s]
Creating Blast DB for 2 species...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7.16it/s]
PID 2837391
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9607@9607@240119/UNK9607@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46/46 [00:07<00:00, 5.91it/s]
Finished in 8.059s
PID 2839315
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UNK9605@9605@240119/UNK9605@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:08<00:00, 5.43it/s]
Finished in 8.564s
==> Adding 2 taxa finished in 20.164s
==> Output for UNK9605@9605@240119 can be found in /home/vinh/fdog_data_2023/searchTaxa_dir, /home/vinh/fdog_data_2023/coreTaxa_dir, /home/vinh/fdog_data_2023/annotation_dir
My Mapping_file.txt
looks like this:
vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ head Mapping_file.txt
#filename tax_id
test.extended.fa 9607
tr_Q90XQ7_DANRE.extended.fa 9605
And this is an example with fdog.addTaxon
vinh@wks15:/share/project/vinh/test/fdog/addTaxa$ fdog.addTaxon -f Fasta_files/tr_Q90XQ7_DANRE.extended.fa -i 9999 -c
NCBI taxon info: 9999 Urocitellus parryii
Species name UROPA@9999@240119
Parsing FASTA file...
WARNING: Sequence IDs contain pipe(s). They will be replaced by "_"!
Please check the /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected] file for details!
Creating Blast DB...
Building a new DB, current time: 01/19/2024 14:56:35
New DB name: /home/vinh/fdog_data_2023/coreTaxa_dir/UROPA@9999@240119/UROPA@9999@240119
New DB title: /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected]
Sequence type: Protein
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 45 sequences in 0.0176709 seconds.
PID 2846204
Doing annotation for /home/vinh/fdog_data_2023/searchTaxa_dir/UROPA@9999@240119/UROPA@[email protected]...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:08<00:00, 5.49it/s]
Finished in 8.469s
==> Output for UROPA@9999@240119 can be found in /home/vinh/fdog_data_2023/searchTaxa_dir, /home/vinh/fdog_data_2023/coreTaxa_dir, /home/vinh/fdog_data_2023/annotation_dir
Which fDOG version are you using? Please make sure to install the latest one (v0.1.26).
Did you put all the _protein.faa
into your Fasta_files
folder? Does that folder contains only the fasta files that you mention in your Mapping_file.txt
? How do you know that it doesn't read all the fasta files? Please show me the log file with the error message.
Best,
Vinh
from fdog.
Many thanks for your response, yes I am using the latest version of the fDOG(v0.1.26).
I managed to get it running, there are lot of conditions for naming of the fasta files as the name shouldn;t contain more than one "_" or some special character therefore I renamed all fasta file to protein1.faa, protein2.faa and so on. and it run until creating blast db and now I am getting the following error:
/home/user/annotation_tools/COILS2/COILS2 -f < "/workspace/fdog/annotation_dir/tmp/2745623/LETRE@7753@240123_XP_061403495_1.fa" Error running
for all the sequences and species.
from fdog.
Also cann I know how much time it will take to add approx 100 vertebrate species, I am only interested in getting Fas score for a human seed protein from all vertebrate species, so this step of COILS2 required while calculating phyloprofile ?
NOTE: I have created fresh environment and install all requirements.
from fdog.
Hi @swttalyan ,
if you encountered a problem with COILS2, please check the FAQs of FAS. Please make sure to test FAS before using fDOG.
The most time consuming step is the feature annotation. The runtime depends on the number of CPUs used for the annotation. In our benchmark, it took about 15min for a gene set with 20,000 proteins using 64 CPUs. To calculate the FAS scores, annotations are required for both the seed species (in your case, human) and the search/query species (the other vertebrate species). Therefore, you still need to run the annotation for species.
Best,
Vinh
from fdog.
Many thanks Vinh, unfortunately I am encountering no problems in running setupFAS and all annotations tools are installed problems occurs only while adding Taxa with fdog.addTaxa.
from fdog.
Did you run the test command for fas.doAnno
?
fas.doAnno -i test_annofas.fa -o test_output
Are you using conda environment? If yes, please make sure that you have FAS and fDOG installed in the same environment. If the test command above worked, could you please try running fdog.addTaxon
for only one species (you can even run it with only the first 10 sequences of a species) with the option -a
to ignore the annotation. Then manually run fas.doAnno
using the protein fasta file in the searchTaxa_dir folder, which has just been created by fdog.addTaxon
. If it works, you can use this solution for all species.
from fdog.
Related Issues (20)
- Convert nine variable assignments to the usage of combined operators HOT 2
- extend taxonomy ranks for core compilation
- fdog cannot identify correct input seed sequence HOT 1
- about the seed sequence HOT 2
- Fail installation (setup stage) HOT 2
- Avoid editing `~/.bashrc` without user permission HOT 6
- Convert six assignment statements to augmented source code HOT 2
- FileNotFoundError: [Errno 2] No such file or directory: '~/python3.7/lib/python3.7/site-packages/fdog/bin/pathconfig.txt' HOT 3
- 4180 low-copy lineologous candidate nuclear genes from 9 representative angiosperms HOT 8
- http://www.deep-phylogeny.org/hamstr HOT 1
- ERROR: Cannot find seed sequence in genome of reference species for Dracaena_cambodiana_DN10000_c0_g1_i1.p1! HOT 4
- About fDOG running issues HOT 12
- select Candidate Orthologous Genes (OGs) from transcript of transcriptomes HOT 9
- Isoforms
- Cannot install FASTA36 HOT 1
- question for understanding the input files for HaMStR, --seqFile and --refspec
- Issue with fdog-Assembly: ModuleNotFoundError for 'fdog.fDOGassembly' HOT 8
- rename seq IDs back to original IDs
- improve output to specify if new seq is directly orthologous to ref. sequence
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fdog.