Comments (8)
Dear Zechen Tang,
Thank you very much for pointing this out. Could you please specify from which branch you have downloaded fDOG? Please use the branch fdog_goes_assembly (https://github.com/BIONF/fDOG/tree/fdog_goes_assembly). If you did use this branch, please indicate so.
Kind regards,
Ingo Ebersberger
from fdog.
Dear Ingo Ebersberger,
Thank you very much for your response. I downloaded and configured fDOG from the branch you specified (fdog_goes_assembly: https://github.com/BIONF/fDOG/tree/fdog_goes_assembly). However, when I attempted to use it, I encountered an error indicating that the module does not exist. Further investigation suggests that it may have been removed.
Could you please provide further guidance or confirm if the module has indeed been removed?
Thank you for your assistance.
from fdog.
Dear Zechen Tang,
thank you for pointing this out.
The fDOG-Assembly module was removed from the fDOG master branch in an earlier version and exists only in its own branch (fdog_goes_assembly). Please double-check if you are using the correct branch and the latest version. You can follow the following instructions to install fDOG-assembly.
# clone repo
git clone https://github.com/BIONF/fDOG.git
#navigate in fDOG folder
cd fDOG
#checkout branch
git checkout --track origin/fdog_goes_assembly
#install fDOG
python setup.py develop
#setup fDOG
fdog.setup -d <directory_for_fDOG_data>
#setup FAS
fas.setup -t </annotation_tools>
Please, let me know if you still encounter the same error.
Kind regards,
Hannah
from fdog.
Dear Hannah,
Thank you for your help. I have now managed to include the module by obtaining the correct version from GitHub. However, I have some further questions I hope you can assist with:
I noticed that the --augustusRefSpec AUGUSTUSREFSPEC parameter is required. Does this mean that I need to pre-train Augustus in my environment, or do I need to provide specific commands to train Augustus during the setup? I did not find any specific instructions regarding configuring Augustus.
The --gene option requires alignment and HMM files. When running fdog.run on a protein dataset, the three specified folders (--searchpath /path/to/your/searchTaxa_dir, --corepath /path/to/your/coreTaxa_dir, and --annopath /path/to/your/annotation_dir) do not include HMM and alignment files. However, according to your 2009 publication, these files are necessary. I am a bit confused if I missed a step where these core gene HMM and alignment files should have been generated during fdog.run. Is the process as follows: aligning sequences of orthologous genes, building, training, and calibrating pHMM using HMMER, and then combining all generated files into one file to be used as an input for fdog.run?
Currently, I am running fDOG version 0.1.32 with fdog.assembly version 0.0.1.
Thank you very much for your patience and assistance, and I apologize for any inconvenience.
Kind regards,
Zechen Tang
from fdog.
Dear Zechen Tang,
the current version of fDOG-Assembly is 0.1.5.1. Please be sure to use the latest one that includes many improvements and new features. Make another git pull today because I made some updates recently. You can check the version with
fdog.assembly --version
.
Augustus already offers many pre-trained models you can use. Use the following augustus command to get all pre-trained models augustus --species=help
. Use the identifier of your choice as --augustusRefSpec parameter.
fDOG (fdog.run) produces, among others, a folder called core_orthologs as output. In this folder, you can find the alignments and HMM files you need for fDOG-Assembly in the correct format. You can pass the path to the folder called core_orthologs to --coregroupPath. Afterwards, specify the gene name you want to run fDOG-Assembly with. Be sure that the gene name and the subfolder in which its data is located in the folder core_orthologs are the same. The reference species you select with --refSpec has to be included in the core_ortholog group and the name must be identical. The data of this reference species must be contained in the fDOG folders searchTaxa_dir, coreTaxa_dir, annotation_dir (have a look at the github wiki for more information about the fDOG data structure).
The last thing you need is a folder containing all the assemblies you want to search in. The assemblies should have the same naming scheme as described in the fDOG wiki. fDOG-Assembly requires a subfolder for each species containing the assembly fasta file. For example, the data structure for the species Drosophila melanogaster would look like the following:
Name: DROME@7227@v1
Folder structure:
assembly_dir
|-DROME@7227@v1
| |-DROME@[email protected]
| |-blast_dir
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
| | |-DROME@[email protected]
The blast_dir will be automatically computed by fDOG-Assembly if it does not exist. You can use the script fdog.addAssembly which generates the required file structure automatically.
I hope things are clearer now. Please don't hesitate to ask if something is unclear or if you encounter any problems.
Kind regards,
Hannah
from fdog.
Dear Hannah,
Thank you very much for your assistance. Upon careful review, I realized that I had previously not selected the correct branch in the git repository. I have now corrected this and am using the following command:
fdog.assembly --gene NAEY1_g34.t1 --refSpec REFSPE ANTNE@642069@240617 --augustusRefSpec Anthocoris_zoui --metaeukDb /datapool/home/yangzc/soft/pacbio/fdog/Pfam/Pfam-hmms/targetDB --coregroupPath /datapool/home/yangzc/soft/OTHER/orth-datasets/other_group/OrthoFinder/Results_Jun04/ortho_braker/NAEY1/core_orthologs --dataPath /datapool/home/yangzc/soft/pacbio/gapfiller-main/Anthocoris/MEGAHIT_MJGY2/ --strict --force
Anthocoris_zoui is an Augustus training set generated previously using transcriptome data via the Braker3 pathway, and NAEY1 is the seed species used for fdog.run.
However, I encountered the following error during execution.
Additionally, I would like to ask if there is currently a way to perform batch gene searches. Specifically, if we use a verified protein dataset to conduct a reverse BLAST using Diamond on an assembled genome, and then extract transcripts using TransDecoder followed by running fdog.run, can we achieve the same goal of batch orthologous gene searches? Based on my understanding, it seems that fdog.assembly does not yet support batch processing of genes.
Thank you very much for your patience and support, and I apologize for any inconvenience.
Kind regards,
Zechen Tang
from fdog.
Dear Zechen Tang,
in fDOG-Assembly we have implemented two different gene prediction methods, namely Augustus and MetaEuk. Currently, MetaEuk is the default gene prediction method. If you want to use augustus, you have to use --augustus as a parameter, and additionally, the --augustusRefSpec parameter. Please remove the --metaEukDB parameter and replace it with --augustus if you want to use Augustus.
There was indeed a bug using --strict, thank you. It is now fixed, so please update fDOG-Assembly. Nevertheless, I want to mention that using --strict can decrease the number of reported orthologs by fDOG-Assembly or lead to no ortholog reported at all. I recommend using fDOG-Assembly without the parameter --strict and by using the species as reference (--refSpec) that is related closest to your species under investigation.
Currently, a batch gene search is not implemented in fDOG-Assembly. Maybe we can deliver that in a future update.
Kind regards,
Hannah
from fdog.
Dear Hannah,
Thank you very much for your detailed response and the provided information. Based on your guidance, I will make the following adjustments to my command:
Remove the --metaEukDB parameter and replace it with --augustus.
Add the --augustus parameter alongside --augustusRefSpec.
I have also noted your recommendation about not using the --strict parameter, as it can decrease the number of reported orthologs or lead to no orthologs being reported at all. I will update fDOG-Assembly to the latest version to ensure the bug fix is applied.
Regarding batch gene search, thank you for clarifying that it is not currently implemented in fDOG-Assembly. I look forward to any future updates that might include this feature.
Thank you again for your patience and support.
Kind regards,
Zechen Tang
from fdog.
Related Issues (20)
- rename seq IDs back to original IDs
- improve output to specify if new seq is directly orthologous to ref. sequence
- Convert nine variable assignments to the usage of combined operators HOT 2
- extend taxonomy ranks for core compilation
- fdog cannot identify correct input seed sequence HOT 1
- about the seed sequence HOT 2
- Fail installation (setup stage) HOT 2
- Avoid editing `~/.bashrc` without user permission HOT 6
- Convert six assignment statements to augmented source code HOT 2
- FileNotFoundError: [Errno 2] No such file or directory: '~/python3.7/lib/python3.7/site-packages/fdog/bin/pathconfig.txt' HOT 3
- 4180 low-copy lineologous candidate nuclear genes from 9 representative angiosperms HOT 8
- http://www.deep-phylogeny.org/hamstr HOT 1
- ERROR: Cannot find seed sequence in genome of reference species for Dracaena_cambodiana_DN10000_c0_g1_i1.p1! HOT 4
- About fDOG running issues HOT 12
- select Candidate Orthologous Genes (OGs) from transcript of transcriptomes HOT 9
- Isoforms
- Cannot install FASTA36 HOT 1
- Addition of Taxa doesnt seems to be working with fdog.addTaxa HOT 8
- question for understanding the input files for HaMStR, --seqFile and --refspec
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fdog.