dyogenibens / agora Goto Github PK

View Code? Open in Web Editor NEW

67.0 67.0 14.0 79.47 MB

Algorithm For Gene Order Reconstruction in Ancestors

License: Other

Python 99.08% Shell 0.71% Dockerfile 0.21%

ancestral-chromosomal-karyotype ancestral-genome-reconstruction bioinformatics evolution

agora's People

Contributors

Stargazers

Watchers

Forkers

chenxi1201 yuzhenpeng asgiraldoc delehef caoyu819 biogeeker wangchengww alouis72 trubohe heche-psb wubaosheng ningshuang-yao loraine-gueguen tong2200

agora's Issues

Error Running agora-basic.py: "assert oldName not in seen"

Hello,

I am trying to run Agora using my own data (the example worked with no issues). This is the command I tried to run: ~/Agora/src/agora-basic.py species-tree.nwk orthologyGroups/orthologyGroups.%s.list genes/genes.%s.list

(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-tree.nwk orthologyGroups/orthologyGroups.%s.list genes/genes.%s.list

| Key | Values |

| speciesTree | species-tree.nwk |
| geneTrees|orthologyGroups | orthologyGroups/orthologyGroups.%s.list |
| genes | genes/genes.%s.list |
| target | |
| extantSpeciesFilter | |
| compress | bz2 |
| workingDir | . |
| nbThreads | 24 |
| forceRerun | False |
| sequential | True |

New task 0 ('ancgenes', 'all')
[]
Command(args=['/home/theillere/Agora/src/ALL.reformatGeneFamilies.py', 'species-tree.nwk', 'orthologyGroups/orthologyGroups.%s.list', '-IN.genesFiles=genes/genes.%s.list', '-OUT.ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.genesFiles=genes/genes.%s.list.bz2'], out='GeneTreeForest.withAncGenes.nhx.bz2', log='ancGenes/ancGenes.log')

New task 1 ('pairwise', 'ancgenes-all')
[('ancgenes', 'all')]
Command(args=['/home/theillere/Agora/src/buildSynteny.pairwise-conservedPairs.py', 'species-tree.nwk', 'NAME_0', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-genesFiles=genes/genes.%s.list.bz2', '-OUT.pairwise=pairwise/pairs-all/%s.list.bz2'], out=None, log='pairwise/pairs-all/log')

New task 2 ('integr', 'denovo-all')
[('pairwise', 'ancgenes-all')]
Command(args=['/home/theillere/Agora/src/buildSynteny.integr-denovo.py', 'species-tree.nwk', 'NAME_0', '+searchLoops', '-OUT.ancBlocks=ancBlocks/denovo-all/blocks.%s.list.bz2', 'pairwise/pairs-all/%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-LOG.ancGraph=ancBlocks/denovo-all/graph.%s.txt.bz2'], out=None, log='ancBlocks/denovo-all/log')

New task 3 ('integr', 'denovo-all.scaffolds')
[('integr', 'denovo-all')]
Command(args=['/home/theillere/Agora/src/buildSynteny.integr-scaffolds.py', 'species-tree.nwk', 'NAME_0', '-OUT.ancBlocks=ancBlocks/denovo-all.scaffolds/blocks.%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-IN.ancBlocks=ancBlocks/denovo-all/blocks.%s.list.bz2', '-genesFiles=genes/genes.%s.list.bz2', '-LOG.ancGraph=ancBlocks/denovo-all.scaffolds/graph.%s.txt.bz2'], out=None, log='ancBlocks/denovo-all.scaffolds/log')

New task 4 ('conversion', 'basic-workflow')
[('integr', 'denovo-all.scaffolds')]
Command(args=['/home/theillere/Agora/src/convert.ancGenomes.blocks-to-genes.py', 'species-tree.nwk', 'NAME_0', '+orderBySize', '-IN.ancBlocks=ancBlocks/denovo-all.scaffolds/blocks.%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.ancGenomes=ancGenomes/basic-workflow/ancGenome.%s.list.bz2'], out=None, log='ancGenomes/basic-workflow/log')

Status: 5 to do, 0 running, 0 done, 0 failed -- 5 total
Available tasks: [0]
Control file ancGenes/ancGenes.log.agora missing
Launching task 0 ['/home/theillere/Agora/src/ALL.reformatGeneFamilies.py', 'species-tree.nwk', 'orthologyGroups/orthologyGroups.%s.list', '-IN.genesFiles=genes/genes.%s.list', '-OUT.ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.genesFiles=genes/genes.%s.list.bz2'] > GeneTreeForest.withAncGenes.nhx.bz2 2> ancGenes/ancGenes.log
Status: 4 to do, 1 running, 0 done, 0 failed -- 5 total
Waiting ...
task 0 report: 0.106603 sec CPU time / 0.107803 sec elapsed = 98.8865% CPU usage, 17.625 MB RAM
task 0 is now finished (status 1)

Inspect ancGenes/ancGenes.log for more information
Status: 4 to do, 0 running, 0 done, 1 failed -- 5 total
Available tasks: []
Workflow stopped because of failures
Workflow report: 0.114315 sec CPU time / 0.115183 sec elapsed = 99.2463% CPU usage, 18.0391 MB RAM
(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$

Here is the input data that I'm working with: https://www.dropbox.com/scl/fo/en4rlnwvvnspv9sj51d3u/h?dl=0&rlkey=ybt2vi7hi09xfgnp2uuw85oz7

Please let me know if you have any insight as to how I can solve this issue. I'm also attaching the log file.
Thanks!

Agora_Log.txt

Error when running agora-basic.py

Hello,
I got an error in log file under folder ancGenomes/basic-workflow after running agora-basic.py

Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A8.list.bz2 ... (Ensembl) OK
Loading genome of ancGenes/all/ancGenes.A8.list.bz2 ... (ancestral genes) OK
Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A9.list.bz2 ... (Ensembl) OK
Loading genome of ancGenes/all/ancGenes.A9.list.bz2 ... (ancestral genes) OK
Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A5.list.bz2 ... (ancestral genome: diags) OK
Loading genome of ancGenes/all/ancGenes.A5.list.bz2 ... (ancestral genes) OK
Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A4.list.bz2 ... (ancestral genome: diags) OK
Loading genome of ancGenes/all/ancGenes.A4.list.bz2 ... (ancestral genes) OK
Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A2.list.bz2 ... (ancestral genome: diags) OK
Loading genome of ancGenes/all/ancGenes.A2.list.bz2 ... (ancestral genes) OK
Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A1.list.bz2 ... (ancestral genome: diags) OK
Loading genome of ancGenes/all/ancGenes.A1.list.bz2 ... (ancestral genes) OK
Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A7.list.bz2 ... (Ensembl) OK
Loading genome of ancGenes/all/ancGenes.A7.list.bz2 ... (ancestral genes) OK
Loading genome of ancBlocks/denovo-all.scaffolds/blocks.A3.list.bz2 ... (ancestral genome: diags) OK
Loading genome of ancGenes/all/ancGenes.A3.list.bz2 ... (ancestral genes) OK
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/public/renyifan/miniconda3/envs/agora/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/public/renyifan/miniconda3/envs/agora/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/public/renyifan/biosoft/Agora-master/src/convert.ancGenomes.blocks-to-genes.py", line 69, in do
    print(utils.myFile.myTSV.printLine([names[s], gene.beginning, gene.end, gene.strand, " ".join(ancGenes[gene.names[0]].names)]), file=ancGenomeFile)
TypeError: list indices must be integers or slices, not str
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/public/renyifan/biosoft/Agora-master/src/convert.ancGenomes.blocks-to-genes.py", line 74, in <module>
    multiprocessing.Pool(n_cpu).map(do, sorted(targets))
  File "/public/renyifan/miniconda3/envs/agora/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/public/renyifan/miniconda3/envs/agora/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
TypeError: list indices must be integers or slices, not str

And I find the results of A1 to A6 are successfully output under ancGenomes, but the output files of A7 to A9 are empty. I think the reason for this cannot be the mismatch of gene names between gene list and orthogroups list , because the successful output of A6, so I wonder where the problem is.

Here's my phylogeny tree:

Thanks for your reply!

Error using misc.compareGenomes.py

Hello,

I am having a similar but different error to #3 when running misc.compareGenomes.py:

agora_input/genes/genes.Pieris_brassicae.list.bz2 \
agora_input/genes/genes.Pieris_napi.list.bz2 \
output/ancGenes/all/ancGenes.anc_7.list.bz2 \
-mode=drawKaryotype -minChrSize=100 > karyotype_Pieris_brassicae_napi

--------------------------------------------------------------------------------
| Key                      | Values                                            |
--------------------------------------------------------------------------------
| studiedGenome            | agora_input/genes/genes.Pieris_brassicae.list.bz2 |
| referenceGenome          | agora_input/genes/genes.Pieris_napi.list.bz2      |
| orthologuesList          | output/ancGenes/all/ancGenes.anc_7.list.bz2       |
| includeGaps              | False                                             |
| includeScaffolds         | True                                              |
| includeRandoms           | False                                             |
| includeNones             | False                                             |
| reverse                  | False                                             |
| mode                     | drawKaryotype                                     |
| orthoslist:fullgenenames | False                                             |
| orthoschr:minHomology    | 90                                                |
| minChrSize               | 500                                               |
| matrix:scaleY            | False                                             |
| matrix:pointSize         | -1                                                |
| sortBySize               | False                                             |
| matrix:colorFile         |                                                   |
| matrix:defaultColor      | black                                             |
| matrix:penColor          | black                                             |
| karyo:landscape          | False                                             |
| ps:backgroundColor       |                                                   |
| karyo:roundedChr         | False                                             |
| karyo:resolution         | 1                                                 |
| karyo:showText           | True                                              |
| karyo:drawBorder         | False                                             |
| karyo:defaultColor       |                                                   |
| karyo:penColor           | black                                             |
--------------------------------------------------------------------------------
Loading genome of output/ancGenes/all/ancGenes.anc_7.list.bz2 ... (ancestral genes) OK
Loading genome of agora_input/genes/genes.Pieris_brassicae.list.bz2 ... (Ensembl) OK
Loading genome of agora_input/genes/genes.Pieris_napi.list.bz2 ... (Ensembl) OK
15 25
Affichage ... Traceback (most recent call last):
  File "/lustre/Agora/src/misc.compareGenomes.py", line 268, in <module>
    eval(str(arguments["mode"]))()
  File "/lustre/Agora/src/misc.compareGenomes.py", line 180, in drawKaryotype
    utils.myKaryoDrawer.drawKaryo(data, arguments, x0=1, y0=1, lx=lx-2, ly=ly-2, bysize=arguments["sortBySize"], color_chrom=color_chrom)
  File "/lustre/Agora/src/utils/myKaryoDrawer.py", line 35, in drawKaryo
    dy = ly / float(max([len(x[0]) for (_,x) in data]))
ZeroDivisionError: float division by zero

I also get the same error when trying to use 'drawMatrix' mode.

I'd appreciate any advice in getting this to run successfully!

Thanks!

Charlotte

ValueError when runnig misc.compareGenomes.py

Hello,

I got a ValueError when running misc.compareGenomes.py with my personal data:

./misc.compareGenomes.py results/ancGenomes/plants-workflow/ancGenome.N3.list results/ancGenomes/plants-workflow/ancGenome.N0.list results/ancGenes/all/ancGenes.N0.list -mode=drawKaryotype -minChrSize=100 +sortBySize > N3.ps

Loading genome of results/ancGenes/all/ancGenes.N0.list ... (ancestral genes) OK
Loading genome of results/ancGenomes/plants-workflow/ancGenome.N3.list ... (Ensembl) OK
Loading genome of results/ancGenomes/plants-workflow/ancGenome.N0.list ... (Ensembl) OK
36116 28048
Affichage ... Traceback (most recent call last):
  File "/home/chensy/lxw/lxw_new/Agora/src/./misc.compareGenomes.py", line 268, in <module>
    eval(str(arguments["mode"]))()
  File "/home/chensy/lxw/lxw_new/Agora/src/./misc.compareGenomes.py", line 180, in drawKaryotype
    utils.myKaryoDrawer.drawKaryo(data, arguments, x0=1, y0=1, lx=lx-2, ly=ly-2, bysize=arguments["sortBySize"], color_chrom=color_chrom)
  File "/home/chensy/lxw/lxw_new/Agora/src/utils/myKaryoDrawer.py", line 35, in drawKaryo
    dy = ly / float(max([len(x[0]) for (_,x) in data]))
ValueError: max() arg is an empty sequence

Part of my species_tree:

But it runs successfully on this command:

./misc.compareGenomes.py results/ancGenomes/plants-workflow/ancGenome.N1.list results/ancGenomes/plants-workflow/ancGenome.N0.list results/ancGenes/all/ancGenes.N0.list -mode=drawKaryotype -minChrSize=100 +sortBySize > N1.ps

Thanks for your reply!

Too many contiguous ancestral regions (CARs)

Hi !
I got 260 CARs and I have used agora-plants.py and agora-vertebrates.py. And I used orthoGroups as input
( The genus I studied has only eight species...

Is it because the number of species is too low, and is there anything I can do to improve this result?

Plotting agora results plus a general question

Hello, very nice and useful pipeline. I finished a run using orthogroups, I am very insterested to know how you approach the plotting of the ancestral genome outputs.

Thank you again,
Orestis

getting error - AssertionError ALL.reformatGeneFamilies.py

Dear all,

I'm trying to run your algorithm using my own data but I got this error:

agora-basic.py ../synteny/syntree.nwk ../synteny/orthologyGroups/orthologyGroups.%s.list ../synteny/genes/genes.%s.list

Inspect ancGenes/ancGenes.log for more information
Status: 4 to do, 0 running, 0 done, 1 failed -- 5 total

Then, when I open that file for more information, this is the message:



------------------------------------------------------------------------
| Key               | Values                                           |
------------------------------------------------------------------------
| speciesTree       | ../synteny/syntree.nwk                           |
| orthologyGroups   | ../synteny/orthologyGroups/orthologyGroups.%s.list |
| IN.genesFiles     | ../synteny/genes/genes.%s.list                   |
| OUT.ancGenesFiles | ancGenes/all/ancGenes.%s.list.bz2                |
| OUT.genesFiles    | genes/genes.%s.list.bz2                          |
------------------------------------------------------------------------
Renaming the genes of M1 ... 4981 OK
Renaming the genes of M10 ... 5539 OK
Renaming the genes of M11 ... 5233 OK
Renaming the genes of M12 ... 5425 OK
Renaming the genes of M13 ... 4778 OK
Renaming the genes of M14 ... 5384 OK
Renaming the genes of M15 ... 5057 OK
Renaming the genes of M16 ... 4767 OK
Renaming the genes of M17 ... 5460 OK
Renaming the genes of M18 ... 6007 OK
Renaming the genes of M2 ... 4891 OK
Renaming the genes of M3 ... 4975 OK
Renaming the genes of M4 ... 4793 OK
Renaming the genes of M5 ... 5166 OK
Renaming the genes of M6 ... 5516 OK
Renaming the genes of M7 ... 5411 OK
Renaming the genes of M8 ... 5285 OK
Renaming the genes of M9 ... 4716 OK
Renaming the genes of A1 ... SKIPPING
Renaming the genes of A11 ... SKIPPING
Renaming the genes of A13 ... SKIPPING
Renaming the genes of A15 ... SKIPPING
Renaming the genes of A17 ... SKIPPING
Renaming the genes of A19 ... SKIPPING
Renaming the genes of A21 ... SKIPPING
Renaming the genes of A23 ... SKIPPING
Renaming the genes of A25 ... SKIPPING
Renaming the genes of A27 ... SKIPPING
Renaming the genes of A29 ... SKIPPING
Renaming the genes of A3 ... SKIPPING
Renaming the genes of A31 ... SKIPPING
Renaming the genes of A33 ... SKIPPING
Renaming the genes of A5 ... SKIPPING
Renaming the genes of A7 ... SKIPPING
Renaming the genes of A9 ... SKIPPING
Updating the ancestral families of A1 ... adding names ... 5969 OK
Updating the ancestral families of A11 ... adding names ... 5260 OK
Updating the ancestral families of A13 ... adding names ... Traceback (most recent call last):
  File "/home/asgiraldoc/Agora/src/ALL.reformatGeneFamilies.py", line 91, in <module>
    assert len(og) == len(set(og))
AssertionError

I'm not sure what's happening here... Can you help me? This is my data https://drive.google.com/drive/folders/1S6Ii0p2-nyc4EFWsc-3CjJZ_TohX3wek?usp=sharing

PD= when I ran the algorithm with your example data, everything was good.

Thank you,

input file issue

Hello and thank you for your software Agora. Now I want to run the software, but I don't know how to get the other two input files besides the species tree. Can you tell me how to get it? Thank you

Install error

Hi there, thanks for your work on Agora.
I am trying to get it installed, but am getting the following error. Any help is appreciated.
Cheers,
John

$ bash ./checkAgoraIntegrity.sh

creates tmp directory for testing

mkdir tmp

check the preprocessing scripts

src/preprocessing/nhxGeneTrees2phylTreeGeneTrees.py example/data/GeneTreeForest.nhx.bz2 > tmp/geneTrees.protTree
Traceback (most recent call last):
File "src/preprocessing/nhxGeneTrees2phylTreeGeneTrees.py", line 16, in
import utils.myFile as myFile
ImportError: No module named utils.myFile

How to locate the support scores of the AGORA adjacency graph for adjacent genes in ancGenome?

Dear Agora Team,

I extend my gratitude to you for developing this exceptional tool for reconstructing ancestral genomes. Recently, I've encountered some challenges in accurately identifying breakpoints between successive genomes within a phylogeny.

I followed the provided guidelines for breakpoint identification and filtering, resulting in a list of candidate breakpoints. However, upon closer inspection, I noticed a number of false positive results. My suspicion is that I might not have properly applied the criteria involving "ends-of-blocks located in ancestral gene adjacencies that are not or poorly supported in the AGORA adjacency graph." Unfortunately, I'm unsure about the specific result file containing the AGORA adjacency graph support scores for adjacent genes in the ancGenome. Could it be the 'pairwise/pairs-all/N%s.list.bz2' files?

As an example, for the ancestral node 'N4', I identified the adjacent genes "N4.6978-N4.25970-N4.29043" in the ancGenome "ancGenome.N4.list.bz2". I attempted to find the AGORA adjacency graph support scores for both "N4.6978-N4.25970" and "N4.25970-N4.29043" using the following shell command:
bzcat N4.list.bz2 | grep -w "25970|6978|29043"
The output is shown below:

Does this output imply that both adjacent gene pairs, "N4.25970-N4.29043" and "N4.6978-N4.25970", lack support in the AGORA adjacency graph? Is the fifth column in the "N4.list.bz2" file indicative of the AGORA adjacency graph support score for a specific gene (family) pair?

I am grappling with this issue due to the detection of "N4.6978-N4.25970" as an interchromosomal rearrangement breakpoint in the ancestral genome N4, when compared to its descendant node. However, this breakpoint appears questionable, as indicated by checking the "ancGenome.N4.list.bz2" file using the command:
bzcat ancGenome.N4.list.bz2 | grep -A 1 "N4.25970|N4.6978"
The output is shown below:

It appears that the gene family of N4.25970 erroneously includes the gene 'Cmol.Cmol1g01662', resulting in the connection of two blocks, one with "Cmol.Cmol1g01661" and another with "Cmol.Cmol7g00753" during the fill-in/fusion/insertion steps. Could I be misunderstanding this situation?

I apologize for the length of this inquiry. Your response would be immensely appreciated. Any suggestions or clarifications you can provide will be of great assistance.

Thank you very much in advance.

With best regards,
Yu

Lack of chromosomal information

Hi, thanks for developing such a powerful tool. But when I used it, I noticed that there are a large number of species whose assembly level is only scaffold which means the chromosome information is not sufficient enough to run Agora. So, I just wonder how to deal with those species ?
For example, the genome of Aciculoconidium aculeatum.
https://www.ncbi.nlm.nih.gov/data-hub/genome/GCA_003707435.1/

how prepare input files for Regulus

Dear, teams

Recent, I have read your articles which published on nature communication (in 2015) and nucleic acids research (in 2020).

As you said in paper "Linking long distance regulatory regions to the genes they regulate is important to study and understand the function of enhancers. Three main categories of experimental methods have been developed to assign enhancers to target genes in a genome-wide manner. The use of methods based on evolutionary principles could solve these difficulties, because they do not depend on the specific biological contexts required by experimental assays and are more easily applicable to multiple species"

Regulus is a wonderful tools to predict the target genes of regulatory elements.

I want use your tools for my research. But, because I don't know exactly how many input files are generated, and what each row and column of the output results represent. So I can't make good use of the tools you developed.

If you can provide more detailed documentation, this may be more conducive to the use and development of software.

Would you mind share the steps for preparing input files for anyone.

Thank you very much.

@DyogenIBENS @JosephLucas @alouis72

Agora-generic pipeline seems to be “stuck” for several days after adding 1 more genome

Dear colleagues,
Thank you for your useful and important approach! Unfortunately, I have some issues with ‘agora-generic’ pipeline. Previously, I have gotten good results when analyzing data using this pipeline. To improve the results, I added one new genome and after 6 completed tasks (Status: 49 to do, 1 running, 6 done, 0 failed -- 56 total) the program seems to be “stuck” for several days. Restarting the pipeline and running the analysis on another computer did not help. It is worth noting that, firstly, previously the pipeline successfully completed the analysis after a few minutes, secondly, table of processes indicates that Python-related processes are still running, thirdly, when analyzing the same data set using ‘basic’ and ‘plants’ pipeline, all processes are completed successfully in a few minutes. Given that the ‘generic’ pipeline tries to find the best parameters for each ancestral node of the phylogenetic tree, is such a long data processing time expected or not? Have you noticed this before? And what can you recommend?
I would be grateful for any help,
Thank you very much!

contiguous ancestral regions (CARs) too small

Hi,
I test three chromosome-level assemblies like this (A:0.5,(B:1,C:1)N1:0.5)N0 using Agora without error. But I get many fragment CARs below. There are only 12 chromosomes of ancestor for these species. Do you have any advice?
number CARs
236 CAR_23
237 CAR_21
237 CAR_22
240 CAR_20
241 CAR_19
243 CAR_18
245 CAR_17
250 CAR_16
270 CAR_15
282 CAR_14
284 CAR_13
305 CAR_12
308 CAR_11
315 CAR_10
316 CAR_9
319 CAR_8
326 CAR_7
440 CAR_6
489 CAR_5
512 CAR_4
518 CAR_3
756 CAR_2
802 CAR_1

It seems like CAR_39 should be add to CAR_2?

How to visualize Agora's results

Hello, the species I want to study is not on Genomicus, so I used Agora to analyze, I want to know how to use the results of Agora to draw the picture like 'Karyotype View with ancestors' in Genomicus. Have you developed corresponding R package or tools？Thanks

Error when running agora-plants.py

Hi! Thank you for developing such an interesting tool. I am trying to reconstruct the ancestral genomes of 5 species. When using the agora-basic.py pipeline I have no issues and obtain the 4 reconstructed genomes. However, when I try running the agora-plants.py pipeline using the same dataset, out of the 42 steps 2 are finished and 1 fails. When looking at the size-0.9-1.1.log file I can see the following error:

Traceback (most recent call last):
File "/home/cvargas/Software/AGORA/Agora/src/ALL.filterGeneFamilies-size.py", line 95, in
mkStruct(target)
File "/home/cvargas/Software/AGORA/Agora/src/ALL.filterGeneFamilies-size.py", line 89, in mkStruct
notseen[newanc].remove(x)
KeyError: 8118

Do you have any suggestions? thanks a lot!

how to use misc.compareGenomes.py？

Hi, teams
thanks for your works on Agora.

I think the script "misc.compareGenomes.py" looks great. However, I got an error while running it with the example data. Did I get the input file wrong? or anything?

Could you please give me some advice to using the script?

Thank you very much.

This is my code:

src/misc.compareGenomes.py \
example/data/genes/genes.M1.list.bz2 \
example/data/genes/genes.M2.list.bz2 \
example/results/ancGenes/all/ancGenes.A3.list.bz2 \
-mode=drawKaryotype \
-minChrSize=200 > M1M2_min200genes.ps

The Error:

--------------------------------------------------------------------------------
| Key                      | Values                                            |
--------------------------------------------------------------------------------
| studiedGenome            | example/data/genes/genes.M1.list.bz2              |
| referenceGenome          | example/data/genes/genes.M2.list.bz2              |
| orthologuesList          | example/results/ancGenes/all/ancGenes.A3.list.bz2 |
| includeGaps              | False                                             |
| includeScaffolds         | True                                              |
| includeRandoms           | False                                             |
| includeNones             | False                                             |
| reverse                  | False                                             |
| mode                     | drawKaryotype                                     |
| orthoslist:fullgenenames | False                                             |
| orthoschr:minHomology    | 90                                                |
| minChrSize               | 200                                               |
| matrix:scaleY            | False                                             |
| matrix:pointSize         | -1                                                |
| sortBySize               | False                                             |
| matrix:colorFile         |                                                   |
| matrix:defaultColor      | black                                             |
| matrix:penColor          | black                                             |
| karyo:landscape          | False                                             |
| ps:backgroundColor       |                                                   |
| karyo:roundedChr         | False                                             |
| karyo:resolution         | 1                                                 |
| karyo:showText           | True                                              |
| karyo:drawBorder         | False                                             |
| karyo:defaultColor       |                                                   |
| karyo:penColor           | black                                             |
--------------------------------------------------------------------------------
Loading genome of example/results/ancGenes/all/ancGenes.A3.list.bz2 ... (ancestral genes) OK
Loading genome of example/data/genes/genes.M1.list.bz2 ... (Ensembl) OK
Loading genome of example/data/genes/genes.M2.list.bz2 ... (Ensembl) OK
27 24
Traceback (most recent call last):
  File "src/misc.compareGenomes.py", line 264, in <module>
    eval(str(arguments["mode"]))()
  File "src/misc.compareGenomes.py", line 161, in drawKaryotype
    (lx,ly) = utils.myPsOutput.printPsHeader(arguments["karyo:landscape"])
  File "/home/nieshuai/bin/Agora-master/src/utils/myPsOutput.py", line 24, in printPsHeader
    initColor()
  File "/home/nieshuai/bin/Agora-master/src/utils/myPsOutput.py", line 54, in initColor
    location = [f for f in knownLocations if os.path.exists(f)][0]
IndexError: list index out of range

how to use ALL.reformatGeneFamilies.py

Hi, there

I have successfully installed Agora. And I got an error while running first step of "agora-basic.py" with the example data.
Could you please give me some advice for running it?

Bests.

This is my code:

src/ALL.reformatGeneFamilies.py \
example/data/Species.nwk \
example/data/orthologyGroups/orthologyGroups.%s.list \
-IN.genesFiles=example/data/genes/genes.%s.list \
-OUT.ancGenesFiles=example/results/ancGenes/all/ancGenes.%s.list.bz2 \
-OUT.genesFiles=example/results/genes/genes.%s.list.bz2 \
  > example/results/GeneTreeForests.withAncGenes.nhx.bz2 \
  2> example/results/ancGenes/ancGenes.log

The ERROR:

----------------------------------------------------------------------------
| Key               | Values                                               |
----------------------------------------------------------------------------
| speciesTree       | example/data/Species.nwk                             |
| orthologyGroups   | example/data/orthologyGroups/orthologyGroups.%s.list |
| IN.genesFiles     | example/data/genes/genes.%s.list                     |
| OUT.ancGenesFiles | example/results/ancGenes/all/ancGenes.%s.list.bz2    |
| OUT.genesFiles    | example/results/genes/genes.%s.list.bz2              |
----------------------------------------------------------------------------
Renaming the genes of M1 ... 21160 OK
Renaming the genes of M2 ... 22697 OK
Renaming the genes of M3 ... 19466 OK
Renaming the genes of M4 ... 16736 OK
Renaming the genes of M5 ... 17805 OK
Renaming the genes of A0 ... Traceback (most recent call last):
  File "/home/nieshuai/bin/Agora-master/src/ALL.reformatGeneFamilies.py", line 50, in <module>
    fi = utils.myFile.openFile(inputPath, "r")
  File "/home/nieshuai/bin/Agora-master/src/utils/myFile.py", line 175, in openFile
    f = open(nom, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'example/data/genes/genes.A0.list'

Creating input for analysis

Hi Agora-team,

I have been trying to learn how to construct ancestral genome karyotype for ever, and was never been able to understand it properly and then saw your paper and I was like wow. It is explained so nicely and I feel that I can do that. While I do understand based on the manual the process to run the program, I wonder what is the recommended process to generate the input files. For example, I have 20 plant genomes, and the genome assemblies are chromosome-scale with nice contiguity. I have used the annotationed genomes, used protein files, and run Orthofinder. Orthofinder analysis constructed gene files for all the families, and also resulted in the species file. Is it ok to use output from this program directly to Agora? What preprocessing would be needed? I am not sure as how to prepare the input file, and therefore, your suggestion and guidance will be highly appreciated.

thanks and regards
Amit

Agora in living sps

Hi,

Thanks for this fantastic repo and tools. I am just wondering if Agora can be used with genomes of closely related living sps to reconstruct the genome of their last common ancestors. I do not see why not, but I prefer to ask.

Thanks
Diego

Clarification on Identifying Successive Genomes and Synthetic Blocks between Ancestral Nodes and Extant Genomes in PhylDiag

Hi everyone,

I have another question regarding the concept of 'successive genomes' discussed in the paper. Taking the tree topology below as an example, can I consider 'N30' and 'Oreh' as successive genomes?

To examine rearrangements between 'N30' and 'Oreh', I gather that identifying synthetic blocks between these nodes is necessary. However, I'm uncertain about obtaining the gene family set between the Agora-reconstructed ancestral genome and the extant genome from the provided 'set of gene families' input file in PhylDiag.

For detecting syntenic blocks between 'Oreh' and 'Cvim', directly using 'geneFamily.N30.list' from 'ancGenes.N30.list.bz2' seems valid due to 'N30' being their common ancestor. This command seems applicable:
phylDiag.py genes.Oreh.list.bz2 genes.Cvim.list.bz2 geneFamily.N30.list --no-imr -m 50 -t 5 -g 45 >CS_Oreh_Cvim.sbs
Concerning synthetic blocks between ancestral 'N30' and extant 'Oreh', I've used the geneFamily file of node 'N24', the closest common ancestor. This command appeared effective:
phylDiag.py genes.Oreh.list.bz2 ancGenome.N30.list geneFamily.N24.list --no-imr -m 50 -t 5 -g 45 > CS_N30_Oreh.sbs
However, I'm uncertain about the approach's validity and logic. I've attached relevant files. Additionally, does PhylDiag exclusively calculate synthetic blocks between ancestral nodes and extant genomes? Your insights are invaluable.

Thank you very much in advance for your assistance.

Best regards,
Yu
test_PhylDiag_0806.zip

Request for Custom Scripts: Rearrangement Breakpoints filtering and Interchromosomal Rearrangement Identification

Dear AGORA team,

I'm currently working on genome structure evolution in the Fagales order. Your AGORA pipeline has been incredibly helpful, aiding me in reconstructing ancestral genomes. Thanks for this fantastic tools.

Now I'm keen on studying rearrangement breakpoints and their correlation with phenotypic innovation. While I understand the process to computing rearrangement breakpoints from the AGORA paper's Supplementary Information ('Vertebrate genome evolutionary dynamics'), I'm struggling with practical implementation.

I've looked extensively on AGORA's GitHub but haven't found the relevant custom Python/Perl scripts. Could you kindly share these scripts for rearrangement breakpoints filtering and interchromosal rearrangement identification? They would greatly assist my ancestral genome reconstruction analysis.

Thank you very much in advance. looking forward to cite your paper and pipeline soon.

With best wishes,
Yu

-target==A1 this parameter seems to be unable to run

I tested this parameter with example data
agora-generic.py Species.nwk orthologyGroups/orthologyGroups.%s.list.bz2 genes/genes.%s.list.bz2 -workingDir=A1 -target==A1

divergence times with Agora

Hi, this is a question regarding the paper "Reconstruction of hundreds of reference ancestral genomes across the eukaryotic kingdom", I tried to look for your e-mails but I did not succeed, so I take advantage of this profile to do it. I wanted to ask you if it is possible that you can share the python scripts (or pipelines) that you used to calculate the divergence times taking as evidence the reconstruction of the synteny with Agora since they are not available in the paper and it is not clear to me how you did it. I appreciated it very much.

Are there any tools to build an evolutionary tree in NHX format (from Multiple sequence alignment) ?

Hi！Thank you for such a useful tool!
I've considered scripting the nwk format to NHX myself, but I'm confused by the information in the NHX format, such as what is a speciation node/how do I determine which ancestor it belongs to for an ancestor node

And as far as I know most tools only output in nwk or NEXUS format， treeIO can convert NHX format to nwk format, but not vice versa

Orthology groups correspond to HOGs?

Hello,

Thanks for developing such useful tools. I want to use AGORA with providing the orthologous groups files and not the trees; I have a question regarding these orthology groups. I used Orthofinder for inferring these groups which outputs hierarchical orthogroups (HOGs). For a deep node in the tree the HOGS might only contain a few or even a couple of species. Basically a HOG for a shallow node (e.g. consisting of 2 species) is also a HOG for a deep node (as this shallow node is a child of the deeper node). Should all these groups be provided to all the levels to which they are relevant? Or how do you define the orthologous groups for each node in the species tree, given that some HOGs do not contain all children species of that node?

Thanks
Alex

Breakpoint regions mapping to reference chromosomes

Hello, thanks for this nice tool. Can Agora output the breakpoint regions information like the NEE paper Fig. 6a? Can you recommend related tools to output the phyloview diagram shown in Figure 6c?

Can Nexus format tree replace NHX format tree?

Thanks a lot for convenient tools. I have question about input gene tree. Can Nexus format tree replace NHX format tree? Or you have recommend methods transfrom Nexus format tree to NHX format tree.

dyogenibens / agora Goto Github PK

agora's People

Contributors

Stargazers

Watchers

Forkers

agora's Issues

| Key | Values |

| speciesTree | species-tree.nwk | | geneTrees|orthologyGroups | orthologyGroups/orthologyGroups.%s.list | | genes | genes/genes.%s.list | | target | | | extantSpeciesFilter | | | compress | bz2 | | workingDir | . | | nbThreads | 24 | | forceRerun | False | | sequential | True |

$ bash ./checkAgoraIntegrity.sh

creates tmp directory for testing

mkdir tmp

check the preprocessing scripts

Recommend Projects

Recommend Topics

Recommend Org

| speciesTree | species-tree.nwk |
| geneTrees|orthologyGroups | orthologyGroups/orthologyGroups.%s.list |
| genes | genes/genes.%s.list |
| target | |
| extantSpeciesFilter | |
| compress | bz2 |
| workingDir | . |
| nbThreads | 24 |
| forceRerun | False |
| sequential | True |