Code Monkey home page Code Monkey logo

Comments (8)

GuilhemRoyer avatar GuilhemRoyer commented on September 23, 2024

I have found a way to do this but I am not sure I get all the spots: I search for my gene family in all the projection files and then get the column which, if I understand correctly, is the list of spots where the gene can be found.

Am I right to do it this way?

from ppanggolin.

axbazin avatar axbazin commented on September 23, 2024

Hi !

I'm pretty sure there is no spot-related information in the projection files.
However, there is a useful command line that you can use that will tell you everything that is related to a gene:

ppanggolin align -p pangenome.h5 -o MYOUTPUTDIR --sequences MY_SEQUENCES_OF_INTEREST.fasta --getinfo

This will give you the exact list of spots, and RGPs, where the genes are found, in a tsv file called 'info_input_seq.tsv'.
It should write one line for each gene in your .fasta file.

The format is described in the last table of the Align wiki page here

You can use any fasta sequence for this, and it will get assigned to its closest member gene family within the pangenome, then extract all info related to this gene family, including the list of spots (and RGPs) in which is it found !

Hopefully that helps !

Adelme

from ppanggolin.

GuilhemRoyer avatar GuilhemRoyer commented on September 23, 2024

Thank you Adelme! In fact there is spot-related information in the projection files (column 13). So I was able to get the information but it was more tricky than the dedicated solution you propose.

Unfortunately I have still a problem while drawing spots. It returns an error that seems to be related to the size of some spots ("RecursionError: maximum recursion depth exceeded while getting the str of an object"). Indeed it always freeze to the same spot that contains 2003 identical_rgp. Did you already experienced it?

Guilhem

from ppanggolin.

axbazin avatar axbazin commented on September 23, 2024

aha, that's a newish addition I was not aware (I probably forgot haha), and the doc is not up-to-date, I will fix this. As I see from the code, what is in which row can change depending on what you ask as output.

I have never met this error in that context, can you tell me which command line you used exactly, and show me the log ?
There is probably a bug to fix there.

from ppanggolin.

GuilhemRoyer avatar GuilhemRoyer commented on September 23, 2024

Thank you Adelme. Here is the command that I used:

ppanggolin align -c 12 -p pangenome.h5 -o iroN_related_spots --sequences iroN.fasta --draw_related

I can get a few spots, then it crashes with the following error:

2022-10-08 12:20:22 main.py:l214 INFO Command: /pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/bin/ppanggolin align -c 12 -p pangenome.h5 -o iroN_related_spots --sequences iroN.fasta --draw_related
2022-10-08 12:20:22 main.py:l215 INFO PPanGGOLiN version: 1.2.74
2022-10-08 12:20:22 readBinaries.py:l39 INFO Getting the current pangenome's status
2022-10-08 12:20:23 readBinaries.py:l389 INFO Reading pangenome annotations...
100%|██████████| 11004557/11004557 [00:53<00:00, 205145.75gene/s]
100%|████��2022-10-08 12:25:38 readBinaries.py:l403 INFO] Reading pangenome gene families...
100%|██████████| 2302/2302 [04:19<00:00, 8.88organism/s]
100%|██████████| 10746569/10746569 [01:17<00:00, 139480.95gene/s]
2022-10-08 12:26:56 readBinaries.py:l418 INFO Reading the RGP...
100%|██████████| 52908/52908 [00:00<00:00, 53671.87gene family/s]
94%|█████████▍| 2890752/3060047 [00:12<00:00,2022-10-08 12:27:10 readBinaries.py:l425 INFO Reading the spots...
100%|██████████| 3060047/3060047 [00:13<00:00, 226261.49gene/s]
99%|█████████▉| 163003/164541 [06:57<00:05, 2022-10-08 12:34:10 alignOnPang.py:l46 INFO Aligning sequences to cluster representatives...
2022-10-08 12:34:12 alignOnPang.py:l51 INFO Extracting alignments...
2022-10-08 12:34:13 alignOnPang.py:l145 INFO Writing RGP and spot information related to hits in the pangenome
2022-10-08 12:35:44 alignOnPang.py:l165 INFO Drawing the 9 spots with more than 1 organization related to hits of the input sequences...
2022-10-08 12:35:44 draw_spot.py:l448 INFO Ordering genes among regions, and drawing spots...
100%|██████████| 164541/164541 [06:57<00:00, 393.79region/s]
11%|█ | 1/9 [00:02<00:22, 2.84s/spot]
Traceback (most recent call last):
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/bin/ppanggolin", line 10, in
sys.exit(main())
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/ppanggolin/main.py", line 241, in main
ppanggolin.align.launch(args)
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/ppanggolin/align/alignOnPang.py", line 264, in launch
align(pangenome=pangenome, sequenceFile=args.sequences, output=args.output, tmpdir=args.tmpdir, cpu=args.cpu,
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/ppanggolin/align/alignOnPang.py", line 247, in align
getSeqInfo(seq2pang, pangenome, output, draw_related, disable_bar=disable_bar)
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/ppanggolin/align/alignOnPang.py", line 168, in getSeqInfo
drawSelectedSpots(drawn_spots, pangenome, output, pangenome.parameters["spots"]["overlapping_match"],
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/ppanggolin/figures/draw_spot.py", line 492, in drawSelectedSpots
GeneLists = orderGeneLists(GeneLists, overlapping_match, exact_match, set_size)
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/ppanggolin/figures/draw_spot.py", line 50, in orderGeneLists
return rowOrderGeneLists(geneLists)
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/ppanggolin/figures/draw_spot.py", line 72, in rowOrderGeneLists
dendro = dendrogram(hc, no_plot=True)
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3349, in dendrogram
_dendrogram_calculate_info(
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3652, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3652, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3619, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3652, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3652, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3652, in _dendrogram_calculate_info
_dendrogram_calculate_info(
[Previous line repeated 981 more times]
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3619, in _dendrogram_calculate_info
_dendrogram_calculate_info(
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3549, in _dendrogram_calculate_info
_append_singleton_leaf_node(Z, p, n, level, lvs, ivl,
File "/pasteur/appa/homes/groyer/anaconda3/envs/Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 3427, in _append_singleton_leaf_node
ivl.append(str(int(i)))
RecursionError: maximum recursion depth exceeded while getting the str of an object

Or is there another way to get the representative of each RGP without drawing spots (i.e. the file "spot_XX_identical_rgps.tsv") ?

Guilhem

from ppanggolin.

GuilhemRoyer avatar GuilhemRoyer commented on September 23, 2024

Hi,

I finally fix the problem by increasing the recursion limit of Python in the script "Ppanggolin/lib/python3.8/site-packages/scipy/cluster/hierarchy.py"

"""
import sys
print(sys.getrecursionlimit(2500))
"""

But, not sure if it is the best way to solve the problem...

Guilhem

from ppanggolin.

axbazin avatar axbazin commented on September 23, 2024

Hi !
This was indeed the best way to solve that problem, if it indeed got you to the files in the end.
Having to draw the spots to get the files is not super practical I admit, it could probably be better as a “standard” output file.
Adelme

from ppanggolin.

axbazin avatar axbazin commented on September 23, 2024

Closing as this is supposedly fixed in master and will be in the next release (fixed it exactly the way you did, but in ppanggolin's code)

I don't have the means to test that it actually works though :(

from ppanggolin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.