Hi, I'm testing out your tool to see if it's suited to my needs e.g.

Thanks for the quick response, I managed to do that: <div class="snippet-clipboard

Big coordinates not handled,about edinburgh-genome-foundry/dnafeaturesviewer

Comments (14)

Zulko commented on July 21, 2024 1

I'm not entirely sure about your scenario but here is some alternative code which (I hope) does the same thing:

from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt
full_sequence = 120_000_000 * "n"  # replace with the actual sequence.
x = 112173530  # index of the region to plot
primer1 = GraphicFeature(start=x, end=x+20, strand=+1, color="#ffd700", label="Primer1")
primer2 = GraphicFeature(start=x+50, end=x+70, strand=-1, color="#cffccc", label="Primer2")
record = GraphicRecord(sequence = full_sequence, features=[primer1, primer2])
region_record = record.crop((x-20, x+700))
detail_record = record.crop((x-5, x+75))

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 3), gridspec_kw={"width_ratios": [1, 2]})
region_record.plot(ax=ax1)
detail_record.plot(ax=ax2)
detail_record.plot_sequence(ax2)

I am not sure why the x coordinates are truncated for you but it could be due to Matplotlib's way of writing the coordinates. Have you noticed that there is a "+1.1217e8" notation under the axis, indicating that you should add 112,170,000 to the x-coordinate.

from dnafeaturesviewer.

Zulko commented on July 21, 2024

The issue here is that the sequence parameter you provide to GraphicRecord should be the whole sequence, not the sequence of the segment. In your case it is not practical because your sequence is huge but there is another solution. Do not use crop. Instead use first_index=11217xxxx... in GraphicRecord, and set sequence_length to 40. Provide the short sequence as you do now. Sorry i am on mobile. Let me know if that makes sense.

from dnafeaturesviewer.

Yu-jinKim commented on July 21, 2024

Thanks for the quick response, I managed to do that:

from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt

sequence = "#complete sequence"

record = GraphicRecord(sequence=sequence, first_index=112173530-20, sequence_length=700, features=[
    GraphicFeature(start=112173530, end=112173530+20, strand=+1, color="#ffd700",
                   label="Primer1"),
    GraphicFeature(start=112173530+50, end=112173530+70, strand=-1, color="#cffccc",
                   label="Primer2"),
])

sequence = "TCAAGCTTGCCATCTCTTCATGTTAGGAAACAAAAAGCCCTAGAAGCAGAATTAGATGCTCAGCACTTATCAGAAACTTT"

record_detail = GraphicRecord(sequence=sequence, first_index=112173530-10, sequence_length=40, features=[
    GraphicFeature(start=112173530, end=112173530+20, strand=+1, color="#ffd700",
                   label="Primer1"),
    GraphicFeature(start=112173530+50, end=112173530+70, strand=-1, color="#cffccc",
                   label="Primer2"),
])

# PLOT THE WHOLE SEQUENCE

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 2))

record.plot(ax=ax1)
record_detail.plot(ax=ax2)
record_detail.plot_sequence(ax2)

fig.savefig('overview_and_detail.png')

And I kinda get what I wanted. However is there a way to keep the original coordinates using first_index?
Right now the x coordinates are truncated (112173530 becomes 3530 for example), I tried increasing the figsize to give enough space to fit the numbers but it doesn't work.

from dnafeaturesviewer.

Yu-jinKim commented on July 21, 2024

Hi again,

First of thanks for all your help.
Second, I think i have a snippet which I does what i envisioned in the beginning.

sequence = "TCAAGCTTGCCATCTCTTCATGTTAGGAAACAAAAAGCCCTAGAAGCAGAATTAGATGCTCAGCACTTATCAGAAACTTTTGACAATATAGACAATTTAAGTCCCAAGGCATCTCATCGTAGTAAGCAGAGACACAAGCAAAGTCTCTATGGTGATTATGTTTTTGACACCAATCGACATGATGATAATAGGTCAGACAATTTTAATACTGGCAACATGACTGTCCTTTCACCATATTTGAATACTACAGTGTTACCCAGCTCCTCTTCATCAAGAGGAAGCTTAGATAGTTCTCGTTCTGAAAAAGATAGAAGTTTGGAGAGAGAACGCGGAATTGGTCTAGGCAACTACCATCCAGCAACAGAAAATCCAGGAACTTCTTCAAAGCGAGGTTTGCAGATCTCCACCACTGCAGCCCAGATTGCCAAAGTCATGGAAGAAGTGTCAGCCATTCATACCTCTCAGGAAGACAGAAGTTCTGGGTCTACCACTGAATTACATTGTGTGACAGATGAGAGAAATGCACTTAGAAGAAGCTCTGCTGCCCATACACATTCAAACACTTACAATTTCACTAAGTCGGAAAATTCAAATAGGACATGTTCTATGCCTTATGCCAAATTAGAATACAAGAGATCTTCAAATGATAGTTTAAATAGTGT"
seq_start = 112173530
name1 = "Primer_1"
start1 = 112173630
end1 = 112173650
strand1 = +
name2 = "Primer_2"
start2 = 112174069
end2 = 112174091
strand2 = -

chunks = [sequence[i:i+80] for i in range(0, len(sequence), 80)]

primers = [
    GraphicFeature(start=start1, end=end1, strand="{}1".format(strand1), color="#ffd700",
                    label=name1),
    GraphicFeature(start=start2, end=end2, strand="{}1".format(strand2), color="#cffccc",
                    label=name2)
]

records = []

for i, chunk in enumerate(chunks):
    record = GraphicRecord(sequence=chunk, first_index=seq_start+i*80, sequence_length=80, features=primers)
    records.append(record)

fig, axes = plt.subplots(nrows=len(chunks), figsize=(14, 10))

for i, ax in enumerate(axes):
    ax.ticklabel_format(useOffset=False, style='plain')
    record = records[i]
    # record.finalize_ax(ax= ax, features_levels = 15, annotations_max_level = 15, auto_figure_height = True, ideal_yspan=1)
    record.plot(ax=ax)
    record.plot_sequence(ax = ax, background = None)

fig.savefig('utils/primer_visualization/{}-{}.pdf'.format(name1, name2), format="pdf")

However I can't figure out how to set the size of the records for them to be a good size for the information to be displayed nicely.

I try using the finalize_ax method but I couldn't find a good fit. I thought I used about using a multiplier depending of the size of the coverage but it doesn't seem satisfactory to me.
Any advice?

from dnafeaturesviewer.

Zulko commented on July 21, 2024

Hi, that's actually a difficult scenario but there should be a (complicated) solution where you would first plot each line as an individual figure to determine how much vertical space it needs needs, then you create subplots as you do now, with a figure height equal to the total, and each ax with the right proportional height (using gridspec_kw={"width_ratios": [1, 2]} in plt.subplots()). I don't have a computer at hand now, I'll try it tomorrow, as this could become a new feature in the library.

from dnafeaturesviewer.

Zulko commented on July 21, 2024

I pushed v2.4 today. In this version, you can plot multiline plots as shown in this code example. This new feature will (hopefully) always make the figure high enough so that the lines don't get squeezed.

This should get you closer to what you want to achieve. Let me know if it helps or if you have any suggestion.

from dnafeaturesviewer.

Yu-jinKim commented on July 21, 2024

Awesome! I had finally given up and put a multiplier to determine the height of the whole figure.
I have noticed 2 things:

If i select nucl_per_line = 80 per exemple, the increment size is 8 which is not ideal
If you could add the non scientific notation in your code, that would be a great quality of life thing here is the snippet to remove the scientific notation:ax.ticklabel_format(useOffset=False, style='plain')

Other than that, it looks great! Thanks for all your help

from dnafeaturesviewer.

Zulko commented on July 21, 2024

Great suggestions thanks. I pushed a new version 2.5.0 which doesnt use offsets, adds "," to separate thousands and millions, and lets you define graphic_record.ticks_resolution=25 to have ticks only every 25bp.

Example:

from dna_features_viewer import GraphicRecord, GraphicFeature
import matplotlib.ticker as ticker
start = 123456
record = GraphicRecord(
    sequence_length=240,
    features=[
        GraphicFeature(start+10, start+70, 1, label="bla"),
        GraphicFeature(start+60, start+130, 1, label="ble"),
        GraphicFeature(start+140, start+210, -1, label="bli"),
    ],
    first_index=start,
    ticks_resolution=25
)
fig, axes = record.plot_on_multiple_lines(nucl_per_line=80)

Let me know if that works for you.

from dnafeaturesviewer.

Yu-jinKim commented on July 21, 2024

Brilliant! Last question hopefully: do you have a check for the size of the figure? I have some cases where the figure is bigger than 2^16 so matplotlib doesn't like it but i get to know that pretty late in the execution of the code.
If I could know that before hand, I could try and crop the GraphicRecord for regions of interest.

from dnafeaturesviewer.

Zulko commented on July 21, 2024

I believe this a bit of an extreme case, and in the current code I only know the figure's final size at 60% of the process' duration, so it wouldn't save you much time. It would be better if you did these checks on your side. Would a multi-page PDF (instead of a single big figure) fit your needs?

from dnafeaturesviewer.

Yu-jinKim commented on July 21, 2024

I guess it would yes.
Otherwise, I can just do as I said and crop for regions of interest.

from dnafeaturesviewer.

Zulko commented on July 21, 2024

I pushed a new major version online which features a record.plot_on_multiple_pages feature (see here for an example).

Closing this thread. Please open a new issue if you run into a problem.

from dnafeaturesviewer.

Yu-jinKim commented on July 21, 2024

Great, I'll definitely be recommending your tool to people !

from dnafeaturesviewer.

Zulko commented on July 21, 2024

Thanks!

from dnafeaturesviewer.

Big coordinates not handled about dnafeaturesviewer HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent