Code Monkey home page Code Monkey logo

Comments (14)

Zulko avatar Zulko commented on July 21, 2024 1

I'm not entirely sure about your scenario but here is some alternative code which (I hope) does the same thing:

from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt
full_sequence = 120_000_000 * "n"  # replace with the actual sequence.
x = 112173530  # index of the region to plot
primer1 = GraphicFeature(start=x, end=x+20, strand=+1, color="#ffd700", label="Primer1")
primer2 = GraphicFeature(start=x+50, end=x+70, strand=-1, color="#cffccc", label="Primer2")
record = GraphicRecord(sequence = full_sequence, features=[primer1, primer2])
region_record = record.crop((x-20, x+700))
detail_record = record.crop((x-5, x+75))

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 3), gridspec_kw={"width_ratios": [1, 2]})
region_record.plot(ax=ax1)
detail_record.plot(ax=ax2)
detail_record.plot_sequence(ax2)

image

I am not sure why the x coordinates are truncated for you but it could be due to Matplotlib's way of writing the coordinates. Have you noticed that there is a "+1.1217e8" notation under the axis, indicating that you should add 112,170,000 to the x-coordinate.

from dnafeaturesviewer.

Zulko avatar Zulko commented on July 21, 2024

The issue here is that the sequence parameter you provide to GraphicRecord should be the whole sequence, not the sequence of the segment. In your case it is not practical because your sequence is huge but there is another solution. Do not use crop. Instead use first_index=11217xxxx... in GraphicRecord, and set sequence_length to 40. Provide the short sequence as you do now. Sorry i am on mobile. Let me know if that makes sense.

from dnafeaturesviewer.

Yu-jinKim avatar Yu-jinKim commented on July 21, 2024

Thanks for the quick response, I managed to do that:

from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt

sequence = "#complete sequence"

record = GraphicRecord(sequence=sequence, first_index=112173530-20, sequence_length=700, features=[
    GraphicFeature(start=112173530, end=112173530+20, strand=+1, color="#ffd700",
                   label="Primer1"),
    GraphicFeature(start=112173530+50, end=112173530+70, strand=-1, color="#cffccc",
                   label="Primer2"),
])

sequence = "TCAAGCTTGCCATCTCTTCATGTTAGGAAACAAAAAGCCCTAGAAGCAGAATTAGATGCTCAGCACTTATCAGAAACTTT"

record_detail = GraphicRecord(sequence=sequence, first_index=112173530-10, sequence_length=40, features=[
    GraphicFeature(start=112173530, end=112173530+20, strand=+1, color="#ffd700",
                   label="Primer1"),
    GraphicFeature(start=112173530+50, end=112173530+70, strand=-1, color="#cffccc",
                   label="Primer2"),
])

# PLOT THE WHOLE SEQUENCE

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 2))

record.plot(ax=ax1)
record_detail.plot(ax=ax2)
record_detail.plot_sequence(ax2)

fig.savefig('overview_and_detail.png')

And I kinda get what I wanted. However is there a way to keep the original coordinates using first_index?
Right now the x coordinates are truncated (112173530 becomes 3530 for example), I tried increasing the figsize to give enough space to fit the numbers but it doesn't work.

from dnafeaturesviewer.

Yu-jinKim avatar Yu-jinKim commented on July 21, 2024

Hi again,

First of thanks for all your help.
Second, I think i have a snippet which I does what i envisioned in the beginning.

sequence = "TCAAGCTTGCCATCTCTTCATGTTAGGAAACAAAAAGCCCTAGAAGCAGAATTAGATGCTCAGCACTTATCAGAAACTTTTGACAATATAGACAATTTAAGTCCCAAGGCATCTCATCGTAGTAAGCAGAGACACAAGCAAAGTCTCTATGGTGATTATGTTTTTGACACCAATCGACATGATGATAATAGGTCAGACAATTTTAATACTGGCAACATGACTGTCCTTTCACCATATTTGAATACTACAGTGTTACCCAGCTCCTCTTCATCAAGAGGAAGCTTAGATAGTTCTCGTTCTGAAAAAGATAGAAGTTTGGAGAGAGAACGCGGAATTGGTCTAGGCAACTACCATCCAGCAACAGAAAATCCAGGAACTTCTTCAAAGCGAGGTTTGCAGATCTCCACCACTGCAGCCCAGATTGCCAAAGTCATGGAAGAAGTGTCAGCCATTCATACCTCTCAGGAAGACAGAAGTTCTGGGTCTACCACTGAATTACATTGTGTGACAGATGAGAGAAATGCACTTAGAAGAAGCTCTGCTGCCCATACACATTCAAACACTTACAATTTCACTAAGTCGGAAAATTCAAATAGGACATGTTCTATGCCTTATGCCAAATTAGAATACAAGAGATCTTCAAATGATAGTTTAAATAGTGT"
seq_start = 112173530
name1 = "Primer_1"
start1 = 112173630
end1 = 112173650
strand1 = +
name2 = "Primer_2"
start2 = 112174069
end2 = 112174091
strand2 = -

chunks = [sequence[i:i+80] for i in range(0, len(sequence), 80)]

primers = [
    GraphicFeature(start=start1, end=end1, strand="{}1".format(strand1), color="#ffd700",
                    label=name1),
    GraphicFeature(start=start2, end=end2, strand="{}1".format(strand2), color="#cffccc",
                    label=name2)
]

records = []

for i, chunk in enumerate(chunks):
    record = GraphicRecord(sequence=chunk, first_index=seq_start+i*80, sequence_length=80, features=primers)
    records.append(record)

fig, axes = plt.subplots(nrows=len(chunks), figsize=(14, 10))

for i, ax in enumerate(axes):
    ax.ticklabel_format(useOffset=False, style='plain')
    record = records[i]
    # record.finalize_ax(ax= ax, features_levels = 15, annotations_max_level = 15, auto_figure_height = True, ideal_yspan=1)
    record.plot(ax=ax)
    record.plot_sequence(ax = ax, background = None)

fig.savefig('utils/primer_visualization/{}-{}.pdf'.format(name1, name2), format="pdf")

However I can't figure out how to set the size of the records for them to be a good size for the information to be displayed nicely.

image

image

I try using the finalize_ax method but I couldn't find a good fit. I thought I used about using a multiplier depending of the size of the coverage but it doesn't seem satisfactory to me.
Any advice?

from dnafeaturesviewer.

Zulko avatar Zulko commented on July 21, 2024

Hi, that's actually a difficult scenario but there should be a (complicated) solution where you would first plot each line as an individual figure to determine how much vertical space it needs needs, then you create subplots as you do now, with a figure height equal to the total, and each ax with the right proportional height (using gridspec_kw={"width_ratios": [1, 2]} in plt.subplots()). I don't have a computer at hand now, I'll try it tomorrow, as this could become a new feature in the library.

from dnafeaturesviewer.

Zulko avatar Zulko commented on July 21, 2024

I pushed v2.4 today. In this version, you can plot multiline plots as shown in this code example. This new feature will (hopefully) always make the figure high enough so that the lines don't get squeezed.

This should get you closer to what you want to achieve. Let me know if it helps or if you have any suggestion.

from dnafeaturesviewer.

Yu-jinKim avatar Yu-jinKim commented on July 21, 2024

Awesome! I had finally given up and put a multiplier to determine the height of the whole figure.
I have noticed 2 things:

  • If i select nucl_per_line = 80 per exemple, the increment size is 8 which is not ideal
  • If you could add the non scientific notation in your code, that would be a great quality of life thing here is the snippet to remove the scientific notation:ax.ticklabel_format(useOffset=False, style='plain')

Other than that, it looks great! Thanks for all your help

from dnafeaturesviewer.

Zulko avatar Zulko commented on July 21, 2024

Great suggestions thanks. I pushed a new version 2.5.0 which doesnt use offsets, adds "," to separate thousands and millions, and lets you define graphic_record.ticks_resolution=25 to have ticks only every 25bp.

Example:

from dna_features_viewer import GraphicRecord, GraphicFeature
import matplotlib.ticker as ticker
start = 123456
record = GraphicRecord(
    sequence_length=240,
    features=[
        GraphicFeature(start+10, start+70, 1, label="bla"),
        GraphicFeature(start+60, start+130, 1, label="ble"),
        GraphicFeature(start+140, start+210, -1, label="bli"),
    ],
    first_index=start,
    ticks_resolution=25
)
fig, axes = record.plot_on_multiple_lines(nucl_per_line=80)

Selection_999(868)

Let me know if that works for you.

from dnafeaturesviewer.

Yu-jinKim avatar Yu-jinKim commented on July 21, 2024

Brilliant! Last question hopefully: do you have a check for the size of the figure? I have some cases where the figure is bigger than 2^16 so matplotlib doesn't like it but i get to know that pretty late in the execution of the code.
If I could know that before hand, I could try and crop the GraphicRecord for regions of interest.

from dnafeaturesviewer.

Zulko avatar Zulko commented on July 21, 2024

I believe this a bit of an extreme case, and in the current code I only know the figure's final size at 60% of the process' duration, so it wouldn't save you much time. It would be better if you did these checks on your side. Would a multi-page PDF (instead of a single big figure) fit your needs?

from dnafeaturesviewer.

Yu-jinKim avatar Yu-jinKim commented on July 21, 2024

I guess it would yes.
Otherwise, I can just do as I said and crop for regions of interest.

from dnafeaturesviewer.

Zulko avatar Zulko commented on July 21, 2024

I pushed a new major version online which features a record.plot_on_multiple_pages feature (see here for an example).

Closing this thread. Please open a new issue if you run into a problem.

from dnafeaturesviewer.

Yu-jinKim avatar Yu-jinKim commented on July 21, 2024

Great, I'll definitely be recommending your tool to people !

from dnafeaturesviewer.

Zulko avatar Zulko commented on July 21, 2024

Thanks!

from dnafeaturesviewer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.