Comments (14)
I'm not entirely sure about your scenario but here is some alternative code which (I hope) does the same thing:
from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt
full_sequence = 120_000_000 * "n" # replace with the actual sequence.
x = 112173530 # index of the region to plot
primer1 = GraphicFeature(start=x, end=x+20, strand=+1, color="#ffd700", label="Primer1")
primer2 = GraphicFeature(start=x+50, end=x+70, strand=-1, color="#cffccc", label="Primer2")
record = GraphicRecord(sequence = full_sequence, features=[primer1, primer2])
region_record = record.crop((x-20, x+700))
detail_record = record.crop((x-5, x+75))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 3), gridspec_kw={"width_ratios": [1, 2]})
region_record.plot(ax=ax1)
detail_record.plot(ax=ax2)
detail_record.plot_sequence(ax2)
I am not sure why the x coordinates are truncated for you but it could be due to Matplotlib's way of writing the coordinates. Have you noticed that there is a "+1.1217e8" notation under the axis, indicating that you should add 112,170,000 to the x-coordinate.
from dnafeaturesviewer.
The issue here is that the sequence parameter you provide to GraphicRecord should be the whole sequence, not the sequence of the segment. In your case it is not practical because your sequence is huge but there is another solution. Do not use crop. Instead use first_index=11217xxxx... in GraphicRecord, and set sequence_length to 40. Provide the short sequence as you do now. Sorry i am on mobile. Let me know if that makes sense.
from dnafeaturesviewer.
Thanks for the quick response, I managed to do that:
from dna_features_viewer import GraphicFeature, GraphicRecord
import matplotlib.pyplot as plt
sequence = "#complete sequence"
record = GraphicRecord(sequence=sequence, first_index=112173530-20, sequence_length=700, features=[
GraphicFeature(start=112173530, end=112173530+20, strand=+1, color="#ffd700",
label="Primer1"),
GraphicFeature(start=112173530+50, end=112173530+70, strand=-1, color="#cffccc",
label="Primer2"),
])
sequence = "TCAAGCTTGCCATCTCTTCATGTTAGGAAACAAAAAGCCCTAGAAGCAGAATTAGATGCTCAGCACTTATCAGAAACTTT"
record_detail = GraphicRecord(sequence=sequence, first_index=112173530-10, sequence_length=40, features=[
GraphicFeature(start=112173530, end=112173530+20, strand=+1, color="#ffd700",
label="Primer1"),
GraphicFeature(start=112173530+50, end=112173530+70, strand=-1, color="#cffccc",
label="Primer2"),
])
# PLOT THE WHOLE SEQUENCE
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 2))
record.plot(ax=ax1)
record_detail.plot(ax=ax2)
record_detail.plot_sequence(ax2)
fig.savefig('overview_and_detail.png')
And I kinda get what I wanted. However is there a way to keep the original coordinates using first_index?
Right now the x coordinates are truncated (112173530 becomes 3530 for example), I tried increasing the figsize to give enough space to fit the numbers but it doesn't work.
from dnafeaturesviewer.
Hi again,
First of thanks for all your help.
Second, I think i have a snippet which I does what i envisioned in the beginning.
sequence = "TCAAGCTTGCCATCTCTTCATGTTAGGAAACAAAAAGCCCTAGAAGCAGAATTAGATGCTCAGCACTTATCAGAAACTTTTGACAATATAGACAATTTAAGTCCCAAGGCATCTCATCGTAGTAAGCAGAGACACAAGCAAAGTCTCTATGGTGATTATGTTTTTGACACCAATCGACATGATGATAATAGGTCAGACAATTTTAATACTGGCAACATGACTGTCCTTTCACCATATTTGAATACTACAGTGTTACCCAGCTCCTCTTCATCAAGAGGAAGCTTAGATAGTTCTCGTTCTGAAAAAGATAGAAGTTTGGAGAGAGAACGCGGAATTGGTCTAGGCAACTACCATCCAGCAACAGAAAATCCAGGAACTTCTTCAAAGCGAGGTTTGCAGATCTCCACCACTGCAGCCCAGATTGCCAAAGTCATGGAAGAAGTGTCAGCCATTCATACCTCTCAGGAAGACAGAAGTTCTGGGTCTACCACTGAATTACATTGTGTGACAGATGAGAGAAATGCACTTAGAAGAAGCTCTGCTGCCCATACACATTCAAACACTTACAATTTCACTAAGTCGGAAAATTCAAATAGGACATGTTCTATGCCTTATGCCAAATTAGAATACAAGAGATCTTCAAATGATAGTTTAAATAGTGT"
seq_start = 112173530
name1 = "Primer_1"
start1 = 112173630
end1 = 112173650
strand1 = +
name2 = "Primer_2"
start2 = 112174069
end2 = 112174091
strand2 = -
chunks = [sequence[i:i+80] for i in range(0, len(sequence), 80)]
primers = [
GraphicFeature(start=start1, end=end1, strand="{}1".format(strand1), color="#ffd700",
label=name1),
GraphicFeature(start=start2, end=end2, strand="{}1".format(strand2), color="#cffccc",
label=name2)
]
records = []
for i, chunk in enumerate(chunks):
record = GraphicRecord(sequence=chunk, first_index=seq_start+i*80, sequence_length=80, features=primers)
records.append(record)
fig, axes = plt.subplots(nrows=len(chunks), figsize=(14, 10))
for i, ax in enumerate(axes):
ax.ticklabel_format(useOffset=False, style='plain')
record = records[i]
# record.finalize_ax(ax= ax, features_levels = 15, annotations_max_level = 15, auto_figure_height = True, ideal_yspan=1)
record.plot(ax=ax)
record.plot_sequence(ax = ax, background = None)
fig.savefig('utils/primer_visualization/{}-{}.pdf'.format(name1, name2), format="pdf")
However I can't figure out how to set the size of the records for them to be a good size for the information to be displayed nicely.
I try using the finalize_ax method but I couldn't find a good fit. I thought I used about using a multiplier depending of the size of the coverage but it doesn't seem satisfactory to me.
Any advice?
from dnafeaturesviewer.
Hi, that's actually a difficult scenario but there should be a (complicated) solution where you would first plot each line as an individual figure to determine how much vertical space it needs needs, then you create subplots as you do now, with a figure height equal to the total, and each ax with the right proportional height (using gridspec_kw={"width_ratios": [1, 2]}
in plt.subplots()
). I don't have a computer at hand now, I'll try it tomorrow, as this could become a new feature in the library.
from dnafeaturesviewer.
I pushed v2.4 today. In this version, you can plot multiline plots as shown in this code example. This new feature will (hopefully) always make the figure high enough so that the lines don't get squeezed.
This should get you closer to what you want to achieve. Let me know if it helps or if you have any suggestion.
from dnafeaturesviewer.
Awesome! I had finally given up and put a multiplier to determine the height of the whole figure.
I have noticed 2 things:
- If i select nucl_per_line = 80 per exemple, the increment size is 8 which is not ideal
- If you could add the non scientific notation in your code, that would be a great quality of life thing here is the snippet to remove the scientific notation:
ax.ticklabel_format(useOffset=False, style='plain')
Other than that, it looks great! Thanks for all your help
from dnafeaturesviewer.
Great suggestions thanks. I pushed a new version 2.5.0 which doesnt use offsets, adds "," to separate thousands and millions, and lets you define graphic_record.ticks_resolution=25
to have ticks only every 25bp.
Example:
from dna_features_viewer import GraphicRecord, GraphicFeature
import matplotlib.ticker as ticker
start = 123456
record = GraphicRecord(
sequence_length=240,
features=[
GraphicFeature(start+10, start+70, 1, label="bla"),
GraphicFeature(start+60, start+130, 1, label="ble"),
GraphicFeature(start+140, start+210, -1, label="bli"),
],
first_index=start,
ticks_resolution=25
)
fig, axes = record.plot_on_multiple_lines(nucl_per_line=80)
Let me know if that works for you.
from dnafeaturesviewer.
Brilliant! Last question hopefully: do you have a check for the size of the figure? I have some cases where the figure is bigger than 2^16 so matplotlib doesn't like it but i get to know that pretty late in the execution of the code.
If I could know that before hand, I could try and crop the GraphicRecord for regions of interest.
from dnafeaturesviewer.
I believe this a bit of an extreme case, and in the current code I only know the figure's final size at 60% of the process' duration, so it wouldn't save you much time. It would be better if you did these checks on your side. Would a multi-page PDF (instead of a single big figure) fit your needs?
from dnafeaturesviewer.
I guess it would yes.
Otherwise, I can just do as I said and crop for regions of interest.
from dnafeaturesviewer.
I pushed a new major version online which features a record.plot_on_multiple_pages
feature (see here for an example).
Closing this thread. Please open a new issue if you run into a problem.
from dnafeaturesviewer.
Great, I'll definitely be recommending your tool to people !
from dnafeaturesviewer.
Thanks!
from dnafeaturesviewer.
Related Issues (20)
- How to display differences from Variant Calling File? HOT 5
- How to use `plot_on_multiple_pages` with `ax` attribute ? HOT 1
- plot_with_bokeh ignores strand=0 and still plots arrowhead HOT 1
- plot_with_bokeh gives BAD_COLUMN_NAME errors and uses default font when using bokeh 2.3.0 HOT 1
- Broken Axis HOT 1
- Change shape of feature arrows HOT 2
- Cannot apply sequence translation on plot_on_multiple_lines or plot_on_multiple_pages HOT 2
- BiopythonTranslator: SeqFeatures with location_operator='join' get wrong position
- Labels overlap whan sharing axis HOT 2
- Feature request: x_lim best-fit HOT 6
- Enhanced Bokeh support for multiple plots HOT 2
- Linking exon annotations with intron lines, and other things. HOT 5
- Add an example which plots sequence features not starting from 0 HOT 5
- truncated scaffolds from gff files HOT 4
- figure is not coming
- type hints support HOT 1
- Error on BioPython v1.8.0 HOT 9
- type error for global variable related to BioPython update HOT 2
- sequences display from 1 not 0 ? HOT 1
- How to solve the memory problem of batch plotting? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dnafeaturesviewer.