jonhoo / inferno Goto Github PK
View Code? Open in Web Editor NEWA Rust port of FlameGraph
License: Other
A Rust port of FlameGraph
License: Other
flamegraph.pl
has a lot of configuration arguments, and many of them are there mainly to change the style of the resulting flame graph. We'll leave color schemes do a different issue, but I'm talking about things like
imagewidth
frameheight
fonttype
fontsize
fontwidth
minwidth
nametype
countname
bgcolors
(#32)titletext
searchcolor
notestext
subtitletext
Individually, these should all be pretty easy to add in support for, so we should do it! Even smaller PRs that just add one or two of these are warmly welcome! Try to stick to the flamegraph.pl
options where possible.
I noticed this on rbspy specifically: rbspy/rbspy#239
See #78 (comment). Specifically, currently we use a SmallVec
for temporarily holding inlined java method calls:
Lines 278 to 279 in 10a1fd7
However, in #78, @Licenser decided to instead re-use a Vec
in the Folder
, which seems like a better strategy since it generally won't allocate even if there are inlined method calls (unlike SmallVec
, which will allocate every time that happens).
Once #78 lands, we'll have two different Collapse
implementations. It'd be cool to have a inferno-collapse-guess
, which attempts to use whichever Collapse
implementation is appropriate for a given input! This may end up being a little tricky, but here's a sketch of how it might work:
is_applicable(&str) -> Option<bool>
to Collapse
. None
means "not sure -- need more input", Some(true)
means "yes, this implementation should work with that string, and Some(false)
means "no, this implementation definitely won't work.Collapse
impls (maybe using something like inventory
?). For each line it reads, it calls is_applicable
on all impls that haven't returned Some(false)
thus far with the input it has collected. If any of them return Some(true)
, it decides to use that impl, and then:BufRead
for a struct that wraps a String
plus a BufRead
. The String
is the string the guesser has read so far, and that's what it'll read from first. Once all of that string has been read, it proceeds to read from the BufRead
.Currently when used as a library to generate a flamegraph (without any other bells and whistles) inferno
requires a bunch of extra dependencies which don't directly contribute to flamegraph generation, e.g. object
, gimli
, memmap
, addr2line
, structopt
, env_logger
. It'd be nice to make those optional.
I've just added support for direct flamegraph generation through inferno
to our CPU profiler, however I can't enable it by default because of this. (E.g. we currently use a non-released version of addr2line
from Git, which is overridden by the addr2line
pulled by inferno
.)
In brendangregg/FlameGraph@465ac0c, @brendangregg added support for differential flame graphs, which can be really neat for profiling performance regressions. They have a couple of effects on flamegraph.pl
:
TimedFrame
and Frame
when we generate them in flow
.There should be an API which allows the user to enable the consistent_palette
option and maintain the palette map themselves in memory without touching the filesystem.
Hey! As a very heavy flamegraph user I've long been thinking about a cargo flamegraph
command, but have been wary of including the bundle of perl scripts in the popular implementation. Is something like cargo flamegraph
a priority of this project? I can imagine that it could support that use case in a more clean way.
Glad to see effort in the rust + flamegraph space :]
Hey Jon. Just watched your most recent stream. I wrote a crate recently that does itoa with thousands separators without allocation. It's not up on crates.io yet, but will be shortly. num-format.
In the original flamegraph.pl
, there's an option (--cp
) that enables "consistent palettes". When this option is on, the color used for every function in the current flame graph is stored in a file called palette.map
. When the option is used while the file exists, the color palette is also read in from that file when the flame graph generator starts up, and functions get their colors from that file unless they were previously unknown.
Bonus points for compatibility with palette.map
files generated by flamegraph.pl
.
Upstream has code for working from a vtune trace. (Yay windows support.)
https://github.com/brendangregg/FlameGraph/blob/master/stackcollapse-vtune.pl
Can we add functionality to work with vtune?
The original flame graph supports the --nameattr
option, which lets you specify a file that contains "extra" attributes to use for the SVG frames of particular functions. This seems like a pretty neat feature that we should support. Be careful that those attributes are handled in accordance with group_start
and group_end
from flamegraph.pl
though!
Modern systems have ASLR enabled, which means that the addresses output by perf script
are randomized, and do not correspond to the addresses present in the binary. This is basically this issue. There are ways to produce performance profiles that do not have this issue, either by compiling with gcc -no-pie -static
, or by running with setarch -R
, but that's pretty burdensome. It'd be better if we could just fix up the addresses after the fact!
From what I can tell, the trick would be to use perf script
's --show-mmap-events
flag, which outputs lines like:
program 14351 12112.392588: PERF_RECORD_MMAP2 14351/14351: [0x7ffde10f9000(0x21000) @ 0x7ffffffde000 00:00 0 0]: rw-p [stack]
program 14351 12112.392591: PERF_RECORD_MMAP2 14351/14351: [0x563d17016000(0x585000) @ 0 00:17 345286 1007]: r--p /data/jon/cargo-target/release/a>
program 14351 12112.392594: PERF_RECORD_MMAP2 14351/14351: [0x563d17068000(0x417000) @ 0x52000 00:17 345286 1007]: r-xp /data/jon/cargo-target/rel>
program 14351 12112.392595: PERF_RECORD_MMAP2 14351/14351: [0x563d1747f000(0xd7000) @ 0x469000 00:17 345286 1007]: r--p /data/jon/cargo-target/rel>
program 14351 12112.392596: PERF_RECORD_MMAP2 14351/14351: [0x563d17557000(0x3d000) @ 0x540000 00:17 345286 1007]: rw-p /data/jon/cargo-target/rel>
program 14351 12112.392597: PERF_RECORD_MMAP2 14351/14351: [0x563d17594000(0x7000) @ 0x563d17594000 00:00 0 0]: rw-p //anon
program 14351 12112.392603: PERF_RECORD_MMAP2 14351/14351: [0x7f01e559a000(0x2c000) @ 0 00:17 3865 12]: r--p /usr/lib/ld-2.28.so
program 14351 12112.392607: PERF_RECORD_MMAP2 14351/14351: [0x7f01e559c000(0x1f000) @ 0x2000 00:17 3865 12]: r-xp /usr/lib/ld-2.28.so
program 14351 12112.392608: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55bb000(0x8000) @ 0x21000 00:17 3865 12]: r--p /usr/lib/ld-2.28.so
program 14351 12112.392610: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55c3000(0x2000) @ 0x28000 00:17 3865 12]: rw-p /usr/lib/ld-2.28.so
program 14351 12112.392614: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55c5000(0x1000) @ 0x7f01e55c5000 00:00 0 0]: rw-p //anon
program 14351 12112.392616: PERF_RECORD_MMAP2 14351/14351: [0x7ffde11d5000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
program 14351 12112.392616: PERF_RECORD_MMAP2 14351/14351: [0x7ffde11d2000(0x3000) @ 0 00:00 0 0]: r--p [vvar]
that indicate what offset is given to various program pointers. By tracking the various mmap'd regions over time, we should be able to undo the randomization of program counter values, and restore the original addresses, which can then be un-inlined correctly.
For what it's worth nperf record
actually keeps track of mmap regions and records the mapping used to de-randomized addresses so that it can later restore the original addresses. See also #14 (comment).
This if
makes me think that empty lines within a stack (i.e., after a non-empty line and before a count) should not be skipped.
When implementing inferno-collapse-perf
, we made the decision not to strip '
and "
. I believe the Perl version did that to not have to deal with them in SVG output, but we should make sure that we actually produce sensible output even with those characters (Rust code with lifetimes for example).
The upstream FlameGraph repository has a number of test cases run by this script. It'd be good to integrate those as a test suite into this tool to make sure we retain compatibility with the original stackcollapse
tool.
This issue is to port brendangregg/FlameGraph#189 to inferno. The original author has consented to us porting their work to this project: #26 (comment).
Given how cool bpftrace is, it seems like a great idea to also port stackcollapse-bpftrace
into inferno! Looking at its code, this should be a pretty straightforward exercise, since the eBPF already does most of the collapsing for us, and ustack
already emits lines that are nearly in the right format.
#52 introduces a lot of default values, and they are duplicated between the impl Default for Options
and the CLI StructOpt
definitions. We should find a way to declare these as global constants, and then use those in both places. There's been a bunch of discussion of that in #52 (comment), which should provide a good starting point.
The code coverage of our tests is good, but not as good as we'd like it to be. In particular, I see some obvious candidates for new tests.
In perf
stack collapsing (see coverage report):
perf
trace with an empty line.perf
trace with a "weird" event line.perf
trace with a "process name".perf
trace with an inline Java function.perf
trace with a "weird" stack line.In DTrace stack collapsing (see coverage report):
::
, but not a trailing argument list.Stack collapsing in general:
collapse_file
.collapse_file
on STDIN
.In flamegraph
(see coverage report):
hash
).stackcollapse-perf
supports calling out to addr2line
to determine the symbol to use for each program counter address (as opposed to using the symbol names that perf script
produces). We should probably take advantage of the addr2line
crate, which enables us to do un-inlining directly without executing an external process, which should significantly speed up processing. In particular because we can parse the debug symbols only once!
In flamegraph.pl
, you can pass the --flamechart
flag to produce a "flame chart" in which stacks frames are not merged. To quote Brendan:
Flame charts were first added by Google Chrome's WebKit Web Inspector (bug). While inspired by flame graphs, flame charts put the passage of time on the x-axis instead of the alphabet. This means that time-based patterns can studied. Flame graphs reorder the x-axis samples alphabetically, which maximizes frame merging, and better shows the big picture of the profile. Multi-threaded applications can't be shown sensibly by a single flame chart, whereas they can with a flame graphs (a problem flame charts didn't need to deal with, since it was initially used for single-threaded JavaScript analysis). Both visualizations are useful, and tools should make both available if possible (eg, TraceCompass does). Some analysis tools have implemented flame charts and mistakingly called them flame graphs.
I'm not entirely sure how time factors into this, since the collapsed stacks don't have any timing information from the original execution in them, but people seem to find them useful, so let's implement them! Also, if you do find them useful, please include docs and examples! If you just want to do the implementation work, that's fine too, just include the quoted explanation above somewhere in the docs.
In terms of code, --flamechart
mainly avoids sorting the incoming stacks, and changes the plot title.
It might be nice to a struct for stacks, an then split it into reader, collapse, 'renderer'
That would allow not having to convert a stack to a string and then back from a string to data to render flame graphs and also generalize the rendering logic.
The Perl stackcollapse-perf
has support for filtering by only particular events. The skeleton code for dealing with that kind of filtering is already in inferno, but needs to be fleshed out to actually support filtering.
Currently, the width of the flamegraph, and the individual sample boxes, is set in pixels. It'd be really cool if we could instead use percentages, as it would mean that users with a wider monitor would automatically be able to take advantage of their screen size.
Now that #60 has landed, it would be nice if inferno could generate the differential profile files that this feature expects. Currently you need to use difffolded.pl from upstream FlameGraph. We should port this to Rust.
See Differential Flame Graphs for more info.
The upstream FlameGraph project has a number of outstanding pull requests that fix bugs and add or improve features. We should take a look through them and see whether we may be able to incorporate some of them here! I propose we take the following approach for each one:
adoption
label. Then, either starting writing a PR if you want to implement it yourself, or just leave it there for others to adopt!We should also probably take a look through known FlameGraph issues and see which ones also apply to inferno. For those that do, we should add tracking issues here, link to the original issue, and ideally also add a failing test case where possible!
In theory we should be able to slice the input to the collapse scripts into multiple pieces, collapse each piece independently (on different cores) and then merge the results. That would likely lead to significant speedups on multi-core machines, and would likely be worth investigating.
Since the project is fairly recent I assume it's benched on a recent version of Rust (or even nightly, does Criterion require nightly?) That means it's using the system allocator, and IIRC flamegraph munging is a lot of string munging and other allocation-heavy tasks, jemalloc could well have an edge.
flamegraph.pl
has a --total
option which sets the internal variable timemax
. I think it basically "compresses" the flamegraph, but not sure exactly what effect that has. If someone cleverer than me can figure out what it is for, document it, and then implement it, that'd be awesome!
The same issue found here applies to inferno: brendangregg/FlameGraph#132
This issue affects me, and I would like to work on it if I time. I'll try to get a sample profile from an arm device and post it here.
If there's too long of a line at the top of the graph, the subtitle is hidden, probably because it ends up being covered up
For example:
If the top line is very long: echo "hello 1" | inferno-flamegraph --subtitle "Hello! this is a subtitle" > regular.svg
If the top line is very long because it's an inverted graph: cat flamegraph/test/results/perf-java-stacks-01-collapsed-pid.txt | inferno-flamegraph --subtitle "Hello! this is a subtitle" --inverted > inverted.svg
The original stackcollapse-perf
has special support for Java in a couple of places. Specifically, it has special rules for tidying up Java perf reports, as well as this loop for supporting Java inlining. See brendangregg/FlameGraph#89 in particular.
We should support the --reverse
option, which, well, reverses the order of all stacks. Note that this does require that we also sort our input, because otherwise the trick we play in flow
in flamegraph/merge.rs
doesn't work!
If we want other projects, like @koute's nperf, to start using inferno to draw their flame graphs (and perhaps even to do the collapsing), we need to provide an external interface that isn't just a purely line-based textual one like the ones used internally between perf script
, stackcollapse-perf
, and flamegraph.pl
. Specifically, we'll want an interface that is typed, and where all (most?) of the parsing is done by the caller. I'm imagining something like:
mod flamegraph {
struct StackCount<S> {
stack: S,
count: usize,
}
fn from_samples<I, S, W>(o: Opts, samples: I, writer: W) ->
where
I: IntoIterator,
I::Item: Into<StackCount<S>>,
S: Iterator<Item = &str>,
W: Write
{
}
}
from_samples
would require that the input is given in sorted order (see #28), and then write a flame graph output to writer
. The existing code that does parsing would call from_samples with an iterator created from the lines iterator, mapped something like this:
lines.map(|line| {
let (stack, nsamples) = ...;
StackCount {
count: nsamples,
stack: stack.split(';'),
}
})
@koute: what do you think of an interface roughly like the above? You think that's an iterator that nperf
could easily produce? What would be a good interface for collapse
(if that's even something you might consider using)?
While #37 got us pretty far, there is still one source of allocation in the main flamegraph loop: the attributes on BytesStart
. Specifically, when you add attributes to a BytesStart
, it allocates a Vec
to hold the underlying bytes if it doesn't already have a Vec
. That's unfortunate since we create a new BytesStart
for every g
. If we instead created a single BytesStart
that we re-used somehow (by resetting all the attributes), that'd be way more efficient. Sadly, that's not yet supported, but I've filed tafia/quick-xml#148 which would enable us to do that!
Let's say your function is too wide to fit on the screen. Right now, the right side will be truncated:
for (var x = txt.length - 2; x > 0; x--) {
if (t.getSubStringLength(0, x + 2) <= w) {
t.textContent = txt.substring(0, x) + "..";
return;
}
}
t.textContent = "";
So e.g. in my case I have /home/itamarst/Devel/memory-profiler/venv/lib64/python3.8/site-packages/numpy/core/numeric.py:ones
truncated to /home/itamarst/Devel/memory-profiler/venv/lib64/..
.
But, you'll notice the most significant information is on the right side of that string: the function name, the main file, etc. It would be better to truncate it in the opposite direction: ../python3.8/site-packages/numpy/core/numeric.py:ones
, or at least have that as a rendering option.
The --factor
option in flamegraph.pl
lets you arbitrarily multiply all sample counts by a constant, probably to show "true counts" if you've undersampled. We should document and adopt that option.
#99 identified that performance drops significantly when there are many frames in the output (see #99 (comment)). We should figure out why that is, and fix it! Maybe it's time to flamegraph inferno ;)
The --inverted
flag, apart from changing the title of the plot, changes the flame graph such that the entire plot is up-side-down (note that this also affects the JavaScript). We should probably support that flag too.
$ cargo tree --no-indent | sed 's/ (\*)//' | sort -u | wc -l
126
My guess is that some of these are unnecessary. Let's see if we can't do something to help improve build times.
Add some instruction using perf to output a perf record and use inferno to output an svg out of it.
I'm still watching part 2 but given the current state of the source code, I believe that you may have misinterpreted the part of the regex in upstream FlameGraph about removing the fractional part of the samples.
my($stack, $samples) = (/^(.*)\s+?(\d+(?:\.\d*)?)$/);
inferno/src/flamegraph/merge.rs
Lines 94 to 98 in ce1c41a
My reading is that the non-capturing group is there to allow input in either integer or fractional format, but not in any way remove it from the surrounding capture. The non-capturing group is there to group up the separator and the fractions for the following ?
.
That is, it accepts sample values like 12
, 34.
, and 56.78
.
Currently, inferno's flamegraph implementation just hardcodes the color of all the frames in the flamegraph. That's makes the flamegraph significantly less useful, and also much less pretty. Instead, we should support flamegraph.pl
's --colors
option. It affects colors in a number of ways:
--hash
flag, the namehash
coloring function is used, which attempts to color similar "parts" of the stack similarly, and consistently across runs.We should support all that! It's probably most a matter of porting the giant color
function.
There's a lot of useful info at the top of flamegraph.pl
that we should incorporate into the (currently non-existant) docs for the flamegraph
module. FlameGraph also has a history that it'd be worthwhile to refer to.
Prior to #74, we had support for demangling function names (although only with --inline
). While perf is supposed to be doing it already (see --demangle
) there are reports that it doesn't demangle everything super well. Having our own demangling step using one of the many demangling crates seems like it might be a good idea!
The trick to merge frames we use in flow
only works if the input lines are sorted. This is generally the case when we're operating over a single file, but is not true if the user provides multiple input files (which we should support), and we should cater to that case. Sorting inputs will also be required for #21. Note that it is fine for the sort to be unstable
.
Currently, we use format!
in a number of places in order to pass appropriate values to BytesStart::with_attributes
. That's really sad, because it means we're allocating tons of little String
items. It'd be far better if we instead just had a single buffer that use write!()
to write into, and then pulled out slices of to get the attribute substrings. I don't know if such a crate exists, but if it doesn't, that seems like a useful thing to write!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.