jonhoo / inferno Goto Github PK

A Rust port of FlameGraph

License: Other

Rust 95.68% Shell 0.22% JavaScript 4.02% CSS 0.08%

inferno's Issues

Make style settings for generated flamegraph configurable

flamegraph.pl has a lot of configuration arguments, and many of them are there mainly to change the style of the resulting flame graph. We'll leave color schemes do a different issue, but I'm talking about things like

Individually, these should all be pretty easy to add in support for, so we should do it! Even smaller PRs that just add one or two of these are warmly welcome! Try to stick to the flamegraph.pl options where possible.

0.7 -> 0.9 upgrade caused reduced svg fidelity

I noticed this on rbspy specifically: rbspy/rbspy#239

Remove `SmallVec` in favor of using a long-term `Vec`

See #78 (comment). Specifically, currently we use a SmallVec for temporarily holding inlined java method calls:

inferno/src/collapse/perf.rs

Lines 278 to 279 in 10a1fd7

    
           let mut java_inline = 
        
               SmallVec::<[String; 1]>::with_capacity(rawfunc.split("->").count() + 1);

However, in #78, @Licenser decided to instead re-use a Vec in the Folder, which seems like a better strategy since it generally won't allocate even if there are inlined method calls (unlike SmallVec, which will allocate every time that happens).

Provide magic collapse-dispatcher

Once #78 lands, we'll have two different Collapse implementations. It'd be cool to have a inferno-collapse-guess, which attempts to use whichever Collapse implementation is appropriate for a given input! This may end up being a little tricky, but here's a sketch of how it might work:

Add a method is_applicable(&str) -> Option<bool> to Collapse. None means "not sure -- need more input", Some(true) means "yes, this implementation should work with that string, and Some(false) means "no, this implementation definitely won't work.
Add a new collapse implementation, "guess", which internally buffers the string it reads from the underlying streams, and knows about all the other Collapse impls (maybe using something like inventory?). For each line it reads, it calls is_applicable on all impls that haven't returned Some(false) thus far with the input it has collected. If any of them return Some(true), it decides to use that impl, and then:
Implements BufRead for a struct that wraps a String plus a BufRead. The String is the string the guesser has read so far, and that's what it'll read from first. Once all of that string has been read, it proceeds to read from the BufRead.

Make the dependencies which are not directly used for flamegraph generation optional

Currently when used as a library to generate a flamegraph (without any other bells and whistles) inferno requires a bunch of extra dependencies which don't directly contribute to flamegraph generation, e.g. object, gimli, memmap, addr2line, structopt, env_logger. It'd be nice to make those optional.

I've just added support for direct flamegraph generation through inferno to our CPU profiler, however I can't enable it by default because of this. (E.g. we currently use a non-released version of addr2line from Git, which is overridden by the addr2line pulled by inferno.)

Flame graph differentials

In brendangregg/FlameGraph@465ac0c, @brendangregg added support for differential flame graphs, which can be really neat for profiling performance regressions. They have a couple of effects on flamegraph.pl:

They require tracking "delta" values in both TimedFrame and Frame when we generate them in flow.
They mean that there may be an additional sample column in each stack line.
They change the status line shown on hover for each frame.
And they change the color palette.

When generating a flamegraph programmatically the `flamegraph::Options::consistent_palette` option shouldn't require filesystem access

There should be an API which allows the user to enable the consistent_palette option and maintain the palette map themselves in memory without touching the filesystem.

Logging infrastructure that better supports multi-threaded code

cargo flamegraph

Hey! As a very heavy flamegraph user I've long been thinking about a cargo flamegraph command, but have been wary of including the bundle of perl scripts in the popular implementation. Is something like cargo flamegraph a priority of this project? I can imagine that it could support that use case in a more clean way.

Glad to see effort in the rust + flamegraph space :]

itoa with separators that avoids allocation

Hey Jon. Just watched your most recent stream. I wrote a crate recently that does itoa with thousands separators without allocation. It's not up on crates.io yet, but will be shortly. num-format.

Persistent color palettes

In the original flamegraph.pl, there's an option (--cp) that enables "consistent palettes". When this option is on, the color used for every function in the current flame graph is stored in a file called palette.map. When the option is used while the file exists, the color palette is also read in from that file when the flame graph generator starts up, and functions get their colors from that file unless they were previously unknown.

Bonus points for compatibility with palette.map files generated by flamegraph.pl.

sport vtune

Upstream has code for working from a vtune trace. (Yay windows support.)
https://github.com/brendangregg/FlameGraph/blob/master/stackcollapse-vtune.pl

Can we add functionality to work with vtune?

Custom per-function name attributes

The original flame graph supports the --nameattr option, which lets you specify a file that contains "extra" attributes to use for the SVG frames of particular functions. This seems like a pretty neat feature that we should support. Be careful that those attributes are handled in accordance with group_start and group_end from flamegraph.pl though!

Figure out how to make `--inline` work with ASLR

Modern systems have ASLR enabled, which means that the addresses output by perf script are randomized, and do not correspond to the addresses present in the binary. This is basically this issue. There are ways to produce performance profiles that do not have this issue, either by compiling with gcc -no-pie -static, or by running with setarch -R, but that's pretty burdensome. It'd be better if we could just fix up the addresses after the fact!

From what I can tell, the trick would be to use perf script's --show-mmap-events flag, which outputs lines like:

program 14351 12112.392588: PERF_RECORD_MMAP2 14351/14351: [0x7ffde10f9000(0x21000) @ 0x7ffffffde000 00:00 0 0]: rw-p [stack]
program 14351 12112.392591: PERF_RECORD_MMAP2 14351/14351: [0x563d17016000(0x585000) @ 0 00:17 345286 1007]: r--p /data/jon/cargo-target/release/a>
program 14351 12112.392594: PERF_RECORD_MMAP2 14351/14351: [0x563d17068000(0x417000) @ 0x52000 00:17 345286 1007]: r-xp /data/jon/cargo-target/rel>
program 14351 12112.392595: PERF_RECORD_MMAP2 14351/14351: [0x563d1747f000(0xd7000) @ 0x469000 00:17 345286 1007]: r--p /data/jon/cargo-target/rel>
program 14351 12112.392596: PERF_RECORD_MMAP2 14351/14351: [0x563d17557000(0x3d000) @ 0x540000 00:17 345286 1007]: rw-p /data/jon/cargo-target/rel>
program 14351 12112.392597: PERF_RECORD_MMAP2 14351/14351: [0x563d17594000(0x7000) @ 0x563d17594000 00:00 0 0]: rw-p //anon
program 14351 12112.392603: PERF_RECORD_MMAP2 14351/14351: [0x7f01e559a000(0x2c000) @ 0 00:17 3865 12]: r--p /usr/lib/ld-2.28.so
program 14351 12112.392607: PERF_RECORD_MMAP2 14351/14351: [0x7f01e559c000(0x1f000) @ 0x2000 00:17 3865 12]: r-xp /usr/lib/ld-2.28.so
program 14351 12112.392608: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55bb000(0x8000) @ 0x21000 00:17 3865 12]: r--p /usr/lib/ld-2.28.so
program 14351 12112.392610: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55c3000(0x2000) @ 0x28000 00:17 3865 12]: rw-p /usr/lib/ld-2.28.so
program 14351 12112.392614: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55c5000(0x1000) @ 0x7f01e55c5000 00:00 0 0]: rw-p //anon
program 14351 12112.392616: PERF_RECORD_MMAP2 14351/14351: [0x7ffde11d5000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
program 14351 12112.392616: PERF_RECORD_MMAP2 14351/14351: [0x7ffde11d2000(0x3000) @ 0 00:00 0 0]: r--p [vvar]

that indicate what offset is given to various program pointers. By tracking the various mmap'd regions over time, we should be able to undo the randomization of program counter values, and restore the original addresses, which can then be un-inlined correctly.

For what it's worth nperf record actually keeps track of mmap regions and records the mapping used to de-randomized addresses so that it can later restore the original addresses. See also #14 (comment).

Don't skip empty lines inside DTrace stracks

This if makes me think that empty lines within a stack (i.e., after a non-empty line and before a count) should not be skipped.

Test our decision to include ' and " in collapsed output

When implementing inferno-collapse-perf, we made the decision not to strip ' and ". I believe the Perl version did that to not have to deal with them in SVG output, but we should make sure that we actually produce sensible output even with those characters (Rust code with lifetimes for example).

Add test cases from FlameGraph

The upstream FlameGraph repository has a number of test cases run by this script. It'd be good to integrate those as a test suite into this tool to make sure we retain compatibility with the original stackcollapse tool.

Port Optimize FlameGraphs by utilizing CSS and modern JS from FlameGraph

This issue is to port brendangregg/FlameGraph#189 to inferno. The original author has consented to us porting their work to this project: #26 (comment).

Port stackcollapse-bpftrace

Given how cool bpftrace is, it seems like a great idea to also port stackcollapse-bpftrace into inferno! Looking at its code, this should be a pretty straightforward exercise, since the eBPF already does most of the collapsing for us, and ustack already emits lines that are nearly in the right format.

Unify defaults across CLI and impl Default

#52 introduces a lot of default values, and they are duplicated between the impl Default for Options and the CLI StructOpt definitions. We should find a way to declare these as global constants, and then use those in both places. There's been a bunch of discussion of that in #52 (comment), which should provide a good starting point.

Improve test coverage for codebase

The code coverage of our tests is good, but not as good as we'd like it to be. In particular, I see some obvious candidates for new tests.

In perf stack collapsing (see coverage report):

Collapsing a perf trace with an empty line.
Collapsing a perf trace with a "weird" event line.
Collapsing a perf trace with a "process name".
Collapsing a perf trace with an inline Java function.
Collapsing a perf trace with a "weird" stack line.
Collapse a profile with Go names.

In DTrace stack collapsing (see coverage report):

Collapse a file with only header lines.
Collapse a file with a line that has ::, but not a trailing argument list.

Stack collapsing in general:

Collapse using collapse_file.
Collapse using collapse_file on STDIN.

In flamegraph (see coverage report):

Add support for un-inlining.

stackcollapse-perf supports calling out to addr2line to determine the symbol to use for each program counter address (as opposed to using the symbol names that perf script produces). We should probably take advantage of the addr2line crate, which enables us to do un-inlining directly without executing an external process, which should significantly speed up processing. In particular because we can parse the debug symbols only once!

Flame chart mode

In flamegraph.pl, you can pass the --flamechart flag to produce a "flame chart" in which stacks frames are not merged. To quote Brendan:

Flame charts were first added by Google Chrome's WebKit Web Inspector (bug). While inspired by flame graphs, flame charts put the passage of time on the x-axis instead of the alphabet. This means that time-based patterns can studied. Flame graphs reorder the x-axis samples alphabetically, which maximizes frame merging, and better shows the big picture of the profile. Multi-threaded applications can't be shown sensibly by a single flame chart, whereas they can with a flame graphs (a problem flame charts didn't need to deal with, since it was initially used for single-threaded JavaScript analysis). Both visualizations are useful, and tools should make both available if possible (eg, TraceCompass does). Some analysis tools have implemented flame charts and mistakingly called them flame graphs.

I'm not entirely sure how time factors into this, since the collapsed stacks don't have any timing information from the original execution in them, but people seem to find them useful, so let's implement them! Also, if you do find them useful, please include docs and examples! If you just want to do the implementation work, that's fine too, just include the quoted explanation above somewhere in the docs.

In terms of code, --flamechart mainly avoids sorting the incoming stacks, and changes the plot title.

Add iterator interface to that returns structs for each event.

It might be nice to a struct for stacks, an then split it into reader, collapse, 'renderer'

That would allow not having to convert a stack to a string and then back from a string to data to render flame graphs and also generalize the rendering logic.

Add support for event filtering

The Perl stackcollapse-perf has support for filtering by only particular events. The skeleton code for dealing with that kind of filtering is already in inferno, but needs to be fleshed out to actually support filtering.

Add support for "fluid" drawing

Currently, the width of the flamegraph, and the individual sample boxes, is set in pixels. It'd be really cool if we could instead use percentages, as it would mean that users with a wider monitor would automatically be able to take advantage of their screen size.

Port difffolded.pl

Now that #60 has landed, it would be nice if inferno could generate the differential profile files that this feature expects. Currently you need to use difffolded.pl from upstream FlameGraph. We should port this to Rust.

See Differential Flame Graphs for more info.

Adopt changes from upstream pull requests

The upstream FlameGraph project has a number of outstanding pull requests that fix bugs and add or improve features. We should take a look through them and see whether we may be able to incorporate some of them here! I propose we take the following approach for each one:

First, see whether the change is still relevant in inferno
Then, comment on the issue asking the original author whether they mind having their change adopted into inferno (we can't just take without permission -- copyright and licensing is no joke!)
If they agree, create a tracking issue on inferno describing the original PR, link to it + the author's agreement to port, and add the adoption label. Then, either starting writing a PR if you want to implement it yourself, or just leave it there for others to adopt!
If they don't respond, and you believe it to be an important feature, feel free to create a corresponding issue on inferno and write a PR, but note explicitly in both that you don't have the author's approval to port the change. We won't merge those until we do, but we can at least do the work in advance!
If they explicitly say they're unwilling to allow the change to be made in inferno, well, then we're our of luck. The exception would be if you can implement the feature without consulting their changes at all!

We should also probably take a look through known FlameGraph issues and see which ones also apply to inferno. For those that do, we should add tracking issues here, link to the original issue, and ideally also add a failing test case where possible!

Multicore collapsing

In theory we should be able to slice the input to the collapse scripts into multiple pieces, collapse each piece independently (on different cores) and then merge the results. That would likely lead to significant speedups on multi-core machines, and would likely be worth investigating.

Investigate and improve allocation behavior of collapse-perf

Since the project is fairly recent I assume it's benched on a recent version of Rust (or even nightly, does Criterion require nightly?) That means it's using the system allocator, and IIRC flamegraph munging is a lot of string munging and other allocation-heavy tasks, jemalloc could well have an edge.

Add support for timemax option

flamegraph.pl has a --total option which sets the internal variable timemax. I think it basically "compresses" the flamegraph, but not sure exactly what effect that has. If someone cleverer than me can figure out what it is for, document it, and then implement it, that'd be awesome!

Support for non host reports

The same issue found here applies to inferno: brendangregg/FlameGraph#132

This issue affects me, and I would like to work on it if I time. I'll try to get a sample profile from an arm device and post it here.

Top-heavy flamegraphs hide the subtitle

If there's too long of a line at the top of the graph, the subtitle is hidden, probably because it ends up being covered up

For example:

If the top line is very long: echo "hello 1" | inferno-flamegraph --subtitle "Hello! this is a subtitle" > regular.svg
If the top line is very long because it's an inverted graph: cat flamegraph/test/results/perf-java-stacks-01-collapsed-pid.txt | inferno-flamegraph --subtitle "Hello! this is a subtitle" --inverted > inverted.svg

Integrate Java support from original stackcollapse-perf

The original stackcollapse-perf has special support for Java in a couple of places. Specifically, it has special rules for tidying up Java perf reports, as well as this loop for supporting Java inlining. See brendangregg/FlameGraph#89 in particular.

Reversed stack ordering

We should support the --reverse option, which, well, reverses the order of all stacks. Note that this does require that we also sort our input, because otherwise the trick we play in flow in flamegraph/merge.rs doesn't work!

Tidy up library interface for flamegraph (and collapse?)

If we want other projects, like @koute's nperf, to start using inferno to draw their flame graphs (and perhaps even to do the collapsing), we need to provide an external interface that isn't just a purely line-based textual one like the ones used internally between perf script, stackcollapse-perf, and flamegraph.pl. Specifically, we'll want an interface that is typed, and where all (most?) of the parsing is done by the caller. I'm imagining something like:

mod flamegraph {
  struct StackCount<S> {
    stack: S,
    count: usize,
  }
    
  fn from_samples<I, S, W>(o: Opts, samples: I, writer: W) -> 
    where
      I: IntoIterator,
      I::Item: Into<StackCount<S>>,
      S: Iterator<Item = &str>,
      W: Write
  {
  }
}

from_samples would require that the input is given in sorted order (see #28), and then write a flame graph output to writer. The existing code that does parsing would call from_samples with an iterator created from the lines iterator, mapped something like this:

lines.map(|line| {
  let (stack, nsamples) = ...;
  StackCount {
    count: nsamples,
    stack: stack.split(';'),
  }
})

@koute: what do you think of an interface roughly like the above? You think that's an iterator that nperf could easily produce? What would be a good interface for collapse (if that's even something you might consider using)?

Share `BytesStart`s across frames in flamegraph

While #37 got us pretty far, there is still one source of allocation in the main flamegraph loop: the attributes on BytesStart. Specifically, when you add attributes to a BytesStart, it allocates a Vec to hold the underlying bytes if it doesn't already have a Vec. That's unfortunate since we create a new BytesStart for every g. If we instead created a single BytesStart that we re-used somehow (by resetting all the attributes), that'd be way more efficient. Sadly, that's not yet supported, but I've filed tafia/quick-xml#148 which would enable us to do that!

Cut off left part of text, rather than right part of text when SVG is too small

Let's say your function is too wide to fit on the screen. Right now, the right side will be truncated:

for (var x = txt.length - 2; x > 0; x--) {
        if (t.getSubStringLength(0, x + 2) <= w) {
            t.textContent = txt.substring(0, x) + "..";
            return;
        }
    }
    t.textContent = "";

So e.g. in my case I have /home/itamarst/Devel/memory-profiler/venv/lib64/python3.8/site-packages/numpy/core/numeric.py:ones truncated to /home/itamarst/Devel/memory-profiler/venv/lib64/...

But, you'll notice the most significant information is on the right side of that string: the function name, the main file, etc. It would be better to truncate it in the opposite direction: ../python3.8/site-packages/numpy/core/numeric.py:ones, or at least have that as a rendering option.

Support arbitrary scaling factor

The --factor option in flamegraph.pl lets you arbitrarily multiply all sample counts by a constant, probably to show "true counts" if you've undersampled. We should document and adopt that option.

Optimize implementation for when there are many frames

#99 identified that performance drops significantly when there are many frames in the output (see #99 (comment)). We should figure out why that is, and fix it! Maybe it's time to flamegraph inferno ;)

Icicle graphs

The --inverted flag, apart from changing the title of the plot, changes the flame graph such that the entire plot is up-side-down (note that this also affects the JavaScript). We should probably support that flag too.

Reduce the number of dependencies

$ cargo tree --no-indent | sed 's/ (\*)//' | sort -u | wc -l
126

My guess is that some of these are unnecessary. Let's see if we can't do something to help improve build times.

Add documentation in README about how to use inferno

Add some instruction using perf to output a perf record and use inferno to output an svg out of it.

Support --countname

Upstream flamegraph doesn't discard fractional samples

I'm still watching part 2 but given the current state of the source code, I believe that you may have misinterpreted the part of the regex in upstream FlameGraph about removing the fractional part of the samples.

my($stack, $samples) = (/^(.*)\s+?(\d+(?:\.\d*)?)$/);

inferno/src/flamegraph/merge.rs

Lines 94 to 98 in ce1c41a

    
           // strip fractional part (if any); 
        
           // foobar 1.klwdjlakdj 
        
           if let Some(doti) = samples.find('.') { 
        
               samples = &samples[..doti]; 
        
           }

My reading is that the non-capturing group is there to allow input in either integer or fractional format, but not in any way remove it from the surrounding capture. The non-capturing group is there to group up the separator and the fractions for the following ?.

That is, it accepts sample values like 12, 34., and 56.78.

Add full color support

Currently, inferno's flamegraph implementation just hardcodes the color of all the frames in the flamegraph. That's makes the flamegraph significantly less useful, and also much less pretty. Instead, we should support flamegraph.pl's --colors option. It affects colors in a number of ways:

The background color changes
It colors individual frames through this serious function (which should live in its own module)
Some of the color modes do extra funky color things like "chain"
When combined with the --hash flag, the namehash coloring function is used, which attempts to color similar "parts" of the stack similarly, and consistently across runs.

We should support all that! It's probably most a matter of porting the giant color function.

Add module level docs to flamegraph

There's a lot of useful info at the top of flamegraph.pl that we should incorporate into the (currently non-existant) docs for the flamegraph module. FlameGraph also has a history that it'd be worthwhile to refer to.

Add back demangling support

Prior to #74, we had support for demangling function names (although only with --inline). While perf is supposed to be doing it already (see --demangle) there are reports that it doesn't demangle everything super well. Having our own demangling step using one of the many demangling crates seems like it might be a good idea!

Support working with multiple collapsed files

The trick to merge frames we use in flow only works if the input lines are sorted. This is generally the case when we're operating over a single file, but is not true if the user provides multiple input files (which we should support), and we should cater to that case. Sorting inputs will also be required for #21. Note that it is fine for the sort to be unstable.

Avoid `format!` in the inner loop.

Currently, we use format! in a number of places in order to pass appropriate values to BytesStart::with_attributes. That's really sad, because it means we're allocating tons of little String items. It'd be far better if we instead just had a single buffer that use write!() to write into, and then pulled out slices of to get the attribute substrings. I don't know if such a crate exists, but if it doesn't, that seems like a useful thing to write!

	let mut java_inline =
	SmallVec::<[String; 1]>::with_capacity(rawfunc.split("->").count() + 1);

	// strip fractional part (if any);
	// foobar 1.klwdjlakdj
	if let Some(doti) = samples.find('.') {
	samples = &samples[..doti];
	}

jonhoo / inferno Goto Github PK

inferno's Issues

Recommend Projects

Recommend Topics

Recommend Org