Code Monkey home page Code Monkey logo

inferno's Issues

Make style settings for generated flamegraph configurable

flamegraph.pl has a lot of configuration arguments, and many of them are there mainly to change the style of the resulting flame graph. We'll leave color schemes do a different issue, but I'm talking about things like

  • imagewidth
  • frameheight
  • fonttype
  • fontsize
  • fontwidth
  • minwidth
  • nametype
  • countname
  • bgcolors (#32)
  • titletext
  • searchcolor
  • notestext
  • subtitletext

Individually, these should all be pretty easy to add in support for, so we should do it! Even smaller PRs that just add one or two of these are warmly welcome! Try to stick to the flamegraph.pl options where possible.

Remove `SmallVec` in favor of using a long-term `Vec`

See #78 (comment). Specifically, currently we use a SmallVec for temporarily holding inlined java method calls:

let mut java_inline =
SmallVec::<[String; 1]>::with_capacity(rawfunc.split("->").count() + 1);

However, in #78, @Licenser decided to instead re-use a Vec in the Folder, which seems like a better strategy since it generally won't allocate even if there are inlined method calls (unlike SmallVec, which will allocate every time that happens).

Provide magic collapse-dispatcher

Once #78 lands, we'll have two different Collapse implementations. It'd be cool to have a inferno-collapse-guess, which attempts to use whichever Collapse implementation is appropriate for a given input! This may end up being a little tricky, but here's a sketch of how it might work:

  • Add a method is_applicable(&str) -> Option<bool> to Collapse. None means "not sure -- need more input", Some(true) means "yes, this implementation should work with that string, and Some(false) means "no, this implementation definitely won't work.
  • Add a new collapse implementation, "guess", which internally buffers the string it reads from the underlying streams, and knows about all the other Collapse impls (maybe using something like inventory?). For each line it reads, it calls is_applicable on all impls that haven't returned Some(false) thus far with the input it has collected. If any of them return Some(true), it decides to use that impl, and then:
  • Implements BufRead for a struct that wraps a String plus a BufRead. The String is the string the guesser has read so far, and that's what it'll read from first. Once all of that string has been read, it proceeds to read from the BufRead.

Make the dependencies which are not directly used for flamegraph generation optional

Currently when used as a library to generate a flamegraph (without any other bells and whistles) inferno requires a bunch of extra dependencies which don't directly contribute to flamegraph generation, e.g. object, gimli, memmap, addr2line, structopt, env_logger. It'd be nice to make those optional.

I've just added support for direct flamegraph generation through inferno to our CPU profiler, however I can't enable it by default because of this. (E.g. we currently use a non-released version of addr2line from Git, which is overridden by the addr2line pulled by inferno.)

Flame graph differentials

In brendangregg/FlameGraph@465ac0c, @brendangregg added support for differential flame graphs, which can be really neat for profiling performance regressions. They have a couple of effects on flamegraph.pl:

cargo flamegraph

Hey! As a very heavy flamegraph user I've long been thinking about a cargo flamegraph command, but have been wary of including the bundle of perl scripts in the popular implementation. Is something like cargo flamegraph a priority of this project? I can imagine that it could support that use case in a more clean way.

Glad to see effort in the rust + flamegraph space :]

Persistent color palettes

In the original flamegraph.pl, there's an option (--cp) that enables "consistent palettes". When this option is on, the color used for every function in the current flame graph is stored in a file called palette.map. When the option is used while the file exists, the color palette is also read in from that file when the flame graph generator starts up, and functions get their colors from that file unless they were previously unknown.

Bonus points for compatibility with palette.map files generated by flamegraph.pl.

Figure out how to make `--inline` work with ASLR

Modern systems have ASLR enabled, which means that the addresses output by perf script are randomized, and do not correspond to the addresses present in the binary. This is basically this issue. There are ways to produce performance profiles that do not have this issue, either by compiling with gcc -no-pie -static, or by running with setarch -R, but that's pretty burdensome. It'd be better if we could just fix up the addresses after the fact!

From what I can tell, the trick would be to use perf script's --show-mmap-events flag, which outputs lines like:

program 14351 12112.392588: PERF_RECORD_MMAP2 14351/14351: [0x7ffde10f9000(0x21000) @ 0x7ffffffde000 00:00 0 0]: rw-p [stack]
program 14351 12112.392591: PERF_RECORD_MMAP2 14351/14351: [0x563d17016000(0x585000) @ 0 00:17 345286 1007]: r--p /data/jon/cargo-target/release/a>
program 14351 12112.392594: PERF_RECORD_MMAP2 14351/14351: [0x563d17068000(0x417000) @ 0x52000 00:17 345286 1007]: r-xp /data/jon/cargo-target/rel>
program 14351 12112.392595: PERF_RECORD_MMAP2 14351/14351: [0x563d1747f000(0xd7000) @ 0x469000 00:17 345286 1007]: r--p /data/jon/cargo-target/rel>
program 14351 12112.392596: PERF_RECORD_MMAP2 14351/14351: [0x563d17557000(0x3d000) @ 0x540000 00:17 345286 1007]: rw-p /data/jon/cargo-target/rel>
program 14351 12112.392597: PERF_RECORD_MMAP2 14351/14351: [0x563d17594000(0x7000) @ 0x563d17594000 00:00 0 0]: rw-p //anon
program 14351 12112.392603: PERF_RECORD_MMAP2 14351/14351: [0x7f01e559a000(0x2c000) @ 0 00:17 3865 12]: r--p /usr/lib/ld-2.28.so
program 14351 12112.392607: PERF_RECORD_MMAP2 14351/14351: [0x7f01e559c000(0x1f000) @ 0x2000 00:17 3865 12]: r-xp /usr/lib/ld-2.28.so
program 14351 12112.392608: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55bb000(0x8000) @ 0x21000 00:17 3865 12]: r--p /usr/lib/ld-2.28.so
program 14351 12112.392610: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55c3000(0x2000) @ 0x28000 00:17 3865 12]: rw-p /usr/lib/ld-2.28.so
program 14351 12112.392614: PERF_RECORD_MMAP2 14351/14351: [0x7f01e55c5000(0x1000) @ 0x7f01e55c5000 00:00 0 0]: rw-p //anon
program 14351 12112.392616: PERF_RECORD_MMAP2 14351/14351: [0x7ffde11d5000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
program 14351 12112.392616: PERF_RECORD_MMAP2 14351/14351: [0x7ffde11d2000(0x3000) @ 0 00:00 0 0]: r--p [vvar]

that indicate what offset is given to various program pointers. By tracking the various mmap'd regions over time, we should be able to undo the randomization of program counter values, and restore the original addresses, which can then be un-inlined correctly.

For what it's worth nperf record actually keeps track of mmap regions and records the mapping used to de-randomized addresses so that it can later restore the original addresses. See also #14 (comment).

Test our decision to include ' and " in collapsed output

When implementing inferno-collapse-perf, we made the decision not to strip ' and ". I believe the Perl version did that to not have to deal with them in SVG output, but we should make sure that we actually produce sensible output even with those characters (Rust code with lifetimes for example).

Unify defaults across CLI and impl Default

#52 introduces a lot of default values, and they are duplicated between the impl Default for Options and the CLI StructOpt definitions. We should find a way to declare these as global constants, and then use those in both places. There's been a bunch of discussion of that in #52 (comment), which should provide a good starting point.

Improve test coverage for codebase

The code coverage of our tests is good, but not as good as we'd like it to be. In particular, I see some obvious candidates for new tests.

In perf stack collapsing (see coverage report):

In DTrace stack collapsing (see coverage report):

Stack collapsing in general:

In flamegraph (see coverage report):

Add support for un-inlining.

stackcollapse-perf supports calling out to addr2line to determine the symbol to use for each program counter address (as opposed to using the symbol names that perf script produces). We should probably take advantage of the addr2line crate, which enables us to do un-inlining directly without executing an external process, which should significantly speed up processing. In particular because we can parse the debug symbols only once!

Flame chart mode

In flamegraph.pl, you can pass the --flamechart flag to produce a "flame chart" in which stacks frames are not merged. To quote Brendan:

Flame charts were first added by Google Chrome's WebKit Web Inspector (bug). While inspired by flame graphs, flame charts put the passage of time on the x-axis instead of the alphabet. This means that time-based patterns can studied. Flame graphs reorder the x-axis samples alphabetically, which maximizes frame merging, and better shows the big picture of the profile. Multi-threaded applications can't be shown sensibly by a single flame chart, whereas they can with a flame graphs (a problem flame charts didn't need to deal with, since it was initially used for single-threaded JavaScript analysis). Both visualizations are useful, and tools should make both available if possible (eg, TraceCompass does). Some analysis tools have implemented flame charts and mistakingly called them flame graphs.

I'm not entirely sure how time factors into this, since the collapsed stacks don't have any timing information from the original execution in them, but people seem to find them useful, so let's implement them! Also, if you do find them useful, please include docs and examples! If you just want to do the implementation work, that's fine too, just include the quoted explanation above somewhere in the docs.

In terms of code, --flamechart mainly avoids sorting the incoming stacks, and changes the plot title.

Add iterator interface to that returns structs for each event.

It might be nice to a struct for stacks, an then split it into reader, collapse, 'renderer'

That would allow not having to convert a stack to a string and then back from a string to data to render flame graphs and also generalize the rendering logic.

Add support for "fluid" drawing

Currently, the width of the flamegraph, and the individual sample boxes, is set in pixels. It'd be really cool if we could instead use percentages, as it would mean that users with a wider monitor would automatically be able to take advantage of their screen size.

Adopt changes from upstream pull requests

The upstream FlameGraph project has a number of outstanding pull requests that fix bugs and add or improve features. We should take a look through them and see whether we may be able to incorporate some of them here! I propose we take the following approach for each one:

  • First, see whether the change is still relevant in inferno
  • Then, comment on the issue asking the original author whether they mind having their change adopted into inferno (we can't just take without permission -- copyright and licensing is no joke!)
  • If they agree, create a tracking issue on inferno describing the original PR, link to it + the author's agreement to port, and add the adoption label. Then, either starting writing a PR if you want to implement it yourself, or just leave it there for others to adopt!
  • If they don't respond, and you believe it to be an important feature, feel free to create a corresponding issue on inferno and write a PR, but note explicitly in both that you don't have the author's approval to port the change. We won't merge those until we do, but we can at least do the work in advance!
  • If they explicitly say they're unwilling to allow the change to be made in inferno, well, then we're our of luck. The exception would be if you can implement the feature without consulting their changes at all!

We should also probably take a look through known FlameGraph issues and see which ones also apply to inferno. For those that do, we should add tracking issues here, link to the original issue, and ideally also add a failing test case where possible!

Multicore collapsing

In theory we should be able to slice the input to the collapse scripts into multiple pieces, collapse each piece independently (on different cores) and then merge the results. That would likely lead to significant speedups on multi-core machines, and would likely be worth investigating.

Investigate and improve allocation behavior of collapse-perf

Since the project is fairly recent I assume it's benched on a recent version of Rust (or even nightly, does Criterion require nightly?) That means it's using the system allocator, and IIRC flamegraph munging is a lot of string munging and other allocation-heavy tasks, jemalloc could well have an edge.

Add support for timemax option

flamegraph.pl has a --total option which sets the internal variable timemax. I think it basically "compresses" the flamegraph, but not sure exactly what effect that has. If someone cleverer than me can figure out what it is for, document it, and then implement it, that'd be awesome!

Top-heavy flamegraphs hide the subtitle

If there's too long of a line at the top of the graph, the subtitle is hidden, probably because it ends up being covered up

For example:

  1. If the top line is very long: echo "hello 1" | inferno-flamegraph --subtitle "Hello! this is a subtitle" > regular.svg

  2. If the top line is very long because it's an inverted graph: cat flamegraph/test/results/perf-java-stacks-01-collapsed-pid.txt | inferno-flamegraph --subtitle "Hello! this is a subtitle" --inverted > inverted.svg

Tidy up library interface for flamegraph (and collapse?)

If we want other projects, like @koute's nperf, to start using inferno to draw their flame graphs (and perhaps even to do the collapsing), we need to provide an external interface that isn't just a purely line-based textual one like the ones used internally between perf script, stackcollapse-perf, and flamegraph.pl. Specifically, we'll want an interface that is typed, and where all (most?) of the parsing is done by the caller. I'm imagining something like:

mod flamegraph {
  struct StackCount<S> {
    stack: S,
    count: usize,
  }
    
  fn from_samples<I, S, W>(o: Opts, samples: I, writer: W) -> 
    where
      I: IntoIterator,
      I::Item: Into<StackCount<S>>,
      S: Iterator<Item = &str>,
      W: Write
  {
  }
}

from_samples would require that the input is given in sorted order (see #28), and then write a flame graph output to writer. The existing code that does parsing would call from_samples with an iterator created from the lines iterator, mapped something like this:

lines.map(|line| {
  let (stack, nsamples) = ...;
  StackCount {
    count: nsamples,
    stack: stack.split(';'),
  }
})

@koute: what do you think of an interface roughly like the above? You think that's an iterator that nperf could easily produce? What would be a good interface for collapse (if that's even something you might consider using)?

Share `BytesStart`s across frames in flamegraph

While #37 got us pretty far, there is still one source of allocation in the main flamegraph loop: the attributes on BytesStart. Specifically, when you add attributes to a BytesStart, it allocates a Vec to hold the underlying bytes if it doesn't already have a Vec. That's unfortunate since we create a new BytesStart for every g. If we instead created a single BytesStart that we re-used somehow (by resetting all the attributes), that'd be way more efficient. Sadly, that's not yet supported, but I've filed tafia/quick-xml#148 which would enable us to do that!

Cut off left part of text, rather than right part of text when SVG is too small

Let's say your function is too wide to fit on the screen. Right now, the right side will be truncated:

for (var x = txt.length - 2; x > 0; x--) {
        if (t.getSubStringLength(0, x + 2) <= w) {
            t.textContent = txt.substring(0, x) + "..";
            return;
        }
    }
    t.textContent = "";

So e.g. in my case I have /home/itamarst/Devel/memory-profiler/venv/lib64/python3.8/site-packages/numpy/core/numeric.py:ones truncated to /home/itamarst/Devel/memory-profiler/venv/lib64/...

But, you'll notice the most significant information is on the right side of that string: the function name, the main file, etc. It would be better to truncate it in the opposite direction: ../python3.8/site-packages/numpy/core/numeric.py:ones, or at least have that as a rendering option.

Reduce the number of dependencies

$ cargo tree --no-indent | sed 's/ (\*)//' | sort -u | wc -l
126

My guess is that some of these are unnecessary. Let's see if we can't do something to help improve build times.

Upstream flamegraph doesn't discard fractional samples

I'm still watching part 2 but given the current state of the source code, I believe that you may have misinterpreted the part of the regex in upstream FlameGraph about removing the fractional part of the samples.

my($stack, $samples) = (/^(.*)\s+?(\d+(?:\.\d*)?)$/);

// strip fractional part (if any);
// foobar 1.klwdjlakdj
if let Some(doti) = samples.find('.') {
samples = &samples[..doti];
}

My reading is that the non-capturing group is there to allow input in either integer or fractional format, but not in any way remove it from the surrounding capture. The non-capturing group is there to group up the separator and the fractions for the following ?.

That is, it accepts sample values like 12, 34., and 56.78.

Add full color support

Currently, inferno's flamegraph implementation just hardcodes the color of all the frames in the flamegraph. That's makes the flamegraph significantly less useful, and also much less pretty. Instead, we should support flamegraph.pl's --colors option. It affects colors in a number of ways:

  • The background color changes
  • It colors individual frames through this serious function (which should live in its own module)
  • Some of the color modes do extra funky color things like "chain"
  • When combined with the --hash flag, the namehash coloring function is used, which attempts to color similar "parts" of the stack similarly, and consistently across runs.

We should support all that! It's probably most a matter of porting the giant color function.

Add back demangling support

Prior to #74, we had support for demangling function names (although only with --inline). While perf is supposed to be doing it already (see --demangle) there are reports that it doesn't demangle everything super well. Having our own demangling step using one of the many demangling crates seems like it might be a good idea!

Support working with multiple collapsed files

The trick to merge frames we use in flow only works if the input lines are sorted. This is generally the case when we're operating over a single file, but is not true if the user provides multiple input files (which we should support), and we should cater to that case. Sorting inputs will also be required for #21. Note that it is fine for the sort to be unstable.

Avoid `format!` in the inner loop.

Currently, we use format! in a number of places in order to pass appropriate values to BytesStart::with_attributes. That's really sad, because it means we're allocating tons of little String items. It'd be far better if we instead just had a single buffer that use write!() to write into, and then pulled out slices of to get the attribute substrings. I don't know if such a crate exists, but if it doesn't, that seems like a useful thing to write!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.