Code Monkey home page Code Monkey logo

Comments (7)

hodgesds avatar hodgesds commented on July 21, 2024

This is a really interesting idea, I think I have it mostly working in master now:

go test  -bench=.
goos: linux
goarch: amd64
pkg: github.com/hodgesds/iouring-go/go/src/github.com/hodgesds/perf-utils
BenchmarkProfiler-8              2050809               588 ns/op              32 B/op          1 allocs/op
BenchmarkCPUCycles-8               30423             39914 ns/op              32 B/op          1 allocs/op
BenchmarkThreadLocking-8        254143965                4.69 ns/op            0 B/op          0 allocs/op
BenchmarkRunBenchmarks-8         3119304               388 ns/op              1336 hardware_cpu_cycles/op             3314 hardware_instructions/op            0 B/op          0 allocs/op
PASS
ok      github.com/hodgesds/iouring-go/go/src/github.com/hodgesds/perf-utils    97.906s

from perf-utils.

clausecker avatar clausecker commented on July 21, 2024

Very nice! Some issues I found: you seem to run f for b.N times. That was not really my intent; f should contain the logic to run the benchmark b.N times itself as otherwise, the entire point of b.N is negated (i.e. compensating for overhead in the support code). The RunBenchmark function should only call f once for each performance counter.

I think it might also be useful to shorten the names of the counters. Right now, they are quite bulky and make the out put hard to read. Here are some suggestions:

  • hardware_instructionsinstructions / insns
  • hardware_cpu_cyclescycles / cy
  • hardware_cache_refcache_refs / refs
  • hardware_cache_misscache_misses / miss
  • hardware_bus_cyclesbus_cycles / buscy
  • hardware_stalled_cycles_frontedstalls_front / frstall
  • hardware_stalled_cycles_backendstalls_back / bestall
  • hardware_ref_cpu_cyclesref_cycles / refcy
  • software_cpu_clockcpu_clocks
  • software_task_clockstask_clocks
  • software_page_faultspage_faults / faults
  • software_context_switchescontext_switches / switches
  • software_cpu_migrationsmigrations
  • software_minor_page_faultsminor_faults
  • software_major_page_faultsmajor_faults
  • software_alignment_faultsalign_faults
  • software_emulation_faultsemu_faults
  • for the cache part, just leave off the hw_cache_ prefix

Another thing I noticed in my testing is that the strict flag is not ideal as is. If I run a benchmark with this feature turned on on a system that doesn't support performance counters or doesn't allow me to use them, the benchmark output is nearly unreadable among all the log messages (made even worse by the log message being printed b.N times len(eventAttr) times!). So perhaps a better idea would be:

  • if strict is not set, silently ignored unsupported counters. Do not log anything (testing.Verbose() could be used here, but the guidelines say not to make log output dependent on it as that can influence benchmark timings)
  • if strict is set, log the reason for the failure and then b.Skip() the benchmark (allowing people without performance counters to still run the benchmark suite to completion)

As for a minor note, I saw that you passed the last argument as

eventAttrs ...*unix.PerfEventAttr

I don't quite see why a pointer is used here. eventAttrs will be passed as a slice, so there shouldn't be any inefficiency with using members of type unix.PerfEventAttr directly. But perhaps there's a good reason for this choice.

If you like, I can make a patch set addressing these issues and send you a PR.

from perf-utils.

hodgesds avatar hodgesds commented on July 21, 2024

I'm not opposed to any of those things, I had a branch where I tried to get rid of the strict parameter and just have a preB function that could get passed in. That way the caller could determine if they wanted to lock the executing goroutine. It's a little tricky because It kind of goes back to the counts vs samples debate and in theory it should be able to support both (just change a few args to perf_event_open and update the PerfEventAttr struct).

Using the non pointer is fine since most of the helper functions return PerfEventAttr structs.

The naming function is not something i'm proud of, so any improvements would be welcome 😄 I would like to keep the hw_ and sw_ prefixes because it gives insight into the type of PerfEventAttr.

from perf-utils.

clausecker avatar clausecker commented on July 21, 2024

It kind of goes back to the counts vs samples debate

I'm unfortunately not aware of this debate. What do you mean?

I tried to get rid of the strict parameter and just have a preB function that could get passed in. That way the caller could determine if they wanted to lock the executing goroutine.

As of now, the strict parameter doesn't influence whether the executing go routine is locked to a thread. It merely influences the behaviour on failure to obtain a performance counter. Is there anything I missed?

from perf-utils.

hodgesds avatar hodgesds commented on July 21, 2024

I added another parameter to optionally lock to a goroutine. It added a bunch of other logic in the code however, with the basic idea being that you need to create a perf group profiler with all the current threads in the process (ie. where every goroutine can run). The problem is this is racey with the runtime so new threads could be created while the child profilers are being created.

Other than that I cleaned up some of the naming so that should be far more legible on the output.

from perf-utils.

clausecker avatar clausecker commented on July 21, 2024

Instead of multiple boolean arguments, it would be much easier to have an int bit mask. This scales better when future options will be added. Also consider the usability of the current design. A call looks like this:

RunBenchmarks(b, f, true, false, attrs...)

Nobody is going to be sure what those true and false operands really mean before looking at the manual for every call.

from perf-utils.

hodgesds avatar hodgesds commented on July 21, 2024

I've cleaned up these helper functions a bit (they still need some refactoring), but I've tested with both locked and non-locked versions and I get roughly the same results. The non locked version is tricky because the go runtime will create threads which makes the setup more difficult.

from perf-utils.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.