Comments (7)
This is a really interesting idea, I think I have it mostly working in master now:
go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/hodgesds/iouring-go/go/src/github.com/hodgesds/perf-utils
BenchmarkProfiler-8 2050809 588 ns/op 32 B/op 1 allocs/op
BenchmarkCPUCycles-8 30423 39914 ns/op 32 B/op 1 allocs/op
BenchmarkThreadLocking-8 254143965 4.69 ns/op 0 B/op 0 allocs/op
BenchmarkRunBenchmarks-8 3119304 388 ns/op 1336 hardware_cpu_cycles/op 3314 hardware_instructions/op 0 B/op 0 allocs/op
PASS
ok github.com/hodgesds/iouring-go/go/src/github.com/hodgesds/perf-utils 97.906s
from perf-utils.
Very nice! Some issues I found: you seem to run f
for b.N
times. That was not really my intent; f
should contain the logic to run the benchmark b.N
times itself as otherwise, the entire point of b.N
is negated (i.e. compensating for overhead in the support code). The RunBenchmark
function should only call f
once for each performance counter.
I think it might also be useful to shorten the names of the counters. Right now, they are quite bulky and make the out put hard to read. Here are some suggestions:
hardware_instructions
→instructions
/insns
hardware_cpu_cycles
→cycles
/cy
hardware_cache_ref
→cache_refs
/refs
hardware_cache_miss
→cache_misses
/miss
hardware_bus_cycles
→bus_cycles
/buscy
hardware_stalled_cycles_fronted
→stalls_front
/frstall
hardware_stalled_cycles_backend
→stalls_back
/bestall
hardware_ref_cpu_cycles
→ref_cycles
/refcy
software_cpu_clock
→cpu_clocks
software_task_clocks
→task_clocks
software_page_faults
→page_faults
/faults
software_context_switches
→context_switches
/switches
software_cpu_migrations
→migrations
software_minor_page_faults
→minor_faults
software_major_page_faults
→major_faults
software_alignment_faults
→align_faults
software_emulation_faults
→emu_faults
- for the cache part, just leave off the
hw_cache_
prefix
Another thing I noticed in my testing is that the strict
flag is not ideal as is. If I run a benchmark with this feature turned on on a system that doesn't support performance counters or doesn't allow me to use them, the benchmark output is nearly unreadable among all the log messages (made even worse by the log message being printed b.N
times len(eventAttr)
times!). So perhaps a better idea would be:
- if
strict
is not set, silently ignored unsupported counters. Do not log anything (testing.Verbose()
could be used here, but the guidelines say not to make log output dependent on it as that can influence benchmark timings) - if
strict
is set, log the reason for the failure and thenb.Skip()
the benchmark (allowing people without performance counters to still run the benchmark suite to completion)
As for a minor note, I saw that you passed the last argument as
eventAttrs ...*unix.PerfEventAttr
I don't quite see why a pointer is used here. eventAttrs
will be passed as a slice, so there shouldn't be any inefficiency with using members of type unix.PerfEventAttr
directly. But perhaps there's a good reason for this choice.
If you like, I can make a patch set addressing these issues and send you a PR.
from perf-utils.
I'm not opposed to any of those things, I had a branch where I tried to get rid of the strict
parameter and just have a preB
function that could get passed in. That way the caller could determine if they wanted to lock the executing goroutine. It's a little tricky because It kind of goes back to the counts vs samples debate and in theory it should be able to support both (just change a few args to perf_event_open
and update the PerfEventAttr
struct).
Using the non pointer is fine since most of the helper functions return PerfEventAttr
structs.
The naming function is not something i'm proud of, so any improvements would be welcome 😄 I would like to keep the hw_
and sw_
prefixes because it gives insight into the type of PerfEventAttr
.
from perf-utils.
It kind of goes back to the counts vs samples debate
I'm unfortunately not aware of this debate. What do you mean?
I tried to get rid of the strict parameter and just have a preB function that could get passed in. That way the caller could determine if they wanted to lock the executing goroutine.
As of now, the strict
parameter doesn't influence whether the executing go routine is locked to a thread. It merely influences the behaviour on failure to obtain a performance counter. Is there anything I missed?
from perf-utils.
I added another parameter to optionally lock to a goroutine. It added a bunch of other logic in the code however, with the basic idea being that you need to create a perf group profiler with all the current threads in the process (ie. where every goroutine can run). The problem is this is racey with the runtime so new threads could be created while the child profilers are being created.
Other than that I cleaned up some of the naming so that should be far more legible on the output.
from perf-utils.
Instead of multiple boolean arguments, it would be much easier to have an int
bit mask. This scales better when future options will be added. Also consider the usability of the current design. A call looks like this:
RunBenchmarks(b, f, true, false, attrs...)
Nobody is going to be sure what those true
and false
operands really mean before looking at the manual for every call.
from perf-utils.
I've cleaned up these helper functions a bit (they still need some refactoring), but I've tested with both locked and non-locked versions and I get roughly the same results. The non locked version is tricky because the go runtime will create threads which makes the setup more difficult.
from perf-utils.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from perf-utils.