Comments (4)
It seems the problem is that the halomon static library is quite large, and we should turn it into a shared library and see what happens. Even when the Monitor system is skipped entirely or the halo-prepare pass are disabled, we see the same overhead.
I think we're seeing page faults because of so much code in the text section. Notice how we have an almost proportional increase in minor page faults when linking in the halomon static lib vs when not doing so. Weird memory affects like this would explain why O2 and O3 don't see this type of issue, even though they would presumably be more obviously affected by XRay nop-sleds, for example.
➤ ls -liah | grep halo
15743662 -rwxr-xr-x 1 kavon kavon 21K Sep 17 14:56 nohalo
15743663 -rwxr-xr-x 1 kavon kavon 9.6M Sep 17 14:41 withhalo
15738726 -rwxr-xr-x 1 kavon kavon 9.6M Sep 17 14:54 withhalo_nomon
15743701 -rwxr-xr-x 1 kavon kavon 9.6M Sep 17 14:57 withhalo_nomon_noprepare
15743700 -rwxr-xr-x 1 kavon kavon 9.6M Sep 17 14:40 withhalo_noprepare
kavon@zeus:~/p/h/test|master⚡*?
➤ ../build/bin/clang++ -DSMALL_PROBLEM_SIZE -O1 bench/cpp/oopack_v1p8.cpp -o nohalo
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./nohalo
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Max 15000
Matrix 200
Complex 2000
Iterator 20000
DONE!
36.24user 0.00system 0:36.24elapsed 99%CPU (0avgtext+0avgdata 14884maxresident)k
0inputs+0outputs (0major+3060minor)pagefaults 0swaps
kavon@zeus:~/p/h/test|master⚡*?
➤ ../build/bin/clang++ -DSMALL_PROBLEM_SIZE -O1 -fhalo bench/cpp/oopack_v1p8.cpp -o withhalo_nomon
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./withhalo_nomon
halo info: Empty Halomon Running!
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Max 15000
Matrix 200
Complex 2000
Iterator 20000
DONE!
49.13user 0.02system 0:49.18elapsed 99%CPU (0avgtext+0avgdata 40060maxresident)k
32inputs+0outputs (1major+4286minor)pagefaults 0swaps
kavon@zeus:~/p/h/test|master⚡*?
➤ ../build/bin/clang++ -DSMALL_PROBLEM_SIZE -O1 -fhalo bench/cpp/oopack_v1p8.cpp -o withhalo_nomon_noprepare
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./withhalo_nomon_noprepare
halo info: Empty Halomon Running!
Seconds Mflops
Test Iterations C OOP C OOP Ratio
---- ---------- ----------- ----------- -----
Max 15000
Matrix 200
Complex 2000
Iterator 20000
DONE!
47.81user 0.00system 0:47.84elapsed 99%CPU (0avgtext+0avgdata 40188maxresident)k
0inputs+0outputs (0major+4291minor)pagefaults 0swaps
from halo.
We're seeing this bad behavior due to the naive insertion of XRay sleds into all functions. Converting to a shared library didn't change the running time at all.
15739084 -rwxr-xr-x 1 kavon kavon 66K Sep 17 15:41 withhalo_nolib
15743662 -rwxr-xr-x 1 kavon kavon 62K Sep 17 15:44 withhalo_nolib_noxraysleds
15738726 -rwxr-xr-x 1 kavon kavon 74K Sep 17 15:38 withhalo_sharedlib
➤ time ./withhalo_sharedlib > /dev/null
49.20user 0.01system 0:49.26elapsed 99%CPU (0avgtext+0avgdata 25488maxresident)k
0inputs+0outputs (0major+3557minor)pagefaults 0swaps
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./withhalo_nolib > /dev/null
49.25user 0.00system 0:49.31elapsed 99%CPU (0avgtext+0avgdata 14768maxresident)k
0inputs+0outputs (0major+3058minor)pagefaults 0swaps
kavon@zeus:~/p/h/test|master⚡*?
➤ time ./withhalo_nolib_noxraysleds > /dev/null
36.30user 0.00system 0:36.32elapsed 99%CPU (0avgtext+0avgdata 14936maxresident)k
0inputs+0outputs (0major+3060minor)pagefaults 0swaps
from halo.
Currently for oopack, with no server running:
Performance counter stats for './nohalo' (5 runs) -O1 -fno-halo
38541.593084 task-clock (msec) # 1.000 CPUs utilized ( +- 0.17% )
53 context-switches # 0.001 K/sec ( +- 25.44% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
3,049 page-faults # 0.079 K/sec ( +- 0.02% )
149,639,664,048 cycles # 3.883 GHz ( +- 0.08% )
228,833,644,966 instructions # 1.53 insn per cycle ( +- 0.00% )
65,261,138,636 branches # 1693.265 M/sec ( +- 0.00% )
826,234 branch-misses # 0.00% of all branches ( +- 0.29% )
38.546724097 seconds time elapsed ( +- 0.18% )
Performance counter stats for './withhalo' (5 runs): -O1 -fhalo
49922.855019 task-clock (msec) # 0.999 CPUs utilized ( +- 0.12% )
60 context-switches # 0.001 K/sec ( +- 28.58% )
2 cpu-migrations # 0.000 K/sec ( +- 26.50% )
4,311 page-faults # 0.086 K/sec ( +- 0.02% )
193,832,699,275 cycles # 3.883 GHz ( +- 0.06% )
272,002,846,416 instructions # 1.40 insn per cycle ( +- 0.00% )
92,335,372,229 branches # 1849.561 M/sec ( +- 0.00% )
969,310 branch-misses # 0.00% of all branches ( +- 0.97% )
49.948874022 seconds time elapsed ( +- 0.13% )
from halo.
The heuristic in 07e24e8 is quite simple: leaf functions with no loop (but may contain cycles) with fewer than 50 LLVM IR instructions are not made patchable. Otherwise all other functions are (exceptions include non-reentrant functions, naked
, etc). This already solves the performance issue tracked here for oopack:
Performance counter stats for './withhalo' (5 runs):
37850.876872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.37% )
46 context-switches # 0.001 K/sec ( +- 21.66% )
0 cpu-migrations # 0.000 K/sec ( +- 61.24% )
4,307 page-faults # 0.114 K/sec ( +- 0.01% )
147,589,115,753 cycles # 3.899 GHz ( +- 0.09% )
230,856,291,463 instructions # 1.56 insn per cycle ( +- 0.00% )
66,265,028,684 branches # 1750.687 M/sec ( +- 0.00% )
909,259 branch-misses # 0.00% of all branches ( +- 0.22% )
37.873064170 seconds time elapsed
from halo.
Related Issues (20)
- Running many Halo-enabled processes at once
- Creating a README
- CCT knots and inconsistencies HOT 2
- PerfJITEventListener? HOT 1
- Code map for JIT'd code
- Implement a "bakeoff" HOT 1
- CRI initialization with multiple identical function definitions HOT 1
- Leaky FunctionInfo
- Make the call-graph dynamic instead of static.
- Ubuntu 20.04 compatibility
- `llvm::raw_os_ostream` does not flush properly
- Multi-threaded client programs
- Obtain server address from environment variable
- Dynamic linker incorrectly resolves dependencies of a symbol.
- Server should use abstract addresses instead of absolute HOT 1
- More Tunables
- Tuning Section Selection
- Static Code Features To Reduce Space
- Feature Importance
- cl::values instead of relying on string for Strategy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from halo.