Code Monkey home page Code Monkey logo

Comments (5)

Mic92 avatar Mic92 commented on July 1, 2024

$ grep timeout results.csv | wc -l
79
$ grep failed results.csv | grep -v EstimatedTimeTooLong | wc -l
14 # 4 of some should be fixed soon
$ grep succeed results.csv | wc -l
31 # we might bring this number up to 40

from hase.

Mic92 avatar Mic92 commented on July 1, 2024

$ ls -la recordings | wc -l
165
$ wc -l results.csv
124

from hase.

Mic92 avatar Mic92 commented on July 1, 2024

Good case/bad case time estimation for some long traces:

https://gist.github.com/Mic92/c8ef5f064206b810b0c3b5ddc83e38bf

from hase.

Mic92 avatar Mic92 commented on July 1, 2024

Problem:

Possible idea:

  • get the callstack with return pointer (and maybe registers) on every context switch with perf_events

  • we already have switch events have to re-assemble the instruction stream

  • Kernel does at least deschedule a process every 20ms by default (50Hz), but at most every 4ms (250Hz)

  • cpu: 3GHz, 3000_000_000 instructions per core (more like a upper bound):

    • snapshots with 50 Hz every 60e6 instructions
    • snapshots with 250 Hz every 12e6 instructions
    • what are realistic numbers here?
    • in case of bad chance this would not produce close-to-exit samples
      -> ignore those traces, easy to detect without decoding the whole trace
  • every system calls also already introduce context switches

  • might affect some i/o heavy application -> instruction counter-based rate limiter based on bpf?

  • Example output: sudo perf script
    .perf-wrapped 8653 27352.980404: 1 cycles:ppp:
    ffffffffa1b884a3 perf_ctx_unlock+0x3 ([kernel.kallsyms])
    ffffffffa1b95dd4 perf_event_exec+0x184 ([kernel.kallsyms])
    ffffffffa1c3c848 setup_new_exec+0xc8 ([kernel.kallsyms])
    ffffffffa1c91076 load_elf_binary+0x2e6 ([kernel.kallsyms])
    ffffffffa1c3a7e0 search_binary_handler+0x90 ([kernel.kallsyms])
    ffffffffa1c3c318 __do_execve_file.isra.37+0x6b8 ([kernel.kallsyms])
    ffffffffa1c3c654 __x64_sys_execve+0x34 ([kernel.kallsyms])
    ffffffffa1a041de do_syscall_64+0x4e ([kernel.kallsyms])
    ffffffffa2200088 entry_SYSCALL_64_after_hwframe+0x44 ([kernel.kallsyms])
    7facebd2c6a7 [unknown] ([unknown])
    ...

  • basically just a flag to set in our existing perf code

  • use Angr angr's call_state and callstack plugin -> supports pushing initial frames

  • avoid tricky compiler optimization:

    • compile with -fno-omit-frame-pointer, also used by some companies in production (netflix)
    • could be in theory also work with -fomit-frame-pointer and some dwarf-based post-processing
  • Work plan:

    • Joerg: implement perf part
    • Liran: implement angr part
    • initial prototype should be in the order of days

from hase.

haollhao avatar haollhao commented on July 1, 2024

Real Application Benchmarking Results

Redis

  • There is a redis server process running all the time, which is recorded.
  • Benchmarking is done by redis-benchmark, with the default configuration (100000 3-byte requests)
  • Results of 5 runs (requests per second):
Benchmark Hase / Original Ratio (should be < 1)
PING_INLINE 1
PING_BULK 1.01
SET 1.01
GET 1.01
INCR 1
LPUSH 1.01
RPUSH 1
LPOP 1
RPOP 0.99
SADD 0.99
HSET 1.02
SPOP 1
LPUSH (needed to benchmark LRANGE) 1
LRANGE_100 (first 100 elements) 1.03
LRANGE_300 (first 300 elements) 0.99
LRANGE_500 (first 450 elements) 1
LRANGE_600 (first 600 elements) 0.99
MSET (10 keys) 0.91
  • There is another metric (user time + system time) that can be pretty accurate, because the redis server is probably only running when there is a request.
Hase (10ms) Original (10ms) Ratio
1911 1754.6 1.089137125

nginx

  • nginx is configured to have only one worker process, which is recorded.
  • Benchmark is done by wrk -t 1 -d 10s -c 10 http://localhost/, using 1 thread with 10 connections for 10 seconds.
  • Results of 10 runs:
Benchmark Hase / Original Ratio (should be < 1)
Latency: 1 (should be > 1)
Req/Sec: 0.97
requests: 0.97
Requests/sec: 0.97
Transfer/sec: 0.97

logcabin

  • logcabind is recorded, which is restarted for each run.
  • Built-in benchmark logcabin-benchmark --writes 10000
  • Results (time):
#Run  Hase (ms) Original (ms) Ratio
0 24211 23742.9  
1 24264.4 23479.3  
2 24403.5 23017.3  
3 23958.5 22874.9  
4 25075.3 23328  
average 24382.54 23288.48 1.046978592

leveldb

  • There is no server running, so the benchmark script itself is recorded.
  • Built-in benchmark command db_bench
  • Results of 5 runs (micros/op)
Benchmark Hase / Original Ratio (should be > 1)
fillseq 0.99
fillsync 1
fillrandom 0.97
overwrite 1.06
readrandom 0.99
readrandom 1
readseq 0.92
readreverse 0.95
compact 1.13
readrandom 1
readseq 1
readreverse 1.02
fill100K 1.03
crc32c 1.02
snappycomp 1
snappyuncomp 1.03
acquireload 1.3

sqlite

  • There is no server running, so the benchmark program (java forked by python) is recorded.
  • YCSB benchmark with workloada
  • The results (10 runs) are weird.
Benchmark Hase / Original Ratio
RunTime(ms) 0.98 (should be > 1)
Throughput(ops/sec) 1.02 (should be < 1)

Apache

  • By configuring the mpm_worker module, it seems that I was able to run one worker process with one thread (together with one master process). However, recording the worker process results in no cpu trace.
  • Running ab while recording the worker process even increases the throughput by 50%! wrk shows no such thing.

from hase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.