Code Monkey home page Code Monkey logo

ocolos-public's Introduction

Ocolos: Online COde Layout OptimizationS

Ocolos is the first online code layout optimization system for unmodified applications written in unmanaged languages. Ocolos allows profile-guided optimization to be performed on a running process, instead of being performed offline and requiring the application to be re-launched. A description of how we implemented Ocolos and experimental results on MySQL-sysbench workloads are in MICRO'22 paper.

For the demonstration purpose, we integrate MySQL and sysbench to Ocolos, so this version of Ocolos ONLY works with MySQL.

Prerequisites

Please refer instructions from links or directly run commands listed below to install prerequisites:

Download ocolos for mysql

> git clone [email protected]:upenn-acg/ocolos-public.git

Install BOLT

To use llvm-bolt and perf2bolt utilities, BOLT needs to be installed.
Please follow the commands below to install BOLT

> mkdir BOLT && cd BOLT
> git clone [email protected]:upenn-acg/BOLT.git llvm-bolt
> cd llvm-bolt
> git checkout ocolos/cont-opt
> cd ..
> mkdir build && cd build
> cmake -G "Unix Makefiles" ../llvm-bolt/llvm -DLLVM_TARGETS_TO_BUILD="X86;AArch64" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PROJECTS="clang;lld;bolt"
> make -j

Build MySQL from source

> git clone https://github.com/mysql/mysql-server.git 
> cd mysql-server 
> git checkout 6846e6b2f72931991cc9fd589dc9946ea2ab58c9 

In CMakeList.txt, at line 580, please add 2:

STRING_APPEND(CMAKE_C_FLAGS  " -fno-jump-tables")
STRING_APPEND(CMAKE_CXX_FLAGS " -fno-jump-tables")
STRING_APPEND(CMAKE_C_FLAGS " -no-pie")
STRING_APPEND(CMAKE_CXX_FLAGS " -no-pie")

Also, in CMakeList.txt, turn off ld.gold linker:
change OPTION(USE_LD_GOLD "Use GNU gold linker" ON) to be OPTION(USE_LD_GOLD "Use GNU gold linker" OFF)

Then build mysqld from source:

> export CC=gcc 
> export CXX=g++
> mkdir build && cd build 
> cmake .. -DWITH_BOOST={path of the boost_1_73_0 directory} -DCMAKE_CXX_LINK_FLAGS=-Wl,--emit-relocs -DCMAKE_C_LINK_FLAGS=-Wl,--emit-relocs -DBUILD_CONFIG=mysql_release 
> make -j
> make install

To initialize MySQL, run:

> chown -R {user} {path to MySQL directory}
> {path to MySQL directory}/bin/mysqld --initialize-insecure --user=root --datadir={your data dir path of MySQL} 
> {path to MySQL directory}/bin/mysqld --user=root --port=3306 --datadir={your data dir path of MySQL}

In another terminal, run:

> mysql -u root
> CREATE USER 'ocolos'@'localhost';
> GRANT ALL PRIVILEGES ON *.* TO 'ocolos'@'localhost' WITH GRANT OPTION;
> CREATE DATABASE ocolos_db;
> QUIT;
> mysqladmin -u root shutdown

Note:

  1. {path to MySQL directory} is normally /usr/local/mysql unless otherwise specified during MySQL server's installation.
  2. {user} should be your linux user name.

Build Sysbench

> sudo apt-get update 
> sudo apt-get install sysbench

Or if you prefer to build sysbench from source, please refer instructions in the following webpage:
https://github.com/akopytov/sysbench

Build & run Ocolos

  • Navigate to ocolos-public directory.
  • In the file config, specify the absolute path for nm,perf,objdump,llvm-bolt,perf2bolt 3
  • In config, please also specify the commands to run MySQL server and sysbench. The example commands are given in the config file.
    • Note: the first argument of the command (a.k.a. the binary being invoked in the command) should be written in its full path.
  • Please also remember to export the path of ocolos-public's directory.
> export OCOLOS_PATH=/your/path/to/ocolos-public
  • Then run the following commands:
> make
> ./extract_call_sites
> ./tracer
  • make will produce 2 executables (tracer & extract_call_sites)+ 1 shared library (replace_function.so).
    • If libunwind library is stored in other places instead of /usr/local/lib, you also need to edit Makefile and update it to the corresponding path.
    • If libunwind's header files are stored in other places instead of /usr/local/include, you also need to edit Makefile and update it to the corresponding path.
  • ./extract_call_sites will produce 2 files which store all call sites information extracted from the target binary (a.k.a. mysqld) to the tmp_data_dir you specified in the config file.
  • ./tracer will invoke both MySQL server process and sysbench workloads oltp_read_only, and then perform code layout optimization during runtime.
    • The output of sysbench's throughput can be found in sysbench_output.txt. At about the 130th second, you will see a significant throughput improvement, since Ocolos has replace the code layout to be the optimized one at that time.
    • After one run (~3 minutes), if you want to start another run, please first run mysqladmin -u root shutdown command to shutdown the current MySQL server process.

UPDATES: Continuous Optimization - use profile from C1 to build new BOLTed binary

  • We've modified BOLT to make it support converting perf.data collected from C1 to be the perf.fdata that llvm-bolt can use.
    • Here, we have the new terms C0 & C1
      • C0 : The duration before Ocolos's code replacement
      • C1 : The duration after Ocolos's code replacement
    • The BOLT's code supports continuous optimization can be found here.
  • In C0, the perf2bolt and llvm-bolt commands are changed to :
> perf2bolt -p perf_c0.data -o perf_c0.fdata mysqld
> llvm-bolt mysqld -o mysqld_c0.bolt --enable-bat --enable-func-map-table -data=perf_c0.fdata -reorder-blocks=cache+ -reorder-functions=hfsort
  • In C1, to make profile collected from C1 work with perf2bolt, and then to produce C1's mysqld.bolt for next round of code replacement, the perf2bolt and llvm-bolt commands are changed to :
    • In the perf2bolt command, callstack_func.bin & BOLTed_bin_info.txt are produced by Ocolos during C0's code replacement, since we need to pass some essential information from Ocolos to BOLT
> perf2bolt --ignore-build-id --cont-opt --call-stack-func=callstack_func.bin --bin-path-info=BOLTed_bin_info.txt -p perf_c1.data -o perf_c1.fdata mysqld_c0.bolt
> llvm-bolt mysqld -o mysqld_c1.bolt --enable-bat --enable-func-map-table -data=perf_c1.fdata -reorder-blocks=cache+ -reorder-functions=hfsort
  • We also have a script to show how continuous optimization works
    • The script does the following things:
      • shows how to use the profile collected from Ocolos' C1 + the mysqld.bolt produced from Ocolos' C0 to build a newly BOLTed binary
      • runs the newly BOLTed binary with oltp_read_only to show the throughput
    • The script can be found Here.
      • before running the script, please change the paths in the script.
      • also, please add -DCONT_OPT to CXXFLAGS in Makefile, and compile Ocolos again.
      • run this script sh scripts/C1_BOLTed_performance_test.sh

UPDATES: Support for the AArch64 platform

  • Thanks to Wenlong who contributed to Ocolos, making it also work for the AArch64 platform.
  • The link to the version that supports AArch64 platform can be found here.

Miscellaneous (notes about how to debug Ocolos)

In Makefile's CXXFLAGS,

  • if -DTIME_MEASUREMENT flag is added, Ocolos will print the execution time of code replacement;
  • if -DMEASUREMENT flag is added, Ocolos will print metrics such as:
    • the number of functions on the call stack when target process is paused,
    • the number of functions that are moved by BOLT,
    • the number of functions that are in the BOLT and original functions.
  • if -DDEBUG_INFO flag is added, Ocolos will print debug information such as:
    • the information about detailed behavior of tracer
    • the content in the call stack when the target process is paused
    • -DDEBUG_INFO can also be defined in src/replace_function.hpp. In this way, the ld_preload library will store all machine code per function it inserted to the target process as a uint8_t format array into a file. The file can be found in the tmp_data_path you defined in the config file.
  • if -DDEBUG flag is added, after code replacement, Ocolos will first send SIGSTOP signal to target process and then resume the target process by PTRACE_DETACH. In this way, it allows debugging tools such as GDB to attach to the target process and observe what goes wrong after code replacement.

If the code replacement runs into a failure, you may want to do the following things to fix this problem

  • first add the -DDEBUG_INFO to CXXFLAGS in Makefile and compile again.
  • then run ./tracer
  • after tracer run into an error, you can check {tmp_data_dir}/machine_code.txt's last few lines
  • if the last few lines shows that the error is caused by mmap failed. file exists.
    • it indicates that the BOLTed text section overlaps with the heap of the running mysqld process.
    • to solve this problem, please comment out this line in replace_function.cpp, and then compile Ocolos again.

Footnotes

  1. If your objdump version is older than 2.27, please download the latest version of binutils.

  2. If the mysqld binary compiled by gcc generates callq instructions rather than call instructions, please refer to the solution discussed in this page.

  3. if nm,objdump and perf are already in shell, it's OK that their paths are not specified in config. This can be checked by which nm, which objdump and which perf.

ocolos-public's People

Contributors

zyuxuan0115 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocolos-public's Issues

[tracer error] thread XXX delivers a non-SIGSTOP signal

I modified the MySQL part of this project into MongoDB, including files such as config, src/utils.cpp, src/infrastructure.cpp and src/tracer.cpp, and used YCSB to drive MongoDB for testing. The following error occurs when executing the ./tracer command:
[tracer][error] thread 2129 delivers a non-SIGSTOP signal
[tracer][error] tracee exits normally, exit num = 1

The full console output is as follows:
error.txt

Hope to get your help, thank you!

The method of debugging replace_function.so with gdb

I am trying to understand the detailed process of ocolos's code replacement, but I am unable to reach the insert_machine_code() function through GDB. Would you like to guide me through the general steps of debugging replace_function.so? Here's what I've tried so far:

  1. I added -DDEBUG to the CPPFLAGS and ran the tracer program. The program ran successfully, and I received the following output:

[tracer] thread id = 1236504, rip = 7f8dd628b99f
[tracer] before SINGLESTEP, set RIP = 7f8dd6b0be1c (lib addr)
[tracer] receive SIGSTOP from tracee (lib code), tracee finished a SINGLESTEP!
[tracer] after SINGLESTEP, RIP = 7f8dd6b0be1b

[tracer] thread id = 1236508, rip = 7f8dd629173d
[tracer] before SINGLESTEP, set RIP = 7f8dd6b0be1c (lib addr)
[tracer] receive SIGSTOP from tracee (lib code), tracee finished a SINGLESTEP!
[tracer] after SINGLESTEP, RIP = 7f8dd6b0be20
[tracer] after a PTRACE_SINGLESTEP, do a PTRACE_CONT
[tracer] connection from 127.0.0.1
[tracer] after PTRACE_CONT, tracee delivers a signal Stopped (signal)
[tracer] RIP = 7f8dd66cd289
[tracer] machine code insertion finishes!
[tracer][time] machine code insertion took 2.071340 seconds to execute
[tracer][OK] code replacement done!

  1. Next, I tried to attach the thread using "gdb attach 1236508" and set a breakpoint using "b insert_machine_code". The output I received was "Program received signal SIGSTOP, Stopped (signal)."

(gdb) b insert_machine_code
Breakpoint 1 at 0x7f8dd6b0be1c
(gdb) continue
Continuing.

Program received signal SIGSTOP, Stopped (signal).
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 in ../sysdeps/unix/sysv/linux/x86_64/syscall.S

  1. Lastly, I tried the following methods to resolve the issue, but GDB did not output anything for me:

(gdb) b insert_machine_code
Breakpoint 1 at 0x7f8dd6b0be1c
(gdb) handle SIGSTOP nopass
Signal Stop Print Pass to program Description
SIGSTOP Yes Yes No Stopped (signal)
(gdb) handle SIGSTOP nostop
Signal Stop Print Pass to program Description
SIGSTOP No Yes No Stopped (signal)
(gdb) continue
Continuing.

(gdb) b insert_machine_code
Breakpoint 1 at 0x7f8dd6b0be1c
(gdb) shell kill -CONT 1236504
(gdb) continue
Continuing.

May I ask you how you usually debug and understand the code inside “replace_function.so”?

Performance doesn't seem to be improving

After running the program and comparing it, I found that the performance has hardly improved. Could you please help me find out the reason?
When I looked at the data from the runtime console, I noticed several suspicious things:

  1. BOLT-WARNING: split function detected on input : _ZL28delete_dictionary_tablespacev.cold/1. The support is limited in relocation mode.
  2. PERF2BOLT: wrote 16614 objects and 0 memory objects to /home/lyf/Desktop/ocolos-data/perf.fdata
    I looked at the sysbench_output.txt file and found that the performance dropped after 20 seconds and improved again after 130 minutes, but the performance was the same as when I started:
    [ 4s ] thds: 16 tps: 2787.88 qps: 33449.53 (r/w/o: 27874.77/0.00/5574.75) lat (ms,95%): 8.90 err/s: 0.00 reconn/s: 0.00
    [ 26s ] thds: 16 tps: 1918.98 qps: 23029.77 (r/w/o: 19191.81/0.00/3837.96) lat (ms,95%): 12.08 err/s: 0.00 reconn/s: 0.00
    [ 167s ] thds: 16 tps: 2628.97 qps: 31538.64 (r/w/o: 26280.70/0.00/5257.94) lat (ms,95%): 8.74 err/s: 0.00 reconn/s: 0.00
    I have added these two files to the attachment.
    Hope to get your help.
    sysbench_output.txt
    runtime_console_infomation.txt

extrac_call_sites does not work

I have built this target binary, but when I use it to generate call_sites_all.bin file ,it always get nothing. I saw the standard output get the call site infomation, but it didn't write into this file. That made me so confused. The standard output looks like this.

....
00000000042996b0 0000000000000030 B _ZN53protobuf_replication_asynchronous_connection_failover33_VariableStatus_default_instance_E

00000000042996e0 0000000000000040 B _ZN53protobuf_replication_asynchronous_connection_failover37_VariableStatusList_default_instance_E

0000000004299720 0000000000000080 B _ZN53protobuf_replication_asynchronous_connection_failover48_SourceAndManagedAndStatusList_default_instance_E

[extract_call_sites] 61186 functions in the original binary
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
the size of call_sites = 0
@@@@@@@@@@@@ the size of call_sites (final) = 0
########### the size of call_sites_list = 0

Who can tell why? Thanks

[tracer] error in run_perf_recoed

Sorry to interrupt!
When I use the './trace' command, I get an error:

Error:
cycles:u: PMU Hardware doesn't support sampling/overflow-interrupts.try 'perf stat'.
Then subsequent commands (including perf2bolt, bolt, etc.) fail to execute

Do I need to change the perf command, will it affect the performance of my ocolos? Or do I need to do something else? Maybe my computer doesn't support it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.