Code Monkey home page Code Monkey logo

compile-time-perf's Introduction

compile-time-perf

I work with a lot of larger projects and do a lot of template meta-programming so while I absolutely love the flamegraphs from the -ftime-trace compiler flag, I've had a hard time detecting when changes affect the total compilation time and which files saw the most dramatic increase/decrease in compile time.

Thus, I created compile-time-perf (CTP), which is designed to be a high-level "profiler" for compiling large projects. It is designed to be simple to install, compiler and language agnostic, and included as part of CI. It is not intended to replace compiler flags like -ftime-trace but supplement them. The main problem with -ftime-trace is that it provides no high-level data w.r.t. which files to focus on, so in large projects, it can be difficult to locate which file should be focused on first. Another problem is that it only provides timing data -- if the system has limited memory resources but has 4+ cores, your compile times can shoot up drastically if each core is compiling something requiring 2+ GB of memory because you'll start using swap.

Down the road, I could see the "analyzer" script actually including support for detecting -ftime-trace in the compile command logs, searching the folder structure for the JSON, and then using/providing something like ClangBuildAnalyzer to provide more in-depth details. But I think the useful enough by itself right now.

CTP essentially uses a UNIX time-like command-line tool to launch the compile commands, called timem (docs), which I built using the timemory toolkit -- a modular C++ template library for build profiling tools which, recursively, is one of the two primary places where I need this functionality (other is Kokkos).

Timem doesn't do anything particularly fancy: it just forks and does a mix of deterministic phase measurements and (on Linux) some statistical sampling of a few /proc/<pid> files while the command executes. Then that data along with the command executed are put into a JSON file whose name is generated from an md5sum of command executed (for uniqueness and reproducibility). Then the Python "analyzer" script just globs the files and directories it is passed and combines that data, does some sorting and filtering to make the commands more easily readable and thats it.

Building CTP

Standard cmake build system without any project-specific options. CTP uses the timem executable from the timemory toolkit to do the measurements on the compile commands. Timemory is included as a git submodule and cmake will automatically run git submodule update --init if you don't.

# clone
git clone https://github.com/jrmadsen/compile-time-perf.git
# configure (it's not hanging, timemory can take a while here)
cmake -B compile-time-perf-build -D CMAKE_INSTALL_PREFIX=/usr/local compile-time-perf
# build
cmake --build compile-time-perf-build --target all
# install
cmake --build compile-time-perf-build --target install

When configuring your project, just set append the CMAKE_INSTALL_PREFIX value to the CMAKE_PREFIX_PATH environment variable.

Minimum requirements:

  • CMake (minimum: v3.13)
  • C++ compiler supporting C++14
  • Python interpreter

CTP is not supported on Windows currently. It can be made available easily if someone with more access to a Windows machine than me wants to write the _spawn implementation for the timem executable. It won't be hard to get a working prototype:

  • the timem source code is only about 1,200 lines of code
  • timemory already has Windows support for the timers, peak RSS, and page RSS on Windows -- the most relevant values
  • it is 100% trivial to locally disable any non-working measurement type for windows builds
    • it's literally 1 LOC: TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, <component>, false_type) and it will disappear from every template instantiation

So if you would like to use this on Windows and have an afternoon to spare, feel free to file an issue in timemory and I can help you get started.

Quick Start

compile-time-perf (CTP) is designed primarily for CMake but if you want to use it manually, see the Manual Usage section.

Setup

Add this to your main CMakeLists.txt somewhere after project(...). I'd recommend replacing foo below with ${PROJECT_NAME}.

find_package(compile-time-perf REQUIRED)
enable_compile_time_perf(foo-ctp)

or, making it optional:

find_package(compile-time-perf)
if(compile-time-perf_FOUND)
    enable_compile_time_perf(foo-ctp)
endif()

The argument foo-ctp is just a "NAME" used to generate two "helper" targets: one for running the analysis (analyze-${NAME}) and another for cleaning up the generated files (clean-${NAME}).

Building and Analyzing

Once you've added the cmake setup, configure and build your code normally, and then build the analyze-${NAME} target, e.g. analyze-foo-ctp.

$ cmake --build . --target all

[1/81] Building CXX object source/timemory/CMakeFiles/timemory-core-shared.dir/utility/popen.cpp.o
[/opt/local/bin/clang++]> Outputting '.compile-time-perf-timem-output/foo-ctp-33ce7f2719c9a3f0a9147cf3f1dfc242.json'...
...

$ cmake --build . --target analyze-foo-ctp

wall_clock          74.947 sec  clang++ tools/timemory-avail/timemory-avail.cpp
wall_clock          42.742 sec  clang++ tools/timemory-timem/timem.cpp
wall_clock           4.092 sec  clang++ tools/timemory-pid/timemory-pid.cpp

cpu_clock          74.150 sec  clang++ tools/timemory-avail/timemory-avail.cpp
cpu_clock          41.970 sec  clang++ tools/timemory-timem/timem.cpp
cpu_clock           3.680 sec  clang++ tools/timemory-pid/timemory-pid.cpp

peak_rss        2040.410 MB   clang++ tools/timemory-avail/timemory-avail.cpp
peak_rss        1040.617 MB   clang++ tools/timemory-timem/timem.cpp
peak_rss         194.691 MB   clang++ tools/timemory-pid/timemory-pid.cpp

NOTE: The JSON filename (e.g. 33ce...c242.json above) is just the md5sum of the compile command without spaces, e.g. for the command g++ foo.cpp -o foo, the md5sum is computed from g++foo.cpp-ofoo. This is used to ensure uniqueness and reproducibility.

Advanced Tutorial

In CMake, CTP uses what is called the RULE_LAUNCH_COMPILE property to prefix every compile command and enable_compile_time_perf(...) is a CMake macro that sets it. This macro has quite a few features to enable getting the exact amount of information emitted during CI. For example, you may want to create a CTest which depends on the all target being built and you want the compile command to print an abbreviated path and include the optimization/arch flags so that you can set FAIL_REGULAR_EXPRESSION if the build time exceeded the range defined in the expression. In other words, if the failure case is the wall-clock compile time exceeding 30 seconds with the clang compiler and -O3 -march=native, transforming clang++ source/tools/timemory-timem/timem.cpp into rendering clang++ -O3 -march=native timem.cpp for ^wall_clock [3-9][0-9].([0-9]+) sec clang.. -O3 -march=native timem.cpp" is supported.

  • It can be applied globally, per-project, per-directory, and/or per-target
    • GLOBAL (zero args)
    • PROJECT (zero args)
    • DIRECTORY (1+ args)
    • TARGET (1+ args)
  • You can pass options to timem
    • TIMEM_OPTIONS (1+ args)
    • See timem --help
  • You can pass options to the python analysis script
    • ANALYZER_OPTIONS (1+ args)
    • See compile-time-perf-analyzer --help
  • You can prefix link commands as well as compile commands
    • LINK (zero args)
  • You can customize the output directory
    • TIMEM_OUTPUT_DIRECTORY (one arg)
add_library(foo ...)

enable_compile_time_perf(foo-ctp
    LINK                                    # include link command
    TARGET
        foo                                 # only apply to foo target
    TIMEM_OPTIONS
        --disable-sampling                  # disable timem sampling /proc/pid on Linux
    ANALYZER_OPTIONS                        #
        -m wall_clock peak_rss cpu_clock    # only report these metrics
        -s "${PROJECT_SOURCE_DIR}/"         # remove this strings from prefix/suffix
        -i "^(-D).*"                        # include the compile definitions in the labels
        -n 5                                # only show first 5 entries
    )

# enable unity builds
set(CMAKE_UNITY_BUILD ON)

add_library(bar ...)

enable_compile_time_perf(bar-ctp
    TARGET
        bar
    ANALYZER_OPTIONS
        -r ".dir/Unity/unity_[0-9]_(cxx|cpp).(cxx|cpp)" # remove unity build generated path
        -i "^(-).*"             # include everything starting with hyphen in label
        -e "^(-D).*(_EXPORTS)$" # except for definitions ending with _EXPORTS
    )

Manual Usage

CTP installs a simple Python script called compile-time-perf-analyzer. Usage is fairly staight-forward.

$ compile-time-perf-analyzer --help
usage: compile-time-perf-analyzer [-h] [-v] [-n MAX_ENTRIES] [-i [INCLUDE_REGEX [INCLUDE_REGEX ...]]]
                                  [-e [EXCLUDE_REGEX [EXCLUDE_REGEX ...]]] [-f [FILE_EXTENSIONS [FILE_EXTENSIONS ...]]]
                                  [-s [STRIP [STRIP ...]]] [-r [REGEX_STRIP [REGEX_STRIP ...]]] [-m METRIC [METRIC ...]]
                                  [-l]
                                  [inputs [inputs ...]]

Measures high-level timing and memory usage metrics during compilation

positional arguments:
    inputs                                      List of JSON files or directory containing JSON files from timem

optional arguments:
    -h, --help                                  show this help message and exit
    -v, --verbose, --debug                      Print out verbosity messages (i.e. debug messages)
    -n MAX_ENTRIES                              Max number of entries to display
    -i [INCLUDE_REGEX [INCLUDE_REGEX ...]]      List of regex expressions for including command-line arguments
    -e [EXCLUDE_REGEX [EXCLUDE_REGEX ...]]      List of regex expressions for removing command-line arguments
    -f [FILE_EXTENSIONS [FILE_EXTENSIONS ...]]  List of file extensions (w/o period) to include in label. Use 'lang-c',
                                                'lang-cxx', 'lang-fortran' keywords to include common extensions. Use
                                                'lang-all' to include all defaults. Use 'none' to disable all extension
                                                filtering
    -s [STRIP [STRIP ...]]                      List of string to strip from the start/end of the labels
    -r [REGEX_STRIP [REGEX_STRIP ...]]          List of regular expressions to strip from the start/end of the labels
    -m METRIC [METRIC ...]                      List of metrics to display
    -l, --list-metrics                          List the metrics which can be potentially reported and exit

All arguments after standalone '--' are treated as input files/directories, e.g. ./foo <options> -- bar.txt baz/

In order to generate the JSON files for the Python script, just prefix every compile command with timem -o <DIR>/%m -q -- where <DIR> is the common directory for the output files, %m instructs timem to generate an md5sum of everything after the --, -q just instructs timem to not output to console.

# original command
g++ foo.cpp -o foo

# manual command
timem -o foo-ctp/%m -q -- g++ foo.cpp -o foo

Example Usage

In timemory, the usage looks like this in the main CMakeLists.txt:

option(TIMEMORY_USE_CTP "Enable compile-time-perf" OFF)
mark_as_advanced(TIMEMORY_USE_CTP)

if(TIMEMORY_BUILD_DEVELOPER OR TIMEMORY_USE_CTP)
    find_package(compile-time-perf)
    if(compile-time-perf_FOUND)
        enable_compile_time_perf(timemory-compile-time
            LINK
            ANALYZER_OPTIONS
                -s "${PROJECT_BINARY_DIR}/" "${PROJECT_SOURCE_DIR}/"
                -f "lang-all" "so" "a" "dylib" "dll"
                -i ".*(_tests)$" "^(ex_).*"
                -e "^(@rpath).*" "^(/usr)" "^(/opt)")
        set(TIMEMORY_USE_CTP ON)
    else()
        set(TIMEMORY_USE_CTP OFF)
    endif()
endif()

Later, in source/tests/CMakeLists.txt, a ctest is created which executes during CI alongside all the tests (at which point, all the code has been built):

if(TIMEMORY_USE_CTP AND compile-time-perf_ANALYZER_EXECUTABLE)
    add_test(
        NAME                compile-time-perf
        COMMAND             ${CMAKE_COMMAND}
                            --build ${PROJECT_BINARY_DIR}
                            --target analyze-timemory-compile-time
        WORKING_DIRECTORY   ${PROJECT_BINARY_DIR})
endif()

compile-time-perf's People

Contributors

jrmadsen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

compile-time-perf's Issues

Documentation about the metrics.

Hello jrmadsen,

first I want to say: really really great project! I found amazing use for it in optimizing or CI builds. Sadly I can not find any documentation about what the metrics represent exactly. Whats the difference between "virtual memory" and "peak_rss"?

    "peak_rss": {
        "value": 412.892,
        "repr": 412.892,
        "laps": 29,
        "unit_value": 1000000,
        "unit_repr": "MB"
       },
       "page_rss": {
        "value": 5.091328,
        "repr": 5.091328,
        "laps": 0,
        "unit_value": 1000000,
        "unit_repr": "MB"
       },
       "virtual_memory": {
        "value": 18.75968,
        "repr": 18.75968,
        "laps": 0,
        "unit_value": 1000000,
        "unit_repr": "MB"
       },

one of my logs created by the current master branch. If would have expected virtual_memory to be higher than peak_rss if this would be the VSZ, so its especially confusing to me.

much greetings!
Jonas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.