distributed-system-analysis / pbench Goto Github PK

A benchmarking and performance analysis framework

Home Page: http://distributed-system-analysis.github.io/pbench/

License: GNU General Public License v3.0

Shell 11.64% Perl 8.14% Python 62.71% HTML 6.47% JavaScript 8.85% CSS 0.15% Makefile 1.18% Mako 0.01% Jinja 0.53% Less 0.31% GLSL 0.01%

pbench's Introduction

Pbench

A Benchmarking and Performance Analysis Framework

The code base includes three sub-systems. The first is the collection agent, Pbench Agent, responsible for collecting configuration data for test systems, managing the collection of performance tool data from those systems (sar, vmstat, perf, etc.), and executing and postprocessing standardized or arbitrary benchmarked workloads (uperf, fio, linpack, as well as real system activity).

The second sub-system is the Pbench Server, which is responsible for archiving result tar balls and providing a secure RESTful API to client applications, such as the Pbench Dashboard. The API supports curation of results data, the ability to annotate results with arbitrary metadata, and to explore the results and collected data.

The third sub-system is the Pbench Dashboard, which provides a web-based GUI for the Pbench Server allowing users to list and view public results. After logging in, users can view their own results, publish results for others to view, and delete results which are no longer of use. On the User Profile page, a logged-in user can generate API keys for use with the Pbench Server API or with the Agent pbench-results-move command. The Pbench Dashboard also serves as a platform for exploring and visualizing result data.

How is it installed?

Instructions for installing pbench-agent, can be found in the Pbench Agent Getting Started Guide.

For Fedora, CentOS, and RHEL users, we have made available COPR RPM builds for the pbench-agent and some benchmark and tool packages.

You might want to consider browsing through the rest of the documentation.

You can also use podman or docker to pull Pbench Agent containers from Quay.io.

How do I use pbench?

Refer to the Pbench Agent Getting Started Guide.

TL;DR? See "TL;DR - How to set up the pbench-agent and run a benchmark " in the main documentation for a super quick set of introductory steps.

Where is the source kept?

The latest source code is at https://github.com/distributed-system-analysis/pbench.

Is there a mailing list for discussions?

Yes, we use Google Groups

How do I report an issue?

Please use GitHub's issues.

Is there a place to track current and future work items?

Yes, we are using GitHub Projects. Please find projects covering the Agent, Server, Dashboard, and a project that is named the same as the current milestone.

How can I contribute?

Below are some simple steps for setting up a development environment for working with the Pbench code base. For more detailed instructions on the workflow and process of contributing code to Pbench, refer to the Guidelines for Contributing.

Getting the Code

$ git clone https://github.com/distributed-system-analysis/pbench
$ cd pbench

Running the Unit Tests

Install tox properly in your environment (Fedora/CentOS/RHEL):

$ sudo dnf install -y perl-JSON python3-pip python3-tox

Once tox is installed you can run the unit tests against different versions of python using the python environment short-hands:

tox -e py36 -- run all tests in a Python 3.6 environment (our default)
tox -e py39 -- run all tests in a Python 3.9 environment
tox -e py310 -- run all tests in a Python 3.10 environment
tox -e pypy3 -- run all tests in a PyPy 3 environment
tox -e pypy3.8 -- run all tests in a PyPy 3.8 environment

See https://tox.wiki/en/latest/example/basic.html#a-simple-tox-ini-default-environments.

You can provide arguments to the tox invocation to request sub-sets of the available tests be run.

For example, if you want to just run the agent or server tests, you'd invoke tox as follows:

tox -- agent -- runs only the agent tests
tox -- server -- runs only the server tests

Each of the "agent" and "server" tests can be further subsetted as follows:

agent
- python -- runs the python tests (via pytest)
- legacy -- runs all the legacy tests
- datalog -- runs only the legacy tool data-log tests, agent/tool-scripts/datalog/unittests
- postprocess -- runs only the legacy tool/bench-scripts post-processing tests, agent/tool-scripts/postprocess/unittests
- tool-scripts -- runs only the legacy tool-scripts tests, agent/tool-scripts/unittests
- util-scripts -- runs only the legacy util-scripts tests, agent/util-scripts/unittests
- bench-scripts -- runs only the legacy bench-scripts tests, agent/bench-scripts/unittests
server
- python -- runs the python tests (via python)

For example:

tox -- agent legacy -- run agent legacy tests
tox -- server python -- run server python tests (via pytest)

For any of the test sub-sets on either the agent or server sides of the tree, one can pass additional arguments to the specific sub-system test runner. This allows one to request a specific test, or set of tests, or command line parameters to modify the test behavior:

tox -- agent bench-scripts test-CL -- run bench-scripts' test-CL
tox -- server python -v -- run server python tests verbosely

For the agent/bench-scripts tests, one can run entire sub-sets of tests using a sub-directory name found in agent/bench-scripts/tests. For example:

tox -- agent bench-scripts pbench-fio
tox -- agent bench-scripts pbench-uperf pbench-linpack

The first runs all the pbench-fio tests, while the second runs all the pbench-uperf and pbench-linpack tests.

You can run the build.sh script to execute the linters, to run the unit tests for the Agent, Server, and Dashboard code, and to build installations for the Agent, Server, and Dashboard.

Finally, see the jenkins/Pipeline.gy file for how the unit tests are run in our CI jobs.

Python formatting

This project uses the flake8 method of code style enforcement, linting, and checking.

All python code contributed to pbench must match the style requirements. These requirements are enforced by the pre-commit hook. In addition to flake8, pbench uses the black Python code formatter and the isort Python import sorter.

Use pre-commit to set automatic commit requirements

This project makes use of pre-commit to do automatic lint and style checking on every commit containing Python files.

To install the pre-commit hook, run the executable from your Python 3 framework while in your current pbench git checkout:

$ cd ~/pbench
$ pip3 install pre-commit
$ pre-commit install --install-hooks

Once installed, all commits will run the test hooks. If your changes fail any of the tests, the commit will be rejected.

Pbench Release Tag Scheme (GitHub)

We employ a simple major, minor, release, build (optional) scheme for tagging starting with the v0.70.0 release (v<Major>.<Minor>.<Release>[-<Build>]). Prior to the v0.70.0 release, the scheme used was mostly v<Major>.<Minor>, where we only had minor releases (Major = 0).

Container Image Tags

This same GitHub "tag" scheme is used with tags applied to container images we build, with the following exceptions for tag names:

latest - always points to the latest released container image pushed to a repository
v<Major>-latest - always points to the "latest" Major released image
v<Major>.<Minor>-latest - always points to the "latest" release for Major.Minor released images
<SHA1 git hash> (9 characters) - commit hash of the checked out code

References to Container Image Repositories

The operation of our functional tests, the Pbench Server "in-a-can" used in the functional tests, and other verification and testing environments use container images from remote image registries. The CI jobs obtain references to those repositories using Jenkins credentials. When running those same jobs locally, you can provide the registry via ${HOME}/.config/pbench/ci_registry.name.

If this file is not provided, local execution will report an error.

pbench's People

Contributors

Stargazers

Watchers

Forkers

portante atheurer ndokos ekuric jtaleric arcolife mdshuai jeremyeder sac-urs anpingli sdnnfv smarterclayton jayunit100 chaitanyaenr vikaslaad greatmazinger cronburg mffiedler psuriset jmencak lukas-vlcek jtudelag tenstormavi ravisantoshgudimetla hanshihai hongkailiu udaycoder ashishkamra sjug mrsiano alexxnica kryndex mbruzek deepthidharwar gurbirkalsi cu12 eljabsheh habbdt wabouhamad rjrajivjha renzhengeek linus5 chawlanikhil24 sourabhtk37 zhangrb noushi brianr27 hifzakh kshithijiyer riya-17 tc-wilson aakarshg ldong2014 pnasrat jirka-h aquibbaig jeniferh mnmehta grafuls kzkz8919 zulcss sunilangadi2 karthiksundaravel natoscott fuqingwang opensourcepilot maxusmusti ldoktor haoweiqiu npalaska robertkrawitz dbutenhof melghub dulili anishaswain arzoo14 maxpshaw webbnh nikiglen xiaoxiong-huang pradiptapks michaelgoldpiano ekaynar goodwinos vishalvvr k-rister siddardh-ra engr-krooozy mvarshini shubham-html-css-js anishahalwai lashn rmohnani prakalp23 lixuemin2016 inntran sousinha97 jainnikhil30 em-winterschon mfleader

pbench's Issues

Examine the use of $1 in other function in agent/base

debug_log was modified to print out all its args. Should all the other functions do the same?

clear-results doesnt clean results from clients

Kill-tools, clear-tools clear tools from both client & server.
But clear-results doesnt clear from clients. It clears only from server.
Every time need to do manually.

For ex: in my experiment, i need to run fio from host on 16 VM's. Every time i need to clear results on vm using scripts. good to have pbench clean results on clients too

pprof-datalog only supports OSE...should also support origin.

Only the service names and /etc/sysconfig locations differ. They should be conditionalized.
/cc @ekuric

diff -pruN /opt/pbench-agent/tool-scripts/datalog/pprof-datalog.orig /opt/pbench-agent/tool-scripts/datalog/pprof-datalog

--- /opt/pbench-agent/tool-scripts/datalog/pprof-datalog.orig 2016-02-09 12:29:38.079425999 -0500
+++ /opt/pbench-agent/tool-scripts/datalog/pprof-datalog 2016-02-09 12:30:17.701923330 -0500
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

-openshift_master="/etc/sysconfig/atomic-openshift-master"
-openshift_node="/etc/sysconfig/atomic-openshift-node"
+openshift_master="/etc/sysconfig/origin-master"
+openshift_node="/etc/sysconfig/origin-node"

profile="$1"
osecomponent="$2"
@@ -12,18 +12,18 @@ ose_pprof() {
case "$profile" in
cpu)
if grep -q "^OPENSHIFT_PROFILE=cpu" $openshift_master; then

                   systemctl restart atomic-openshift-master.service

                   systemctl restart origin-master.service
             else
                 echo "OPENSHIFT_PROFILE=cpu" >> $openshift_master

                   systemctl restart atomic-openshift-master.service

                   systemctl restart origin-master.service
             fi
          ;;
         mem)
             if grep -q "^OPENSHIFT_PROFILE=mem" $openshift_master; then

                   systemctl restart atomic-openshift-master.service

                   systemctl restart origin-master.service
             else
                 echo "OPENSHIFT_PROFILE=mem" >> $openshift_master

                   systemctl restart atomic-openshift-master.service

                   systemctl restart origin-master.service
             fi
         ;;
     esac

@@ -32,18 +32,18 @@ ose_pprof() {
case "$profile" in
cpu)
if grep -q "^OPENSHIFT_PROFILE=cpu" $openshift_node; then

                   systemctl restart atomic-openshift-node.service

                   systemctl restart origin-node.service
             else
                 echo "OPENSHIFT_PROFILE=cpu" >> $openshift_node

                   systemctl restart atomic-openshift-node.service

                   systemctl restart origin-node.service
             fi
         ;;
         mem)
             if grep -q "^OPENSHIFT_PROFILE=mem" $openshift_master; then

                   systemctl restart atomic-openshift-node.service

                   systemctl restart origin-node.service
             else
                 echo "OPENSHIFT_PROFILE=mem" >> $openshift_node

                   systemctl restart atomic-openshift-node.service

                   systemctl restart origin-node.service
             fi
         ;;
     esac

Jenkins plugin for pbench data collection

Jenkins have plugins to add functionality on top on Jenkin
https://wiki.jenkins-ci.org/display/JENKINS/Plugins

It would be good to have plugin of pbench for jenkins to save the performanace data.

Tool RPMs availability and naming

We build RPMs internally for tools that pbench uses. In most cases, they are built from upstream bits without any changes. We could make those available externally through COPR.

In some cases, we might need to patch the upstream.

In all cases, we want to prepend "pbench-" to the RPM name to avoid conflicts with any system provided ones.

When starting/stopping/postprocessing tools, tool arguments are not passed properly

This has to do with how we try to preserve the arguments, especially when it comes to white space in arguments. Adding quotes does not always work. The full-proof approach appears to be putting arguments in a bash array, and referencing that array when calling the tool script

Develop an "all-in-one" environment for development and testing

It is pretty clear that both for development and for simple example use / kick the tires testing we need an all-in-one environment for deploying the agent, background server tasks, web server, etc.

With such an environment we could automate builds and unit tests for integration with TravisCI, and other tools.

All benchmark packages need "pbench_" in the beginning of their name

Too often we may use the wrong package, provided by another repo, because we do not have pbench specific naming for our benchmark packages. We need to make sure all benchmark packages have "pbench_" prepended in their names.

Rename CONFIG env variable to _CONFIGTOOLS_CONFIG

... to avoid land-grab issues :-) The corresponding issue in the configtools repo is distributed-system-analysis/configtools#11

Another fix needed for check_install_rpm

The check for existing RPM does not include $this_version, which can be either "" or ""

pbench_uperf undeterministic port number for uperf server for server counts over 45

pbench_uperf uses a base port number for the server of 20000 and then adds 1000 for each given server.

When using more than 45 servers the port number is over 65535 and uperf will start assigning random numbers for the server port instead.

Would it be possible to only increment the port value only if the server IPs overlap?

sar fails while running pbench with multi vm

running fio job: /var/lib/pbench-agent/fio_multi-vm-lvm-cache:none-io:native-disk:hdd-fs:lvm-prealloc-iodepth-1-jobs-32-ioeng:sync-profile:latency-perf-full-run-with-stefan-rhel72-patch-without-perf-ag-vcpu-2-run:1_2015-10-29_01:31:51/1-randread-4KiB/sample1/fio.job
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (28146) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11246) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11298) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11237) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11327) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11378) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11266) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11354) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11270) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11377) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11417) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11427) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11340) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11464) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11402) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11376) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (11493) - No such process
Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-85]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-85]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-87]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-87]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-98]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-98]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-86]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-86]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-84]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-84]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-95]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-95]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-90]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-90]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-88]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-88]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-91]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-91]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-89]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-89]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-96]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-96]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-94]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-94]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-184]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-184]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess

Get rid of trying to copy directory error (when scp did not use -r)

This issue is to remove a harmless but confusing error message when pbench_fio copies over result files on remote systems. The scp does not use "-r", but tries to copy everything in a directory with a wildcard -including a subdirectory.

fio summary missing some results when using multiple clients

I am observing that when using multiple fio clients (--clients=), some of the results (combined metrics from all clients like IOPS) found in summary files are missing.

Include method to derive efficiency metrics for any pbench benchmark

This issue tracks computing efficiency metrics, to be used on any of the pbench benchmark scripts. Currently we have efficiency metrics for pbench_uperf only, and these are Gb-sec/CPU and transactions-sec/CPU. We want to modularize the process, in order to create any "work/resource" metric for any of the benchmarks.

Tool names are too generic

The executable names on pbench are too generic, specially if you have an RPM install. I'd suggest the tools to have a pbench- prefix, such as pbench-register-tool-set. There's also the possibility to have a 'shell' command, that has subcommands, so the command would turn into pbench register-tool-set.

Device check not working on pbench_fio

pbench_fio has a function called fio_device_check, but does not seem to be working (per Pradeeep)

sar doesnt start while running pbench_fio

sar doesnt start while running pbench_fio some of the cases. While these tests in progress i checked for sar on machine. its not started. so it fails to kill once test done. Its inconsistent. some cases it works fine.

Failed tests:

running fio job: /var/lib/pbench-agent/fio_single-vm-ext4-cache:none-io:native-disk:hdd-img:qcow2-preallocate:falloc-fs:ext4-iodepth-1-jobs-32-ioeng:sync-profile:throughput-without-perf-full-run-with-stefan-rhel72-patch-ag-vcpu-2-run:1_2015-10-30_04:16:18/7-write-16384KiB/sample5/fio.job
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (36157) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (30111) - No such process
Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
[virbr0-122-84]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-84]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.
fio job complete

/opt/pbench-agent/tool-scripts/sar: line 168: kill: (3450) - No such process

/opt/pbench-agent/tool-scripts/sar: line 168: kill: (12532) - No such process

Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.

[virbr0-122-84]Use of uninitialized value $line in scalar chomp at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 97.
[virbr0-122-84]Use of uninitialized value $line in pattern match (m//) at /opt/pbench-agent/tool-scripts/postprocess/sar-postprocess line 98.

fio job complete

running fio job: /var/lib/pbench-agent/fio_single-vm-ext4-cache:none-io:native-disk:hdd-img:qcow2-preallocate:falloc-fs:ext4-iodepth-1-jobs-32-ioeng:sync-profile:throughput-without-perf-full-run-with-stefan-rhel72-patch-ag-vcpu-2-run:1_2015-10-30_04:16:18/12-read-1024KiB/sample2/fio.job

/opt/pbench-agent/tool-scripts/sar: line 168: kill: (6283) - No such process

/opt/pbench-agent/tool-scripts/sar: line 168: kill: (13982) - No such process

fio job complete
The following jobfile was created: /var/lib/pbench-agent/fio_single-vm-ext4-cache:none-io:native-disk:hdd-img:qcow2-preallocate:falloc-fs:ext4-iodepth-1-jobs-32-ioeng:sync-profile:throughput-without-perf-full-run-with-stefan-rhel72-patch-ag-vcpu-2-run:1_2015-10-30_04:16:18/12-read-1024KiB/sample3/fio.job

running fio job: /var/lib/pbench-agent/fio_single-vm-ext4-cache:none-io:native-disk:hdd-img:qcow2-preallocate:falloc-fs:ext4-iodepth-1-jobs-32-ioeng:sync-profile:throughput-without-perf-full-run-with-stefan-rhel72-patch-ag-vcpu-2-run:1_2015-10-30_04:16:18/12-read-1024KiB/sample3/fio.job

/opt/pbench-agent/tool-scripts/sar: line 168: kill: (9006) - No such process
/opt/pbench-agent/tool-scripts/sar: line 168: kill: (15615) - No such process

fio job complete

Number of open files should be bumped up

In some cases, a user can run against the open file limit (the soft limit seems to be 1024, the hard limit is 4096 on my F21 box).

A root user can bump it up though, so we might want to add that to pbench-base.sh:

ulimit -n 1048576

N.B. "unlimited" or any number > 2^20 does not seem to work on F21, but YMMV.

Fix --postprocess-only option for pbench_fio

remote-sysinfo-dump does not create tar file when debug_mode=1

If debug_mode=1, extra messages are output on stdout, which messes up the tar file which is sent over stdout. We'll need to find a better way transfer this over.

Index fio disk stats JSON data found fio-result.txt

Let's consider indexing the JSON data generated by fio as stored in the fio-result.txt file for a given sample.

This should be fairly straight forward since it is already in JSON form, but we need to add the right metadata so that we can find the document algorithmically.

pbench_cyclictest does not work on F22

It's looking for a specific version of the package rt-tests - that fails on F22.

I installed the currently available version of rt-tests (4.2.12-1.fc22)
but there is no command named "cyclictest" in the package.

Is the benchmark obsolete?

Running specific benchmark without registering tools

Presently pbench expects user to register tools. Once debugging is done, user might want to run just benchmark without registering tools.

Ex: Just run pbench_fio without registering tools.

Restore indexing unit tests 7.6 and 7.7

I had to comment these two out of the server/pbench/bin/unittests script, after a small cascade of problems: the travis-ci build broke, I fixed that for now, but that caused the above tests to fail: the pretty-printing of the strucures that are compared is slightly different in the travis-ci environment, causing spurious failures. After a failed effort to fix that, I commented the two failing tests out and will revisit them next week after we get the pbench-agent release out (which is not affected at all by these two tests).

percentage metrics (*_pct) being plotted along with normally ranged metrics

In screenshot_1, Notice vmeff_pct (ranges 0-100) included with other metrics like page_swaps_in/out_sec, pgscand/k_sec and so on.. (ranges 0-2,000,000 approx)

Similarly in screenshot_2, notice the range of the param memused_pct (varied 0-100)

These should be ideally plotted separately, maybe as a toggle button, which when clicked, shows all % stats). The ranges aren't meant to be mixed up, since it creates a confusion for the user.

Improve README to describe various parts of pbench

We need to improve the README file that ends up displayed on GitHub, and move the documentation to the gh-pages side of the house.

Review and obsolete or create issues for TODO items

There used to be a ./doc/TODO tracking a few items that need to be addressed in the pbench code base. The contents of that document are provided in this issue below. We should review them and create individual issues where appropriate, dropping any items that don't make sense anymore.

TODO

Included here are various items which at some point should be done for pbench.

General

job processor

Currently pbench is usually run in a terminal. This is fine for single system use, but does not work well for multi-system tests. We need a way to process job files, so we can (1) not maintain a terminal and (2) submit the same job file to many systems. We most likely need a daemon which waits for new job files and processes a job file once one appears. We could scan a local directory for new files (inotify) and/or periodically check a http/ftp/nfs location for new jobs. Job files could simply be bash scripts, or we could process ourselves, exec'ing each line. Having pbench processing the file might have some advantages in that some state could be saved (variable defs) if the job file should issue a reboot command (and the pbench daemon would resume on boot, processing the remainder of the job file). Running bash scripts, on the other hand, won't survive a reboot, but it would be far easier to implement. With either of these they would probably be run within a screen session so one could attach and watch.

Utils

restrict

If we plan to write a single job file or bash script which gets distributed to many systems, we need to ability to allow different systems run different things, even with the same job file. This is not really difficult with an if statement and hostname checking, but something more convenient would be nice. "restrict" would be a utility which takes a list of hostnames or IPs, followed by a command. In this situation every system with the same job file [that has a restrict command] runs the restrict script, and if their hostname matches the list, then they get to run the command. For example:

#!/bin/bash
restrict client1 uperf --mode=client --server=server1
restrict server1 uperf --mode=server

sync

When running a test with many different hosts, VMs, or containers executing, we often need to synchronize certain things, so we can get repeatable results and have high confidence that what we want to happen is actually happening at the right time. This is also required when one system needs to set up a service (web, etc) before a client system tests that service. A sync is used to wait until the server is done setting up the service.

To do this, we need a "sync" command. All systems using the same job file would call sync with (1) the sync label, "this_sync", and (2) a list of systems who have to participate in the sync. The sync utility will wait until all members in the list are running the sync command for the particular sync name. Once all members have executed the command, then and only then do they get to exit the sync command and resume. An implementation will certainly require networking to make this happen.

An example of job file might be

#!/bin/bash
restrict server1 start-web-service
sync web-ready server1 client1
restrict client1 benchmark-web-server
sync web-test-complete server1 client1
restrict server1 stop-web-service

A sync command may also be within a benchmark script. This may be needed when several systems need to execute the same test, and each iteration in the test must start at the same time for all systems. A benchmark script (in /opt/pbench-agent/benchmark-scripts) may have something like:

for iteration in `seq 1 10`; do
    sync $benchmark-$iteration $systems_running_benchmark
    start-tools
    benchmark-command $benchmark-options
    stop-tools
done

Note that the $systems_running_benchmark above may be the same list that was used in a restrict command in the user's jb file to call the benchmark. The benchmark would also have to process an option which provided this list.

Tools

mpstat

For post-processing, group cpu graphs according to the system topology, with an average graph for system, then a section for each node containing an average graph for nodeX, then individual cpuid graphs with sibling hyperthread cpus paired together:

  [system average graph]

  [node0 average graph]
  [coreid0-threadid0][coredid0-threadid1]
  [coreid1-threadid0][coredid1-threadid1]
  [coreid2-threadid0][coredid2-threadid1]
  [coreid3-threadid0][coredid3-threadid1]

  [node1 average graph]
  [coreid0-threadid0][coredid0-threadid1]
  [coreid1-threadid0][coredid1-threadid1]
  [coreid2-threadid0][coredid2-threadid1]
  [coreid3-threadid0][coredid3-threadid1]

Benchmarks

uperf

Before running any tests, run a very quick test (any type really) to make sure the client can contact the server. If this fails, abort the whole thing.

Install from git repo without packages

Implement an autotools (./configure; make; make install) installation mechanism.

make pbench work on ubuntu

perf-report-consolidated.txt not generated

When the perf tool does post-processing, it creates a consolidated report, where all of the samples for a unique functions, but from different PIDs, are summed. This is currently broken.

Allow the user to specify the IO depth for pbench_fio

iodepth has always been "32", but we should allow this to be changed.

Fix metadata-log command to references tools-$group directory for all operations

We need to update metadata-log to derive tool information from the tools-$group directory instead of the old tools.$group file.

Vastly simplify the process to add a new benchmark to pbench

This documents development of a new feature for pbench. We need a way to integrate new benchmarks that is far easier than the current process.

Today we write a new benchmark script for each benchmark. A pbench benchmark script has to do the following:

Process cmd line options for benchmark like runtime, test-types, etc. Some of these are common across benchmarks and some are benchmark specific.
Installation of benchmark binary, on local and/or remote systems.
Create a list of benchmark iterations to run. Each iteration describes the execution of the benchmark with specific options. This is typically based on $test_types, and other benchmark specific options like $message_sizes (uperf) $block_sizes (fio), $threads (dbench). The list of iterations to run is usually a matrix of these options.
Collection system information before benchmark execution.
Start any server process the benchmark may use (can be more than one if --clients has more than one system)
Run the N benchmark iterations sequentially. A benchmark may run just one instance of the benchmark iteration, or many instances, depending on the options for --clients and --servers (some benchmarks use both, others may use only --clients). The benchmark script runs these instances sequentially:
6A) Start any server process the benchmark may use. Some benchmark do not use this. Those that do may involve many servers.
6B) Create a benchmark config/job file if needed (xml for uperf, job file to fio)
6C) Copy config/job file to client systems (may be only local system or many systems)
6D) Call start-tools
6E) Synchronized start of all clients, and wait until complete
6F) Call stop-tools
6G) Stop the server process(es)
6H) Collect benchmark result data from all clients and servers
6I) Post-process result data (may involve generating new metrics, like throughput/resource). This is usually done by calling a benchmark-specific post-processing script.
6J) Post-process tool data
Generate a summary of all benchmark iterations in txt, csv, and html formats
Collection system information after benchmark execution

Writing a new benchmark script can require up to 1000 lines, and much of this is just duplicated from previous benchmark scripts. Having many copies of similar code ends up being inefficient to maintain. When adding a benchmark to pbench, we need to find a way to provide only the data needed specific to the new benchmark. Pbench needs to process this data and run the benchmark. There should only be one script needed in pbench that can use this benchmark specific data and run the benchmark.

Perhaps not every single benchmark can be done this way, but we should try to get the vast majority of them.

fio: server bad crc on payload

While running pbench_fio on kvm VM's concurrently, Noticing this especially with
jobs=32
iodepth=1

fio: server bad crc on payload (got 0, wanted 6b2a)
fio: server bad crc on payload (got 0, wanted e4e2)
fio: fragment opcode mismatch (6 != 9)
fio: fragment opcode mismatch (6 != 9)
fio: fragment opcode mismatch (6 != 9)

fio job complete

show jenkins job example

how to setup and run pbench
how to push results to server after job completion

Review existing run-scripts for relevance

Add a cpu-all graph which shows breakdown of usr, sys, irq, etc. to replace mpstat data

We have been talking about obsoleting the mpstat tool in favor of just using the sar data, which is identical as it is derived from the same place.

The only piece missing is a cpu-all graph which shows the breakdown of usr, sys, irq, etc. like what the mpstat tool provides.

Generating benchmark summary JSON data for indexing

From Andrew Theurer:

Guys, if you have a chance, take a look at the attached files[1]:

These are generated with a new benchmark-summary script, which all benchmarks will eventually use. Currently I have uperf using this in a git branch of mine.

What's new here is the summary-result.* files, including the json format. html format also now uses a html table.

I think the json format should work for elastic search. It was based on our conversation way back, with some minor tweaks.

[1]archive.zip

In fio-postprocess, a make sure we detect (a little better) when the json data starts

Include turbostat in our build

Add turbostat in our build process, so we can control exactly which turbostat is used. This will work in the same manner that we use sysstat utils (sar, mpstat, iostat, pidstat).

Also always include --debug in the invocation of turbostat

@jeremyeder please let me know if this works for you. We'll try to get this in ASAP

Setup integration with TravisCI for the individual unit tests

We have existing unit tests which we could setup with TravisCI to run automatically on each commit.

iostat and perf prematurelly die when running inside pod on Openshift v3

running pbench_fio inside pod on OSE v3 and once job ends it does not collect iostat / perf data.
Error message as showed below

The following jobfile was created: /var/lib/pbench-agent/fio_12E_2015-11-05_09:52:44/2-read-64KiB/sample2/fio.job
[global]
bs=64k
ioengine=libaio
iodepth=32
direct=1
sync=0
time_based=1
runtime=30
clocksource=gettimeofday
ramp_time=5
[job1]
rw=read
filename=/var/lib/docker/fiotest/fiotest
size=4096M
write_bw_log=fio
write_iops_log=fio
write_lat_log=fio
log_avg_msec=1000
running fio job: /var/lib/pbench-agent/fio_12E_2015-11-05_09:52:44/2-read-64KiB/sample2/fio.job
/opt/pbench-agent/tool-scripts/iostat: line 168: kill: (1456) - No such process
/opt/pbench-agent/tool-scripts/perf: line 139: kill: (1705) - No such process
fio job complete
The following jobfile was created: /var/lib/pbench-agent/fio_12E_2015-11-05_09:52:44/2-read-64KiB/sample3/fio.job

name pbench_fio summary files based on env where they ran

necessary to have summary files summary-results.txt/csv and operations written inside to be differently named based on environment where they ran
eg :

now
summary-results.txt
and inside there are
1-read-4KiB
....
18-randrw-1024KiB

proposed
summary-results_baremetal.txt
1-read-4KiB_baremetal
18-randrw-1024KiB_baremetal

this will help to search / grep results based on where they were ran for case when there are multiple files / runs.

Now,summary files get same name for every test case

We need some efficiency metrics for fio tests

Often using a result metric based on throughput is not enough to compare one test to another, because the bottleneck may come from a hard limit like network link speed or drive speed. Having an alternative metric that shows efficiency helps us better understand the result. For example, IOPS/cpu, or Mbps/CPU. We need to add some metrics like this for fio.

Provide error message/waring when disk provided to pbench_fio which is not available

Run pbench other than root user

While working on getting pbench to run from a user which doesn't exist on other remote nodes, I was able to get a tool to register however pbench now displays there is an additional host named "root".

Ideally, I could set what user to run the tool under.

[stack@manager ~]$ register-tool --name=mpstat [email protected]
[[email protected]]Package pbench-sysstat-11.1.2-32.el7.centos.x86_64 already installed and latest version
[[email protected]]mpstat tool is now registered in group default
[stack@manager ~]$ list-tools
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
ssh: Could not resolve hostname root: Name or service not known
default: root[],192.0.2.12[],192.0.2.11[]

pbench on rhev hypervisor

Supporting pbench tools to install on rhev hypervisor to run benchmarks

Yum failure prevented pbench from registering a tool

This didn't cause any casualties until one of my machines added a repo while automation was running and some of my pbench output only has the vmstat tool rather than all of the tools I was expecting.

This is what prevented the tools (This case mpstat) from registering:

# register-tool --name=mpstat -- --interval=1
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/sjis/os/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found
Trying other mirror.


 One of the configured repositories failed (Red Hat Enterprise Linux for S-JIS (RHEL 7 Server) (RPMs)),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Disable the repository, so yum won't use it by default. Yum will then
        just ignore the repository until you permanently enable it again or use
        --enablerepo for temporary usage:

            yum-config-manager --disable rhel-sjis-for-rhel-7-server-rpms

     4. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=rhel-sjis-for-rhel-7-server-rpms.skip_if_unavailable=true

failure: repodata/repomd.xml from rhel-sjis-for-rhel-7-server-rpms: [Errno 256] No more mirrors to try.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/sjis/os/repodata/repomd.xml: [Errno 14] HTTPS Error 404 - Not Found
For some reason this tool could not be installed

To fix this I had to remove the broken repo. (Subscription-manager repos --disable=$bad_repo) Unfortunately it is unclear as to why that repo decided to show up on one of my machines over the weekend. All I could find related to that is this BZ comment: https://bugzilla.redhat.com/show_bug.cgi?id=1194899#c7

Prior to this failure, I have successful runs of pbench on the same machine with the other tools (mpstat and others.) I am certain the tool was already installed and the yum failure should not affect pbench from registering that particular tool.

Documentation feedback from first external reviewer

From Doug Williams ([email protected]):

Here's some initial comments and suggested edits to pbench-agent.html.

General Comments:

The documentation flow seems to mix of basic, intermediate and advanced topics. It may make sense to segregate these topics by complexity. An example flow:

Running a pre-packaged benchmark on a single node
What data is collected, and selection of tools (default vs optional)
Running benchmark on multiple nodes
Running user-specified benchmark
Adding new benchmark to pbench
Adding new tool to pbench

Some of the text is in an informal conversational style using personal pronouns, 'You', 'I'. I've included some proposed edits if your decide to go with a more formal 3rd person style. I view this as a secondary issue.

Section 1: WARNING

DDW COMMENT: Use of 'some' may be unnecessary, may want to consider the following ...

This document may describe future capabilities of pbench.
Currently both code and documentation are undergoing
active development, and while we strive for consistency,
if you find something not working as described here, please
let us know.  It may be a bug in the documentation, a bug in
the code or a feature not yet implemented.

Section 2: What is pbench?

DDW COMMENT: Text seemed a bit confusing, may want to consider the following ...

... Pbench includes built-in scripts supporting many common
benchmarks such as cyclictest, debench, fio, linpack, migrate,
iozone, netperf, specjbb2005 and uperf.  Options for use of
Pbench with other (non-built-in) benchmarks include:
  o Running pbench in collector-only mode, separately running
     benchmark
  o Extending pbench by through development a benchmark-specific
     pbench script.
Such contributions are more than welcome!

DDW QUESTION: How does one extend Pbench for additional data collectors?

Section 3: Quick links

DDW COMMENT: Both URLS are currently invalid

- Results Directory - http://pbench.example.com/results/
- pbench RPM repo - http://pbench.example.com/repo

Section 4: TL;DR version

DDW COMMENT: Need to update URL to reflect repo location. Will you be using GitHub as a repo?

wget -O /etc/yum.repos.d/pbench.repo http://repohost.example.com/repo/yum.repos.d/pbench.repo

Section 5: How to install

DDW COMMENT: Similar issue concerning repo URL

http://repohost.example.com/repo/yum.repos.d

and

wget -O /etc/yum.repos.d/pbench.repo http://repohost.example.com/repo/yum.repos.d/pbench.repo

Section 5.1: Updating pbench

DDW COMMENT: Nit, section is written with a conversational style, such as 'I' and 'you'.

Consider:

Since the pbench package and associated benchmark and tools RPMS are
updated frequently, it may be necessary to clean the yum cache in order for
yum to see any new versions.  If during update yum reports no packages to
update, try again after cleaning the cache:

<<Command Sequence>>

If may be necessary to logout and re-login after changes. If the above update
encounters problems, try the following workaround:

<<Command Sequence>>

.....

The workaround should not be necessary if currently installed release is
0.31-95 or later.

.....

When upgrading to a release later than -102, due to changes in label
handling it is necessary to clear out and re-register tools post upgrade.
For example:

<<Command Sequence>>

Section 6: First Steps

DDW COMMENT: Another first-person to 3rd person change ...

...
Built-in benchmarks can be run by invoking the associated pbench_XXX
script
  - pbench will install the benchmark if necessary:
...

Section 6.1: First Steps with user-benchmark

DDW COMMENT: should Section 2 make reference to 6.1 (user-benchmark) in context of 'but the data collection can be run separately as well with a benchmark that is not built-in to pbench ....'

DDW COMMENT: Stylistic change to remove first person

Consider:

A user-benchmark script can be used to run other benchmark in addition
to the benchmarks pre-packaged with pbench.  user-benchmark takes a
command as argument ...

Section 6.2: First Steps with Remote Hosts and user-benchmark

DDW COMMENT: This section appears to be the first treatment of multi-host benchmarks. My recommendation is that you show and example of a multi-host packaged benchmark, then show an example of a multi-host user-benchmark.

Section 8: Available tools

DDW COMMENT: Stylistic nit, consider the following wording

register-tool-set configures the following tools by default:

DDW COMMENT: Error in last command sequence

Current:

unregister --name=perf
register-tool --name=perf -- --record-opts="record -a --freq=200"

Should read:

unregister-tool --name=perf
register-tool --name=perf -- --record-opts="record -a --freq=200"

Section 9: Available Benchmark Scripts

DDW COMMENT: Which of these benchmarks support multi-host operation?

DDW COMMENT: Stylistic nit

Consider:

Note that in many of these scripts the default tool group is hard-wired:

edits to the appropriate script may be required when using different tool group

Section 10: Utility Scripts

DDW COMMENT: Stylistic nit

Consider:

This section provides background for the Second steps section below.

Pbench uses a utility scripts to do common operations.  Many of the
utility scripts support the following options:
  --name to specify a tool
  --group to specify a tool group
  --with-options to list or pass options to a tool
  --remote to operate on a remote host

See entries in the FAQ section below for more details on these options.

DDW COMMENT: Consider headings 'Tool Registration related Utility Scripts', 'Tool Control related Utility Scripts', 'Results and Post Processing related Utility Scripts', and 'Miscellaneous Utility Scripts'

Section 11: Second Steps

DDW COMMENT: The warning is a bit confusing. If you're recommending against user-benchmarks, then move this content later in docs. If it's something else, such as ad-hoc scripts, then I would advise that you move the treatment of extensibility of user creation of benchmark scripts into an advanced section.

Section 12: Running Pbench Collection Tools with an Arbitrary Benchmark

DDW COMMENT: Should the warning in Section 11 be moved here?

distributed-system-analysis / pbench Goto Github PK

pbench's Introduction

Pbench

How is it installed?

How do I use pbench?

Where is the source kept?

Is there a mailing list for discussions?

How do I report an issue?

Is there a place to track current and future work items?

How can I contribute?

Getting the Code

Running the Unit Tests

Python formatting

Use pre-commit to set automatic commit requirements

Pbench Release Tag Scheme (GitHub)

Container Image Tags

References to Container Image Repositories

pbench's People

Contributors

Stargazers

Watchers

Forkers

pbench's Issues

diff -pruN /opt/pbench-agent/tool-scripts/datalog/pprof-datalog.orig /opt/pbench-agent/tool-scripts/datalog/pprof-datalog

TODO

General

job processor

Utils

restrict

sync

Tools

mpstat

Benchmarks

uperf

General Comments:

Section 1: WARNING

Section 2: What is pbench?

Section 3: Quick links

Section 4: TL;DR version

Section 5: How to install

Section 5.1: Updating pbench

Section 6: First Steps

Section 6.1: First Steps with user-benchmark

Section 6.2: First Steps with Remote Hosts and user-benchmark

Section 8: Available tools

Section 9: Available Benchmark Scripts

Section 10: Utility Scripts

Section 11: Second Steps

Section 12: Running Pbench Collection Tools with an Arbitrary Benchmark

Recommend Projects

Recommend Topics

Recommend Org