spirals-team / benchmark-containers Goto Github PK

View Code? Open in Web Editor NEW

18.0 9.0 9.0 33 KB

A repository of benchmarks packaged as Docker containers

Home Page: https://hub.docker.com/r/spirals

License: Apache License 2.0

Shell 71.94% Perl 28.06%

docker benchmark container parsec iozone npb

benchmark-containers's Issues

Define a Common Command Line Interface to run any container

Instead of dealing with specific command line interfaces (CLIs), it would be relevant to define a standard CLI to run benchmark containers.

Such a CLI should include:

Introspection commands to list the benchmarks, workloads and other parameters (e.g., threads) available and their value.
Execution commands to run the benchmark with a specific workload and set of parameters.

This CLI should be exposed by all the available containers.

[benchmarks] Add a container for NPB

The NAS Parallel Benchmarks (NPB) are a small set of programs designed to help evaluate the performance of parallel supercomputers. The benchmarks are derived from computational fluid dynamics (CFD) applications and consist of five kernels and three pseudo-applications in the original "pencil-and-paper" specification (NPB 1). The benchmark suite has been extended to include new benchmarks for unstructured adaptive mesh, parallel I/O, multi-zone applications, and computational grids. Problem sizes in NPB are predefined and indicated as different classes. Reference implementations of NPB are available in commonly-used programming models like MPI and OpenMP (NPB 2 and NPB 3).

Benchmark Specifications

The original eight benchmarks specified in NPB 1 mimic the computation and data movement in CFD applications:

five kernels:
- IS - Integer Sort, random memory access
- EP - Embarrassingly Parallel
- CG - Conjugate Gradient, irregular memory access and communication
- MG - Multi-Grid on a sequence of meshes, long- and short-distance communication, memory intensive
- FT - discrete 3D fast Fourier Transform, all-to-all communication
three pseudo applications
- BT - Block Tri-diagonal solver
- SP - Scalar Penta-diagonal solver
- LU - Lower-Upper Gauss-Seidel solver

Multi-zone versions of NPB (NPB-MZ) are designed to exploit multiple levels of parallelism in applications and to test the effectiveness of multi-level and hybrid parallelization paradigms and tools. There are three types of benchmark problems derived from single-zone pseudo applications of NPB:

BT-MZ - uneven-size zones within a problem class, increased number of zones as problem class grows
SP-MZ - even-size zones within a problem class, increased number of zones as problem class grows
LU-MZ - even-size zones within a problem class, a fixed number of zones for all problem classes

Benchmarks for unstructured computation, parallel I/O, and data movement:

UA - Unstructured Adaptive mesh, dynamic and irregular memory access
BT-IO - test of different parallel I/O techniques
DC - Data Cube
DT - Data Traffic

GridNPB is designed specifically to rate the performance of computational grids. Each of the four benchmarks in the set consists of a collection of communicating tasks derived from the NPB. They symbolize distributed applications typically run on grids.

ED - Embarrassingly Distributed
HC - Helical Chain
VP - Visualization Pipeline
MB - Mixed Bag

Benchmark Classes

Class S: small for quick test purposes
Class W: workstation size (a 90's workstation; now likely too small)
Classes A, B, C: standard test problems; ~4X size increase going from one class to the next
Classes D, E, F: large test problems; ~16X size increase from each of the previous classes

Reference Implementations

Vendors and others implement the detailed specifications in the NPB 1 report, using algorithms and programming models appropriate to their different machines. NPB 1 implementations are generally proprietary and are not distributed by NAS.

A set of reference implementations of the NPB specifications has been written and distributed by NAS as NPB 2 and NPB 3. These source-code implementations are intended to be run with little or no tuning, and approximate the performance a typical user can expect to obtain for a portable parallel program. NPB 2 contains MPI-based source code implementations of the original eight benchmarks, and NPB 3 has included new benchmarks and problem classes as well as implementations using other programming models. The latest release is NPB 3.3.1.

[benchmarks] Minimize the size of built images

Several optimisations can be applied to reduce the size of built images.

Docker commits each RUN command inside a new layer (which can take a non negligible amount of disk). It is thus important to clean (if possible) the image before the completion of the RUN command.
One example could be to delete all intermediate compilation files just after the compilation (and not in a separate RUN as done currently).

You could see each layer of an image by using the command docker history [tag/id].

A guide for the best practices to build docker images is available here

The current virtual size of PARSEC-3.0 is 25 GB and should be considerably reduced.

raytrace cannot find the input file thai_object

on running raytrace benchmark using the command ./run -a run -p raytrace -i native -n 4, below output is produced, it can be clearly seen it cannot find native input and file thai_statue.obj, but in the filesystem, it is present. How to run it correctly?

[PARSEC] Benchmarks to run:  parsec.raytrace

[PARSEC] [========== Running benchmark parsec.raytrace [1] ==========]
[PARSEC] Deleting old run directory.
[PARSEC] Setting up run directory.
[PARSEC] No archive for input 'native' available, skipping input setup.
[PARSEC] Running 'time /home/parsec-3.0/pkgs/apps/raytrace/inst/amd64-linux.gcc/bin/rtview thai_statue.obj -automove -nthreads 4 -frames 200 -res 1920 1080':
[PARSEC] [---------- Beginning of output ----------]
this =0xb419c0
PARSEC Benchmark Suite Version 3.0-beta-20150206
initializing LRT ray tracer ...
File thai_statue.obj does not exist.
default BVH builder : binnedalldimssavespace


File thai_statue.obj does not exist.
Options: 
Using memory framebuffer...
num nodes in scene graph 1
adding 8 vertices
adding 12 triangles
finalizing geometry
vertices            8 (0.09375 KB)
triangles           12 (0.140625 KB)
texture coordinates 0 (0 KB)
No materials -> create dummy material
building index
using BVH builder default
build time 8e-06
done
sceneAABB =[[[-1,-1,-1,0] ],[[1,1,1,0] ]]
Rendering 1 frames... 
Done

real	0m0.068s
user	0m0.055s
sys	0m0.004s
[PARSEC] [----------    End of output    ----------]
[PARSEC]
[PARSEC] BIBLIOGRAPHY
[PARSEC]
[PARSEC] [1] Bienia. Benchmarking Modern Multiprocessors. Ph.D. Thesis, 2011.
[PARSEC]
[PARSEC] Done.

[benchmarks] Add a container for MineBench

NU-MineBench is a data mining benchmark suite containing a mix of several representative data mining applications from different application domains. This benchmark is intended for use in computer architecture research, systems research, performance evaluation, and high-performance computing. The well-known applications assembled in this benchmark suite have been collected from research groups in industry and academia. The applications contain highly optimized versions of the data mining algorithms. Scalable versions of the applications are also provided. Such extensions were designed and implemented by developers at Northwestern University. Currently, the benchmark has applications with algorithms based on clustering, association rules, classification, bayesian network, pattern recognition, support vector machines and several other well known data mining methodologies. These applications are used in diverse fields like bioinformatics, network intrusion, customer relationship management, and marketing.
If you would like to contribute any well-known and stable application to our benchmark suite, please do not hesitate to contact us.

List of algorithms and applications

Approximate Frequent Itemset Miner
Apriori association rule mining
Naive Bayesian Network data classifier
BIRCH data clustering
ECLAT association rule mining
GeneNet, a DNA sequencing application using Bayesian network
HOP, a density-based data clustering
K-means and Fuzzy K-means data clustering
Parallel ETI Mining
PLSA (Parallel Linear Space Alignment)
Recursive_Weak, Recursive_Weak_pp
RSearch, a sequence database searching with RNA structure queries
ScalParC decision-tree based data classification
Semphy, a structure learning algorithm that is based on phylogenetic trees
SNP (Single Nucleotide Polymorphisms) data classification
SVM-RFE (Support Vector Machines - Recursive Feature Elimination) is a feature selection algorithm
Utility mining, association rule-based mining algorithm

[parsec-3.0] netapps benchmarks

Benchmarks parsec.netdedup, parsec.netferret, parsec.netstreamcluster are not correctly built.
The same error is described and fixed here.
You can consider fixing it automatically by applying a patch.

mark

[benchmarks] Add a container for ALPBench

Media applications are important for general-purpose processors, but are becoming increasingly complex with high performance demands. Future processors can potentially meet these demands by exploiting various levels of parallelism in these applications.

ALPBench consists of a set of parallelized complex media applications gathered from various sources, and modified to expose thread-level and data-level parallelism. The applications are:

SpeechRec: Speech recognition, derived from CMU Sphinx 3.3
FaceRec: Face recognition, derived from CSU Face Recogniser
RayTrace: Ray Tracer, same as Tachyon Retracer
MPGenc: MPEG-2 encode, derived from MSSG MPEG-2 encoder
MPGdec: MPEG-2 decode, derived from MSSG MPEG-2 decoder

Key Features

Multithreaded with POSIX threads
128-bit SSE2 media instructions
Some algorithmic enhancements
Paper analyzing parallelism and performance

Platforms

Multithreaded versions can be used on any system that supports POSIX threads. SSE2 instructions can be used on Intel processors supporting SSE2 instructions.

[benchmarks] Add a container for STREAM

STREAM: Sustainable Memory Bandwidth in High Performance Computers

The STREAM benchmark is a simple, synthetic benchmark designed to measure the
sustainable memory bandwidth (in MB/s) and a corresponding computation rate for four
simple vector kernels.

[benchmarks] Add a container for IOZone

IOzone is a filesystem benchmark tool. The benchmark generates and measures a variety of file operations. Iozone has been ported to many machines and runs under many operating systems.

Iozone is useful for performing a broad filesystem analysis of a vendor’s computer platform. The benchmark tests file I/O performance for the following operations: Read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read, pread ,mmap, aio_read, aio_write.

Benchmark Features

ANSII C source
POSIX async I/O
Mmap() file I/O
Normal file I/O
Single stream measurement
Multiple stream measurement
Distributed fileserver measurements (Cluster)
POSIX pthreads
Multi-process measurement
Excel importable output for graph generation
Latency plots
64bit compatible source
Large file compatible
Stonewalling in throughput tests to eliminate straggler effects
Processor cache size configurable
Selectable measurements with fsync, O_SYNC
Builds for: AIX, BSDI, HP-UX, IRIX, FreeBSD, Linux, OpenBSD, NetBSD, OSFV3, OSFV4, OSFV5, SCO OpenServer, Solaris, MAC OS X, Windows (95/98/Me/NT/2K/XP)

[benchmarks] Split the Parsec 3.0 image in two parts

Currently, the docker image for Parsec benchmarks are minified and source are removed before the final commit. It is thus impossible to re-compile manually some benchmarks.

For this reason, it will be interesting to separate this image in two. The first image will allow to install all necessary packages, to download Parsec 3.0, to apply all patches, to create the missing links for inputs and to compress all available inputs.

The second image will be based on the first one and it will allow to build all benchmarks (builds cleaned + sources removed) and to launch easily a benchmark with the extended run.sh script.

[benchmarks] Add a container for bonnie++

Bonnie++ is a program to test hard drives and file systems for performance or the lack therof. There are a many different types of file system operations which different applications use to different degrees. Bonnie++ tests some of them and for each test gives a result of the amount of work done per second and the percentage of CPU time this took. For performance results higher numbers are better, for CPU usage lower are better (NB a configuration scoring a performance result of 2000 and a CPU result of 90% is better in terms of CPU use than a configuration delivering performance of 1000 and CPU usage of 60%).

There are two sections to the program's operations. The first is to test the IO throughput in a fashion that is designed to simulate some types of database applications. The second is to test creation, reading, and deleting many small files in a fashion similar to the usage patterns of programs such as Squid or INN.

[parsec-3.0] Compress the input workloads

You can save further disk space by compressing the input workloads that are shipped with PARSEC.

This requires the following updates:

After unzipping the PARSEC archive, execute the command find . -name \*.tar -exec xz -9 \{\} \; to compress all the input workloads (this should save about 4GB).
The command run.sh should be extended to include a use primitive that unzips a given workload whose value can only be benchmark|all:test|native|simdev|simlarge|simmedium|simsmall. If the workload is not unzipped a priori, the command run.sh will do it implicitly.

Running spirals/parsec-3.0 on Kubernetes cluster

I am trying to run the spirals/parsec-3.0 image as a pod on Kubernetes cluster referring to https://gist.github.com/balajismaniam/fac7923f6ee44f1f36969c29354e3902

I my case, I am trying with a manifest like this,

apiVersion: v1
kind: Pod
metadata:
  name: parsec-pod
spec:
  containers:
  - image: spirals/parsec-3.0
    command: ["/bin/bash"]
    args: ["parsecmgmt -r run -p canneal -i simsmall"]
    name: parsec-ctn
  restartPolicy: "Never"

However, when I run get pod,

NAME         READY   STATUS              RESTARTS   AGE
parsec-pod   0/1     ContainerCreating   0          6s
parsec-pod   0/1     Error               0          8s
parsec-pod   0/1     Error               0          10s

Here are the event logs,

Events:
  Type    Reason          Age    From               Message
  ----    ------          ----   ----               -------
  Normal  Scheduled       5m28s  default-scheduler  Successfully assigned default/parsec-pod to ocp410-24nts-worker-rbv46
  Normal  AddedInterface  5m29s  multus             Add eth0 [10.128.2.251/23] from openshift-sdn
  Normal  Pulling         5m29s  kubelet            Pulling image "spirals/parsec-3.0:latest"
  Normal  Pulled          5m23s  kubelet            Successfully pulled image "spirals/parsec-3.0:latest" in 5.930295127s
  Normal  Created         5m23s  kubelet            Created container parsec-ctn
  Normal  Started         5m23s  kubelet            Started container parsec-ctn

@mcolmant @gfieni @rouvoy , could you please point out what I am doing wrong here. Any suggestion would be greatly appreciated.

[DockerHub] Adding the support for automated build in GitHub

Creating a Spirals-Team account on DockerHub
Using the automated build support for GitHub
Upload each benchmark container on DockerHub

[benchmarks] Add a container for MOSBench

Mosbench is a set of application benchmarks designed to measure scalability of operating systems. It consists of applications that previous work has shown not to scale well on Linux and applications that are designed for parallel execution and are kernel intensive. The applications and workloads are chosen to stress important parts of many kernel components.

Mosbench includes Exim, a mail server; Memcached, an object cache; Apache, a web server; PostgreSQL, a SQL database; gmake, a parallel build system; psearchy, a parallel text indexer; and Metis, a multicore MapReduce library.

[benchmarks] Add a container for Phoronix Test Suite

The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added. The software is designed to effectively carry out both qualitative and quantitative benchmarks in a clean, reproducible, and easy-to-use manner. The Phoronix Test Suite can be used for simply comparing your computer's performance with your friends and colleagues or can be used within your organization for internal quality assurance purposes, hardware validation, and continuous integration / performance management.

Supported platforms

Linux
Windows
Apple OS X
GNU Hurd
Solaris
BSD Operating Systems

[benchmarks] Add a container for FileBench

Filebench is a file system and storage benchmark that allows to generate a large variety of workloads. Unlike typical benchmarks it is very flexible and allows to minutely specify (any) applications' behaviour using extensive Workload Model Language (WML). Filebench uses loadable workload personalities to allow easy emulation of complex applications (e.g., mail, web, file, and database servers). Filebench is quick to set up and easy to use compared to deploying real applications. It is also a handy tool for micro-benchmarking.

Features

Filebench includes many features to facilitate file system benchmarking:

Multiple workload types support via loadable personalities
Ships with more than 40 pre-defined personalities, including the one that describe mail, web, file, and database servers behaviour
Easy to add new personalities using reach Workload Model Language (WML)
Multi-process and multi-thread workload support
Configurable directory hierarchies with depth, width, and file sizes set to given statistical distributions
Support of asynchronous I/O and process synchronization primitives
Integrated statistics for throughput, latency, and CPU cycle counts per system call
Tested on Linux, FreeBSD, and Solaris platforms (should work for any POSIX-compliant Operating System)