voltrondata-labs / benchmarks Goto Github PK
View Code? Open in Web Editor NEWLanguage-independent Continuous Benchmarking (CB) for Apache Arrow
License: MIT License
Language-independent Continuous Benchmarking (CB) for Apache Arrow
License: MIT License
As {arrowbench} is now [capable of] putting case_version
in tags
, {benchmarks} needs to be able to pass it through to conbench, but currently the only think we're reading from it is the real
time. This story can include using as much of the JSON as practically makes sense without breaking histories.
Right now, we're catching catastrophic benchmark failures, but not posting them to Conbench: #143 (comment)
After voltrondata-labs/arrow-benchmarks-ci#146 is resolved (likely by eddelbuettel/digest#189 resulting in a patch), we should turn on posting here: https://github.com/voltrondata-labs/benchmarks/blob/main/benchmarks/_benchmark.py#L252-L282 (Doing so before will result in a lot of messages on PRs that there are errored benchmarks, but the cause of the error is out of the committer's scope.)
Arrow bench has a read_csv benchmark that would be nice to have.
These are the arguments (the defaults from are {arrowbench} are ~what we want to run, though I'm happy to adjust them if we decide that only a subset of sources should be default)
source
argument:
uncompressed
and gzip
compressed files as the compression
argumentarrow_table
and data_frame
as the output
argumentreader
argument should be arrow
(the other readers it knows how to test are not important for and should not be run on conbench)Right now the CSV reading benchmark is reading a gzip file which is actually something of a worst-case scenario (for hot-in-cache data) since the decompression becomes a bottleneck.
Also, the benchmark only tests the CSV file reader and not the streaming CSV reader which is used by the datasets API.
Add capability to add a case_version
tag depending on parameter values as in voltrondata-labs/arrowbench#105 . This only includes versioning Python benchmarks, not R ones, whose versions will be read and passed through from their JSON in a different story.
TPC-H query 21 at scale factor 10 has been regularly failing on machines with less than 64 GB of memory, e.g. https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-t4g-linux-compute/builds/2829#0188b088-42b8-4218-ae4a-93dd0058ce8c As a solution, remove scale_factor=10 permutations in get_valid_cases()
when there is insufficient memory.
benchmarks/benchmarks/tpch_benchmark.py
Lines 6 to 12 in 606f4fc
Currently arrowbench is passing R errors through in its result, but they're getting ignored by old error handling here. We should instead pass them through properly so they show up in Conbench.
Now that voltrondata-labs/arrowbench#33 is merged, could we add the TPC-H benchmarks?
The benchmark name is tpc_h
, and it accepts the following parameters the default values in {arrowbench} are what we would like to call:
{arrowbench} automagically generates the TPC-H test data if it doesn't exist (and reuses it if it does), so we should not need to do anything to put the data anywhere.
When I run all of these permutations on my computer (with maximum cores available, 3 iterations), they take a total of 2.9 minutes to run, so shouldn't add a huge amount of time to our benchmark runs, though if we want to cut that down, we could remove one of the formats (either parquet or feather, they are pretty similar).
We will be adding queries as we have the ability to run them, which I think will need PRs to benchmarks, but once we have the structure up it should be easy to add them in (they will each be added to the defaults in {arrowbench} as well).
I have seen something in conbench.ursa.dev that I would love to use as an example scenario: do we have a performance regression, or do we maybe have a methodological weakness?
https://conbench.ursa.dev/benchmarks/4fe411bf67a94bc6aa9787fc0394bd03/
That is, around 2023-01-05 07:49
it was measured with apache/arrow@e5ec942 that benchmark dataset-selectivity
with case permutation 10%, nyctaxi_multi_parquet_s3
took almost two seconds in each of three iterations: [1.951955, 1.846497, 1.891674]
.
In the previous 1-2 weeks it took ~1.2 seconds:
We should test compression in CSV writing in R once we easily can. Specifically, once this {arrowbench} issue (which in turn depends on this Arrow ticket) is addressed, the compression parameter can be added to CsvWriterBenchmark
.
Traceback (most recent call last):
File "/var/lib/buildkite-agent/miniconda3/envs/arrow-commit/bin/conbench", line 5, in <module>
from conbenchlegacy.cli import conbench
File "/var/lib/buildkite-agent/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/conbenchlegacy/cli.py", line 87, in <module>
instance = benchmark()
File "/var/lib/buildkite-agent/builds/ip-172-31-43-254-1/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/benchmarks/benchmarks/cpp_micro_benchmarks.py", line 107, in __init__
os.environ["CONBENCH_PROJECT_PR_NUMBER"] = self.github_info["pr_number"]
File "/var/lib/buildkite-agent/miniconda3/envs/arrow-commit/lib/python3.8/os.py", line 680, in __setitem__
value = self.encodevalue(value)
File "/var/lib/buildkite-agent/miniconda3/envs/arrow-commit/lib/python3.8/os.py", line 750, in encode
raise TypeError("str expected, not %s" % type(value).__name__)
TypeError: str expected, not int
Offending code:
benchmarks/benchmarks/cpp_micro_benchmarks.py
Line 107 in 079a920
This only happens during PR benchmarks (i.e. the "ursabot please benchmark" workflow).
Per #110 (comment), ideally we want to test core functionality (mostly in _benchmark.py
) directly, instead of just via implemented and example benchmarks. Some non-exclusive options:
I wrote a doc about what is required to run the R benchmarks via BenchmarkR
with arrowbench::run_benchmark()
(which runs for each case) instead of arrowbench::run_one()
(which runs for a single case). A major part of this is making sure the right data and metadata from each run flows around correctly such that eventually it can be POSTed to conbench, so the doc devotes a lot of time to benchmark result (at the case level) schemas.
The implication is probably moving to a more standardized and unified form of benchmark result schema across the different levels and languages, so please add comments with opinions on what that might look like.
In #81 we added tests for the Arrow TPC-H benchmarks. As part of that process {arrowbench} will (re)build duckdb ensuring that it has the tpch extension.
If we do the two following things, that testing time should come down to much closer to what it was before:
If we provided data files like data/customer_1.parquet
(we could make a very very small dataset and put it in this place, so long as the column names are the same), the data generation process will be short-circuited.
The tpch extension is also used at the verification stage. I can provide an option to turn off verification in {arrowbench} for testing purposes so that that does not trigger a duckdb re-build.
It seems the wide-dataframe benchmark expects a benchmarks/data/temp
directory that doesn't exist.
$ conbench wide-dataframe
[...]
Traceback (most recent call last):
[...]
File "/home/antoine/arrow/benchmarks/benchmarks/wide_dataframe_benchmark.py", line 29, in run
self._create_if_not_exists(path)
File "/home/antoine/arrow/benchmarks/benchmarks/wide_dataframe_benchmark.py", line 46, in _create_if_not_exists
parquet.write_table(table, path)
File "/home/antoine/arrow/dev/python/pyarrow/parquet/core.py", line 3103, in write_table
with ParquetWriter(
File "/home/antoine/arrow/dev/python/pyarrow/parquet/core.py", line 1010, in __init__
sink = self.file_handle = filesystem.open_output_stream(
File "pyarrow/_fs.pyx", line 868, in pyarrow._fs.FileSystem.open_output_stream
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 113, in pyarrow.lib.check_status
FileNotFoundError: [Errno 2] Failed to open local file '/home/antoine/arrow/benchmarks/benchmarks/data/temp/wide.parquet'. Detail: [errno 2] No such file or directory
This will demonstrate the benefit of the async feature. For the moment this probably only makes sense to run on EC2/S3.
Parallel to #62, which is to implement a CSV writer benchmark in Python. An R CSV writer benchmark has already been written in {arrowbench}, so this is just to add it here so it gets run, similar to #37 for reading. Settings should probably be parallel to #37.
Now that there is a CSV writer we should benchmark it:
compression = munge_compression(compression, "csv")
out_stream = pyarrow.output_stream(path, compression=compression)
pyarrow.csv.write_csv(table, out_stream)
We would love to run continuous benchmarks for the Arrow JS library. We already have a benchmark setup with benchmark.js at https://github.com/apache/arrow/blob/master/js/perf/index.js. It would be awesome if there was a native integration into Conbench for results from JavaScript benchmarks.
Could you help us set up the benchmarks for JavaScript?
Arrow ticket for Conbench integration: https://issues.apache.org/jira/browse/ARROW-12690
Example build: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/builds/2347#018890c5-5406-469b-a411-3495092d1fe5
[230606-13:36:49.995] [22163] [benchclients.conbench] INFO: try to perform login
[230606-13:36:49.995] [22163] [benchclients.http] INFO: try: POST to https://conbench.ursa.dev/api/login/
[230606-13:36:50.132] [22163] [benchclients.http] INFO: POST request to https://conbench.ursa.dev/api/login/: took 0.1362 s, response status code: 204
[230606-13:36:50.132] [22163] [benchclients.conbench] INFO: ConbenchClient: initialized
[230606-13:36:50.132] [22163] [benchclients.http] DEBUG: POST request JSON body:
{
"run_id": "fc27335fd7364cd0816346a148bee7f4",
"batch_id": "fc27335fd7364cd0816346a148bee7f4-1n",
"timestamp": "2023-06-06T13:36:49.725925+00:00",
"context": {
"arrow_compiler_flags": "-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /var/lib/buildkite-agent/.conda/envs/arrow-commit/include -fdiagnostics-color=always",
"benchmark_language": "R"
},
"info": {
"arrow_version": "13.0.0-SNAPSHOT",
"arrow_compiler_id": "GNU",
"arrow_compiler_version": "11.3.0",
"benchmark_language_version": "R version 4.2.3 (2023-03-15)",
"arrow_version_r": "12.0.0.9000"
},
"tags": {
"cpu_count": null,
"engine": "arrow",
"memory_map": false,
"query_id": "TPCH-01",
"scale_factor": 1,
"format": "native",
"language": "R",
"name": "tpch"
},
"optional_benchmark_info": {},
"github": {
"repository": "https://github.com/apache/arrow",
"pr_number": null,
"commit": "3d0172d40dfcf934308e6e1f4249a854004fe824"
},
"stats": {
"data": [
"0.396095",
"0.458458",
"0.453228"
],
"times": [],
"unit": "s",
"time_unit": "s",
"iterations": 3,
"mean": "0.435927",
"median": "0.453228",
"min": "0.396095",
"max": "0.458458",
"stdev": "0.034595",
"q1": "0.424661",
"q3": "0.455843",
"iqr": "0.031182"
},
"machine_info": {
"name": "ec2-m5-4xlarge-us-east-2",
"os_name": "Linux",
"os_version": "4.14.248-189.473.amzn2.x86_64-x86_64-with-glibc2.10",
"architecture_name": "x86_64",
"kernel_name": "4.14.248-189.473.amzn2.x86_64",
"memory_bytes": "65498251264",
"cpu_model_name": "Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz",
"cpu_core_count": "8",
"cpu_thread_count": "16",
"cpu_l1d_cache_bytes": "32768",
"cpu_l1i_cache_bytes": "32768",
"cpu_l2_cache_bytes": "1048576",
"cpu_l3_cache_bytes": "37486592",
"cpu_frequency_max_hz": "0",
"gpu_count": "0",
"gpu_product_names": []
},
"run_name": "commit: 3d0172d40dfcf934308e6e1f4249a854004fe824",
"run_reason": "commit"
}
[230606-13:36:50.132] [22163] [benchclients.http] INFO: try: POST to https://conbench.ursa.dev/api/benchmark-results
[230606-13:36:50.371] [22163] [benchclients.http] INFO: POST request to https://conbench.ursa.dev/api/benchmark-results: took 0.2394 s, response status code: 200
[230606-13:36:50.372] [22163] [benchclients.http] INFO: unexpected response. code: 200, body bytes: <[
{
"id": "0647f25ae7197cbf8000d16e33d1a4bf",
"run_id": "d46b8964796e4429b39faf0dc15301ea",
"batch_id": "1cdc3a9dc9d04d7782901ce831f34e85",
"timestamp": "2023-06-06T12:21:00Z",
"tags": {
"name": "ReplaceWithMaskLowSelectivityBench",
"suite": "arrow-compute-vector-replace-benchmark",
"params": "16384/99",
"source": "cpp-micro"
},
"optional_bench ...>
Traceback (most recent call last):
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/bin/conbench", line 8, in <module>
sys.exit(conbench())
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/conbench/cli.py", line 149, in _benchmark
for result, output in benchmark().run(**kwargs):
File "/var/lib/buildkite-agent/builds/aws-ec2-m5-4xlarge-us-east-2-i-09bd8650b4486e15b-1/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/benchmarks/benchmarks/tpch_benchmark.py", line 30, in run
yield self.r_benchmark(command, tags, kwargs, case)
File "/var/lib/buildkite-agent/builds/aws-ec2-m5-4xlarge-us-east-2-i-09bd8650b4486e15b-1/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/benchmarks/benchmarks/_benchmark.py", line 304, in r_benchmark
return self.record(
File "/var/lib/buildkite-agent/builds/aws-ec2-m5-4xlarge-us-east-2-i-09bd8650b4486e15b-1/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/benchmarks/benchmarks/_benchmark.py", line 150, in record
benchmark, output = self.conbench.record(
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/conbench/runner.py", line 333, in record
self.publish(benchmark_result)
File "/var/lib/buildkite-agent/builds/aws-ec2-m5-4xlarge-us-east-2-i-09bd8650b4486e15b-1/apache-arrow/arrow-bci-benchmark-on-ec2-m5-4xlarge-us-east-2/benchmarks/benchmarks/_benchmark.py", line 83, in publish
self.conbench_client.post("/benchmark-results", benchmark)
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/benchclients/http.py", line 164, in post
resp = self._make_request("POST", self._abs_url_from_path(path), 201, json=json)
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/benchclients/http.py", line 205, in _make_request
result = self._make_request_retry_until_deadline(
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/benchclients/http.py", line 266, in _make_request_retry_until_deadline
result = self._make_request_retry_guts(
File "/var/lib/buildkite-agent/.conda/envs/arrow-commit/lib/python3.8/site-packages/benchclients/http.py", line 393, in _make_request_retry_guts
raise RetryingHTTPClientNonRetryableResponse(message=msg, error_response=resp)
benchclients.http.RetryingHTTPClientNonRetryableResponse: POST request to https://conbench.ursa.dev/api/benchmark-results: unexpected HTTP response. Expected code 201, got 200. Leading bytes of body: <[
{
"id": "0647f25ae7197cbf8000d16e33d1a4bf",
"run_id": "d46b8964796e4429b39faf0dc15301ea",
"batch_id": "1cdc3a9dc9d04d7782901ce831f34e8 ...>
stdout:
We started using nightly because:
FAILED benchmarks/tests/test_file_benchmark.py::test_read_r[parquet, snappy, table]
FAILED benchmarks/tests/test_file_benchmark.py::test_read_r[parquet, snappy, dataframe]
FAILED benchmarks/tests/test_file_benchmark.py::test_read_r[feather, lz4, table]
FAILED benchmarks/tests/test_file_benchmark.py::test_read_r[feather, lz4, dataframe]
FAILED benchmarks/tests/test_file_benchmark.py::test_write_r[parquet, snappy, table]
FAILED benchmarks/tests/test_file_benchmark.py::test_write_r[parquet, snappy, dataframe]
FAILED benchmarks/tests/test_file_benchmark.py::test_write_r[feather, lz4, table]
FAILED benchmarks/tests/test_file_benchmark.py::test_write_r[feather, lz4, dataframe]
Exception: Error: NotImplemented: Support for codec 'snappy' not built
Exception: Error: NotImplemented: Support for codec 'lz4' not built
> install.packages("arrow")
Installing package into '/home/jkeane/R/x86_64-pc-linux-gnu-library/4.1'
(as 'lib' is unspecified)
trying URL 'https://packagemanager.rstudio.com/all/__linux__/bionic/latest/src/contrib/arrow_6.0.0.2.tar.gz'
Content type 'binary/octet-stream' length 20338508 bytes (19.4 MB)
==================================================
downloaded 19.4 MB
* installing *binary* package 'arrow' ...
* DONE (arrow)
The downloaded source packages are in
'/tmp/RtmpJLf3lB/downloaded_packages'
> arrow_info()
Error in arrow_info() : could not find function "arrow_info"
> arrow::arrow_info()
Arrow package version: 6.0.0.2
Capabilities:
dataset TRUE
parquet TRUE
json TRUE
s3 FALSE
utf8proc TRUE
re2 TRUE
snappy FALSE
gzip FALSE
brotli FALSE
zstd FALSE
lz4 FALSE
lz4_frame FALSE
lzo FALSE
bz2 FALSE
jemalloc FALSE
mimalloc FALSE
To reinstall with more optional capabilities enabled, see
https://arrow.apache.org/docs/r/articles/install.html
As eventually we want to move the posting of results out of this package, this package needs to be able to save results to JSON in the same fashion {arrowbench} does. There are a few steps to this project, of which this is the first:
conbench.record()
(once we've got a separate tool to do so) will likely take a little more work, but allow us simplify the codebase a bit. This may not happen for a bit, but should be kept in mind during (2) especially so we can end up with a tidy codebase.Again, this task is only (1); the rest above is just for context.
Currently for R benchmarks, this repo passescpu_count = NULL
to run_one()
(code), which then does not set the number of CPUs or threads anywhere (it omits that part of the script it creates). When run through higher-level arrowbench interfaces, cpu_count = NULL
gets translated by get_default_parameters()
to c(1L, parallel::detectCores())
, which would create two cases for run_one()
, which would be a problem.
In practice, not calling arrow:::SetCpuThreadPoolCapacity()
means we're running with the default, which is the number of cores on the machine (pyarrow.cpu_count()
). We should move to specifying this and recording it in tags
. Right now the cpu_count
key is in tags, but the value is empty. Changing this will break histories, but we should be able to adjust old records based on machine_info.cpu_core_count
or machine_info.cpu_thread_count
(I'm not exactly sure which we want, but they may not differ for any of the machines we're running on anyway).
Because of the shift to running arrowbench directly from arrow-benchmarks-ci, it may be more pragmatic to break things as we switch over and then do the cleanup, but I'm opening this issue here because the problem is presently here, even if the fix ends up being some tweaks in arrowbench defaults and some database cleanup.
14:29:59 ± pytest -v -s -k serialize benchmarks/tests
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /home/jp/.pyenv/versions/3108-vd-benchmarks/bin/python
cachedir: .pytest_cache
rootdir: /home/jp/dev/voltrondata-labs-benchmarks
collecting ... [221214-14:33:26.893] [361266] [benchmarks._sources] INFO: path does not exist: /home/jp/dev/voltrondata-labs-benchmarks/benchmarks/data/fanniemae_sample.csv
[221214-14:33:26.893] [361266] [benchmarks._sources] INFO: _get_object_url for idx 0
[221214-14:33:26.893] [361266] [benchmarks._sources] INFO: HTTP GET None
Narrowed this down to
benchmarks/benchmarks/_sources.py
Line 437 in 5ea34d7
def _get_object_url(self, idx=0):
if self.paths:
s3_url = pathlib.Path(self.paths[idx])
return (
"https://"
+ s3_url.parts[0]
+ ".s3."
+ self.region
+ ".amazonaws.com/"
+ os.path.join(*s3_url.parts[1:])
)
return self.store.get("source")
where if self.paths
evaluates to False
, and return self.store.get("source")
returns None
.
One can (re)write a dataset (partitioned or not) without reading the full thing into memory with pyarrow. We currently have a benchmark that runs a filter on datasets.
We should create a new benchmark that is similar to the filtering, but on top of filtering, also write the results out to a new dataset (instead of pulling into table like we do at
We might parameterize this over:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.