The following code compares two functions for summing over a vector: <div class="s

FWIW, the hspec test suite has a tool called <a href=

See <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id

A benchmark's runtime can depend on the presence/absence of other benchmarks about criterion HOT 5 OPEN

haskell commented on June 2, 2024

A benchmark's runtime can depend on the presence/absence of other benchmarks

from criterion.

Comments (5)

harendra-kumar commented on June 2, 2024

I am observing a similar behavior. When I add a second benchmark the first one takes longer time. The two pieces of code I am benchmarking are completely unrelated. Also have tried this with many combinations and consistently seeing the same result.

The measurement should measure only the function under test and should be independent of the surrounding code if that's the case. I do not think the generated code of the two benchmarked functions per se is different in the two cases. They are completely unrelated.

Its hard to trust the benchmarking results because of this.

Here is a simplified test code:

import qualified Data.Text.ICU             as TI
...
main :: IO ()
main = do
    str <- readFile "data/English.txt"
    let str' = take 1000000 (cycle str)
        txt = T.pack str'
    str' `deepseq` txt `deepseq` defaultMain
        [ bgroup "text-icu" $ [bench "1" (nf (TI.normalize TI.NFD) txt)]
        , bgroup "just-count" $ [bench "1" (nf (show . length) str')]
        ]

The first benchmark is measuring text-icu normalize function. When run it with only the first benchmark alone it reports:

benchmarking text-icu/1
time                 2.830 ms   (2.777 ms .. 2.913 ms)

When I add the second one it becomes:

cueball:/vol/hosts/cueball/workspace/play/criterion$ ./Benchmark1 
benchmarking text-icu/1
time                 3.709 ms   (3.570 ms .. 3.846 ms)

benchmarking just-count/1
time                 2.677 ms   (2.516 ms .. 2.872 ms)

A 30% degradation by just adding a line. The difference is even more marked in several other cases.

This problem is forcing me to run criterion with only one benchmark at a time. Also note that the benchmark result is wrong even when only one benchmark is chosen out of many via command line. Just the presence of another benchmark is enough irrespective of the runtime selection.

I am running criterion-1.1.1.0 and ghc-7.10.3.

from criterion.

harendra-kumar commented on June 2, 2024

It seems my problem was due to the sharing of input data across benchmarks which caused undue memory pressure for the later benchmarks. The problem got resolved by using env. I rewrote the above code like this:

setup = fmap (take 1000000 . cycle) (readFile "data/English1.txt")

main :: IO ()
main = do
    defaultMain
            [ bgroup "text-icu" $
                [
                    env (fmap T.pack setup) (\txt -> bench "1" (nf (TI.normalize TI.NFD) txt))
                ]
            , bgroup "just-count" $
                [
                    env setup (\str -> bench "1" (nf (show . length) str))
                ]
            ]

One possible enhancement could be to strongly recommend using env in the documentation when using multiple benchmarks or even better if possible detect the case when env is not being used and issue a warning at runtime.

from criterion.

rrnewton commented on June 2, 2024

Yes, being cognizant of working set is hard with Haskell's lazy semantics and GHC's optimizations. I don't know what would be detected here for a warning though -- what would the check for a linter be?

The original issue is a tricky one of compilation units. You can always put benchmarks in separate modules but that's a pain. I'm not sure what would be a good solution to alleviate this pain. TH doesn't seem sufficient.

It kind of seems like you'd want something like fragnix to create minimal compliation units that isolate your benchmarks.

from criterion.

RyanGlScott commented on June 2, 2024

FWIW, the hspec test suite has a tool called hspec-discover that automates the process of discovering other modules in a directory that contain tests (modules that end with the suffix -Spec). If isolating benchmarks into other modules is the recommended approach to solving this particular issue, we could consider implementing hspec-discover-style functionality to automate discover of benchmarks in other modules.

from criterion.

RyanGlScott commented on June 2, 2024

See #166 for another example.

from criterion.

A benchmark's runtime can depend on the presence/absence of other benchmarks about criterion HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent