Code Monkey home page Code Monkey logo

Comments (10)

bos avatar bos commented on June 11, 2024

Well that's entertaining. Could you tell me more about the environment you're running in? OS, GHC version, etc?

from criterion.

jmillikin avatar jmillikin commented on June 11, 2024

OS: Ubuntu 11.10
Arch: x86_64
GHC: 7.0.3 (standard version in the Ubuntu repository)

I'm not able to reproduce it on GHC 7.4.1, but it reliably reproduces on GHC 7.0.3.

What library is responsible for measuring clock resolution (statistics, criteiron, etc)? I'd be happy to put in some traces or something to help debug.

from criterion.

bos avatar bos commented on June 11, 2024

It's criterion itself that measures the clock resolution.https://github.com/bos/criterion/blob/master/Criterion/Environment.hs#L37

from criterion.

jmillikin avatar jmillikin commented on June 11, 2024

It looks like the outlier calculation assumes there will be outliers, and produces bad results if all the samples are nearly identical.

In this case, the measurement of clock resolution on my system is coming up with the same value almost every time. I added some traces to Criterion.Analysis.classifyOutliers so it would use only the first ten values, and print out its internal state. Here's the output:

outliers = Outliers {samplesSeen = 10, lowSevere = 10, lowMild = 0, highMild = 0, highSevere = 10}
ssa = fromList [3.814697265625e-6,3.814697265625e-6,3.814697265625e-6,3.814697265625e-6,3.814697265625e-6,3.814697265625e-6,3.814697265625e-6,3.814697265625e-6,3.814697265625e-6,3.814697265625e-6]
loS = 3.814697265625e-6
loM = 3.814697265625e-6
hiM = 3.814697265625e-6
hiS = 3.814697265625e-6
q1 = 3.814697265625e-6
q3 = 3.814697265625e-6
iqr = 0.0

from criterion.

bos avatar bos commented on June 11, 2024

Cool! Thanks for looking into this. That should make it easier to figure out a fix.

from criterion.

dreixel avatar dreixel commented on June 11, 2024

Any progress on this? I'm having the same stuff on Windows 7 64-bit, with criterion-0.6.0.1, ghc-7.4.1, in this very simple example:

module Main where

import Control.DeepSeq
import Criterion.Main
import Data.List ( sort )


main = defaultMain [ bgroup "rands" . benchSorts $ [1..1000::Int] ]

benchSorts xs = seq (sum xs) [ bench "sort" (nf (sum . sort) xs) ]

Typical output is:

warming up
estimating clock resolution...
mean is 1.637714 us (320001 iterations)
found 638950 outliers among 319999 samples (199.7%)
  318951 (99.7%) low severe
  319999 (100.0%) high severe
estimating cost of a clock call...
mean is 123.0625 ns (16 iterations)

benchmarking rands/sort
Warning: Couldn't open /dev/urandom
Warning: using system clock for seed instead (quality will be lower)
mean: 15.09599 us, lb 14.81007 us, ub 15.52451 us, ci 0.950
std dev: 1.798179 us, lb 1.497669 us, ub 2.072980 us, ci 0.950
found 17 outliers among 100 samples (17.0%)
  17 (17.0%) high severe
variance introduced by outliers: 84.204%
variance is severely inflated by outliers

It doesn't seem to be affected by optimization level. Same happens on a virtual machine running linux:

warming up
estimating clock resolution...
mean is 5.303620 us (160001 iterations)
found 287114 outliers among 159999 samples (179.4%)
  145205 (90.8%) low severe
  141909 (88.7%) high severe
estimating cost of a clock call...
mean is 2.490415 us (21 iterations)
found 3 outliers among 21 samples (14.3%)
  3 (14.3%) high mild

benchmarking rands/insertionSort
mean: 281.0402 us, lb 277.6374 us, ub 284.8547 us, ci 0.950
std dev: 18.47338 us, lb 16.33518 us, ub 20.99275 us, ci 0.950
variance introduced by outliers: 61.565%
variance is severely inflated by outliers

from criterion.

letmaik avatar letmaik commented on June 11, 2024

I also got the same with Windows 7 64-bit, criterion-0.6.1.1, ghc-7.4.1:

Warning: Couldn't open /dev/urandom
Warning: using system clock for seed instead (quality will be lower)
warming up
estimating clock resolution...
mean is 2.190758 us (320001 iterations)
found 639297 outliers among 319999 samples (199.8%)
  319298 (99.8%) low severe
  319999 (100.0%) high severe
estimating cost of a clock call...
mean is 179.1769 ns (15 iterations)

benchmarking tiny/fib 10
mean: 4.652202 us, lb 4.358087 us, ub 4.973060 us, ci 0.950
std dev: 1.549386 us, lb 1.293269 us, ub 2.013035 us, ci 0.950
found 2 outliers among 100 samples (2.0%)
  2 (2.0%) high mild
variance introduced by outliers: 97.865%
variance is severely inflated by outliers

benchmarking tiny/fib 15
mean: 49.72007 us, lb 47.22011 us, ub 51.66477 us, ci 0.950
std dev: 11.37166 us, lb 9.687343 us, ub 12.79400 us, ci 0.950
found 173 outliers among 100 samples (173.0%)
  94 (94.0%) low severe
  79 (79.0%) high severe
variance introduced by outliers: 95.725%
variance is severely inflated by outliers

With these tests:

{-# LANGUAGE ScopedTypeVariables #-}

import Criterion.Main

main = defaultMain [
        bgroup "tiny" [ bench "fib 10" $ whnf fib 10
                      , bench "fib 15" $ whnf fib 15
                      ]
       ]

fib :: Int -> Int
fib n | n < 0 = error "negative!"
      | otherwise = go (fromIntegral n)
  where
    go 0 = 0
    go 1 = 1
    go n = go (n-1) + go (n-2)

from criterion.

Gabriella439 avatar Gabriella439 commented on June 11, 2024

classifyOutliers triggers this bug whenever the input contains many identical values:

>>> classifyOutliers (U.replicate 10 0)
Outliers {samplesSeen = 10, lowSevere = 10, lowMild = 0, highMild = 0, highSevere = 10}

This results in duplicate counts in the lowSevere and highSevere range, just like in the other errors reported here, too. This also triggers the bug if even if not all the data is identical:

>>> classifyOutliers (U.fromList [0, 0, 0, 0, 0, 0, 1])
Outliers {samplesSeen = 7, lowSevere = 6, lowMild = 0, highMild = 0, highSevere = 7}

However,it was not clear what would cause it to receive large stretches of identical data, especially when you have 100 samples. You would think that this would be very improbable. So I added a trace to see what the offending data was and this is what it was getting:

benchmarking parse
collecting 100 samples, 1 iterations each, in estimated 10.58121 s
mean: 123.6324 ns, lb 47.33842 ns, ub 238.0733 ns, ci 0.950
std dev: 471.0889 ns, lb 368.2355 ns, ub 707.7345 ns, ci 0.950
fromList [2.7629260312427172e-6,8.555773984302174e-7,-9.80969179760326e-8,-9.809
69179760326e-8,8.555773984302174e-7,-9.80969179760326e-8,-9.80969179760326e-8,8.
555773984302174e-7,-9.80969179760326e-8,8.555773984302174e-7,8.555773984302174e-
7,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.809691797603
26e-8,-9.80969179760326e-8,-9.80969179760326e-8,8.555773984302174e-7,-9.80969179
760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.8096
9179760326e-8,8.555773984302174e-7,-9.80969179760326e-8,8.555773984302174e-7,-9.
80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8
,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.8096917976032
6e-8,8.555773984302174e-7,-9.80969179760326e-8,-9.80969179760326e-8,-9.809691797
60326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969
179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.8
0969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,8.555773984302174e-7,
-9.80969179760326e-8,-9.80969179760326e-8,8.555773984302174e-7,-9.80969179760326
e-8,-9.80969179760326e-8,8.555773984302174e-7,-9.80969179760326e-8,-9.8096917976
0326e-8,-9.80969179760326e-8,-9.80969179760326e-8,8.555773984302174e-7,-9.809691
79760326e-8,-9.80969179760326e-8,8.555773984302174e-7,-9.80969179760326e-8,-9.80
969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,1.09399597753178e-6,-9
.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-
8,8.555773984302174e-7,-9.80969179760326e-8,8.555773984302174e-7,-9.80969179760$
26e-8,-9.80969179760326e-8,8.555773984302174e-7,-9.80969179760326e-8,-9.80969179
760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.8096
9179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,8.555773984302174e-7,-9.
80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8
,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.8096917976032
6e-8,-9.80969179760326e-8,-9.80969179760326e-8,-9.80969179760326e-8,8.5557739843
02174e-7,-9.80969179760326e-8]
found 179 outliers among 100 samples (179.0%)
  79 (79.0%) low severe
  100 (100.0%) high severe

The data has several repeating values, and the value -9.80969179760326e-8 sufficiently dominates the rest of the data set that it triggers the bug. I still don't have an answer for why that exact timing repeats so often. My initial guess was because it was getting near the limit of the timer's minimum step size, but that wasn't the case. Benchmarks on trivial code like nf id () give smaller values in the e-9 range that are much more diverse. Here's a sample of some of them:

-9,5.519538864781468e-9,5.444293899712435e-9,5.59069442957501e-9,5.468012421310282e-9,5.529353425442646e-9,5.553071947040493e-9,5.443476019657337e-9,5.621773871668741e-9,5.605416270566778e-9,5.51872098472637e-9,5.646310273321687e-9,5.549800426820101e-9,5.643038753101294e-9,5.516267344561075e-9,5.60459839051168e-9,5.443476019657337e-9,5.571065308252654e-9,5.783714122578182e-9,6.146852867041778e-9,5.763267121200728e-9,5.49173094290813e-9,5.488459422687737e-9,5.557161347315985e-9,5.440204499436944e-9,5.60459839051168e-9,5.5121779442855845e-9,5.502363383624406e-9,5.6185023514483485e-9,5.443476019657337e-9,5.632406312385018e-9,5.5391679861038246e-9,5.6086877907871706e-9,5.519538864781468e-9,5.605416270566778e-9,5.6185023514483485e-9,5.446747539877729e-9,5.632406312385018e-9,5.611959311007563e-9,5.512995824340683e-9,5.574336828473046e-9,5.45083694015322e-9,5.6185023514483485e-9,5.508906424065192e-9,5.45083694015322e-9,5.616048711283054e-9,5.45083694015322e-9,5.56370438775677e-9,5.621773871668741e-9,5.502363383624406e-9,5.584969269189323e-9,5.6185023514483485e-9,5.6078699107320724e-9,5.643038753101294e-9,6.047071500319799e-9,5.567793788032261e-9,5.638949352825803e-9,5.601326870291287e-9,5.471283941530675e-9,5.587422909354618e-9,5.526081905222253e-9,5.454108460373613e-9,5.638949352825803e-9,5.6561248339828655e-9,5.69456519657248e-9,5.532624945663039e-9,5.546528906599708e-9]

So the missing thing to solve is what causes the timings to abnormally snap to certain values.

from criterion.

bos avatar bos commented on June 11, 2024

Oops, closed this by accident.

from criterion.

bos avatar bos commented on June 11, 2024

Fix validated in f028c02e5ec26c5e24f626c2f53da936bbe926bf.

from criterion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.