What is Tolerance?
Tolerance marks the error a user allows in an aggregation, within a confidence interval. That means that, giving a CI of 95% for example, 95 of 100 times runs of the same query, the answer would have a relative error of [0.0, tolerance]
How do we calculate Tolerance?
The idea is with a user-provided tolerance value, we can estimate the required sample size to satisfy the query that computes the mean with a predefined level of certainty.
Given a confidence level of say 95%, we want to determine a confidence interval for which 95% of all our mean estimations will fall within the target range. That is, given a sample of size n
drawn from a population (with
and
as population mean and variance respectively), determine the confidence interval of the sample mean
so it has a 95% chance of containing ![](https://render.githubusercontent.com/render/math?math=\mu)
![](https://render.githubusercontent.com/render/math?math=Confidence\%20Interval%20=%20[\bar{x}-Z\frac{\sigma}{\sqrt{n}},\bar{x}+Z\frac{\sigma}{\sqrt{n}}])
In other words;
![](https://render.githubusercontent.com/render/math?math=Pr(\bar{x}-Z\frac{\sigma}{\sqrt{n}}<=\mu<=\bar{x}+Z\frac{\sigma}{\sqrt{n}}) = 0.95)
Here, the Central Limit Theorem is taken into account:
Regardless of the distribution of the population(as long a
and
are finite), the distribution of the sample means is normal.
As well as the notion of Standard Error of the Mean:
Given a single sample of size n, how can we determine how far its mean $\bar{x}$ is from the population mean $\mu$? The answer,
, reflects the standard deviation of the sample means and can be estimated as
, with s being the standard deviation of the sample.
Tolerance is the Relative Standard Error (RSE) of the distribution of the sample means. The formula of the RSE can be expressed in terms of the Standard Error (SE) and the Estimated Mean (
).
Consequently, the RSE can be estimated from the Standard Error (
) of the Sample Mean and the Estimated Mean (
with the formula
.
![](https://render.githubusercontent.com/render/math?math=RSE=\frac{Z\frac{\sigma}{\sqrt{n}}}{\bar{x}} \le t)
Another way to put it is; "we want that the error of the mean
to be less than the tolerance applied to the estimated mean (
)";
![](https://render.githubusercontent.com/render/math?math=\bar{x}*t=Z\frac{\sigma}{\sqrt{n}})
![](https://render.githubusercontent.com/render/math?math=t = \frac{Z*\frac{\sigma}{\sqrt{n}}}{\bar{x}})
Both ways lead to the same equation which allows determining the sample size as follows;
![](https://render.githubusercontent.com/render/math?math=\sqrt{n} = \frac{Z*\sigma}{t*\bar{x}})
![](https://render.githubusercontent.com/render/math?math=n=(\frac{{Z*\sigma}}{t*\bar{x}})^2)
![](https://render.githubusercontent.com/render/math?math=n=\frac{1}{t^2}(\frac{{Z*\sigma}}{\bar{x}})^2)
Standard Error of the Mean,
, can be estimated as
, with s being the standard deviation of the sample. It can be done because of the assumption of normality.
![](https://render.githubusercontent.com/render/math?math=n=\frac{1}{t^2}(\frac{{Z*s}}{\bar{x}})^2)
Deviation of the sample mean from the population mean is the SEM, and we want the percentage of error with respect to the mean, which should have tolerance as upper bound (ratio of the error of the SEM
). This gives us;
![](https://render.githubusercontent.com/render/math?math=\frac{SEM}{\bar{x}} \le tolerance)
This issue has the scope to collect all the information about the table tolerance and guide a bit the future development.
Missing steps.