When computing d(h), we need relative frequencies of groups
in the population
when taking s random samples from a set to determine true
probability p of a group, this is a binomial variable x with success probability
p. Distribution is:
P(x)=(
S
X
)*pX (1-p)S-X
with mean m=sp, s=Ösp(1-p)
estimate of probability is p' := x/s
Chernoff bound: For any a > 0 ,
P(x > sp + a) < e-2a^2
The probability of misestimating an underlying probability
more than e: