Calculate the variance of a sample from the variances of its subsamples
Mar 12, 2016 00:00 · 128 words · 1 minute read
Calculating the mean of a sample from the means of its subsamples is pretty straightforward1.
Calculating however the variance of a sample from the variances of its subsamples didn't seem straightforward to me. It looks like there is a formula that allows us to do that. Assuming we have \(g\) sub-samples, each with \(k_j\), \(j=1,...,g\) elements for a total of \(n=\sum k_j\) values, then:
\[ Var(X_1,...,X_n) = \frac{1}{n-1}(\sum_{j=1}^{g} (k_j-1)Var_j + \sum_{j=1}^{g}k_j(\bar{X}_j-\bar{X})^2) \]
\(Var\) being the variance, and \(\bar{X}_{j}\) the mean of the sample \(j\).
This wikipedia article describes various algorithms to calculate variance in a number of scenarios (e.g. online, in parallel, etc), whereas this paper presents some computational considerations when updating mean and variance estimates.
- Provided of course that we have the sizes of all subsamples [return]