Binning analysis

The binning analysis is a tool to estimate statistical errors of means computed from sequential data \(X_i\). If the data of the sequence were uncorrelated, the standard error of the mean would be a reliable error quantification. However, it is often the case that subsequent data are correlated, i.e. the data series exhibits a non-vanishing autocorrelation time. In this case, the standard error would underestimate the actual statistical error. The binning analysis introduced in Ref.  [1] accounts for the presence of autocorrelation to give reliable error estimates also in that case; here, we give a compressed summary and recommend reading Ref.  [1] for a very clear and detailed explanation of the matter.

The idea of the binning analysis for a sequence of data \(\{X_i\}_{i=1}^M\) is to construct a series of coarse-grained sequences \[\begin{aligned} X_l^{(k)} = \frac{1}{k}\sum_{i=lk}^{(l+1)k-1}X_i\ . \end{aligned}\] with \(M_k=\lfloor M/k\rfloor\) elements. The mean of each of these sequences remains the same, but with increasing \(n\), consecutive elements of the sequences become less correlated. Therefore, the corresponding standard error estimate of the mean \[\begin{aligned} \Delta_X^{(k)}=\sqrt{\frac{1}{M_k(M_k-1)}\sum_{l=1}^{M_k}\Big(X_l^{(k)}-\langle\langle X^{(k)}\rangle\rangle_{M_k}\Big)^2} \end{aligned}\] increases and eventually converges to an accurate error estimate \[\begin{aligned} \Delta_X=\lim_{k\to\infty}\Delta_X^{(k)}\ . \end{aligned}\] In Eq. 2 above we use the notation \(\langle\langle X\rangle\rangle_{M}=\frac{1}{M}\sum_{i=1}^MX_i\) for the empirical mean.

From this sequence of error estimates one can extract the autocorrelation time of the original sequence as \[\begin{aligned} \tau = \frac12\Bigg[\bigg(\frac{\Delta_X}{\Delta_X^{(1)}}\bigg)^2-1\Bigg]\ . \end{aligned}\]

You will deploy such a binning analysis in this lab course for the numerical data created in the Markov chain Monte Carlo or Molecular Dynamics simulations. Note, however, that such a binning analysis can be performed on any sequence (or time series) of data samples to detect its intrinsic autocorrelation effects independent of the way this data was produced – be it in numerical experiments or in real-life situations (think of any time series such as traffic measurements on the Zülpicher Strasse, orders of cappuccinos at your favorite coffee place, or the daily stock market data of some company).

Implementation

Write a function that performs a binning analysis for a given sequence of observables \(\mathbf X=(X_i)_{i=1,\ldots,M}\). The following pseudo-code sketches the procedure assuming that statistical and reshaping functions are available (as it is the case in julia and python/numpy):

\begin{algorithm}
\caption{Pseudocode for binning analysis}
\begin{algorithmic}
\Function{binninganalysis}{$\vec X$, $k_{\text{max}}$}
\State $M$ = size($\vec X$)
\For{$k$ in 1:$k_{\text{max}}$}
\State $M_k$ = $\lfloor M/k\rfloor$
\State $\vec X^{(k)}\leftarrow$mean(reshape($\vec X$[1:$kM_k$], (k,:)), axis=0)
\State
\Comment{Compute coarse grained sequence}
\State error-est[k] $\leftarrow$ std($\vec X^{(k)}$)/$\sqrt{M_k}$
\Comment{Error estimate of current sequence}
\EndFor
\State
\Return error\_est
\EndFunction
\end{algorithmic}
\end{algorithm}

References

[1]
V. Ambegaokar and M. Troyer, Estimating errors reliably in Monte Carlo simulations of the Ehrenfest model, Am. J. Phys. 78, 150 (2010).

Materials