Basic statistics

| categories: statistics | tags:

Given several measurements of a single quantity, determine the average value of the measurements, the standard deviation of the measurements and the 95% confidence interval for the average.

This is a recipe for computing the confidence interval. The strategy is:

  1. compute the average
  2. Compute the standard deviation of your data
  3. Define the confidence interval, e.g. 95% = 0.95
  4. compute the student-t multiplier. This is a function of the confidence

interval you specify, and the number of data points you have minus 1. You subtract 1 because one degree of freedom is lost from calculating the average. The confidence interval is defined as ybar +- T_multiplier*std/sqrt(n).

import numpy as np
from scipy.stats.distributions import  t

y = [8.1, 8.0, 8.1]

ybar = np.mean(y)
s = np.std(y)

ci = 0.95
alpha = 1.0 - ci

n = len(y)
T_multiplier = t.ppf(1-alpha/2.0, n-1)

ci95 = T_multiplier * s / np.sqrt(n-1)

print [ybar - ci95, ybar + ci95]
[7.9232449090029595, 8.210088424330376]

We are 95% certain the next measurement will fall in the interval above.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter