Python calculate standard deviation of list
This article will explain four different ways to calculate standard deviation of a list of numbers in python with examples and explanation.
Standard deviation means the variation or dispersion of values of a data set from its mean or average. For a list, it means the variation of its elements from the mean value of elements.
Python provides different ways to calculate standard deviation and we can also calculate it by applying its formula. All this will be covered in this article.
Method 1: Using stdev()
Python
statistics
module has a stdev()
function which takes a data set as argument and returns square root of variance, also called standard deviation. Example, import statistics # create list l=[1,2,3,4,5] d=statistics.stdev(l) print('Standard deviation =',d)
This prints
Standard deviation = 1.5811388300841898
This method is available since Python 3.4
Method 2: Using pstdev()
Python statistics module has a pstdev()
function which calculates standard deviation over the entire population or data set. Example,
import statistics l=[1,2,3,4,5] print('Standard deviation =',statistics.pstdev(l))
Output is
Standard deviation = 1.4142135623730951
stdev()
takes into account sample data[(n -1 ) elements] to calculate variance. Hence, there is a difference between the results of stdev()
and pstddev()
.
Since stdev()
takes a smaller data set, its value is higher as compared to pstdev()
.
Python’s numpy (shorthand for Numerical Python) library contains mathematical functions to work on large data sets. It has a function
std()
which takes a data set argument and returns its standard deviation. Example, import numpy as n l=[1,2,3,4,5] print('Standard deviation =',n.std(l))
Output is
Standard deviation = 1.4142135623730951
There is a difference in the values returnd from statistics
module and numpy
because statistics considers (n-1) elements while numpy takes into account n elements.
Notice that the result of numpy and pstdev()
are identical, since they both cover all list elements.
Method 4: Using formula
Mathematically, standard deviation is the square root of variance. Variance is calculated using below formula
where,
xi is value of obervation or a single list element,
x̄ is the average or mean of list elements,
n is the total number of list elements,
S2 is the variance, and
Σ is the summation.
So, if you carefully look at the formula, it is subtracting each list element from the list average, squaring the result, adding them up and dividing by the list element count.
This values will be variance. Finally, take the square root of variance to get standard deviation.
We can apply this formula in a python program to calculate standard deviation of list elements. Example,
l=[1,2,3,4,5] # calculate average of list mean = sum(l) / len(l) # apply formula variance = sum((x - mean)**2 for x in l) / len(l) # square root of variance std_dev = variance ** 0.5 print('Standard deviation =',std_dev)
To calculate mean or average, Python’s inbuilt sum()
and len()
functions are used.
To calculate, variance, we are iterating over a list, subtracting the mean from each element and taking its square. All these operations are performed in below line
sum((x - mean)**2 for x in l)
This syntax is called Python list comprehension.
If you are not familiar with this syntax, then replace it with Python for loop as shown below.
element_sum = 0 for x in l: element_sum += (x-mean)**2 variance = element_sum / len(l)
Divide the sum of elements with the length of list to get variance. Finally, calculate standard deviation by taking square root of varianceby raising it to the power of 0.5.
Output of this code is
Standard deviation = 1.4142135623730951
Note that this method does not require any external library or module, it is a pure mathematical solution.