Web Analytics and Standard Deviations

As a web analyst it can sometimes be difficult to show an effect on even the simpliest site metrics. How do you know when a change in visits is significant and not down to simple chance. Take a look at this simple example - is the increase in sessions observed on the 7th September due to the fantastic email marketing campaign you ran that day.

 Web analytics data in spreadsheet is the change significant?

How do you know when you have actually made a difference - is that the trend you are seeing significant or just random. There are a number os statistical techniques that you can use. One of the simplies is frequently used in manufacturing quality management and is sometimes referred to as statistical process control.

This technique developed by W. Edwards Deming and in the 1960s and was widely used in the motor industry to reduce undiserable variation in component manufacturing. Statistical process control uses the measurement of standard deviations to identify changes in measurements that lie outside that expected by randon chance.    

A table of session data in Microsoft Excel

Standard Deviations

In statistics, the standard deviation is a measure of the dispersion of a collection of values and is defined as the root-mean-square (RMS) deviation of the values from their mean, or as the square root of the variance.

Standard deviation remains the most common measure of statistical dispersion, measuring how widely spread the values in a data set are. If many data points are close to the mean, then the standard deviation is small; if many data points are far from the mean, then the standard deviation is large. If all data values are equal, then the standard deviation is zero.

If a data distribution is approximately normal then about 68% of the values are within 1 standard deviation of the mean, about 95% of the values are within two standard deviations and about 99.7% lie within 3 standard deviations.

This theory can be easily applied using Excel. Using the following data as an example plot sessions against time using a line graph.  Once you have created the chart right click on the line and select Y Erroir Bars tab from the Format Data Series dialogue box. Select "both" from Display choices and "standard deviation" under Error Amount. You can opt for 1 or 2 or however many standard deviations  according to your required confidence level 68%, 95% and 99.7% respectively.

Did Sessions Change Significantly?

The error bars in the chart below display 2 standard deviations so any data points that lie outside of the error bars will be significant at the 95% confidence level. Conclusion the email campaign did significantly effect sessions. Its that easy.

Web analytics data standard deviations indicate a significant increase in sessions