Many statistical tools explained in this guide need a normal distribution for work.
Box-cox helps to transform data from any distribution to a normal distribution. However, remember that Box-Cox only does the transformation and doesn’t test the final result, so after the transformation is essential to do a normality test on the transformed data.

The formula for the Box-Cox transformation is in the image1. This formula works only on a positive value.

In image2, you can find a successive formula adaptation that works even with a negative value. image2- Box cox formula for positive and negative value

Normal probability plot

A way to use the data of a box-cox after their transformation is to plot the transformed data in a regression graph against a therical normal distribution to see if they are normal. It can be possible that they aren’t normal after the first transformation, and maybe more application of the box-cox is needed.

To run a Normal probability plot in excel you need to:

1. Have a column with all the data;
2. Have another column where you assign an index to each data;
3. You use the sort function, with the data and the index function, to get the sort data;
4. You make another column to apply the box-cox transformation on your sort data;
5. Add a column with a function of the normal order statistic median of the observed data (*)
6. Plot the regression graph between the box-cox change and the average order statistic median and look at the shape and the R-squared.

So in excel you have something like the image3:

And if you plot the graph of X and Z, and transformed data and Z you have the result in image4: image4 – Box-Cox regression graph pre and post the transformation

(*) What’s the normal order statistics median?

In formula you have:

• Ni = G(Ui), where U is the uniform order statistics medians is approximated as:
• Ui = 1 – Un    for i = 1
• Ui = (i – 0.3175)/(n + 0.365)    for i = 2, 3, …, n-1
• Ui = 0.5(1/n)    for i = n
• G is the normal percent point function that is the inverse of the normal cumulative distribution. You can easly calculate it with the respective excel formula
Share on: