Many statistical tools explained in this guide need a normal distribution for work.
Box-cox helps to transform data from any distribution to a normal distribution. However, remember that Box-Cox only does the transformation and doesn’t test the final result, so after the transformation is essential to do a normality test on the transformed data.


The formula for the Box-Cox transformation is in the image1. This formula works only on a positive value.

image1 - Box cox formula for positive value
image1 – Box cox formula for positive value

In image2, you can find a successive formula adaptation that works even with a negative value.

image2- Box cox formula for positive and negative value
image2- Box cox formula for positive and negative value

Normal probability plot

A way to use the data of a box-cox after their transformation is to plot the transformed data in a regression graph against a therical normal distribution to see if they are normal. It can be possible that they aren’t normal after the first transformation, and maybe more application of the box-cox is needed.

To run a Normal probability plot in excel you need to:

  1. Have a column with all the data;
  2. Have another column where you assign an index to each data;
  3. You use the sort function, with the data and the index function, to get the sort data;
  4. You make another column to apply the box-cox transformation on your sort data;
  5. Add a column with a function of the normal order statistic median of the observed data (*)
  6. Plot the regression graph between the box-cox change and the average order statistic median and look at the shape and the R-squared.

So in excel you have something like the image3:

image3 - Excel data and transformation
image3 – Excel data and transformation

And if you plot the graph of X and Z, and transformed data and Z you have the result in image4:

image4 - Box-Cox regression graph pre and post the transformation
image4 – Box-Cox regression graph pre and post the transformation

(*) What’s the normal order statistics median?

In formula you have:

  • Ni = G(Ui), where U is the uniform order statistics medians is approximated as:
    • Ui = 1 – Un    for i = 1
    • Ui = (i – 0.3175)/(n + 0.365)    for i = 2, 3, …, n-1
    • Ui = 0.5(1/n)    for i = n
  • G is the normal percent point function that is the inverse of the normal cumulative distribution. You can easly calculate it with the respective excel formula
Share on: