Before defining probability distribution and what a normal distribution is, we need to define some other thing.

In probability, we have:

  • Experiment: what we want to test;
  • Sample space: the set of the possible outcome;
  • Event: a subset of the space;
  • The probability function: is the function that will give you the probability of a certain event. 

The basic probability can be calculated by dividing the number of events in the space by the total number of events in the sample space.

Experiment1: roll of six on a 6 face dice;
Sample space1: {1, 2, 3, 4, 5, 6};
So P("roll 6") = 1/6 that is 0,16

Experiment2: roll of six on two 6 face dices ;
Sample space1: {(1,1),(1,2)..(1,6),(2,1),(2,2)..(2,6),..(6,6)} that is 36 in number;

So P("roll 6") = 5/36 = 0,14 becase all the combination that make six rolling two six face dices are: (1,5),(2,4),(3,3), (4,2), and (5,1)

But how to use the probability in Lean six sigma?

First, you need to know that you can’t analyze all the output of a process in most cases, but you need to work on a representative sample of it. When you have this semple you can:

  • Create descriptive statistics to study and better understand the nature of data;
  • Use the correct probability distribution to determine the probability of an event;
  • The probability of an event help to assess some question (like the number of non-conforming unit in a process) without analyzing the entire population;

We will look that the probability distribution will be helpful for the other chapter, for example, for the hypothesis testing.

This chapter is about normal distribution (also called bell distribution for its shape or gaussian distribution) and checking if a probability distribution is normal. For other distributions, you can read chapter 3.1.2 Classes of Distributions.

In image1 you can look at how a normal distribution appear if we plot on histogram.

Image1: Normal distribution histogram
Image1: Normal distribution histogram

You can check the normality of this distribution by looking at these few points on the graph:

  • You have the max value in the middle of the distribution. This value is also the mean.
  • From this max value, the data decrease.
  • In a perfect normal distribution, the two sides of the mean are symmetric, but in reality, it’s ok even if it’s not perfectly symmetric (linke in image1).

You can also test the normality of a di distribution with:

  • Chi-square goodness of fit hypothesis test explained in chapter 3.5.8 is only for categorical variable;
  • Normal Probability plot that is the graph of the linear regression between your data and a normal distribution;
  • Anderson Darling goodness of fit hypothesis test, that is based on the statistic on the image1.1. To know how to use it, I suggest studying chapters 3.5.x.
image1.1 - Anderson Darling goodness of fit statistic
image1.1 – Anderson Darling goodness of fit statistic

Other essential factor of the normal distribution are:

  • It can have various bell forms because we can appear differently depending on the standard deviation and the mean of the value.
  • In a perfect one, the 68 percent of data falls plus or minus one standard deviation from the mean (in example 1 we have the mean of 10, and standard deviation of 2, and we look that this area is from 8 to 12). The 95 percent of data falls plus or minuse two standard deviation from the mean. 99.7 percent of data falls plus or minuse three standard deviafion from the mena.

We can look ath the measure of the second point directly in the image2.

Image2: Normal distribution histogram  with standard deviation
Image2: Normal distribution histogram with standard deviation

But what if we want to know how many percentages of observation are lower than a specific number, for example, 6? We can use the formula of the Z-score in the image3.

Image3: Z-Score formula
Image3: Z-Score formula

So in our case, we have (6-10)/2, that is -2. With this value, we need to look at the Z-Score table (you can find a loot just by googling), and it says that it is 0,02275, so an estimate of the 2,275% of the value. They fall on the left side of the graph because of the negative Z-Score (positive Z-score value falls on the right side).

Important to know is that kind of distribution works only with parametric variable.

For the exam you need to remember that

  • Probability says how many times an event can occur given a sample space;
  • The normal distribution is a probability distribution that helps you to find the probability of an event, but the data need to come out from this kind of distribution;
  • To assess if you have a normal distribution, with a histogram analysis, you need to have in the middle the higher value, and from the middle, the value needs to decrease;
  • The Z-score tells you how many percentage of value falls under same value.
Share on: