We are interested in applied statistics in lean six sigma because we can use it in our project. We have two kinds of applied statistics:

• Descriptive: Used to organize, summarize, and present data;
• Inferential: Used to get a conclusion about a population based on data in a sample.

A useful notion of doing is that we can work on the entire population when we have all the data (maybe are few). Instead, when you don’t know all the people or is too big/too much effort is needed to know, you use just a representative sample of the entire population.

In this chapter, we will focalize on the Descriptive statistic formula. Another chapter can look at descriptive statistics (the graphical analysis) and inferential statistics (like the hypothesis test).

Means

Simple arithmetic mean that you can calculate with the sum of all the values divided by the number of values. It is the most simple measure of central tendency. The math formula is in image1.

You can also use a weighted mean if the value doesn’t have the same importance. In image2, you can see the mat formula.

Another kind of mean is the geometric mean, that you can calculate by:

• Make the product of all the number;
• Elevet to 1/n power that product;

In image3 you can look the math formula.

This kind of mean becomes useful when your number refers to percentage variation.

```Example: you have that in year0 you have an incoming of 100€, then in year1, you have a +100%; year2 +100%; year3 -80%. If you use an arithmetic mean of the percentage you have: (100% + 200% - 90%) / 3 => 70% of average seems that you have gain 70% every year.
But look better at that you have:
Year0 = 100
Year1 = +100% = 200€
Year2 = +200% = 600€
Year3 = -90% = 60€

So you lost 40% of your total instead of gaining 70% per year.

If we applay the Geometric mean we have:
Year1 = +100% = 1
Year2 = +200% = 2;
Year3 = -90% = -0,9

Because we have a negative number, and the formula for work need all positive one, we add one to each number, and we have:
Year1 = 2
Year2 = 3
Year3 = 0,1

So we have that 2*3*0,1 = 0,6.
If we make the 1/3 power, we have = -0,84
If we want to know the loss per year, we only need to make (-0,84-1)*100, and we have -15% that is the yearly loss for each of the three years.```

Mean is an useful tools for summarize parametric variabile

Variance

With the variance, we can have the dispersion of the value of a variable because it’s the distance of each value from the mean. In the image4 you can look at the math formula.

Variance is an useful tool for summarizinge parametric variable

Like variance, in some part, we can look at the Standard Deviation, which is the variance’s square root.

Mode

The mode is the value with a higher number of observations. It’s another measure of central tendency, and it helps summarize all kinds of variables (Parametric, ordinal, Nominal, and categorical).

`Example: if we have a variable for which we observe the value 2 3 3 3 4 5, the mode is 3. If we have yellow blue blue blue red blue, the mode is blue.`

Median

The median is the middle value when all the observations are low to high order. It’s another measure of central tendency, and it helps summarize Parametric and ordinal variables.

`Example: if for a variable we observe 4, 2 and 7, then we order it and we have 2,4 and 7. The median is 4.`

For the exam you need to remember that:

• Mean, Mode, and Median are all measures of central tendency;
• You can apply mode to all kinds of variables; mean is only for the parametric once, and the median is for the parametric and ordinal variable;
• Variance measures the dispersion from the mean;
• You need to remember how to apply the formula.

References:

Share on: