Understand Hypothesis testing with real-life examples

If you are learning or working in the field of Data Science, you should have clear understanding of what are hypothesis testing and p values in statistics. If you learn those two things then understanding any basic machine learning algorithms like linear regression, and logistic regression will be easier for you. In this tutorial I will explain hypothesis testing with real life examples.

Many people have lots of confusion about hypothesis testing and p value. In this tutorial, I will explain you what is hypothesis testing in statistics with a very simple and real life example.

What is Hypothesis

Hypothesis testing is a statistical method to evaluate or investigate the validity of a claim or assumption about something based on sample data.

In simple English, meaning of Hypothesis is guess. Now a guess can be anything like: With a ladder tall enough, I could touch the sky or If I had wings, I could fly to the moon, If I had a time machine, I could travel to any point in history, etc.

Some guesses are just imagination, you can not prove or investigate them with anything. If we can investigate or test any guesses with data, that test is called Hypothesis testing in statistics.

Types of Hypothesis Testing

At this point, we know what is hypothesis testing and its use. Now there are two types of hypotheses:

Null Hypothesis (H₀)
Alternative Hypothesis (H₁)

Null Hypothesis

In general programming language, we denote Null as empty or zero. In statistics, Null hypothesis means default hypothesis, the hypothesis or guess which is already established for a certain question.

Why it is called “Null”? We call it Null Hypothesis because we always want to nullify the existing theory with something else. This is why we perform any test right? If we do not want to prove something new then why should we put effort into doing any new test?

In statistics, Null Hypothesis is denoted as H₀

Alternative Hypothesis

The alternative hypothesis (H₁) is the opposite of the null hypothesis. The alternative hypothesis is the hypothesis that we want to or try to prove.

For example, previously, people believed that the sun is rotating around the Earth until Galileo, an Italian astronomer, proved through his observations that the Earth is actually revolving around the sun. This discovery challenged the widely accepted belief of geocentrism, which people used to believe for centuries.

Here previously believed theory (the Sun is rotating around the Earth) is the Null Hypothesis and the theory Galileo wanted to prove (Earth is rotating around the Sun) is the Alternative Hypothesis. Here the test or observation Galileo perform can be called a Hypothesis testing.

In statistics, Alternative Hypothesis is denoted as H₁

Hypothesis testing example

Let me now give your rela life example where you can implement hypothesis testing as a data science professional. Let’s say for a particular candy manufacturing company, it is believed that a candy machine makes chocolate bars that are an average of 5 grams. That candy manufacturing plant has been making chocolate bars for 10 years now suddenly a worker claims that the machine no longer makes 5 grams of each chocolate bar.

real-life-application-of-hypothesis-testing

So is the worker wrong or the machine is not functioning properly? To answer those questions the owner of that candy company should perform Hypothesis testing.

Here the Null Hypothesis is the statement “The machine is functioning correctly and making each chocolate bar 5 grams” and the alternative hypothesis is the statement of the worker “The machine no longer makes 5 grams of each chocolate bar”.

If we want to write above English text into math then we can write like below:

Pass outcome of Hypothesis testing

There are only two outcomes of hypothesis testing in statistics those are:

Reject Null Hypothesis (H₀) – the machine is producing each chocolate bar with 5g of weight
Fail to reject null hypothesis – In other words, alternative hypothesis (H₁) is proven or new theory is established

Application of Hypothesis Testing

Like the above example, there are so many places we can implement hypothesis testing in real life problems. Let’s also see some other important areas where we can apply hypothesis testing.

Medicine: Hypothesis testing used to determine effectiveness of new treatments or drugs.
Marketing: You can understand the effectiveness of advertising campaigns or new product launches.
Quality Control: Find out whether a manufacturing process is producing products within acceptable quality limits. This is the example I explained above.
Environmental Science: You can test the effects of pollutants on ecosystems using hypothesis testing

Also Read: Page Rank Algorithm and Implementation in python

Terminologies to know

Before calculating hypothesis testing you should know some statistical terminologies, let’s understand those.

Statistical Significance

Statistically Significant is a term used to describe whether a result from a statistical test or analysis is likely to be real or just by chance. It is the point where we can draw the line to make a decision. Again the decision is whether we are rejecting the null hypothesis or not.

Level of Confidence

The Level of Confidence (c) says that, how much confident are we when taking our decision to reject null hypothesis or fail to reject the null hypothesis. Level of confidence is measured as a probability value. Now when you are trying to prove something new (rejecting null hypo), you must have high confidence in your test so that people or statistics can believe in your new theory. Good confidence level is more than 95%.

Level of Significance

Level of Significance is a probability value used in hypothesis testing to determine if the results of a statistical test are statistically significant or not. This level of significance is also known as the p-value, which I will explain in a later post. The equation of LOS is as follows:

Level of Significance = 1 – Level of confidence

Now if for some test, level of confidence = 95%

Then, Level of Significance = 1 – 95 = 0.05

Similarly, if LOC = 99% then LOS = 0.01.

You can see sum of Level of Significance and Level of confidence is equal to 1 (LOS + LOC = 1). The sum should always be 1.

Good level of significance score is less than 0.05 (which is greater than 95% of LOS).

Level of Confidence vs Level of Significance

In general, there is no difference between level of confidence and level of significance. They are both the same thing. Different problems or research papers may use different parameters to indicate significance. But both will tell you the same thing, how sure are you to make any statistical decision?

Degrees of Freedom

In statistics, degrees of freedom (df) is the number of values in a sample that are free to vary after imposing certain restrictions or constraints.

In simple words, Degrees of freedom represents the number of independent observations or data points that can be used to estimate statistical parameters or test hypotheses. The more degrees of freedom available, the more reliable and accurate the statistical analysis is likely to be.

The equation for degrees of freedom is as follows:

DF = number of sample data (n) – 1

Hypothesis Testing Formula

Before we calculate hypothesis testing in hand from scratch, we should know the equation to test hypothesis. There are mainly two ways to calculate hypothesis those are t-test and z-test.

t-test equation

t-test equation is: t = ( x̅ – μ₀ ) / (s / √n)

Where x̅ is sample mean
μ₀ is already proven value (you can say null hypothesis value)
s sample standard deviation
n is the sample size

z-test equation

z-test equation is: Z = ( x̅ – μ0 ) / (σ / √n)

Where x̅ is sample mean
μ₀ is already proven value (you can say null hypothesis value)
σ is the population standard deviation
n is the sample size

Difference between sample SD and population SD

For t-test, I mentioned sample standard deviation and for z-test, I mentioned population standard deviation. Now, what is the difference between sample SD and population SD?

The main difference between sample standard deviation (SD) and population standard deviation (SD) is the data set they are calculated from.

Sample standard deviation (SD) is calculated from a subset of the population data called a sample. A sample is a smaller group of data selected from the entire population for statistical analysis.

On the other hand Population standard deviation (SD) is calculated from the entire population or entire data set.

Z-Test vs. T-Test: When to Use Which?

The z-test and t-test are both statistical tests used to calculate hypothesis testing. The z-test is used when the population standard deviation is known and the sample size is large (typically n > 30).

On the other hand, t-test is used when the population standard deviation is unknown, and the sample size is small (typically n < 30).

Also Read: Install TensorFlow GPU with Jupiter notebook for Windows

In general, if the sample size is large and the population standard deviation is known then the z-test is preferred. If the sample size is small and the population standard deviation is unknown, then the t-test is more appropriate (you can use sample standard deviation).

How to calculate Hypothesis Testing in Excel

So we understand what is hypothesis. Now how to calculate it? or how do you actually determine if you reject the null hypothesis or not? For that, you need to perform some statistical test using some sample data.

Data Collection

Coming back to our main example of a candy-making factory where a worker suddenly made a statement that “the candy making machine is no longer makes 5 grams of chocolate bars”.

Now, this is a big and scary statement for the owner of that factory. So the owner assigned a quality inspector to inspect this.

The quality inspector checked the weight of ten chocolate randomly. I mentioned randomly which means the sample data should be normally distributed. Now the quality inspector can bring this randomness in various ways like he can check chocolate weight at different time of a day or week or check weight while different worker is working etc.

Now let’s say the quality inspector collected following ten random samples:

Candy Number	Candy Weight (in grams)
1	5.2
2	4.8
3	5
4	4.9
5	5.2
6	4.7
7	5
8	4.8
9	4.9
10	5.1

Selecting Test Type

As I explained earlier since the data or samle size is less and we do not know the population standard deviation so we will use T-test to calculate hypothesis for this problem.

Note: To solve almost all real life problem, we use t-test to calculate hypothesis testing because of lack of overall information (unknown population standard deviation).

Calculate required Parameters

So to apply t-test, now we need to calculate sample standard distribution for our example. You can do it easily in Excel. Now in excel, there are mainly two formulas for standard deviation. Those are:

STDEV.S => To calculate standard deviation for sample
STDEV.P +> To calculate population standard deviation

We can find our sample standard deviation value is 0.171 (using STDEV.S method in excel)

An our sample mean (x̅) = (5.2 + 4.8 + 5 + 4.9 + 5.2 + 4.7 + 5 + 4.8 + 4.9 + 5.1) / 10 = 4.96

So let’s now list down all required information to calculate hypothesis testing for our problem (measure likelihood of the worker’s claim). Informations are:

Already proven value (μ₀) = 5 grams
Sample standard deviation (s) = 0.171
Number of samples collected (n) = 10
Degrees of Freedom (DF) = (n-1) = 9
Sample mean (x̅) = 4.96

Assuming that the company’s claim of a chocolate bar weight being 5 grams each is true, We need to prove that:

Either H0: μ=μ0 (null hypothesis is true – fail to reject null hypothesis – worker’s claimed information is not correct)
Or H₁: μ≠μ0 (Rejecting null hypothesis – worker’s claim is correct)

Draw distribution graph

So our sample mean at 4.96 and our sample standard deviation at 0.171. Now if you draw the distribution of the possible values of our sample mean given the population mean is 5 (already proven weight of a chocolate bar which is 5 grams each). Assuming the null hypothesis is true, below would be the distribution:

standard-normal-distribution-gaussian-bell-graph-curve-concept-hypothesis-testing-gauss-distribution

So basically this hypothesis test is really going to be asking how extreme is this value of 4.96 (sample mean x̅) if the null hypothesis is true. Is it too extreme for us to accept null hypothesis or closer to population mean (already proven value 5) to reject null hypothesis? To answer this question we need to calculate a test statistic here using all provided or known informations.

Apply T test

Applying the T-test formula:

t = ( x̅ – μ₀ ) / (s / √n)

t = (4.96 – 5) / (0.171 / √ 10) = -0.04/0.054 = -0.74074

Decision Rule

So we have our T score which is -0.74074. Now the next thing we need to do is to consider our decision rule. Like normal distribution, T distribution is also a bell curve type shape. The only difference is that the T distribution is defined by its number of degrees of freedom. In our example, degrees of freedom (DF) is 9.

So if degree of freedom is 9 and if we want 95% level of confidence or 0.05 level of significance, we can reject null hypothesis if our calculated T score lies in the green region on the negative side.

t-distribution-bell-curve-for-hypothesis-testing-with-005-level-of-significance

Output of Hypothesis Testing

Now from the T Critical values table, we can find that if degrees of freedom = 9 and Level of significance = 0.05, then T score should be greater than -1.833 (we will keep this value negative because our T score is negative) then only we can reject the null hypothesis.

Also Read: Top 8 Machine Learning Algorithms for Classification

Note: This is just a one-tailed test as we are only looking at one direction from our expected value (in the negative direction). If we were looking for both the direction of distribution bell curve then we could call it a two-tailed test.

Below is the T Critical values table for your reference.

T distribution table

Conf. Level	50%	80%	90%	95%	98%	99%
One Tail	0.25	0.1	0.05	0.025	0.01	0.005
Two Tail	0.5	0.2	0.1	0.05	0.02	0.01
df = 1	1	3.078	6.314	12.706	31.821	63.657
2	0.816	1.886	2.92	4.303	6.965	9.925
3	0.765	1.638	2.353	3.182	4.541	5.841
4	0.741	1.533	2.132	2.776	3.747	4.604
5	0.727	1.476	2.015	2.571	3.365	4.032
6	0.718	1.44	1.943	2.447	3.143	3.707
7	0.711	1.415	1.895	2.365	2.998	3.499
8	0.706	1.397	1.86	2.306	2.896	3.355
9	0.703	1.383	1.833	2.262	2.821	3.25
10	0.7	1.372	1.812	2.228	2.764	3.169
11	0.697	1.363	1.796	2.201	2.718	3.106
12	0.695	1.356	1.782	2.179	2.681	3.055
13	0.694	1.35	1.771	2.16	2.65	3.012
14	0.692	1.345	1.761	2.145	2.624	2.977
15	0.691	1.341	1.753	2.131	2.602	2.947
16	0.69	1.337	1.746	2.12	2.583	2.921
17	0.689	1.333	1.74	2.11	2.567	2.898
18	0.688	1.33	1.734	2.101	2.552	2.878
19	0.688	1.328	1.729	2.093	2.539	2.861
20	0.687	1.325	1.725	2.086	2.528	2.845
21	0.686	1.323	1.721	2.08	2.518	2.831
22	0.686	1.321	1.717	2.074	2.508	2.819
23	0.685	1.319	1.714	2.069	2.5	2.807
24	0.685	1.318	1.711	2.064	2.492	2.797
25	0.684	1.316	1.708	2.06	2.485	2.787
26	0.684	1.315	1.706	2.056	2.479	2.779
27	0.684	1.314	1.703	2.052	2.473	2.771
28	0.683	1.313	1.701	2.048	2.467	2.763
29	0.683	1.311	1.699	2.045	2.462	2.756
30	0.683	1.31	1.697	2.042	2.457	2.75
40	0.681	1.303	1.684	2.021	2.423	2.704
50	0.679	1.299	1.676	2.009	2.403	2.678
60	0.679	1.296	1.671	2	2.39	2.66
70	0.678	1.294	1.667	1.994	2.381	2.648
80	0.678	1.292	1.664	1.99	2.374	2.639
90	0.677	1.291	1.662	1.987	2.368	2.632
100	0.677	1.29	1.66	1.984	2.364	2.626
z	0.674	1.282	1.645	1.96	2.326	2.576

The calculated T score for our example is -0.74074 > -1.833, therefore the null hypothesis is true (we failed to reject the null hypothesis). That means the claim by the worker is not true. The candy machine is working properly and it is still producing each chocolate bar around 5 grams.

Graph interpretation

We can see the above interpretation in the below T distribution bell curve:

output-decision-of-hypothesis-testing-calculation-in-hand-and-excel-from-scratch

End Note

In statistics, we should always test a hypothesis to make a proper judgment. Now there are two types of hypotheses. The first one is the Null Hypothesis, which is an already established concept. The second one is the Alternative Hypothesis, which is a new concept that we are trying to establish by proving the Null Hypothesis wrong.

We are making these decisions (rejecting null hypothesis or not) based on Level of confidence or Level of Significance score. Good score for those parameters are more than 95% for LOC or less than 0.05 for LOS.

In this post, I tried to explain Hypothesis testing with easy and real life examples. Please let me know if you have any questions or suggestions regarding this tutorial.

Anindya

Hi there, I’m Anindya Naskar, Data Science Engineer. I created this website to show you what I believe is the best possible way to get your start in the field of Data Science.