what is the use of hypothesis testing in research

How it works

Hypothesis Testing – A Complete Guide with Examples

Published by Alvin Nicolas at August 14th, 2021 , Revised On October 26, 2023

In statistics, hypothesis testing is a critical tool. It allows us to make informed decisions about populations based on sample data. Whether you are a researcher trying to prove a scientific point, a marketer analysing A/B test results, or a manufacturer ensuring quality control, hypothesis testing plays a pivotal role. This guide aims to introduce you to the concept and walk you through real-world examples.

What is a Hypothesis and a Hypothesis Testing?

A hypothesis is considered a belief or assumption that has to be accepted, rejected, proved or disproved. In contrast, a research hypothesis is a research question for a researcher that has to be proven correct or incorrect through investigation.

What is Hypothesis Testing?

Hypothesis testing is a scientific method used for making a decision and drawing conclusions by using a statistical approach. It is used to suggest new ideas by testing theories to know whether or not the sample data supports research. A research hypothesis is a predictive statement that has to be tested using scientific methods that join an independent variable to a dependent variable.

Example: The academic performance of student A is better than student B

Characteristics of the Hypothesis to be Tested

A hypothesis should be:

Clear and precise
Capable of being tested
Able to relate to a variable
Stated in simple terms
Consistent with known facts
Limited in scope and specific
Tested in a limited timeframe
Explain the facts in detail

What is a Null Hypothesis and Alternative Hypothesis?

A null hypothesis is a hypothesis when there is no significant relationship between the dependent and the participants’ independent variables .

In simple words, it’s a hypothesis that has been put forth but hasn’t been proved as yet. A researcher aims to disprove the theory. The abbreviation “Ho” is used to denote a null hypothesis.

If you want to compare two methods and assume that both methods are equally good, this assumption is considered the null hypothesis.

Example: In an automobile trial, you feel that the new vehicle’s mileage is similar to the previous model of the car, on average. You can write it as: Ho: there is no difference between the mileage of both vehicles. If your findings don’t support your hypothesis and you get opposite results, this outcome will be considered an alternative hypothesis.

If you assume that one method is better than another method, then it’s considered an alternative hypothesis. The alternative hypothesis is the theory that a researcher seeks to prove and is typically denoted by H1 or HA.

If you support a null hypothesis, it means you’re not supporting the alternative hypothesis. Similarly, if you reject a null hypothesis, it means you are recommending the alternative hypothesis.

Example: In an automobile trial, you feel that the new vehicle’s mileage is better than the previous model of the vehicle. You can write it as; Ha: the two vehicles have different mileage. On average/ the fuel consumption of the new vehicle model is better than the previous model.

If a null hypothesis is rejected during the hypothesis test, even if it’s true, then it is considered as a type-I error. On the other hand, if you don’t dismiss a hypothesis, even if it’s false because you could not identify its falseness, it’s considered a type-II error.

Hire an Expert Researcher

Orders completed by our expert writers are

Formally drafted in academic style
100% Plagiarism free & 100% Confidential
Never resold
Include unlimited free revisions
Completed to match exact client requirements

How to Conduct Hypothesis Testing?

Here is a step-by-step guide on how to conduct hypothesis testing.

Step 1: State the Null and Alternative Hypothesis

Once you develop a research hypothesis, it’s important to state it is as a Null hypothesis (Ho) and an Alternative hypothesis (Ha) to test it statistically.

A null hypothesis is a preferred choice as it provides the opportunity to test the theory. In contrast, you can accept the alternative hypothesis when the null hypothesis has been rejected.

Example: You want to identify a relationship between obesity of men and women and the modern living style. You develop a hypothesis that women, on average, gain weight quickly compared to men. Then you write it as: Ho: Women, on average, don’t gain weight quickly compared to men. Ha: Women, on average, gain weight quickly compared to men.

Step 2: Data Collection

Hypothesis testing follows the statistical method, and statistics are all about data. It’s challenging to gather complete information about a specific population you want to study. You need to gather the data obtained through a large number of samples from a specific population.

Example: Suppose you want to test the difference in the rate of obesity between men and women. You should include an equal number of men and women in your sample. Then investigate various aspects such as their lifestyle, eating patterns and profession, and any other variables that may influence average weight. You should also determine your study’s scope, whether it applies to a specific group of population or worldwide population. You can use available information from various places, countries, and regions.

Step 3: Select Appropriate Statistical Test

There are many types of statistical tests , but we discuss the most two common types below, such as One-sided and two-sided tests.

Note: Your choice of the type of test depends on the purpose of your study

One-sided Test

In the one-sided test, the values of rejecting a null hypothesis are located in one tail of the probability distribution. The set of values is less or higher than the critical value of the test. It is also called a one-tailed test of significance.

Example: If you want to test that all mangoes in a basket are ripe. You can write it as: Ho: All mangoes in the basket, on average, are ripe. If you find all ripe mangoes in the basket, the null hypothesis you developed will be true.

Two-sided Test

In the two-sided test, the values of rejecting a null hypothesis are located on both tails of the probability distribution. The set of values is less or higher than the first critical value of the test and higher than the second critical value test. It is also called a two-tailed test of significance.

Example: Nothing can be explicitly said whether all mangoes are ripe in the basket. If you reject the null hypothesis (Ho: All mangoes in the basket, on average, are ripe), then it means all mangoes in the basket are not likely to be ripe. A few mangoes could be raw as well.

Get statistical analysis help at an affordable price

An expert statistician will complete your work
Rigorous quality checks
Confidentiality and reliability
Any statistical software of your choice
Free Plagiarism Report

Step 4: Select the Level of Significance

When you reject a null hypothesis, even if it’s true during a statistical hypothesis, it is considered the significance level . It is the probability of a type one error. The significance should be as minimum as possible to avoid the type-I error, which is considered severe and should be avoided.

If the significance level is minimum, then it prevents the researchers from false claims.

The significance level is denoted by P, and it has given the value of 0.05 (P=0.05)

If the P-Value is less than 0.05, then the difference will be significant. If the P-value is higher than 0.05, then the difference is non-significant.

Example: Suppose you apply a one-sided test to test whether women gain weight quickly compared to men. You get to know about the average weight between men and women and the factors promoting weight gain.

Step 5: Find out Whether the Null Hypothesis is Rejected or Supported

After conducting a statistical test, you should identify whether your null hypothesis is rejected or accepted based on the test results. It would help if you observed the P-value for this.

Example: If you find the P-value of your test is less than 0.5/5%, then you need to reject your null hypothesis (Ho: Women, on average, don’t gain weight quickly compared to men). On the other hand, if a null hypothesis is rejected, then it means the alternative hypothesis might be true (Ha: Women, on average, gain weight quickly compared to men. If you find your test’s P-value is above 0.5/5%, then it means your null hypothesis is true.

Step 6: Present the Outcomes of your Study

The final step is to present the outcomes of your study . You need to ensure whether you have met the objectives of your research or not.

In the discussion section and conclusion , you can present your findings by using supporting evidence and conclude whether your null hypothesis was rejected or supported.

In the result section, you can summarise your study’s outcomes, including the average difference and P-value of the two groups.

If we talk about the findings, our study your results will be as follows:

Example: In the study of identifying whether women gain weight quickly compared to men, we found the P-value is less than 0.5. Hence, we can reject the null hypothesis (Ho: Women, on average, don’t gain weight quickly than men) and conclude that women may likely gain weight quickly than men.

Did you know in your academic paper you should not mention whether you have accepted or rejected the null hypothesis?

Always remember that you either conclude to reject Ho in favor of Haor do not reject Ho . It would help if you never rejected Ha or even accept Ha .

Suppose your null hypothesis is rejected in the hypothesis testing. If you conclude reject Ho in favor of Haor do not reject Ho, then it doesn’t mean that the null hypothesis is true. It only means that there is a lack of evidence against Ho in favour of Ha. If your null hypothesis is not true, then the alternative hypothesis is likely to be true.

Example: We found that the P-value is less than 0.5. Hence, we can conclude reject Ho in favour of Ha (Ho: Women, on average, don’t gain weight quickly than men) reject Ho in favour of Ha. However, rejected in favour of Ha means (Ha: women may likely to gain weight quickly than men)

Frequently Asked Questions

What are the 3 types of hypothesis test.

The 3 types of hypothesis tests are:

One-Sample Test : Compare sample data to a known population value.
Two-Sample Test : Compare means between two sample groups.
ANOVA : Analyze variance among multiple groups to determine significant differences.

What is a hypothesis?

A hypothesis is a proposed explanation or prediction about a phenomenon, often based on observations. It serves as a starting point for research or experimentation, providing a testable statement that can either be supported or refuted through data and analysis. In essence, it’s an educated guess that drives scientific inquiry.

What are null hypothesis?

A null hypothesis (often denoted as H0) suggests that there is no effect or difference in a study or experiment. It represents a default position or status quo. Statistical tests evaluate data to determine if there’s enough evidence to reject this null hypothesis.

What is the probability value?

The probability value, or p-value, is a measure used in statistics to determine the significance of an observed effect. It indicates the probability of obtaining the observed results, or more extreme, if the null hypothesis were true. A small p-value (typically <0.05) suggests evidence against the null hypothesis, warranting its rejection.

What is p value?

The p-value is a fundamental concept in statistical hypothesis testing. It represents the probability of observing a test statistic as extreme, or more so, than the one calculated from sample data, assuming the null hypothesis is true. A low p-value suggests evidence against the null, possibly justifying its rejection.

What is a t test?

A t-test is a statistical test used to compare the means of two groups. It determines if observed differences between the groups are statistically significant or if they likely occurred by chance. Commonly applied in research, there are different t-tests, including independent, paired, and one-sample, tailored to various data scenarios.

When to reject null hypothesis?

Reject the null hypothesis when the test statistic falls into a predefined rejection region or when the p-value is less than the chosen significance level (commonly 0.05). This suggests that the observed data is unlikely under the null hypothesis, indicating evidence for the alternative hypothesis. Always consider the study’s context.

Hypothesis Testing

When you conduct a piece of quantitative research, you are inevitably attempting to answer a research question or hypothesis that you have set. One method of evaluating this research question is via a process called hypothesis testing , which is sometimes also referred to as significance testing . Since there are many facets to hypothesis testing, we start with the example we refer to throughout this guide.

An example of a lecturer's dilemma

Two statistics lecturers, Sarah and Mike, think that they use the best method to teach their students. Each lecturer has 50 statistics students who are studying a graduate degree in management. In Sarah's class, students have to attend one lecture and one seminar class every week, whilst in Mike's class students only have to attend one lecture. Sarah thinks that seminars, in addition to lectures, are an important teaching method in statistics, whilst Mike believes that lectures are sufficient by themselves and thinks that students are better off solving problems by themselves in their own time. This is the first year that Sarah has given seminars, but since they take up a lot of her time, she wants to make sure that she is not wasting her time and that seminars improve her students' performance.

The research hypothesis

The first step in hypothesis testing is to set a research hypothesis. In Sarah and Mike's study, the aim is to examine the effect that two different teaching methods – providing both lectures and seminar classes (Sarah), and providing lectures by themselves (Mike) – had on the performance of Sarah's 50 students and Mike's 50 students. More specifically, they want to determine whether performance is different between the two different teaching methods. Whilst Mike is skeptical about the effectiveness of seminars, Sarah clearly believes that giving seminars in addition to lectures helps her students do better than those in Mike's class. This leads to the following research hypothesis:

Research Hypothesis:

When students attend seminar classes, in addition to lectures, their performance increases.

Before moving onto the second step of the hypothesis testing process, we need to take you on a brief detour to explain why you need to run hypothesis testing at all. This is explained next.

Sample to population

If you have measured individuals (or any other type of "object") in a study and want to understand differences (or any other type of effect), you can simply summarize the data you have collected. For example, if Sarah and Mike wanted to know which teaching method was the best, they could simply compare the performance achieved by the two groups of students – the group of students that took lectures and seminar classes, and the group of students that took lectures by themselves – and conclude that the best method was the teaching method which resulted in the highest performance. However, this is generally of only limited appeal because the conclusions could only apply to students in this study. However, if those students were representative of all statistics students on a graduate management degree, the study would have wider appeal.

In statistics terminology, the students in the study are the sample and the larger group they represent (i.e., all statistics students on a graduate management degree) is called the population . Given that the sample of statistics students in the study are representative of a larger population of statistics students, you can use hypothesis testing to understand whether any differences or effects discovered in the study exist in the population. In layman's terms, hypothesis testing is used to establish whether a research hypothesis extends beyond those individuals examined in a single study.

Another example could be taking a sample of 200 breast cancer sufferers in order to test a new drug that is designed to eradicate this type of cancer. As much as you are interested in helping these specific 200 cancer sufferers, your real goal is to establish that the drug works in the population (i.e., all breast cancer sufferers).

As such, by taking a hypothesis testing approach, Sarah and Mike want to generalize their results to a population rather than just the students in their sample. However, in order to use hypothesis testing, you need to re-state your research hypothesis as a null and alternative hypothesis. Before you can do this, it is best to consider the process/structure involved in hypothesis testing and what you are measuring. This structure is presented on the next page .

Hypothesis Testing: Definition, Uses, Limitations + Examples

Hypothesis testing is as old as the scientific method and is at the heart of the research process.

Research exists to validate or disprove assumptions about various phenomena. The process of validation involves testing and it is in this context that we will explore hypothesis testing.

What is a Hypothesis?

A hypothesis is a calculated prediction or assumption about a population parameter based on limited evidence. The whole idea behind hypothesis formulation is testing—this means the researcher subjects his or her calculated assumption to a series of evaluations to know whether they are true or false.

Typically, every research starts with a hypothesis—the investigator makes a claim and experiments to prove that this claim is true or false . For instance, if you predict that students who drink milk before class perform better than those who don’t, then this becomes a hypothesis that can be confirmed or refuted using an experiment.

Read: What is Empirical Research Study? [Examples & Method]

What are the Types of Hypotheses?

1. simple hypothesis.

Also known as a basic hypothesis, a simple hypothesis suggests that an independent variable is responsible for a corresponding dependent variable. In other words, an occurrence of the independent variable inevitably leads to an occurrence of the dependent variable.

Typically, simple hypotheses are considered as generally true, and they establish a causal relationship between two variables.

Examples of Simple Hypothesis

Drinking soda and other sugary drinks can cause obesity.
Smoking cigarettes daily leads to lung cancer.

2. Complex Hypothesis

A complex hypothesis is also known as a modal. It accounts for the causal relationship between two independent variables and the resulting dependent variables. This means that the combination of the independent variables leads to the occurrence of the dependent variables .

Examples of Complex Hypotheses

Adults who do not smoke and drink are less likely to develop liver-related conditions.
Global warming causes icebergs to melt which in turn causes major changes in weather patterns.

3. Null Hypothesis

As the name suggests, a null hypothesis is formed when a researcher suspects that there’s no relationship between the variables in an observation. In this case, the purpose of the research is to approve or disapprove this assumption.

Examples of Null Hypothesis

This is no significant change in a student’s performance if they drink coffee or tea before classes.
There’s no significant change in the growth of a plant if one uses distilled water only or vitamin-rich water.

Read: Research Report: Definition, Types + [Writing Guide]

4. Alternative Hypothesis

To disapprove a null hypothesis, the researcher has to come up with an opposite assumption—this assumption is known as the alternative hypothesis. This means if the null hypothesis says that A is false, the alternative hypothesis assumes that A is true.

An alternative hypothesis can be directional or non-directional depending on the direction of the difference. A directional alternative hypothesis specifies the direction of the tested relationship, stating that one variable is predicted to be larger or smaller than the null value while a non-directional hypothesis only validates the existence of a difference without stating its direction.

Examples of Alternative Hypotheses

Starting your day with a cup of tea instead of a cup of coffee can make you more alert in the morning.
The growth of a plant improves significantly when it receives distilled water instead of vitamin-rich water.

5. Logical Hypothesis

Logical hypotheses are some of the most common types of calculated assumptions in systematic investigations. It is an attempt to use your reasoning to connect different pieces in research and build a theory using little evidence. In this case, the researcher uses any data available to him, to form a plausible assumption that can be tested.

Examples of Logical Hypothesis

Waking up early helps you to have a more productive day.
Beings from Mars would not be able to breathe the air in the atmosphere of the Earth.

6. Empirical Hypothesis

After forming a logical hypothesis, the next step is to create an empirical or working hypothesis. At this stage, your logical hypothesis undergoes systematic testing to prove or disprove the assumption. An empirical hypothesis is subject to several variables that can trigger changes and lead to specific outcomes.

Examples of Empirical Testing

People who eat more fish run faster than people who eat meat.
Women taking vitamin E grow hair faster than those taking vitamin K.

7. Statistical Hypothesis

When forming a statistical hypothesis, the researcher examines the portion of a population of interest and makes a calculated assumption based on the data from this sample. A statistical hypothesis is most common with systematic investigations involving a large target audience. Here, it’s impossible to collect responses from every member of the population so you have to depend on data from your sample and extrapolate the results to the wider population.

Examples of Statistical Hypothesis

45% of students in Louisiana have middle-income parents.
80% of the UK’s population gets a divorce because of irreconcilable differences.

What is Hypothesis Testing?

Hypothesis testing is an assessment method that allows researchers to determine the plausibility of a hypothesis. It involves testing an assumption about a specific population parameter to know whether it’s true or false. These population parameters include variance, standard deviation, and median.

Typically, hypothesis testing starts with developing a null hypothesis and then performing several tests that support or reject the null hypothesis. The researcher uses test statistics to compare the association or relationship between two or more variables.

Explore: Research Bias: Definition, Types + Examples

Researchers also use hypothesis testing to calculate the coefficient of variation and determine if the regression relationship and the correlation coefficient are statistically significant.

How Hypothesis Testing Works

The basis of hypothesis testing is to examine and analyze the null hypothesis and alternative hypothesis to know which one is the most plausible assumption. Since both assumptions are mutually exclusive, only one can be true. In other words, the occurrence of a null hypothesis destroys the chances of the alternative coming to life, and vice-versa.

Interesting: 21 Chrome Extensions for Academic Researchers in 2021

What Are The Stages of Hypothesis Testing?

To successfully confirm or refute an assumption, the researcher goes through five (5) stages of hypothesis testing;

Determine the null hypothesis
Specify the alternative hypothesis
Set the significance level
Calculate the test statistics and corresponding P-value
Draw your conclusion
Determine the Null Hypothesis

Like we mentioned earlier, hypothesis testing starts with creating a null hypothesis which stands as an assumption that a certain statement is false or implausible. For example, the null hypothesis (H0) could suggest that different subgroups in the research population react to a variable in the same way.

Specify the Alternative Hypothesis

Once you know the variables for the null hypothesis, the next step is to determine the alternative hypothesis. The alternative hypothesis counters the null assumption by suggesting the statement or assertion is true. Depending on the purpose of your research, the alternative hypothesis can be one-sided or two-sided.

Using the example we established earlier, the alternative hypothesis may argue that the different sub-groups react differently to the same variable based on several internal and external factors.

Set the Significance Level

Many researchers create a 5% allowance for accepting the value of an alternative hypothesis, even if the value is untrue. This means that there is a 0.05 chance that one would go with the value of the alternative hypothesis, despite the truth of the null hypothesis.

Something to note here is that the smaller the significance level, the greater the burden of proof needed to reject the null hypothesis and support the alternative hypothesis.

Explore: What is Data Interpretation? + [Types, Method & Tools]

Calculate the Test Statistics and Corresponding P-Value

Test statistics in hypothesis testing allow you to compare different groups between variables while the p-value accounts for the probability of obtaining sample statistics if your null hypothesis is true. In this case, your test statistics can be the mean, median and similar parameters.

If your p-value is 0.65, for example, then it means that the variable in your hypothesis will happen 65 in100 times by pure chance. Use this formula to determine the p-value for your data:

what is the use of hypothesis testing in research

Draw Your Conclusions

After conducting a series of tests, you should be able to agree or refute the hypothesis based on feedback and insights from your sample data.

Applications of Hypothesis Testing in Research

Hypothesis testing isn’t only confined to numbers and calculations; it also has several real-life applications in business, manufacturing, advertising, and medicine.

In a factory or other manufacturing plants, hypothesis testing is an important part of quality and production control before the final products are approved and sent out to the consumer.

During ideation and strategy development, C-level executives use hypothesis testing to evaluate their theories and assumptions before any form of implementation. For example, they could leverage hypothesis testing to determine whether or not some new advertising campaign, marketing technique, etc. causes increased sales.

In addition, hypothesis testing is used during clinical trials to prove the efficacy of a drug or new medical method before its approval for widespread human usage.

What is an Example of Hypothesis Testing?

An employer claims that her workers are of above-average intelligence. She takes a random sample of 20 of them and gets the following results:

Mean IQ Scores: 110

Standard Deviation: 15

Mean Population IQ: 100

Step 1: Using the value of the mean population IQ, we establish the null hypothesis as 100.

Step 2: State that the alternative hypothesis is greater than 100.

Step 3: State the alpha level as 0.05 or 5%

Step 4: Find the rejection region area (given by your alpha level above) from the z-table. An area of .05 is equal to a z-score of 1.645.

Step 5: Calculate the test statistics using this formula

Z = (110–100) ÷ (15÷√20)

10 ÷ 3.35 = 2.99

If the value of the test statistics is higher than the value of the rejection region, then you should reject the null hypothesis. If it is less, then you cannot reject the null.

In this case, 2.99 > 1.645 so we reject the null.

Importance/Benefits of Hypothesis Testing

The most significant benefit of hypothesis testing is it allows you to evaluate the strength of your claim or assumption before implementing it in your data set. Also, hypothesis testing is the only valid method to prove that something “is or is not”. Other benefits include:

Hypothesis testing provides a reliable framework for making any data decisions for your population of interest.
It helps the researcher to successfully extrapolate data from the sample to the larger population.
Hypothesis testing allows the researcher to determine whether the data from the sample is statistically significant.
Hypothesis testing is one of the most important processes for measuring the validity and reliability of outcomes in any systematic investigation.
It helps to provide links to the underlying theory and specific research questions.

Criticism and Limitations of Hypothesis Testing

Several limitations of hypothesis testing can affect the quality of data you get from this process. Some of these limitations include:

The interpretation of a p-value for observation depends on the stopping rule and definition of multiple comparisons. This makes it difficult to calculate since the stopping rule is subject to numerous interpretations, plus “multiple comparisons” are unavoidably ambiguous.
Conceptual issues often arise in hypothesis testing, especially if the researcher merges Fisher and Neyman-Pearson’s methods which are conceptually distinct.
In an attempt to focus on the statistical significance of the data, the researcher might ignore the estimation and confirmation by repeated experiments.
Hypothesis testing can trigger publication bias, especially when it requires statistical significance as a criterion for publication.
When used to detect whether a difference exists between groups, hypothesis testing can trigger absurd assumptions that affect the reliability of your observation.

Connect to Formplus, Get Started Now - It's Free!

alternative hypothesis
alternative vs null hypothesis
complex hypothesis
empirical hypothesis
hypothesis testing
logical hypothesis
simple hypothesis
statistical hypothesis
busayo.longe

Type I vs Type II Errors: Causes, Examples & Prevention

This article will discuss the two different types of errors in hypothesis testing and how you can prevent them from occurring in your research

Internal Validity in Research: Definition, Threats, Examples

In this article, we will discuss the concept of internal validity, some clear examples, its importance, and how to test it.

What is Pure or Basic Research? + [Examples & Method]

Simple guide on pure or basic research, its methods, characteristics, advantages, and examples in science, medicine, education and psychology

Alternative vs Null Hypothesis: Pros, Cons, Uses & Examples

We are going to discuss alternative hypotheses and null hypotheses in this post and how they work in research.

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

Frequently asked questions

What is hypothesis testing.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Frequently asked questions: Methodology

Attrition refers to participants leaving a study. It always happens to some extent—for example, in randomized controlled trials for medical research.

Differential attrition occurs when attrition or dropout rates differ systematically between the intervention and the control group . As a result, the characteristics of the participants who drop out differ from the characteristics of those who stay in the study. Because of this, study results may be biased .

Action research is conducted in order to solve a particular issue immediately, while case studies are often conducted over a longer period of time and focus more on observing and analyzing a particular ongoing phenomenon.

Action research is focused on solving a problem or informing individual and community-based knowledge in a way that impacts teaching, learning, and other related processes. It is less focused on contributing theoretical input, instead producing actionable input.

Action research is particularly popular with educators as a form of systematic inquiry because it prioritizes reflection and bridges the gap between theory and practice. Educators are able to simultaneously investigate an issue as they solve it, and the method is very iterative and flexible.

A cycle of inquiry is another name for action research . It is usually visualized in a spiral shape following a series of steps, such as “planning → acting → observing → reflecting.”

To make quantitative observations , you need to use instruments that are capable of measuring the quantity you want to observe. For example, you might use a ruler to measure the length of an object or a thermometer to measure its temperature.

Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.

While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.

Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.

Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.

Convergent validity indicates whether a test that is designed to measure a particular construct correlates with other tests that assess the same or similar construct.
Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related. This type of validity is also called divergent validity .

You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.

Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related

Content validity shows you how accurately a test or other measurement method taps into the various aspects of the specific construct you are researching.

In other words, it helps you answer the question: “does the test measure all aspects of the construct I want to measure?” If it does, then the test has high content validity.

The higher the content validity, the more accurate the measurement of the construct.

If the test fails to include parts of the construct, or irrelevant parts are included, the validity of the instrument is threatened, which brings your results into question.

Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.

When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.

For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).

On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.

A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.

Snowball sampling is a non-probability sampling method . Unlike probability sampling (which involves some form of random selection ), the initial individuals selected to be studied are the ones who recruit new participants.

Because not every member of the target population has an equal chance of being recruited into the sample, selection in snowball sampling is non-random.

Snowball sampling is a non-probability sampling method , where there is not an equal chance for every member of the population to be included in the sample .

This means that you cannot use inferential statistics and make generalizations —often the goal of quantitative research . As such, a snowball sample is not representative of the target population and is usually a better fit for qualitative research .

Snowball sampling relies on the use of referrals. Here, the researcher recruits one or more initial participants, who then recruit the next ones.

Participants share similar characteristics and/or know each other. Because of this, not every member of the population has an equal chance of being included in the sample, giving rise to sampling bias .

Snowball sampling is best used in the following cases:

If there is no sampling frame available (e.g., people with a rare disease)
If the population of interest is hard to access or locate (e.g., people experiencing homelessness)
If the research focuses on a sensitive topic (e.g., extramarital affairs)

The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.

Reproducibility and replicability are related terms.

Reproducing research entails reanalyzing the existing data in the same manner.
Replicating (or repeating ) the research entails reconducting the entire analysis, including the collection of new data .
A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
A successful replication shows that the reliability of the results is high.

Stratified sampling and quota sampling both involve dividing the population into subgroups and selecting units from each subgroup. The purpose in both cases is to select a representative sample and/or to allow comparisons between subgroups.

The main difference is that in stratified sampling, you draw a random sample from each subgroup ( probability sampling ). In quota sampling you select a predetermined number or proportion of units, in a non-random manner ( non-probability sampling ).

Purposive and convenience sampling are both sampling methods that are typically used in qualitative data collection.

A convenience sample is drawn from a source that is conveniently accessible to the researcher. Convenience sampling does not distinguish characteristics among the participants. On the other hand, purposive sampling focuses on selecting participants possessing characteristics associated with the research study.

The findings of studies based on either convenience or purposive sampling can only be generalized to the (sub)population from which the sample is drawn, and not to the entire population.

Random sampling or probability sampling is based on random selection. This means that each unit has an equal chance (i.e., equal probability) of being included in the sample.

On the other hand, convenience sampling involves stopping people at random, which means that not everyone has an equal chance of being selected depending on the place, time, or day you are collecting your data.

Convenience sampling and quota sampling are both non-probability sampling methods. They both use non-random criteria like availability, geographical proximity, or expert knowledge to recruit study participants.

However, in convenience sampling, you continue to sample units or cases until you reach the required sample size.

In quota sampling, you first need to divide your population of interest into subgroups (strata) and estimate their proportions (quota) in the population. Then you can start your data collection, using convenience sampling to recruit participants, until the proportions in each subgroup coincide with the estimated proportions in the population.

A sampling frame is a list of every member in the entire population . It is important that the sampling frame is as complete as possible, so that your sample accurately reflects your population.

Stratified and cluster sampling may look similar, but bear in mind that groups created in cluster sampling are heterogeneous , so the individual characteristics in the cluster vary. In contrast, groups created in stratified sampling are homogeneous , as units share characteristics.

Relatedly, in cluster sampling you randomly select entire groups and include all units of each group in your sample. However, in stratified sampling, you select some units of all groups and include them in your sample. In this way, both methods can ensure that your sample is representative of the target population .

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

An observational study is a great choice for you if your research question is based purely on observations. If there are ethical, logistical, or practical concerns that prevent you from conducting a traditional experiment , an observational study may be a good choice. In an observational study, there is no interference or manipulation of the research subjects, as well as no control or treatment groups .

It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.

While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.

Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.

Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.

Face validity is about whether a test appears to measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing only on the surface.

Statistical analyses are often applied to test validity with data from your measures. You test convergent validity and discriminant validity with correlations to see if results from your test are positively or negatively related to those of other established tests.

You can also use regression analyses to assess whether your measure is actually predictive of outcomes that you expect it to predict theoretically. A regression analysis that supports your expectations strengthens your claim of construct validity .

When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.

Construct validity is often considered the overarching type of measurement validity , because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.

Construct validity is about how well a test measures the concept it was designed to evaluate. It’s one of four types of measurement validity , which includes construct validity, face validity , and criterion validity.

There are two subtypes of construct validity.

Convergent validity : The extent to which your measure corresponds to measures of related constructs
Discriminant validity : The extent to which your measure is unrelated or negatively related to measures of distinct constructs

Naturalistic observation is a valuable tool because of its flexibility, external validity , and suitability for topics that can’t be studied in a lab setting.

The downsides of naturalistic observation include its lack of scientific control , ethical considerations , and potential for bias from observers and subjects.

Naturalistic observation is a qualitative research method where you record the behaviors of your research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic observation.

You can think of naturalistic observation as “people watching” with a purpose.

A dependent variable is what changes as a result of the independent variable manipulation in experiments . It’s what you’re interested in measuring, and it “depends” on your independent variable.

In statistics, dependent variables are also called:

Response variables (they respond to a change in another variable)
Outcome variables (they represent the outcome you want to measure)
Left-hand-side variables (they appear on the left-hand side of a regression equation)

An independent variable is the variable you manipulate, control, or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study.

Independent variables are also called:

Explanatory variables (they explain an event or outcome)
Predictor variables (they can be used to predict the value of a dependent variable)
Right-hand-side variables (they appear on the right-hand side of a regression equation).

As a rule of thumb, questions related to thoughts, beliefs, and feelings work well in focus groups. Take your time formulating strong questions, paying special attention to phrasing. Be careful to avoid leading questions , which can bias your responses.

Overall, your focus group questions should be:

Open-ended and flexible
Impossible to answer with “yes” or “no” (questions that start with “why” or “how” are often best)
Unambiguous, getting straight to the point while still stimulating discussion
Unbiased and neutral

A structured interview is a data collection method that relies on asking questions in a set order to collect data on a topic. They are often quantitative in nature. Structured interviews are best used when:

You already have a very clear understanding of your topic. Perhaps significant research has already been conducted, or you have done some prior research yourself, but you already possess a baseline for designing strong structured questions.
You are constrained in terms of time or resources and need to analyze your data quickly and efficiently.
Your research question depends on strong parity between participants, with environmental conditions held constant.

More flexible interview options include semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias is the tendency for interview participants to give responses that will be viewed favorably by the interviewer or other participants. It occurs in all types of interviews and surveys , but is most common in semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias can be mitigated by ensuring participants feel at ease and comfortable sharing their views. Make sure to pay attention to your own body language and any physical or verbal cues, such as nodding or widening your eyes.

This type of bias can also occur in observations if the participants know they’re being observed. They might alter their behavior accordingly.

The interviewer effect is a type of bias that emerges when a characteristic of an interviewer (race, age, gender identity, etc.) influences the responses given by the interviewee.

There is a risk of an interviewer effect in all types of interviews , but it can be mitigated by writing really high-quality interview questions.

A semi-structured interview is a blend of structured and unstructured types of interviews. Semi-structured interviews are best used when:

You have prior interview experience. Spontaneous questions are deceptively challenging, and it’s easy to accidentally ask a leading question or make a participant uncomfortable.
Your research question is exploratory in nature. Participant answers can guide future research questions and help you develop a more robust knowledge base for future research.

An unstructured interview is the most flexible type of interview, but it is not always the best fit for your research topic.

Unstructured interviews are best used when:

You are an experienced interviewer and have a very strong background in your research topic, since it is challenging to ask spontaneous, colloquial questions.
Your research question is exploratory in nature. While you may have developed hypotheses, you are open to discovering new or shifting viewpoints through the interview process.
You are seeking descriptive data, and are ready to ask questions that will deepen and contextualize your initial thoughts and hypotheses.
Your research depends on forming connections with your participants and making them feel comfortable revealing deeper emotions, lived experiences, or thoughts.

The four most common types of interviews are:

Structured interviews : The questions are predetermined in both topic and order.
Semi-structured interviews : A few questions are predetermined, but other questions aren’t planned.
Unstructured interviews : None of the questions are predetermined.
Focus group interviews : The questions are presented to a group instead of one individual.

Deductive reasoning is commonly used in scientific research, and it’s especially associated with quantitative research .

In research, you might have come across something called the hypothetico-deductive method . It’s the scientific method of testing hypotheses to check whether your predictions are substantiated by real-world data.

Deductive reasoning is a logical approach where you progress from general ideas to specific conclusions. It’s often contrasted with inductive reasoning , where you start with specific observations and form general conclusions.

Deductive reasoning is also called deductive logic.

There are many different types of inductive reasoning that people use formally or informally.

Here are a few common types:

Inductive generalization : You use observations about a sample to come to a conclusion about the population it came from.
Statistical generalization: You use specific numbers about samples to make statements about populations.
Causal reasoning: You make cause-and-effect links between different things.
Sign reasoning: You make a conclusion about a correlational relationship between different things.
Analogical reasoning: You make a conclusion about something based on its similarities to something else.

Inductive reasoning is a bottom-up approach, while deductive reasoning is top-down.

Inductive reasoning takes you from the specific to the general, while in deductive reasoning, you make inferences by going from general premises to specific conclusions.

In inductive research , you start by making observations or gathering data. Then, you take a broad scan of your data and search for patterns. Finally, you make general conclusions that you might incorporate into theories.

Inductive reasoning is a method of drawing conclusions by going from the specific to the general. It’s usually contrasted with deductive reasoning, where you proceed from general information to specific conclusions.

Inductive reasoning is also called inductive logic or bottom-up reasoning.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Triangulation can help:

Reduce research bias that comes from using a single method, theory, or investigator
Enhance validity by approaching the same topic with different tools
Establish credibility by giving you a complete picture of the research problem

But triangulation can also pose problems:

It’s time-consuming and labor-intensive, often involving an interdisciplinary team.
Your results may be inconsistent or even contradictory.

There are four main types of triangulation :

Data triangulation : Using data from different times, spaces, and people
Investigator triangulation : Involving multiple researchers in collecting or analyzing data
Theory triangulation : Using varying theoretical perspectives in your research
Methodological triangulation : Using different methodologies to approach the same topic

Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure.

Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.

Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.

In general, the peer review process follows the following steps:

First, the author submits the manuscript to the editor.
Reject the manuscript and send it back to author, or
Send it onward to the selected peer reviewer(s)
Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made.
Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.

Exploratory research is often used when the issue you’re studying is new or when the data collection process is challenging for some reason.

You can use exploratory research if you have a general idea or a specific question that you want to study but there is no preexisting knowledge or paradigm with which to study it.

Exploratory research is a methodology approach that explores research questions that have not previously been studied in depth. It is often used when the issue you’re studying is new, or the data collection process is challenging in some way.

Explanatory research is used to investigate how or why a phenomenon occurs. Therefore, this type of research is often one of the first stages in the research process , serving as a jumping-off point for future research.

Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.

Explanatory research is a research method used to investigate how or why something occurs when only a small amount of information is available pertaining to that topic. It can help you increase your understanding of a given topic.

Clean data are valid, accurate, complete, consistent, unique, and uniform. Dirty data include inconsistencies and errors.

Dirty data can come from any part of the research process, including poor research design , inappropriate measurement materials, or flawed data entry.

Data cleaning takes place between data collection and data analyses. But you can use some methods even before collecting data.

For clean data, you should start by designing measures that collect valid data. Data validation at the time of data entry or collection helps you minimize the amount of data cleaning you’ll need to do.

After data collection, you can use data standardization and data transformation to clean your data. You’ll also deal with any missing values, outliers, and duplicate values.

Every dataset requires different techniques to clean dirty data , but you need to address these issues in a systematic way. You focus on finding and resolving data points that don’t agree or fit with the rest of your dataset.

These data might be missing values, outliers, duplicate values, incorrectly formatted, or irrelevant. You’ll start with screening and diagnosing your data. Then, you’ll often standardize and accept or remove data to make your dataset consistent and valid.

Data cleaning is necessary for valid and appropriate analyses. Dirty data contain inconsistencies or errors , but cleaning your data helps you minimize or resolve these.

Without data cleaning, you could end up with a Type I or II error in your conclusion. These types of erroneous conclusions can be practically significant with important consequences, because they lead to misplaced investments or missed opportunities.

Data cleaning involves spotting and resolving potential data inconsistencies or errors to improve your data quality. An error is any value (e.g., recorded weight) that doesn’t reflect the true value (e.g., actual weight) of something that’s being measured.

In this process, you review, analyze, detect, modify, or remove “dirty” data to make your dataset “clean.” Data cleaning is also called data cleansing or data scrubbing.

Research misconduct means making up or falsifying data, manipulating data analyses, or misrepresenting results in research reports. It’s a form of academic fraud.

These actions are committed intentionally and can have serious consequences; research misconduct is not a simple mistake or a point of disagreement but a serious ethical failure.

Anonymity means you don’t know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations .

You can only guarantee anonymity by not collecting any personally identifying information—for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos.

You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals.

Research ethics matter for scientific integrity, human rights and dignity, and collaboration between science and society. These principles make sure that participation in studies is voluntary, informed, and safe.

Ethical considerations in research are a set of principles that guide your research designs and practices. These principles include voluntary participation, informed consent, anonymity, confidentiality, potential for harm, and results communication.

Scientists and researchers must always adhere to a certain code of conduct when collecting data from others .

These considerations protect the rights of research participants, enhance research validity , and maintain scientific integrity.

In multistage sampling , you can use probability or non-probability sampling methods .

For a probability sample, you have to conduct probability sampling at every stage.

You can mix it up by using simple random sampling , systematic sampling , or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.

Multistage sampling can simplify data collection when you have large, geographically spread samples, and you can obtain a probability sample without a complete sampling frame.

But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples .

These are four of the most common mixed methods designs :

Convergent parallel: Quantitative and qualitative data are collected at the same time and analyzed separately. After both analyses are complete, compare your results to draw overall conclusions.
Embedded: Quantitative and qualitative data are collected at the same time, but within a larger quantitative or qualitative design. One type of data is secondary to the other.
Explanatory sequential: Quantitative data is collected and analyzed first, followed by qualitative data. You can use this design if you think your qualitative data will explain and contextualize your quantitative findings.
Exploratory sequential: Qualitative data is collected and analyzed first, followed by quantitative data. You can use this design if you think the quantitative data will confirm or validate your qualitative findings.

Triangulation in research means using multiple datasets, methods, theories and/or investigators to address a research question. It’s a research strategy that can help you enhance the validity and credibility of your findings.

Triangulation is mainly used in qualitative research , but it’s also commonly applied in quantitative research . Mixed methods research always uses triangulation.

In multistage sampling , or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups at each stage.

This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.

No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.

To find the slope of the line, you’ll need to perform a regression analysis .

Correlation coefficients always range between -1 and 1.

The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

These are the assumptions your data must meet if you want to use Pearson’s r :

Both variables are on an interval or ratio level of measurement
Data from both variables follow normal distributions
Your data have no outliers
Your data is from a random or representative sample
You expect a linear relationship between the two variables

Quantitative research designs can be divided into two main categories:

Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
Experimental and quasi-experimental designs are used to test causal relationships .

Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.

A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.

The priorities of a research design can vary depending on the field, but you usually have to specify:

Your research questions and/or hypotheses
Your overall approach (e.g., qualitative or quantitative )
The type of design you’re using (e.g., a survey , experiment , or case study )
Your sampling methods or criteria for selecting subjects
Your data collection methods (e.g., questionnaires , observations)
Your data collection procedures (e.g., operationalization , timing and data management)
Your data analysis methods (e.g., statistical tests or thematic analysis )

A research design is a strategy for answering your research question . It defines your overall approach and determines how you will collect and analyze data.

Questionnaires can be self-administered or researcher-administered.

Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through mail. All questions are standardized so that all respondents receive the same questions with identical wording.

Researcher-administered questionnaires are interviews that take place by phone, in-person, or online between researchers and respondents. You can gain deeper insights by clarifying questions for respondents or asking follow-up questions.

You can organize the questions logically, with a clear progression from simple to complex, or randomly between respondents. A logical flow helps respondents process the questionnaire easier and quicker, but it may lead to bias. Randomization can minimize the bias from order effects.

Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from. These questions are easier to answer quickly.

Open-ended or long-form questions allow respondents to answer in their own words. Because there are no restrictions on their choices, respondents can answer in ways that researchers may not have otherwise considered.

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.

The third variable and directionality problems are two main reasons why correlation isn’t causation .

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.

Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.

Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.

While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

A correlation reflects the strength and/or direction of the association between two or more variables.

A positive correlation means that both variables change in the same direction.
A negative correlation means that the variables change in opposite directions.
A zero correlation means there’s no relationship between the variables.

Random error is almost always present in scientific studies, even in highly controlled settings. While you can’t eradicate it completely, you can reduce random error by taking repeated measurements, using a large sample, and controlling extraneous variables .

You can avoid systematic error through careful design of your sampling , data collection , and analysis procedures. For example, use triangulation to measure your variables using multiple methods; regularly calibrate instruments or procedures; use random sampling and random assignment ; and apply masking (blinding) where possible.

Systematic error is generally a bigger problem in research.

With random error, multiple measurements will tend to cluster around the true value. When you’re collecting data from a large sample , the errors in different directions will cancel each other out.

Systematic errors are much more problematic because they can skew your data away from the true value. This can lead you to false conclusions ( Type I and II errors ) about the relationship between the variables you’re studying.

Random and systematic error are two types of measurement error.

Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale records an incorrect measurement).

Systematic error is a consistent or proportional difference between the observed and true values of something (e.g., a miscalibrated scale consistently records weights as higher than they actually are).

On graphs, the explanatory variable is conventionally placed on the x-axis, while the response variable is placed on the y-axis.

If you have quantitative variables , use a scatterplot or a line graph.
If your response variable is categorical, use a scatterplot or a line graph.
If your explanatory variable is categorical, use a bar graph.

The term “ explanatory variable ” is sometimes preferred over “ independent variable ” because, in real world contexts, independent variables are often influenced by other variables. This means they aren’t totally independent.

Multiple independent variables may also be correlated with each other, so “explanatory variables” is a more appropriate term.

The difference between explanatory and response variables is simple:

An explanatory variable is the expected cause, and it explains the results.
A response variable is the expected effect, and it responds to other variables.

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

A control group that receives a standard treatment, a fake treatment, or no treatment.
Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

There are 4 main types of extraneous variables :

Demand characteristics : environmental cues that encourage participants to conform to researchers’ expectations.
Experimenter effects : unintentional actions by researchers that influence study outcomes.
Situational variables : environmental variables that alter participants’ behaviors.
Participant variables : any characteristic or aspect of a participant’s background that could affect study results.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

In a factorial design, multiple independent variables are tested.

If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.

Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .

Advantages:

Only requires small samples
Statistically powerful
Removes the effects of individual differences on the outcomes

Disadvantages:

Internal validity threats reduce the likelihood of establishing a direct relationship between variables
Time-related effects, such as growth, can influence the outcomes
Carryover effects mean that the specific order of different treatments affect the outcomes

While a between-subjects design has fewer threats to internal validity , it also requires more participants for high statistical power than a within-subjects design .

Prevents carryover effects of learning and fatigue.
Shorter study duration.
Needs larger samples for high power.
Uses more resources to recruit participants, administer sessions, cover costs, etc.
Individual differences may be an alternative explanation for results.

Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.

In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.

To implement random assignment , assign a unique number to every member of your study’s sample .

Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.

Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.

In contrast, random assignment is a way of sorting the sample into control and experimental groups.

Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.

Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.

Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .

If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .

A control variable is any variable that’s held constant in a research study. It’s not a variable of interest in the study, but it’s controlled because it could influence the outcomes.

Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. They are important to consider when studying complex correlational or causal relationships.

Mediators are part of the causal pathway of an effect, and they tell you how or why an effect takes place. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds.

If something is a mediating variable :

It’s caused by the independent variable .
It influences the dependent variable
When it’s taken into account, the statistical correlation between the independent and dependent variables is higher than when it isn’t considered.

A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.

A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.

There are three key steps in systematic sampling :

Define and list your population , ensuring that it is not ordered in a cyclical or periodic order.
Decide on your sample size and calculate your interval, k , by dividing your population by your target sample size.
Choose every k th member of the population as your sample.

Systematic sampling is a probability sampling method where researchers select members of the population at a regular interval – for example, by selecting every 15th person on a list of the population. If the population is in a random order, this can imitate the benefits of simple random sampling .

Yes, you can create a stratified sample using multiple characteristics, but you must ensure that every participant in your study belongs to one and only one subgroup. In this case, you multiply the numbers of subgroups for each characteristic to get the total number of groups.

For example, if you were stratifying by location with three subgroups (urban, rural, or suburban) and marital status with five subgroups (single, divorced, widowed, married, or partnered), you would have 3 x 5 = 15 subgroups.

You should use stratified sampling when your sample can be divided into mutually exclusive and exhaustive subgroups that you believe will take on different mean values for the variable that you’re studying.

Using stratified sampling will allow you to obtain more precise (with lower variance ) statistical estimates of whatever you are trying to measure.

For example, say you want to investigate how income differs based on educational attainment, but you know that this relationship can vary based on race. Using stratified sampling, you can ensure you obtain a large enough sample from each racial group, allowing you to draw more precise conclusions.

In stratified sampling , researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment).

Once divided, each subgroup is randomly sampled using another probability sampling method.

Cluster sampling is more time- and cost-efficient than other probability sampling methods , particularly when it comes to large samples spread across a wide geographical area.

However, it provides less statistical certainty than other methods, such as simple random sampling , because it is difficult to ensure that your clusters properly represent the population as a whole.

There are three types of cluster sampling : single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.

In single-stage sampling , you collect data from every unit within the selected clusters.
In double-stage sampling , you select a random sample of units from within the clusters.
In multi-stage sampling , you repeat the procedure of randomly sampling elements from within the clusters until you have reached a manageable sample.

Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.

The clusters should ideally each be mini-representations of the population as a whole.

If properly implemented, simple random sampling is usually the best sampling method for ensuring both internal and external validity . However, it can sometimes be impractical and expensive to implement, depending on the size of the population to be studied,

If you have a list of every member of the population and the ability to reach whichever members are selected, you can use simple random sampling.

The American Community Survey is an example of simple random sampling . In order to collect detailed data on the population of the US, the Census Bureau officials randomly select 3.5 million households per year and use a variety of methods to convince them to fill out the survey.

Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population . Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity as they can use real-world interventions instead of artificial laboratory settings.

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.

Blinding is important to reduce research bias (e.g., observer bias , demand characteristics ) and ensure a study’s internal validity .

If participants know whether they are in a control or treatment group , they may adjust their behavior in ways that affect the outcome that researchers are trying to measure. If the people administering the treatment are aware of group assignment, they may treat participants differently and thus directly or indirectly influence the final results.

In a single-blind study , only the participants are blinded.
In a double-blind study , both participants and experimenters are blinded.
In a triple-blind study , the assignment is hidden not only from participants and experimenters, but also from the researchers analyzing the data.

Blinding means hiding who is assigned to the treatment group and who is assigned to the control group in an experiment .

A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn’t receive the experimental treatment.

However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group’s outcomes before and after a treatment (instead of comparing outcomes between different groups).

For strong internal validity , it’s usually best to include a control group if possible. Without a control group, it’s harder to be certain that the outcome was caused by the experimental treatment and not by other variables.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.

Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.

The type of data determines what statistical tests you should use to analyze your data.

A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.

To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.

In scientific research, concepts are the abstract ideas or phenomena that are being studied (e.g., educational achievement). Variables are properties or characteristics of the concept (e.g., performance at school), while indicators are ways of measuring or quantifying variables (e.g., yearly grade reports).

The process of turning abstract concepts into measurable variables and indicators is called operationalization .

There are various approaches to qualitative data analysis , but they all share five steps in common:

Prepare and organize your data.
Review and explore your data.
Develop a data coding system.
Assign codes to the data.
Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

There are five common approaches to qualitative research :

Grounded theory involves collecting data in order to develop new theories.
Ethnography involves immersing yourself in a group or organization to understand its culture.
Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
Phenomenological research involves investigating phenomena through people’s lived experiences.
Action research links theory and practice in several cycles to drive innovative changes.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

When conducting research, collecting original data has significant advantages:

You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )

However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

Yes, but including more than one of either type requires multiple research questions .

For example, if you are interested in the effect of a diet on health, you can use multiple measures of health: blood sugar, blood pressure, weight, pulse, and many more. Each of these is its own dependent variable with its own research question.

You could also choose to look at the effect of exercise levels as well as diet, or even the additional effect of the two combined. Each of these is a separate independent variable .

To ensure the internal validity of an experiment , you should only change one independent variable at a time.

No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time. It must be either the cause or the effect, not both!

You want to find out how blood sugar levels are affected by drinking diet soda and regular soda, so you conduct an experiment .

The type of soda – diet or regular – is the independent variable .
The level of blood sugar that you measure is the dependent variable – it changes depending on the type of soda.

Determining cause and effect is one of the most important parts of scientific research. It’s essential to know which is the cause – the independent variable – and which is the effect – the dependent variable.

In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.

Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling, and quota sampling .

Probability sampling means that every member of the target population has a known chance of being included in the sample.

Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .

Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .

Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.

Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.

A sampling error is the difference between a population parameter and a sample statistic .

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

There are seven threats to external validity : selection bias , history, experimenter effect, Hawthorne effect , testing effect, aptitude-treatment and situation effect.

The two types of external validity are population validity (whether you can generalize to other groups of people) and ecological validity (whether you can generalize to other situations and settings).

The external validity of a study is the extent to which you can generalize your findings to different groups of people, situations, and measures.

Cross-sectional studies cannot establish a cause-and-effect relationship or analyze behavior over a period of time. To investigate cause and effect, you need to do a longitudinal study or an experimental study .

Cross-sectional studies are less expensive and time-consuming than many other types of study. They can provide useful insights into a population’s characteristics and identify correlations for further research.

Sometimes only cross-sectional data is available for analysis; other times your research question may only require a cross-sectional study to answer it.

Longitudinal studies can last anywhere from weeks to decades, although they tend to be at least a year long.

The 1970 British Cohort Study , which has collected data on the lives of 17,000 Brits since their births in 1970, is one well-known example of a longitudinal study .

Longitudinal studies are better to establish the correct sequence of events, identify changes over time, and provide insight into cause-and-effect relationships, but they also tend to be more expensive and time-consuming than other types of studies.

Longitudinal studies and cross-sectional studies are two different types of research design . In a cross-sectional study you collect data from a population at a specific point in time; in a longitudinal study you repeatedly collect data from the same sample over an extended period of time.

Longitudinal study	Cross-sectional study
observations	Observations at a in time
Observes the multiple times	Observes (a “cross-section”) in the population
Follows in participants over time	Provides of society at a given point

There are eight threats to internal validity : history, maturation, instrumentation, testing, selection bias , regression to the mean, social interaction and attrition .

Internal validity is the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

Discrete and continuous variables are two types of quantitative variables :

Discrete variables represent counts (e.g. the number of objects in a collection).
Continuous variables represent measurable amounts (e.g. water volume or weight).

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

The independent variable is the amount of nutrients added to the crop field.
The dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

A testable hypothesis
At least one independent variable that can be precisely manipulated
At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

How you will manipulate the variable(s)
How you will control for any potential confounding variables
How many subjects or samples will be included in the study
How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalized to other contexts.

The validity of your experiment depends on your experimental design .

Reliability and validity are both about how well a method measures something:

Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Ask our team

Want to contact us directly? No problem. We are always here for you.

Email [email protected]
Start live chat
Call +1 (510) 822-8066
WhatsApp +31 20 261 6040

Our team helps students graduate by offering:

A world-class citation generator
Plagiarism Checker software powered by Turnitin
Innovative Citation Checker software
Professional proofreading services
Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

PhD dissertations
Research proposals
Personal statements
Admission essays
Motivation letters
Reflection papers
Journal articles
Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

Hypothesis tests #

Formal hypothesis testing is perhaps the most prominent and widely-employed form of statistical analysis. It is sometimes seen as the most rigorous and definitive part of a statistical analysis, but it is also the source of many statistical controversies. The currently-prevalent approach to hypothesis testing dates to developments that took place between 1925 and 1940, especially the work of Ronald Fisher , Jerzy Neyman , and Egon Pearson .

In recent years, many prominent statisticians have argued that less emphasis should be placed on the formal hypothesis testing approaches developed in the early twentieth century, with a correspondingly greater emphasis on other forms of uncertainty analysis. Our goal here is to give an overview of some of the well-established and widely-used approaches for hypothesis testing. We will also provide some perspectives on how these tools can be effectively used, and discuss their limitations. We will also discuss some new approaches to hypothesis testing that may eventually come to be as prominent as these classical approaches.

A falsifiable hypothesis is a statement, or hypothesis, that can be contradicted with evidence. In empirical (data-driven) research, this evidence will always be obtained through the data. In statistical hypothesis testing, the hypothesis that we formally test is called the null hypothesis . The alternative hypothesis is a second hypothesis that is our proposed explanation for what happens if the null hypothesis is wrong.

Test statistics #

The key element of a statistical hypothesis test is the test statistic , which (like any statistic) is a function of the data. A test statistic takes our entire dataset, and reduces it to one number. This one number ideally should contain all the information in the data that is relevant for assessing the two hypotheses of interest, and exclude any aspects of the data that are irrelevant for assessing the two hypotheses. The test statistic measures evidence against the null hypothesis. Most test statistics are constructed so that a value of zero represents the lowest possible level of evidence against the null hypothesis. Test statistic values that deviate from zero represent greater levels of evidence against the null hypothesis. The larger the magnitude of the test statistic, the stronger the evidence against the null hypothesis.

A major theme of statistical research is to devise effective ways to construct test statistics. Many useful ways to do this have been devised, and there is no single approach that is always the best. In this introductory course, we will focus on tests that starting with an estimate of a quantity that is relevant for assessing the hypotheses, then proceed by standardizing this estimate by dividing it by its standard error. This approach is sometimes referred to as “Wald testing”, after Abraham Wald .

Testing the equality of two proportions #

As a basic example, let’s consider risk perception related to COVID-19. As you will see below, hypothesis testing can appear at first to be a fairly elaborate exercise. Using this example, we describe each aspect of this exercise in detail below.

The data and research question #

The data shown below are simulated but are designed to reflect actual surveys conducted in the United States in March of 2020. Partipants were asked whether they perceive that they have a substantial risk of dying if they are infected with the novel coronavirus. The number of people stating each response, stratified on age, are shown below (only two age groups are shown):

	High risk	Not high risk
Age < 30	25	202
Age 60-69	30	124

Each subject’s response is binary – they either perceive themselves to be high risk, or not to be at high risk. When working with this type of data, we are usually interested in the proportion of people who provide each response within each stratum (age group). These are conditional proportions, conditioning on the age group. The numerical values of the conditional proportions are given below:

	High risk	Not high risk
Age < 30	0.110	0.890
Age 60-69	0.195	0.805

There are four conditional proportions in the table above – the proportion of younger people who perceive themselves to be at higher risk, 0.110=25/(25+202); the proportion of younger people who do not perceive themselves to be at high risk, 0.890=202/(25+202); the proportion of older people who perceive themselves to be at high risk 0.195=30/(30+124); and the proportion of older people who do not perceive themselves to be at high risk, 0.805=124/(30+124).

The trend in the data is that younger people perceive themselves to be at lower risk of dying than older people, by a difference of 0.195-0.110=0.085 (in terms of proportions). But is this trend only present in this sample, or is it generalizable to a broader population (say the entire US population)? That is the goal of conducting a statistical hypothesis test in this setting.

The population structure #

Corresponding to our data above is the unobserved population structure, which we can denote as follows

	High risk	Not high risk
Age < 30	$p$	$1-p$
Age 60-69	$q$	$1-q$

The symbols $p$ and $q$ in the table above are population parameters . These are quantitites that we do not know, and wish to assess using the data. In this case, our null hypothesis can be expressed as the statement $p = q$ . We can estimate $p$ using the sample proportion $\hat{p} = 0.110$ , and similarly estimate $q$ using $\hat{q} = 0.195$ . However these estimates do not immediately provide us with a way of expressing the evidence relating to the hypothesis that $p=q$ . This is provided by the test statistic.

A test statistic #

As noted above, a test statistic is a reduction of the data to one number that captures all of the relevant information for assessing the hypotheses. A natural first choice for a test statistic here would be the difference in sample proportions between the two age groups, which is 0.195 - 0.110 = 0.085. There is a difference of 0.085 between the perceived risks of death in the younger and older age groups.

The difference in rates (0.085) does not on its own make a good test statistic, although it is a good start toward obtaining one. The reason for this is that the evidence underlying this difference in rates depends also on the absolute rates (0.110 and 0.195), and on the sample sizes (227 and 154). If we only know that the difference in rates is 0.085, this is not sufficient to evaluate the hypothesis in a statistical manner. A given difference in rates is much stronger evidence if it is obtained from a larger sample. If we have a difference of 0.085 with a very large sample, say one million people, then we should be almost certain that the true rates differ (i.e. the data are highly incompatiable with the hypothesis that $p=q$ ). If we have the same difference in rates of 0.085, but with a small sample, say 50 people per age group, then there would be almost no evidence for a true difference in the rates (i.e. the data are compatiable with the hypothesis $p=q$ ).

To address this issue, we need to consider the uncertainty in the estimated rate difference, which is 0.085. Recall that the estimated rate difference is obtained from the sample and therefore is almost certain to deviate somewhat from the true rate difference in the population (which is unknown). Recall from our study of standard errors that the standard error for an estimated proportion is $\sqrt{p(1-p)/n}$ , where $p$ is the outcome probability (here the outcome is that a person perceives a high risk of dying), and $n$ is the sample size.

In the present analysis, we are comparing two proportions, so we have two standard errors. The estimated standard error for the younger people is $\sqrt{0.11\cdot 0.89/227} \approx 0.021$ . The estimated standard error for the older people is $\sqrt{0.195\cdot 0.805/154} \approx 0.032$ . Note that both standard errors are estimated, rather than exact, because we are plugging in estimates of the rates (0.11 and 0.195). Also note that the standard error for the rate among older people is greater than that for younger people. This is because the sample size for older people is smaller, and also because the estimated rate for older people is closer to 1/2.

In our previous discussion of standard errors, we saw how standard errors for independent quantities $A$ and $B$ can be used to obtain the standard error for the difference $A-B$ . Applying that result here, we see that the standard error for the estimated difference in rates 0.195-0.11=0.085 is $\sqrt{0.021^2 + 0.032^2} \approx 0.038$ .

The final step in constructing our test statistic is to construct a Z-score from the estimated difference in rates. As with all Z-scores, we proceed by taking the estimated difference in rates, and then divide it by its standard error. Thus, we get a test statistic value of $0.085 / 0.038 \approx 2.24$ .

A test statistic value of 2.24 is not very close to zero, so there is some evidence against the null hypothesis. But the strength of this evidence remains unclear. Thus, we must consider how to calibrate this evidence in a way that makes it more interpretable.

Calibrating the evidence in the test statistic #

By the central limit theorem (CLT), a Z-score approximately follows a normal distribution. When the null hypothesis holds, the Z-score approximately follows the standard normal distribution (recall that a standard normal distribution is a normal distribution with expected value equal to 0 and variance equal to 1). If the null hypothesis does not hold, then the test statistic continues to approximately follow a normal distribution, but it is not the standard normal distribution.

A test statistic of zero represents the least possible evidence against the null hypothesis. Here, we will obtain a test statistic of zero when the two proportions being compared are identical, i.e. exactly the same proportions of younger and older people perceive a substantial risk of dying from a disease. Even if the test statistic is exactly zero, this does not guarantee that the null hypothesis is true. However it is the least amount of evidence that the data can present against the null hypothesis.

In a hypothesis testing setting using normally-distrbuted Z-scores, as is the case here (due to the CLT), the standard normal distribution is the reference distribution for our test statistic. If the Z-score falls in the center of the reference distribution, there is no evidence against the null hypothesis. If the Z-score falls into either tail of the reference distribution, then there is evidence against the null distribution, and the further into the tails of the reference distribution the Z-score falls, the greater the evidence.

The most conventional way to quantify the evidence in our test statistic is through a probability called the p-value . The p-value has a somewhat complex definition that many people find difficult to grasp. It is the probability of observing as much or more evidence against the null hypothesis as we actually observe, calculated when the null hypothesis is assumed to be true. We will discuss some ways to think about this more intuitively below.

For our purposes, “evidence against the null hypothesis” is reflected in how far into the tails of the reference distribution the Z-score (test statistic) falls. We observed a test statistic of 2.24 in our COVID risk perception analysis. Recall that due to the “empirical rule”, 95% of the time, a draw from a standard normal distribution falls between -2 and 2. Thus, the p-value must be less than 0.05, since 2.24 falls outside this interval. The p-value can be calculated using a computer, in this case it happens to be approximately 0.025.

As stated above, the p-value tells us how likely it would be for us to obtain as much evidence against the the null hypothesis as we observed in our actual data analysis, if we were certain that the null hypothesis were true. When the null hypothesis holds, any evidence against the null hypothesis is spurious. Thus, we will want to see stronger evidence against the null from our actual analysis than we would see if we know that the null hypothesis were true. A smaller p-value therefore reflects more evidence against the null hypothesis than a larger p-value.

By convention, p-values of 0.05 or smaller are considered to represent sufficiently strong evidence against the null hypothesis to make a finding “statistically significant”. This threshold of 0.05 was chosen arbitrarily 100 years ago, and there is no objective reason for it. In recent years, people have argued that either a lesser or a greater p-value threshold should be used. But largely due to convention, the practice of deeming p-values smaller than 0.05 to be statistically significant continues.

Summary of this example #

Here is a restatement of the above discussion, using slightly different language. In our analysis of COVID risk perceptions, we found a difference in proportions of 0.085 between younger and older subjects, with younger people perceiving a lower risk of dying. This is a difference based on the sample of data that we observed, but what we really want to know is whether there is a difference in COVID risk perception in the population (say, all US adults).

Suppose that in fact there is no difference in risk perception between younger and older people. For instance, suppose that in the population, 15% of people believe that they have a substantial risk of dying should they become infected with the novel coronavirus, regardless of their age. Even though the rates are equal in this imaginary population (both being 15%), the rates in our sample would typically not be equal. Around 3% of the time (0.024=2.4% to be exact), if the rates are actually equal in the population, we would see a test statistic that is 2.4 or larger. Since 3% represents a fairly rare event, we can conclude that our observed data are not compatible with the null hypothesis. We can also say that there is statistically significant evidence against the null hypothesis, and that we have “rejected” the null hypothesis at the 3% level.

In this data analysis, as in any data analysis, we cannot confirm definitively that the alternative hypothesis is true. But based on our data and the analysis performed above, we can claim that there is substantial evidence against the null hypothesis, using standard criteria for what is considered to be “substantial evidence”.

Comparison of means #

A very common setting where hypothesis testing is used arises when we wish to compare the means of a quantitative measurement obtained for two populations. Imagine, for example, that we have two ways of manufacturing a battery, and we wish to assess which approach yields batteries that are longer-lasting in actual use. To do this, suppose we obtain data that tells us the number of charge cycles that were completed in 200 batteries of type A, and in 300 batteries of type B. For the test developed below to be meaningful, the data must be independent and identically distributed samples.

The raw data for this study consists of 500 numbers, but it turns out that the most relevant information from the data is contained in the sample means and sample standard deviations computed within each battery type. Note that this is a huge reduction in complexity, since we started with 500 measurements and are able to summarize this down to just four numbers.

Suppose the summary statistics are as follows, where $\bar{x}$ , $\hat{\sigma}_x$ , and $n$ denote the sample mean, sample standard deviation, and sample size, respectively.

Type	$\bar{x}$	$\hat{\sigma}_x$	$n$
	420	70	200
	403	90	300

The simplest measure comparing the two manufacturing approaches is the difference 420 - 403 = 17. That is, batteries of type A tend to have 17 more charge cycles compared to batteries of type B. This difference is present in our sample, but is it also true that the entire population of type A batteries has more charge cycles than the entire population of type B batteries? That is the goal of conducting a hypothesis test.

The next step in the present analysis is to divide the mean difference, which is 17, by its standard error. As we have seen, the standard error of the mean, or SEM, is $\sigma/n$ , where $\sigma$ is the standard deviation and $n$ is the sample size. Since $\sigma$ is almost never known, we plug in its estimate $\hat{\sigma}$ . For the type A batteries, the estimated SEM is thus $70/\sqrt{200} \approx 4.95$ , and for the type B batteries the estimated SEM is $90/\sqrt{300} \approx 5.2$ .

Since we are comparing two estimated means that are obtained from independent samples, we can pool the standard deviations to obtain an overall standard deviation of $\sqrt{4.95^2 + 5.2^2} \approx 7.18$ . We can now obtain our test statistic $17/7.18 \approx 2.37$ .

The test statistic can be calibrated against a standard normal reference distribution. The probability of observing a standard normal value that is greater in magnitude than 2.37 is 0.018 (this can be obtained from a computer). This is the p-value, and since it is smaller than the conventional threshold of 0.05, we can claim that there is a statistically significant difference between the average number of charge cycles for the two types of batteries, with the A batteries having more charge cycles on average.

The analysis illustrated here is called a two independent samples Z-test , or just a two sample Z-test . It may be the most commonly employed of all statistical tests. It is also common to see the very similar two sample t-test , which is different only in that it uses the Student t distribution rather than the normal (Gaussian) distribution to calculate the p-values. In fact, there are quite a few minor variations on this testing framework, including “one sided” and “two sided” tests, and tests based on different ways of pooling the variance. Due to the CLT, if the sample size is modestly large (which is the case here), the results of all of these tests will be almost identical. For simplicity, we only cover the Z-test in this course.

Assessment of a correlation #

The tests for comparing proportions and means presented above are quite similar in many ways. To provide one more example of a hypothesis test that is somewhat different, we consider a test for a correlation coefficient.

Recall that the sample correlation coefficient $\hat{r}$ is used to assess the relationship, or association, between two quantities X and Y that are measured on the same units. For example, we may ask whether two biomarkers, serum creatinine and D-dimer, are correlated with each other. These biomarkers are both commonly used in medical settings and are obtained using blood tests. D-dimer is used to assess whether a person has blood clots, and serum creatinine is used to measure kidney performance.

Suppose we are interested in whether there is a correlation in the population between D-dimer and serum creatinine. The population correlation coefficient between these two quantitites can be denoted $r$ . Our null hypothesis is $r=0$ . Suppose that we observe a sample correlation coefficient of $\hat{r}=0.15$ , using an independent and identically distributed sample of pairs $(x, y)$ , where $x$ is a D-dimer measurement and $y$ is a serum creatinine measurement. Are these data consistent with the null hypothesis?

As above, we proceed by constructing a test statistic by taking the estimated statistic and dividing it by its standard error. The approximate standard error for $\hat{r}$ is $1/\sqrt{n}$ , where $n$ is the sample size. The test statistic is therefore $\sqrt{n}\cdot \hat{r} \approx 1.48$ .

We now calibrate this test statistic by comparing it to a standard normal reference distribution. Recall from the empirical rule that 5% of the time, a standard normal value falls outside the interval (-2, 2). Therefore, if the test statistic is smaller than 2 in magnitude, as is the case here, its p-value is greater than 0.05. Thus, in this case we know that the p-value will exceed 0.05 without calculating it, and therefore there is no basis for claiming that D-dimer and serum creatinine levels are correlated in this population.

Sampling properties of p-values #

A p-value is the most common way of calibrating evidence. Smaller p-values indicate stronger evidence against a null hypothesis. By convention, if the p-value is smaller than some threshold, usually 0.05, we reject the null hypothesis and declare a finding to be “statistically significant”. How can we understand more deeply what this means? One major concern should be obtaining a small p-value when the null hypothesis is true. If the null hypothesis is true, then it is incorrect to reject it. If we reject the null hypothesis, we are making a false claim. This can never be prevented with complete certainty, but we would like to have a very clear understanding of how likely it is to reject the null hypothesis when the null hypothesis is in fact true.

P-values have a special property that when the null distribution is true, the probability of observing a p-value smaller than 0.05 is 0.05 (5%). In fact, the probability of observing a p-value smaller than $t$ is equal to $t$ , for any threshold $t$ . For example, the probability of observing a p-value smaller than 0.1, when the null hypothesis is true, is 10%.

This fact gives a more concrete understanding of how strong the evidence is for a particular p-value. If we always reject the null hypothesis when the p-value is 0.1 or smaller, then over the long run we will reject the null hypothesis 10% of the time when the null hypothesis is true. If we always reject the null hypothesis when the p-value is 0.05 or smaller, then over the long run we will reject the null hypothesis 5% of the time when the null hypothesis is true.

The approach to hypothesis testing discussed above largely follows the framework developed by RA Fisher around 1925. Note that although we mentioned the alternative hypothesis above, we never actually used it. A more elaborate approach to hypothesis testing was developed somewhat later by Egon Pearson and Jerzy Neyman. The “Neyman-Pearson” approach to hypothesis testing is even more formal than Fisher’s approach, and is most suited to highly planned research efforts in which the study is carefully designed, then executed. While ideally all research projects should be carried out this way, in reality we often conduct research using data that are already available, rather than using data that are specifically collected to address the research question.

Neyman-Pearson hypothesis testing involves specifying an alternative hypothesis that we anticipate encountering. Usually this alternative hypothesis represents a realistic guess about what we might find once the data are collected. In each of the three examples above, imagine that the data are not yet collected, and we are asked to specify an alternative hypothesis. We may arrive at the following:

In comparing risk perceptions for COVID, we may anticipate that older people will perceive a 30% risk of dying, and younger people will anticipate a 5% risk of dying.

In comparing the number of charge cycles for two types of batteries, we may anticipate that batter type A will have on average 500 charge cycles, and battery type B will have on average 400 charge cycles.

In assessing the correlation between D-dimer and serum creatinine levels, we may anticipate a correlation of 0.3.

Note that none of the numbers stated here are data-driven – they are specified before any data are collected, so they do not match the results from the data, which were collected only later. These alternative hypotheses are all essentially speculations, based perhaps on related data or theoretical considerations.

There are several benefits of specifying an explicit alternative hypothesis, as done here, even though it is not strictly necessary and can be avoided entirely by adopting Fisher’s approach to hypothesis testing. One benefit of specifying an alternative hypothesis is that we can use it to assess the power of our planned study, which can in turn inform the design of the study, in particular the sample size. The power is the probability of rejecting the null hypothesis when the alternative hypothesis is true. That is, it is the probability of discovering something real. The power should be contrasted with the level of a hypothesis test, which is the probability of rejecting the null hypothesis when the null hypothesis is true. That is, the level is the probability of “discovering” something that is not real.

To calculate the power, recall that for many of the test statistics that we are considering here, the test statistic has the form $\hat{\theta}/{\rm SE}(\hat{\theta})$ , where $\hat{\theta}$ is an estimate. For example, $\hat{\theta}$ ) may be the correlation coefficient between D-dimer and serum creatinine levels. As stated above, the power is the probability of rejecting the null hypothesis when the alternative hypothesis is true. Suppose we decide to reject the null hypothesis when the test statistic is greater than 2, which is approximately equivalent to rejecting the null hypothesis when the p-value is less than 0.05. The following calculation tells us how to obtain the power in this setting:

Under the alternative hypothesis, $\sqrt{n}(\hat{r} - r)$ approximately follows a standard normal distribution. Therefore, if $r$ and $n$ are given, we can easily use the computer to obtain the probability of observing a value greater than $2 - \sqrt{n}r$ . This gives us the power of the test. For example, if we anticipate $r=0.3$ and plan to collect data for $n=100$ observations, the power is 0.84. This is generally considered to be good power – if the true value of $r$ is in fact 0.3, we would reject the null hypothesis 84% of the time.

A study usually has poor power because it has too small of a sample size. Poorly powered studies can be very misleading, but since large sample sizes are expensive to collect, a lot of research is conducted using sample sizes that yield moderate or even low power. If a study has low power, it is unlikely to reject the null hypothesis even when the alternative hypothesis is true, but it remains possible to reject the null hypothesis when the null hypothesis is true (usually this probability is 5%). Therefore the most likely outcome of a poorly powered study may be an incorrectly rejected null hypothesis.

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability.

A Comprehensive Look at Percentile in Statistics

The Best Guide to Understand Bayes Theorem

Everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, a complete guide to chi-square test, what is hypothesis testing in statistics types and examples, understanding the fundamentals of arithmetic and geometric progression, the definitive guide to understand spearman’s rank correlation, mean squared error: overview, examples, concepts and more, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution.

All You Need to Know About Bias in Statistics

A Complete Guide to Get a Grasp of Time Series Analysis

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, what is hypothesis testing in statistics types and examples.

Lesson 10 of 24 By Avijeet Biswal

What Is Hypothesis Testing in Statistics? Types and Examples

In today’s data-driven world , decisions are based on data all the time. Hypothesis plays a crucial role in that process, whether it may be making business decisions, in the health sector, academia, or in quality improvement. Without hypothesis & hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. In this tutorial, you will look at Hypothesis Testing in Statistics.

The Ultimate Ticket to Top Data Science Job Roles

What Is Hypothesis Testing in Statistics?

Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables.

Let's discuss few examples of statistical hypothesis from real-life -

A teacher assumes that 60% of his college's students come from lower-middle-class families.
A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for diabetic patients.

Now that you know about hypothesis testing, look at the two types of hypothesis testing in statistics.

Hypothesis Testing Formula

Z = ( x̅ – μ0 ) / (σ /√n)

Here, x̅ is the sample mean,
μ0 is the population mean,
σ is the standard deviation,
n is the sample size.

How Hypothesis Testing Works?

An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample of the population to test a theory. Analysts use a random population sample to test two hypotheses: the null and alternative hypotheses.

The null hypothesis is typically an equality hypothesis between population parameters; for example, a null hypothesis may claim that the population means return equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the population means the return is not equal to zero). As a result, they are mutually exclusive, and only one can be correct. One of the two possibilities, however, will always be correct.

Your Dream Career is Just Around The Corner!

Null Hypothesis and Alternate Hypothesis

The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Let's understand this with an example.

A sanitizer manufacturer claims that its product kills 95 percent of germs on average.

To put this company's claim to the test, create a null and alternate hypothesis.

H0 (Null Hypothesis): Average = 95%.

Alternative Hypothesis (H1): The average is less than 95%.

Another straightforward example to understand this concept is determining whether or not a coin is fair and balanced. The null hypothesis states that the probability of a show of heads is equal to the likelihood of a show of tails. In contrast, the alternate theory states that the probability of a show of heads and tails would be very different.

Become a Data Scientist with Hands-on Training!

Hypothesis Testing Calculation With Examples

Let's consider a hypothesis test for the average height of women in the United States. Suppose our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and determine that their average height is 5'5". The standard deviation of population is 2.

To calculate the z-score, we would use the following formula:

z = ( x̅ – μ0 ) / (σ /√n)

z = (5'5" - 5'4") / (2" / √100)

z = 0.5 / (0.045)

We will reject the null hypothesis as the z-score of 11.11 is very large and conclude that there is evidence to suggest that the average height of women in the US is greater than 5'4".

Steps of Hypothesis Testing

Hypothesis testing is a statistical method to determine if there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. Here’s a breakdown of the typical steps involved in hypothesis testing:

Formulate Hypotheses

Null Hypothesis (H0): This hypothesis states that there is no effect or difference, and it is the hypothesis you attempt to reject with your test.
Alternative Hypothesis (H1 or Ha): This hypothesis is what you might believe to be true or hope to prove true. It is usually considered the opposite of the null hypothesis.

Choose the Significance Level (α)

The significance level, often denoted by alpha (α), is the probability of rejecting the null hypothesis when it is true. Common choices for α are 0.05 (5%), 0.01 (1%), and 0.10 (10%).

Select the Appropriate Test

Choose a statistical test based on the type of data and the hypothesis. Common tests include t-tests, chi-square tests, ANOVA, and regression analysis . The selection depends on data type, distribution, sample size, and whether the hypothesis is one-tailed or two-tailed.

Collect Data

Gather the data that will be analyzed in the test. This data should be representative of the population to infer conclusions accurately.

Calculate the Test Statistic

Based on the collected data and the chosen test, calculate a test statistic that reflects how much the observed data deviates from the null hypothesis.

Determine the p-value

The p-value is the probability of observing test results at least as extreme as the results observed, assuming the null hypothesis is correct. It helps determine the strength of the evidence against the null hypothesis.

Make a Decision

Compare the p-value to the chosen significance level:

If the p-value ≤ α: Reject the null hypothesis, suggesting sufficient evidence in the data supports the alternative hypothesis.
If the p-value > α: Do not reject the null hypothesis, suggesting insufficient evidence to support the alternative hypothesis.

Report the Results

Present the findings from the hypothesis test, including the test statistic, p-value, and the conclusion about the hypotheses.

Perform Post-hoc Analysis (if necessary)

Depending on the results and the study design, further analysis may be needed to explore the data more deeply or to address multiple comparisons if several hypotheses were tested simultaneously.

Types of Hypothesis Testing

To determine whether a discovery or relationship is statistically significant, hypothesis testing uses a z-test. It usually checks to see if two means are the same (the null hypothesis). Only when the population standard deviation is known and the sample size is 30 data points or more, can a z-test be applied.

A statistical test called a t-test is employed to compare the means of two groups. To determine whether two groups differ or if a procedure or treatment affects the population of interest, it is frequently used in hypothesis testing.

Chi-Square

You utilize a Chi-square test for hypothesis testing concerning whether your data is as predicted. To determine if the expected and observed results are well-fitted, the Chi-square test analyzes the differences between categorical variables from a random sample. The test's fundamental premise is that the observed values in your data should be compared to the predicted values that would be present if the null hypothesis were true.

Hypothesis Testing and Confidence Intervals

Both confidence intervals and hypothesis tests are inferential techniques that depend on approximating the sample distribution. Data from a sample is used to estimate a population parameter using confidence intervals. Data from a sample is used in hypothesis testing to examine a given hypothesis. We must have a postulated parameter to conduct hypothesis testing.

Bootstrap distributions and randomization distributions are created using comparable simulation techniques. The observed sample statistic is the focal point of a bootstrap distribution, whereas the null hypothesis value is the focal point of a randomization distribution.

A variety of feasible population parameter estimates are included in confidence ranges. In this lesson, we created just two-tailed confidence intervals. There is a direct connection between these two-tail confidence intervals and these two-tail hypothesis tests. The results of a two-tailed hypothesis test and two-tailed confidence intervals typically provide the same results. In other words, a hypothesis test at the 0.05 level will virtually always fail to reject the null hypothesis if the 95% confidence interval contains the predicted value. A hypothesis test at the 0.05 level will nearly certainly reject the null hypothesis if the 95% confidence interval does not include the hypothesized parameter.

Become a Data Scientist through hands-on learning with hackathons, masterclasses, webinars, and Ask-Me-Anything! Start learning now!

Simple and Composite Hypothesis Testing

Depending on the population distribution, you can classify the statistical hypothesis into two types.

Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter.

Composite Hypothesis: A composite hypothesis specifies a range of values.

A company is claiming that their average sales for this quarter are 1000 units. This is an example of a simple hypothesis.

Suppose the company claims that the sales are in the range of 900 to 1000 units. Then this is a case of a composite hypothesis.

One-Tailed and Two-Tailed Hypothesis Testing

The One-Tailed test, also called a directional test, considers a critical region of data that would result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the acceptance of the alternate hypothesis.

In a one-tailed test, the critical distribution area is one-sided, meaning the test sample is either greater or lesser than a specific value.

In two tails, the test sample is checked to be greater or less than a range of values in a Two-Tailed test, implying that the critical distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be accepted, and the null hypothesis will be rejected.

Become a Data Scientist With Real-World Experience

Right Tailed Hypothesis Testing

If the larger than (>) sign appears in your hypothesis statement, you are using a right-tailed test, also known as an upper test. Or, to put it another way, the disparity is to the right. For instance, you can contrast the battery life before and after a change in production. Your hypothesis statements can be the following if you want to know if the battery life is longer than the original (let's say 90 hours):

The null hypothesis is (H0 <= 90) or less change.
A possibility is that battery life has risen (H1) > 90.

The crucial point in this situation is that the alternate hypothesis (H1), not the null hypothesis, decides whether you get a right-tailed test.

Left Tailed Hypothesis Testing

Alternative hypotheses that assert the true value of a parameter is lower than the null hypothesis are tested with a left-tailed test; they are indicated by the asterisk "<".

Suppose H0: mean = 50 and H1: mean not equal to 50

According to the H1, the mean can be greater than or less than 50. This is an example of a Two-tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50

Here the mean is less than 50. It is called a One-tailed test.

Type 1 and Type 2 Error

A hypothesis test can result in two types of errors.

Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being true.

Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false, unlike a Type-I error.

Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.

H0: Student has passed

H1: Student has failed

Type I error will be the teacher failing the student [rejects H0] although the student scored the passing marks [H0 was true].

Type II error will be the case where the teacher passes the student [do not reject H0] although the student did not score the passing marks [H1 is true].

Level of Significance

The alpha value is a criterion for determining whether a test statistic is statistically significant. In a statistical test, Alpha represents an acceptable probability of a Type I error. Because alpha is a probability, it can be anywhere between 0 and 1. In practice, the most commonly used alpha values are 0.01, 0.05, and 0.1, which represent a 1%, 5%, and 10% chance of a Type I error, respectively (i.e. rejecting the null hypothesis when it is in fact correct).

A p-value is a metric that expresses the likelihood that an observed difference could have occurred by chance. As the p-value decreases the statistical significance of the observed difference increases. If the p-value is too low, you reject the null hypothesis.

Here you have taken an example in which you are trying to test whether the new advertising campaign has increased the product's sales. The p-value is the likelihood that the null hypothesis, which states that there is no change in the sales due to the new advertising campaign, is true. If the p-value is .30, then there is a 30% chance that there is no increase or decrease in the product's sales. If the p-value is 0.03, then there is a 3% probability that there is no increase or decrease in the sales value due to the new advertising campaign. As you can see, the lower the p-value, the chances of the alternate hypothesis being true increases, which means that the new advertising campaign causes an increase or decrease in sales.

Our Data Scientist Master's Program covers core topics such as R, Python, Machine Learning, Tableau, Hadoop, and Spark. Get started on your journey today!

Why Is Hypothesis Testing Important in Research Methodology?

Hypothesis testing is crucial in research methodology for several reasons:

Provides evidence-based conclusions: It allows researchers to make objective conclusions based on empirical data, providing evidence to support or refute their research hypotheses.
Supports decision-making: It helps make informed decisions, such as accepting or rejecting a new treatment, implementing policy changes, or adopting new practices.
Adds rigor and validity: It adds scientific rigor to research using statistical methods to analyze data, ensuring that conclusions are based on sound statistical evidence.
Contributes to the advancement of knowledge: By testing hypotheses, researchers contribute to the growth of knowledge in their respective fields by confirming existing theories or discovering new patterns and relationships.

When Did Hypothesis Testing Begin?

Hypothesis testing as a formalized process began in the early 20th century, primarily through the work of statisticians such as Ronald A. Fisher, Jerzy Neyman, and Egon Pearson. The development of hypothesis testing is closely tied to the evolution of statistical methods during this period.

Ronald A. Fisher (1920s): Fisher was one of the key figures in developing the foundation for modern statistical science. In the 1920s, he introduced the concept of the null hypothesis in his book "Statistical Methods for Research Workers" (1925). Fisher also developed significance testing to examine the likelihood of observing the collected data if the null hypothesis were true. He introduced p-values to determine the significance of the observed results.
Neyman-Pearson Framework (1930s): Jerzy Neyman and Egon Pearson built on Fisher’s work and formalized the process of hypothesis testing even further. In the 1930s, they introduced the concepts of Type I and Type II errors and developed a decision-making framework widely used in hypothesis testing today. Their approach emphasized the balance between these errors and introduced the concepts of the power of a test and the alternative hypothesis.

The dialogue between Fisher's and Neyman-Pearson's approaches shaped the methods and philosophy of statistical hypothesis testing used today. Fisher emphasized the evidential interpretation of the p-value. At the same time, Neyman and Pearson advocated for a decision-theoretical approach in which hypotheses are either accepted or rejected based on pre-determined significance levels and power considerations.

The application and methodology of hypothesis testing have since become a cornerstone of statistical analysis across various scientific disciplines, marking a significant statistical development.

Limitations of Hypothesis Testing

Hypothesis testing has some limitations that researchers should be aware of:

It cannot prove or establish the truth: Hypothesis testing provides evidence to support or reject a hypothesis, but it cannot confirm the absolute truth of the research question.
Results are sample-specific: Hypothesis testing is based on analyzing a sample from a population, and the conclusions drawn are specific to that particular sample.
Possible errors: During hypothesis testing, there is a chance of committing type I error (rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
Assumptions and requirements: Different tests have specific assumptions and requirements that must be met to accurately interpret results.

Learn All The Tricks Of The BI Trade

After reading this tutorial, you would have a much better understanding of hypothesis testing, one of the most important concepts in the field of Data Science . The majority of hypotheses are based on speculation about observed behavior, natural phenomena, or established theories.

If you are interested in statistics of data science and skills needed for such a career, you ought to explore the Post Graduate Program in Data Science.

If you have any questions regarding this ‘Hypothesis Testing In Statistics’ tutorial, do share them in the comment section. Our subject matter expert will respond to your queries. Happy learning!

1. What is hypothesis testing in statistics with example?

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence. An example: testing if a new drug improves patient recovery (Ha) compared to the standard treatment (H0) based on collected patient data.

2. What is H0 and H1 in statistics?

In statistics, H0 and H1 represent the null and alternative hypotheses. The null hypothesis, H0, is the default assumption that no effect or difference exists between groups or conditions. The alternative hypothesis, H1, is the competing claim suggesting an effect or a difference. Statistical tests determine whether to reject the null hypothesis in favor of the alternative hypothesis based on the data.

3. What is a simple hypothesis with an example?

A simple hypothesis is a specific statement predicting a single relationship between two variables. It posits a direct and uncomplicated outcome. For example, a simple hypothesis might state, "Increased sunlight exposure increases the growth rate of sunflowers." Here, the hypothesis suggests a direct relationship between the amount of sunlight (independent variable) and the growth rate of sunflowers (dependent variable), with no additional variables considered.

4. What are the 2 types of hypothesis testing?

One-tailed (or one-sided) test: Tests for the significance of an effect in only one direction, either positive or negative.
Two-tailed (or two-sided) test: Tests for the significance of an effect in both directions, allowing for the possibility of a positive or negative effect.

The choice between one-tailed and two-tailed tests depends on the specific research question and the directionality of the expected effect.

5. What are the 3 major types of hypothesis?

The three major types of hypotheses are:

Null Hypothesis (H0): Represents the default assumption, stating that there is no significant effect or relationship in the data.
Alternative Hypothesis (Ha): Contradicts the null hypothesis and proposes a specific effect or relationship that researchers want to investigate.
Nondirectional Hypothesis: An alternative hypothesis that doesn't specify the direction of the effect, leaving it open for both positive and negative possibilities.

Find our PL-300 Microsoft Power BI Certification Training Online Classroom training classes in top cities:

Name	Date	Place
	20 Jul -4 Aug 2024, Weekend batch	Your City
	10 Aug -25 Aug 2024, Weekend batch	Your City
	7 Sep -22 Sep 2024, Weekend batch	Your City

About the Author

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

Free eBook: Top Programming Languages For A Data Scientist

Normality Test in Minitab: Minitab with Statistics

Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer

PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Resources Home 🏠
Try SciSpace Copilot
Search research papers
Add Copilot Extension
Try AI Detector
Try Paraphraser
Try Citation Generator
April Papers
June Papers
July Papers

The Craft of Writing a Strong Hypothesis

Writing a hypothesis is one of the essential elements of a scientific research paper. It needs to be to the point, clearly communicating what your research is trying to accomplish. A blurry, drawn-out, or complexly-structured hypothesis can confuse your readers. Or worse, the editor and peer reviewers.

A captivating hypothesis is not too intricate. This blog will take you through the process so that, by the end of it, you have a better idea of how to convey your research paper's intent in just one sentence.

What is a Hypothesis?

The first step in your scientific endeavor, a hypothesis, is a strong, concise statement that forms the basis of your research. It is not the same as a thesis statement , which is a brief summary of your research paper .

The sole purpose of a hypothesis is to predict your paper's findings, data, and conclusion. It comes from a place of curiosity and intuition . When you write a hypothesis, you're essentially making an educated guess based on scientific prejudices and evidence, which is further proven or disproven through the scientific method.

The reason for undertaking research is to observe a specific phenomenon. A hypothesis, therefore, lays out what the said phenomenon is. And it does so through two variables, an independent and dependent variable.

The independent variable is the cause behind the observation, while the dependent variable is the effect of the cause. A good example of this is “mixing red and blue forms purple.” In this hypothesis, mixing red and blue is the independent variable as you're combining the two colors at your own will. The formation of purple is the dependent variable as, in this case, it is conditional to the independent variable.

Different Types of Hypotheses‌

Types of hypotheses

Some would stand by the notion that there are only two types of hypotheses: a Null hypothesis and an Alternative hypothesis. While that may have some truth to it, it would be better to fully distinguish the most common forms as these terms come up so often, which might leave you out of context.

Apart from Null and Alternative, there are Complex, Simple, Directional, Non-Directional, Statistical, and Associative and casual hypotheses. They don't necessarily have to be exclusive, as one hypothesis can tick many boxes, but knowing the distinctions between them will make it easier for you to construct your own.

1. Null hypothesis

A null hypothesis proposes no relationship between two variables. Denoted by H 0 , it is a negative statement like “Attending physiotherapy sessions does not affect athletes' on-field performance.” Here, the author claims physiotherapy sessions have no effect on on-field performances. Even if there is, it's only a coincidence.

2. Alternative hypothesis

Considered to be the opposite of a null hypothesis, an alternative hypothesis is donated as H1 or Ha. It explicitly states that the dependent variable affects the independent variable. A good alternative hypothesis example is “Attending physiotherapy sessions improves athletes' on-field performance.” or “Water evaporates at 100 °C. ” The alternative hypothesis further branches into directional and non-directional.

Directional hypothesis: A hypothesis that states the result would be either positive or negative is called directional hypothesis. It accompanies H1 with either the ‘<' or ‘>' sign.
Non-directional hypothesis: A non-directional hypothesis only claims an effect on the dependent variable. It does not clarify whether the result would be positive or negative. The sign for a non-directional hypothesis is ‘≠.'

3. Simple hypothesis

A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, “Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking.

4. Complex hypothesis

In contrast to a simple hypothesis, a complex hypothesis implies the relationship between multiple independent and dependent variables. For instance, “Individuals who eat more fruits tend to have higher immunity, lesser cholesterol, and high metabolism.” The independent variable is eating more fruits, while the dependent variables are higher immunity, lesser cholesterol, and high metabolism.

5. Associative and casual hypothesis

Associative and casual hypotheses don't exhibit how many variables there will be. They define the relationship between the variables. In an associative hypothesis, changing any one variable, dependent or independent, affects others. In a casual hypothesis, the independent variable directly affects the dependent.

6. Empirical hypothesis

Also referred to as the working hypothesis, an empirical hypothesis claims a theory's validation via experiments and observation. This way, the statement appears justifiable and different from a wild guess.

Say, the hypothesis is “Women who take iron tablets face a lesser risk of anemia than those who take vitamin B12.” This is an example of an empirical hypothesis where the researcher the statement after assessing a group of women who take iron tablets and charting the findings.

7. Statistical hypothesis

The point of a statistical hypothesis is to test an already existing hypothesis by studying a population sample. Hypothesis like “44% of the Indian population belong in the age group of 22-27.” leverage evidence to prove or disprove a particular statement.

Characteristics of a Good Hypothesis

Writing a hypothesis is essential as it can make or break your research for you. That includes your chances of getting published in a journal. So when you're designing one, keep an eye out for these pointers:

A research hypothesis has to be simple yet clear to look justifiable enough.
It has to be testable — your research would be rendered pointless if too far-fetched into reality or limited by technology.
It has to be precise about the results —what you are trying to do and achieve through it should come out in your hypothesis.
A research hypothesis should be self-explanatory, leaving no doubt in the reader's mind.
If you are developing a relational hypothesis, you need to include the variables and establish an appropriate relationship among them.
A hypothesis must keep and reflect the scope for further investigations and experiments.

Separating a Hypothesis from a Prediction

Outside of academia, hypothesis and prediction are often used interchangeably. In research writing, this is not only confusing but also incorrect. And although a hypothesis and prediction are guesses at their core, there are many differences between them.

A hypothesis is an educated guess or even a testable prediction validated through research. It aims to analyze the gathered evidence and facts to define a relationship between variables and put forth a logical explanation behind the nature of events.

Predictions are assumptions or expected outcomes made without any backing evidence. They are more fictionally inclined regardless of where they originate from.

For this reason, a hypothesis holds much more weight than a prediction. It sticks to the scientific method rather than pure guesswork. "Planets revolve around the Sun." is an example of a hypothesis as it is previous knowledge and observed trends. Additionally, we can test it through the scientific method.

Whereas "COVID-19 will be eradicated by 2030." is a prediction. Even though it results from past trends, we can't prove or disprove it. So, the only way this gets validated is to wait and watch if COVID-19 cases end by 2030.

Finally, How to Write a Hypothesis

Quick tips on writing a hypothesis

1. Be clear about your research question

A hypothesis should instantly address the research question or the problem statement. To do so, you need to ask a question. Understand the constraints of your undertaken research topic and then formulate a simple and topic-centric problem. Only after that can you develop a hypothesis and further test for evidence.

2. Carry out a recce

Once you have your research's foundation laid out, it would be best to conduct preliminary research. Go through previous theories, academic papers, data, and experiments before you start curating your research hypothesis. It will give you an idea of your hypothesis's viability or originality.

Making use of references from relevant research papers helps draft a good research hypothesis. SciSpace Discover offers a repository of over 270 million research papers to browse through and gain a deeper understanding of related studies on a particular topic. Additionally, you can use SciSpace Copilot , your AI research assistant, for reading any lengthy research paper and getting a more summarized context of it. A hypothesis can be formed after evaluating many such summarized research papers. Copilot also offers explanations for theories and equations, explains paper in simplified version, allows you to highlight any text in the paper or clip math equations and tables and provides a deeper, clear understanding of what is being said. This can improve the hypothesis by helping you identify potential research gaps.

3. Create a 3-dimensional hypothesis

Variables are an essential part of any reasonable hypothesis. So, identify your independent and dependent variable(s) and form a correlation between them. The ideal way to do this is to write the hypothetical assumption in the ‘if-then' form. If you use this form, make sure that you state the predefined relationship between the variables.

In another way, you can choose to present your hypothesis as a comparison between two variables. Here, you must specify the difference you expect to observe in the results.

4. Write the first draft

Now that everything is in place, it's time to write your hypothesis. For starters, create the first draft. In this version, write what you expect to find from your research.

Clearly separate your independent and dependent variables and the link between them. Don't fixate on syntax at this stage. The goal is to ensure your hypothesis addresses the issue.

5. Proof your hypothesis

After preparing the first draft of your hypothesis, you need to inspect it thoroughly. It should tick all the boxes, like being concise, straightforward, relevant, and accurate. Your final hypothesis has to be well-structured as well.

Research projects are an exciting and crucial part of being a scholar. And once you have your research question, you need a great hypothesis to begin conducting research. Thus, knowing how to write a hypothesis is very important.

Now that you have a firmer grasp on what a good hypothesis constitutes, the different kinds there are, and what process to follow, you will find it much easier to write your hypothesis, which ultimately helps your research.

Now it's easier than ever to streamline your research workflow with SciSpace Discover . Its integrated, comprehensive end-to-end platform for research allows scholars to easily discover, write and publish their research and fosters collaboration.

It includes everything you need, including a repository of over 270 million research papers across disciplines, SEO-optimized summaries and public profiles to show your expertise and experience.

If you found these tips on writing a research hypothesis useful, head over to our blog on Statistical Hypothesis Testing to learn about the top researchers, papers, and institutions in this domain.

Frequently Asked Questions (FAQs)

1. what is the definition of hypothesis.

According to the Oxford dictionary, a hypothesis is defined as “An idea or explanation of something that is based on a few known facts, but that has not yet been proved to be true or correct”.

2. What is an example of hypothesis?

The hypothesis is a statement that proposes a relationship between two or more variables. An example: "If we increase the number of new users who join our platform by 25%, then we will see an increase in revenue."

3. What is an example of null hypothesis?

A null hypothesis is a statement that there is no relationship between two variables. The null hypothesis is written as H0. The null hypothesis states that there is no effect. For example, if you're studying whether or not a particular type of exercise increases strength, your null hypothesis will be "there is no difference in strength between people who exercise and people who don't."

4. What are the types of research?

• Fundamental research

• Applied research

• Qualitative research

• Quantitative research

• Mixed research

• Exploratory research

• Longitudinal research

• Cross-sectional research

• Field research

• Laboratory research

• Fixed research

• Flexible research

• Action research

• Policy research

• Classification research

• Comparative research

• Causal research

• Inductive research

• Deductive research

5. How to write a hypothesis?

• Your hypothesis should be able to predict the relationship and outcome.

• Avoid wordiness by keeping it simple and brief.

• Your hypothesis should contain observable and testable outcomes.

• Your hypothesis should be relevant to the research question.

6. What are the 2 types of hypothesis?

• Null hypotheses are used to test the claim that "there is no difference between two groups of data".

• Alternative hypotheses test the claim that "there is a difference between two data groups".

7. Difference between research question and research hypothesis?

A research question is a broad, open-ended question you will try to answer through your research. A hypothesis is a statement based on prior research or theory that you expect to be true due to your study. Example - Research question: What are the factors that influence the adoption of the new technology? Research hypothesis: There is a positive relationship between age, education and income level with the adoption of the new technology.

8. What is plural for hypothesis?

The plural of hypothesis is hypotheses. Here's an example of how it would be used in a statement, "Numerous well-considered hypotheses are presented in this part, and they are supported by tables and figures that are well-illustrated."

9. What is the red queen hypothesis?

The red queen hypothesis in evolutionary biology states that species must constantly evolve to avoid extinction because if they don't, they will be outcompeted by other species that are evolving. Leigh Van Valen first proposed it in 1973; since then, it has been tested and substantiated many times.

10. Who is known as the father of null hypothesis?

The father of the null hypothesis is Sir Ronald Fisher. He published a paper in 1925 that introduced the concept of null hypothesis testing, and he was also the first to use the term itself.

11. When to reject null hypothesis?

You need to find a significant difference between your two populations to reject the null hypothesis. You can determine that by running statistical tests such as an independent sample t-test or a dependent sample t-test. You should reject the null hypothesis if the p-value is less than 0.05.

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Literature Review and Theoretical Framework: Understanding the Differences

Types of Essays in Academic Writing - Quick Guide (2024)

Search Search Please fill out this field.

What Is Hypothesis Testing?

How It Works

4 Step Process

The bottom line.

Fundamental Analysis

Hypothesis Testing: 4 Steps and Example

Hypothesis testing, sometimes called significance testing, is an act in statistics whereby an analyst tests an assumption regarding a population parameter. The methodology employed by the analyst depends on the nature of the data used and the reason for the analysis.

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data. Such data may come from a larger population or a data-generating process. The word "population" will be used for both of these cases in the following descriptions.

Key Takeaways

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data.
The test provides evidence concerning the plausibility of the hypothesis, given the data.
Statistical analysts test a hypothesis by measuring and examining a random sample of the population being analyzed.
The four steps of hypothesis testing include stating the hypotheses, formulating an analysis plan, analyzing the sample data, and analyzing the result.

How Hypothesis Testing Works

In hypothesis testing, an analyst tests a statistical sample, intending to provide evidence on the plausibility of the null hypothesis. Statistical analysts measure and examine a random sample of the population being analyzed. All analysts use a random population sample to test two different hypotheses: the null hypothesis and the alternative hypothesis.

The null hypothesis is usually a hypothesis of equality between population parameters; e.g., a null hypothesis may state that the population mean return is equal to zero. The alternative hypothesis is effectively the opposite of a null hypothesis. Thus, they are mutually exclusive , and only one can be true. However, one of the two hypotheses will always be true.

The null hypothesis is a statement about a population parameter, such as the population mean, that is assumed to be true.

State the hypotheses.
Formulate an analysis plan, which outlines how the data will be evaluated.
Carry out the plan and analyze the sample data.
Analyze the results and either reject the null hypothesis, or state that the null hypothesis is plausible, given the data.

Example of Hypothesis Testing

If an individual wants to test that a penny has exactly a 50% chance of landing on heads, the null hypothesis would be that 50% is correct, and the alternative hypothesis would be that 50% is not correct. Mathematically, the null hypothesis is represented as Ho: P = 0.5. The alternative hypothesis is shown as "Ha" and is identical to the null hypothesis, except with the equal sign struck-through, meaning that it does not equal 50%.

A random sample of 100 coin flips is taken, and the null hypothesis is tested. If it is found that the 100 coin flips were distributed as 40 heads and 60 tails, the analyst would assume that a penny does not have a 50% chance of landing on heads and would reject the null hypothesis and accept the alternative hypothesis.

If there were 48 heads and 52 tails, then it is plausible that the coin could be fair and still produce such a result. In cases such as this where the null hypothesis is "accepted," the analyst states that the difference between the expected results (50 heads and 50 tails) and the observed results (48 heads and 52 tails) is "explainable by chance alone."

When Did Hypothesis Testing Begin?

Some statisticians attribute the first hypothesis tests to satirical writer John Arbuthnot in 1710, who studied male and female births in England after observing that in nearly every year, male births exceeded female births by a slight proportion. Arbuthnot calculated that the probability of this happening by chance was small, and therefore it was due to “divine providence.”

What are the Benefits of Hypothesis Testing?

Hypothesis testing helps assess the accuracy of new ideas or theories by testing them against data. This allows researchers to determine whether the evidence supports their hypothesis, helping to avoid false claims and conclusions. Hypothesis testing also provides a framework for decision-making based on data rather than personal opinions or biases. By relying on statistical analysis, hypothesis testing helps to reduce the effects of chance and confounding variables, providing a robust framework for making informed conclusions.

What are the Limitations of Hypothesis Testing?

Hypothesis testing relies exclusively on data and doesn’t provide a comprehensive understanding of the subject being studied. Additionally, the accuracy of the results depends on the quality of the available data and the statistical methods used. Inaccurate data or inappropriate hypothesis formulation may lead to incorrect conclusions or failed tests. Hypothesis testing can also lead to errors, such as analysts either accepting or rejecting a null hypothesis when they shouldn’t have. These errors may result in false conclusions or missed opportunities to identify significant patterns or relationships in the data.

Hypothesis testing refers to a statistical process that helps researchers determine the reliability of a study. By using a well-formulated hypothesis and set of statistical tests, individuals or businesses can make inferences about the population that they are studying and draw conclusions based on the data presented. All hypothesis testing methods have the same four-step process, which includes stating the hypotheses, formulating an analysis plan, analyzing the sample data, and analyzing the result.

Sage. " Introduction to Hypothesis Testing ," Page 4.

Elder Research. " Who Invented the Null Hypothesis? "

Formplus. " Hypothesis Testing: Definition, Uses, Limitations and Examples ."

Terms of Service
Editorial Policy
Privacy Policy

Business Essentials
Leadership & Management
Credential of Leadership, Impact, and Management in Business (CLIMB)
Entrepreneurship & Innovation
Digital Transformation
Finance & Accounting
Business in Society
For Organizations
Support Portal
Media Coverage
Founding Donors
Leadership Team

Harvard Business School →
HBS Online →
Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

Career Development
Communication
Decision-Making
Earning Your MBA
Negotiation
News & Events
Productivity
Staff Spotlight
Student Profiles
Work-Life Balance
AI Essentials for Business
Alternative Investments
Business Analytics
Business Strategy
Business and Climate Change
Design Thinking and Innovation
Digital Marketing Strategy
Disruptive Strategy
Economics for Managers
Entrepreneurship Essentials
Financial Accounting
Global Business
Launching Tech Ventures
Leadership Principles
Leadership, Ethics, and Corporate Accountability
Leading Change and Organizational Renewal
Leading with Finance
Management Essentials
Negotiation Mastery
Organizational Leadership
Power and Influence for Positive Impact
Strategy Execution
Sustainable Business Strategy
Sustainable Investing
Winning with Digital Platforms

A Beginner’s Guide to Hypothesis Testing in Business

Business professionals performing hypothesis testing

30 Mar 2021

Becoming a more data-driven decision-maker can bring several benefits to your organization, enabling you to identify new opportunities to pursue and threats to abate. Rather than allowing subjective thinking to guide your business strategy, backing your decisions with data can empower your company to become more innovative and, ultimately, profitable.

If you’re new to data-driven decision-making, you might be wondering how data translates into business strategy. The answer lies in generating a hypothesis and verifying or rejecting it based on what various forms of data tell you.

Below is a look at hypothesis testing and the role it plays in helping businesses become more data-driven.

Access your free e-book today.

What Is Hypothesis Testing?

To understand what hypothesis testing is, it’s important first to understand what a hypothesis is.

A hypothesis or hypothesis statement seeks to explain why something has happened, or what might happen, under certain conditions. It can also be used to understand how different variables relate to each other. Hypotheses are often written as if-then statements; for example, “If this happens, then this will happen.”

Hypothesis testing , then, is a statistical means of testing an assumption stated in a hypothesis. While the specific methodology leveraged depends on the nature of the hypothesis and data available, hypothesis testing typically uses sample data to extrapolate insights about a larger population.

Hypothesis Testing in Business

When it comes to data-driven decision-making, there’s a certain amount of risk that can mislead a professional. This could be due to flawed thinking or observations, incomplete or inaccurate data , or the presence of unknown variables. The danger in this is that, if major strategic decisions are made based on flawed insights, it can lead to wasted resources, missed opportunities, and catastrophic outcomes.

The real value of hypothesis testing in business is that it allows professionals to test their theories and assumptions before putting them into action. This essentially allows an organization to verify its analysis is correct before committing resources to implement a broader strategy.

As one example, consider a company that wishes to launch a new marketing campaign to revitalize sales during a slow period. Doing so could be an incredibly expensive endeavor, depending on the campaign’s size and complexity. The company, therefore, may wish to test the campaign on a smaller scale to understand how it will perform.

In this example, the hypothesis that’s being tested would fall along the lines of: “If the company launches a new marketing campaign, then it will translate into an increase in sales.” It may even be possible to quantify how much of a lift in sales the company expects to see from the effort. Pending the results of the pilot campaign, the business would then know whether it makes sense to roll it out more broadly.

Related: 9 Fundamental Data Science Skills for Business Professionals

Key Considerations for Hypothesis Testing

1. alternative hypothesis and null hypothesis.

In hypothesis testing, the hypothesis that’s being tested is known as the alternative hypothesis . Often, it’s expressed as a correlation or statistical relationship between variables. The null hypothesis , on the other hand, is a statement that’s meant to show there’s no statistical relationship between the variables being tested. It’s typically the exact opposite of whatever is stated in the alternative hypothesis.

For example, consider a company’s leadership team that historically and reliably sees $12 million in monthly revenue. They want to understand if reducing the price of their services will attract more customers and, in turn, increase revenue.

In this case, the alternative hypothesis may take the form of a statement such as: “If we reduce the price of our flagship service by five percent, then we’ll see an increase in sales and realize revenues greater than $12 million in the next month.”

The null hypothesis, on the other hand, would indicate that revenues wouldn’t increase from the base of $12 million, or might even decrease.

Check out the video below about the difference between an alternative and a null hypothesis, and subscribe to our YouTube channel for more explainer content.

2. Significance Level and P-Value

Statistically speaking, if you were to run the same scenario 100 times, you’d likely receive somewhat different results each time. If you were to plot these results in a distribution plot, you’d see the most likely outcome is at the tallest point in the graph, with less likely outcomes falling to the right and left of that point.

With this in mind, imagine you’ve completed your hypothesis test and have your results, which indicate there may be a correlation between the variables you were testing. To understand your results' significance, you’ll need to identify a p-value for the test, which helps note how confident you are in the test results.

In statistics, the p-value depicts the probability that, assuming the null hypothesis is correct, you might still observe results that are at least as extreme as the results of your hypothesis test. The smaller the p-value, the more likely the alternative hypothesis is correct, and the greater the significance of your results.

3. One-Sided vs. Two-Sided Testing

When it’s time to test your hypothesis, it’s important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests , or one-tailed and two-tailed tests, respectively.

Typically, you’d leverage a one-sided test when you have a strong conviction about the direction of change you expect to see due to your hypothesis test. You’d leverage a two-sided test when you’re less confident in the direction of change.

Business Analytics | Become a data-driven leader | Learn More

4. Sampling

To perform hypothesis testing in the first place, you need to collect a sample of data to be analyzed. Depending on the question you’re seeking to answer or investigate, you might collect samples through surveys, observational studies, or experiments.

A survey involves asking a series of questions to a random population sample and recording self-reported responses.

Observational studies involve a researcher observing a sample population and collecting data as it occurs naturally, without intervention.

Finally, an experiment involves dividing a sample into multiple groups, one of which acts as the control group. For each non-control group, the variable being studied is manipulated to determine how the data collected differs from that of the control group.

A Beginner's Guide to Data and Analytics | Access Your Free E-Book | Download Now

Learn How to Perform Hypothesis Testing

Hypothesis testing is a complex process involving different moving pieces that can allow an organization to effectively leverage its data and inform strategic decisions.

If you’re interested in better understanding hypothesis testing and the role it can play within your organization, one option is to complete a course that focuses on the process. Doing so can lay the statistical and analytical foundation you need to succeed.

Do you want to learn more about hypothesis testing? Explore Business Analytics —one of our online business essentials courses —and download our Beginner’s Guide to Data & Analytics .

About the Author

> Machine Learning
> Statistics

What is Hypothesis Testing? Types and Methods

Soumyaa Rawat
Jul 23, 2021

Hypothesis Testing

Hypothesis testing is the act of testing a hypothesis or a supposition in relation to a statistical parameter. Analysts implement hypothesis testing in order to test if a hypothesis is plausible or not.

In data science and statistics , hypothesis testing is an important step as it involves the verification of an assumption that could help develop a statistical parameter. For instance, a researcher establishes a hypothesis assuming that the average of all odd numbers is an even number.

In order to find the plausibility of this hypothesis, the researcher will have to test the hypothesis using hypothesis testing methods. Unlike a hypothesis that is ‘supposed’ to stand true on the basis of little or no evidence, hypothesis testing is required to have plausible evidence in order to establish that a statistical hypothesis is true.

Perhaps this is where statistics play an important role. A number of components are involved in this process. But before understanding the process involved in hypothesis testing in research methodology, we shall first understand the types of hypotheses that are involved in the process. Let us get started!

Types of Hypotheses

In data sampling, different types of hypothesis are involved in finding whether the tested samples test positive for a hypothesis or not. In this segment, we shall discover the different types of hypotheses and understand the role they play in hypothesis testing.

Alternative Hypothesis

Alternative Hypothesis (H1) or the research hypothesis states that there is a relationship between two variables (where one variable affects the other). The alternative hypothesis is the main driving force for hypothesis testing.

It implies that the two variables are related to each other and the relationship that exists between them is not due to chance or coincidence.

When the process of hypothesis testing is carried out, the alternative hypothesis is the main subject of the testing process. The analyst intends to test the alternative hypothesis and verifies its plausibility.

Null Hypothesis

The Null Hypothesis (H0) aims to nullify the alternative hypothesis by implying that there exists no relation between two variables in statistics. It states that the effect of one variable on the other is solely due to chance and no empirical cause lies behind it.

The null hypothesis is established alongside the alternative hypothesis and is recognized as important as the latter. In hypothesis testing, the null hypothesis has a major role to play as it influences the testing against the alternative hypothesis.

(Must read: What is ANOVA test? )

Non-Directional Hypothesis

The Non-directional hypothesis states that the relation between two variables has no direction.

Simply put, it asserts that there exists a relation between two variables, but does not recognize the direction of effect, whether variable A affects variable B or vice versa.

Directional Hypothesis

The Directional hypothesis, on the other hand, asserts the direction of effect of the relationship that exists between two variables.

Herein, the hypothesis clearly states that variable A affects variable B, or vice versa.

Statistical Hypothesis

A statistical hypothesis is a hypothesis that can be verified to be plausible on the basis of statistics.

By using data sampling and statistical knowledge, one can determine the plausibility of a statistical hypothesis and find out if it stands true or not.

(Related blog: z-test vs t-test )

Performing Hypothesis Testing

Now that we have understood the types of hypotheses and the role they play in hypothesis testing, let us now move on to understand the process in a better manner.

In hypothesis testing, a researcher is first required to establish two hypotheses - alternative hypothesis and null hypothesis in order to begin with the procedure.

To establish these two hypotheses, one is required to study data samples, find a plausible pattern among the samples, and pen down a statistical hypothesis that they wish to test.

A random population of samples can be drawn, to begin with hypothesis testing. Among the two hypotheses, alternative and null, only one can be verified to be true. Perhaps the presence of both hypotheses is required to make the process successful.

At the end of the hypothesis testing procedure, either of the hypotheses will be rejected and the other one will be supported. Even though one of the two hypotheses turns out to be true, no hypothesis can ever be verified 100%.

(Read also: Types of data sampling techniques )

Therefore, a hypothesis can only be supported based on the statistical samples and verified data. Here is a step-by-step guide for hypothesis testing.

Establish the hypotheses

First things first, one is required to establish two hypotheses - alternative and null, that will set the foundation for hypothesis testing.

These hypotheses initiate the testing process that involves the researcher working on data samples in order to either support the alternative hypothesis or the null hypothesis.

Generate a testing plan

Once the hypotheses have been formulated, it is now time to generate a testing plan. A testing plan or an analysis plan involves the accumulation of data samples, determining which statistic is to be considered and laying out the sample size.

All these factors are very important while one is working on hypothesis testing.

Analyze data samples

As soon as a testing plan is ready, it is time to move on to the analysis part. Analysis of data samples involves configuring statistical values of samples, drawing them together, and deriving a pattern out of these samples.

While analyzing the data samples, a researcher needs to determine a set of things -

Significance Level - The level of significance in hypothesis testing indicates if a statistical result could have significance if the null hypothesis stands to be true.

Testing Method - The testing method involves a type of sampling-distribution and a test statistic that leads to hypothesis testing. There are a number of testing methods that can assist in the analysis of data samples.

Test statistic - Test statistic is a numerical summary of a data set that can be used to perform hypothesis testing.

P-value - The P-value interpretation is the probability of finding a sample statistic to be as extreme as the test statistic, indicating the plausibility of the null hypothesis.

Infer the results

The analysis of data samples leads to the inference of results that establishes whether the alternative hypothesis stands true or not. When the P-value is less than the significance level, the null hypothesis is rejected and the alternative hypothesis turns out to be plausible.

Methods of Hypothesis Testing

As we have already looked into different aspects of hypothesis testing, we shall now look into the different methods of hypothesis testing. All in all, there are 2 most common types of hypothesis testing methods. They are as follows -

Frequentist Hypothesis Testing

The frequentist hypothesis or the traditional approach to hypothesis testing is a hypothesis testing method that aims on making assumptions by considering current data.

The supposed truths and assumptions are based on the current data and a set of 2 hypotheses are formulated. A very popular subtype of the frequentist approach is the Null Hypothesis Significance Testing (NHST).

The NHST approach (involving the null and alternative hypothesis) has been one of the most sought-after methods of hypothesis testing in the field of statistics ever since its inception in the mid-1950s.

Bayesian Hypothesis Testing

A much unconventional and modern method of hypothesis testing, the Bayesian Hypothesis Testing claims to test a particular hypothesis in accordance with the past data samples, known as prior probability, and current data that lead to the plausibility of a hypothesis.

The result obtained indicates the posterior probability of the hypothesis. In this method, the researcher relies on ‘prior probability and posterior probability’ to conduct hypothesis testing on hand.

On the basis of this prior probability, the Bayesian approach tests a hypothesis to be true or false. The Bayes factor, a major component of this method, indicates the likelihood ratio among the null hypothesis and the alternative hypothesis.

The Bayes factor is the indicator of the plausibility of either of the two hypotheses that are established for hypothesis testing.

(Also read - Introduction to Bayesian Statistics )

To conclude, hypothesis testing, a way to verify the plausibility of a supposed assumption can be done through different methods - the Bayesian approach or the Frequentist approach.

Although the Bayesian approach relies on the prior probability of data samples, the frequentist approach assumes without a probability. A number of elements involved in hypothesis testing are - significance level, p-level, test statistic, and method of hypothesis testing.

(Also read: Introduction to probability distributions )

A significant way to determine whether a hypothesis stands true or not is to verify the data samples and identify the plausible hypothesis among the null hypothesis and alternative hypothesis.

Share Blog :

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

An Overview of Descriptive Analysis

What is PESTLE Analysis? Everything you need to know about it

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

Quantitative Research Methods

Introduction
Descriptive and Inferential Statistics
Hypothesis Testing
Regression and Correlation
Time Series
Meta-Analysis
Mixed Methods
Additional Resources
Get Research Help

Hypothesis Tests

A hypothesis test is exactly what it sounds like: You make a hypothesis about the parameters of a population, and the test determines whether your hypothesis is consistent with your sample data.

Hypothesis Testing Penn State University tutorial
Hypothesis Testing Wolfram MathWorld overview
Hypothesis Testing Minitab Blog entry
List of Statistical Tests A list of commonly used hypothesis tests and the circumstances under which they're used.

The p-value of a hypothesis test is the probability that your sample data would have occurred if you hypothesis were not correct. Traditionally, researchers have used a p-value of 0.05 (a 5% probability that your sample data would have occurred if your hypothesis was wrong) as the threshold for declaring that a hypothesis is true. But there is a long history of debate and controversy over p-values and significance levels.

Nonparametric Tests

Many of the most commonly used hypothesis tests rely on assumptions about your sample data—for instance, that it is continuous, and that its parameters follow a Normal distribution. Nonparametric hypothesis tests don't make any assumptions about the distribution of the data, and many can be used on categorical data.

Nonparametric Tests at Boston University A lesson covering four common nonparametric tests.
Nonparametric Tests at Penn State Tutorial covering the theory behind nonparametric tests as well as several commonly used tests.
<< Previous: Descriptive and Inferential Statistics
Next: Regression and Correlation >>
Last Updated: Aug 18, 2023 11:55 AM
URL: https://guides.library.duq.edu/quant-methods

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
J Korean Med Sci
v.37(16); 2022 Apr 25

A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

Edward barroga.

1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.

Glafera Janet Matanguihan

2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.

The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.

INTRODUCTION

Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6

It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4

There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.

DEFINITIONS AND RELATIONSHIP OF RESEARCH QUESTIONS AND HYPOTHESES

A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5

On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4

Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8

Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12

CHARACTERISTICS OF GOOD RESEARCH QUESTIONS AND HYPOTHESES

Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13

There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10

TYPES OF RESEARCH QUESTIONS AND HYPOTHESES

Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .

Quantitative research questions	Quantitative research hypotheses
Descriptive research questions	Simple hypothesis
Comparative research questions	Complex hypothesis
Relationship research questions	Directional hypothesis
	Non-directional hypothesis
	Associative hypothesis
	Causal hypothesis
	Null hypothesis
	Alternative hypothesis
	Working hypothesis
	Statistical hypothesis
	Logical hypothesis
	Hypothesis-testing
Qualitative research questions	Qualitative research hypotheses
Contextual research questions	Hypothesis-generating
Descriptive research questions
Evaluation research questions
Explanatory research questions
Exploratory research questions
Generative research questions
Ideological research questions
Ethnographic research questions
Phenomenological research questions
Grounded theory questions
Qualitative case study questions

Research questions in quantitative research

In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .

Quantitative research questions
Descriptive research question
	- Measures responses of subjects to variables
	- Presents variables to measure, analyze, or assess
	What is the proportion of resident doctors in the hospital who have mastered ultrasonography (response of subjects to a variable) as a diagnostic technique in their clinical training?
Comparative research question
	- Clarifies difference between one group with outcome variable and another group without outcome variable
	Is there a difference in the reduction of lung metastasis in osteosarcoma patients who received the vitamin D adjunctive therapy (group with outcome variable) compared with osteosarcoma patients who did not receive the vitamin D adjunctive therapy (group without outcome variable)?
	- Compares the effects of variables
	How does the vitamin D analogue 22-Oxacalcitriol (variable 1) mimic the antiproliferative activity of 1,25-Dihydroxyvitamin D (variable 2) in osteosarcoma cells?
Relationship research question
	- Defines trends, association, relationships, or interactions between dependent variable and independent variable
	Is there a relationship between the number of medical student suicide (dependent variable) and the level of medical student stress (independent variable) in Japan during the first wave of the COVID-19 pandemic?

Hypotheses in quantitative research

In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .

Quantitative research hypotheses
Simple hypothesis
	- Predicts relationship between single dependent variable and single independent variable
	If the dose of the new medication (single independent variable) is high, blood pressure (single dependent variable) is lowered.
Complex hypothesis
	- Foretells relationship between two or more independent and dependent variables
	The higher the use of anticancer drugs, radiation therapy, and adjunctive agents (3 independent variables), the higher would be the survival rate (1 dependent variable).
Directional hypothesis
	- Identifies study direction based on theory towards particular outcome to clarify relationship between variables
	Privately funded research projects will have a larger international scope (study direction) than publicly funded research projects.
Non-directional hypothesis
	- Nature of relationship between two variables or exact study direction is not identified
	- Does not involve a theory
	Women and men are different in terms of helpfulness. (Exact study direction is not identified)
Associative hypothesis
	- Describes variable interdependency
	- Change in one variable causes change in another variable
	A larger number of people vaccinated against COVID-19 in the region (change in independent variable) will reduce the region’s incidence of COVID-19 infection (change in dependent variable).
Causal hypothesis
	- An effect on dependent variable is predicted from manipulation of independent variable
	A change into a high-fiber diet (independent variable) will reduce the blood sugar level (dependent variable) of the patient.
Null hypothesis
	- A negative statement indicating no relationship or difference between 2 variables
	There is no significant difference in the severity of pulmonary metastases between the new drug (variable 1) and the current drug (variable 2).
Alternative hypothesis
	- Following a null hypothesis, an alternative hypothesis predicts a relationship between 2 study variables
	The new drug (variable 1) is better on average in reducing the level of pain from pulmonary metastasis than the current drug (variable 2).
Working hypothesis
	- A hypothesis that is initially accepted for further research to produce a feasible theory
	Dairy cows fed with concentrates of different formulations will produce different amounts of milk.
Statistical hypothesis
	- Assumption about the value of population parameter or relationship among several population characteristics
	- Validity tested by a statistical experiment or analysis
	The mean recovery rate from COVID-19 infection (value of population parameter) is not significantly different between population 1 and population 2.
	There is a positive correlation between the level of stress at the workplace and the number of suicides (population characteristics) among working people in Japan.
Logical hypothesis
	- Offers or proposes an explanation with limited or no extensive evidence
	If healthcare workers provide more educational programs about contraception methods, the number of adolescent pregnancies will be less.
Hypothesis-testing (Quantitative hypothesis-testing research)
	- Quantitative research uses deductive reasoning.
	- This involves the formation of a hypothesis, collection of data in the investigation of the problem, analysis and use of the data from the investigation, and drawing of conclusions to validate or nullify the hypotheses.

Research questions in qualitative research

Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15

There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .

Qualitative research questions
Contextual research question
	- Ask the nature of what already exists
	- Individuals or groups function to further clarify and understand the natural context of real-world problems
	What are the experiences of nurses working night shifts in healthcare during the COVID-19 pandemic? (natural context of real-world problems)
Descriptive research question
	- Aims to describe a phenomenon
	What are the different forms of disrespect and abuse (phenomenon) experienced by Tanzanian women when giving birth in healthcare facilities?
Evaluation research question
	- Examines the effectiveness of existing practice or accepted frameworks
	How effective are decision aids (effectiveness of existing practice) in helping decide whether to give birth at home or in a healthcare facility?
Explanatory research question
	- Clarifies a previously studied phenomenon and explains why it occurs
	Why is there an increase in teenage pregnancy (phenomenon) in Tanzania?
Exploratory research question
	- Explores areas that have not been fully investigated to have a deeper understanding of the research problem
	What factors affect the mental health of medical students (areas that have not yet been fully investigated) during the COVID-19 pandemic?
Generative research question
	- Develops an in-depth understanding of people’s behavior by asking ‘how would’ or ‘what if’ to identify problems and find solutions
	How would the extensive research experience of the behavior of new staff impact the success of the novel drug initiative?
Ideological research question
	- Aims to advance specific ideas or ideologies of a position
	Are Japanese nurses who volunteer in remote African hospitals able to promote humanized care of patients (specific ideas or ideologies) in the areas of safe patient environment, respect of patient privacy, and provision of accurate information related to health and care?
Ethnographic research question
	- Clarifies peoples’ nature, activities, their interactions, and the outcomes of their actions in specific settings
	What are the demographic characteristics, rehabilitative treatments, community interactions, and disease outcomes (nature, activities, their interactions, and the outcomes) of people in China who are suffering from pneumoconiosis?
Phenomenological research question
	- Knows more about the phenomena that have impacted an individual
	What are the lived experiences of parents who have been living with and caring for children with a diagnosis of autism? (phenomena that have impacted an individual)
Grounded theory question
	- Focuses on social processes asking about what happens and how people interact, or uncovering social relationships and behaviors of groups
	What are the problems that pregnant adolescents face in terms of social and cultural norms (social processes), and how can these be addressed?
Qualitative case study question
	- Assesses a phenomenon using different sources of data to answer “why” and “how” questions
	- Considers how the phenomenon is influenced by its contextual situation.
	How does quitting work and assuming the role of a full-time mother (phenomenon assessed) change the lives of women in Japan?

Qualitative research hypotheses
Hypothesis-generating (Qualitative hypothesis-generating research)
	- Qualitative research uses inductive reasoning.
	- This involves data collection from study participants or the literature regarding a phenomenon of interest, using the collected data to develop a formal hypothesis, and using the formal hypothesis as a framework for testing the hypothesis.
	- Qualitative exploratory studies explore areas deeper, clarifying subjective experience and allowing formulation of a formal hypothesis potentially testable in a future quantitative approach.

Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15

Hypotheses in qualitative research

Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1

FRAMEWORKS FOR DEVELOPING RESEARCH QUESTIONS AND HYPOTHESES

Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14

The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14

As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.

Variables	Unclear and weak statement (Statement 1)	Clear and good statement (Statement 2)	Points to avoid
Research question	Which is more effective between smoke moxibustion and smokeless moxibustion?	“Moreover, regarding smoke moxibustion versus smokeless moxibustion, it remains unclear which is more effective, safe, and acceptable to pregnant women, and whether there is any difference in the amount of heat generated.”	1) Vague and unfocused questions
			2) Closed questions simply answerable by yes or no
			3) Questions requiring a simple choice
Hypothesis	The smoke moxibustion group will have higher cephalic presentation.	“Hypothesis 1. The smoke moxibustion stick group (SM group) and smokeless moxibustion stick group (-SLM group) will have higher rates of cephalic presentation after treatment than the control group.	1) Unverifiable hypotheses
		Hypothesis 2. The SM group and SLM group will have higher rates of cephalic presentation at birth than the control group.	2) Incompletely stated groups of comparison
		Hypothesis 3. There will be no significant differences in the well-being of the mother and child among the three groups in terms of the following outcomes: premature birth, premature rupture of membranes (PROM) at < 37 weeks, Apgar score < 7 at 5 min, umbilical cord blood pH < 7.1, admission to neonatal intensive care unit (NICU), and intrauterine fetal death.”	3) Insufficiently described variables or outcomes
Research objective	To determine which is more effective between smoke moxibustion and smokeless moxibustion.	“The specific aims of this pilot study were (a) to compare the effects of smoke moxibustion and smokeless moxibustion treatments with the control group as a possible supplement to ECV for converting breech presentation to cephalic presentation and increasing adherence to the newly obtained cephalic position, and (b) to assess the effects of these treatments on the well-being of the mother and child.”	1) Poor understanding of the research question and hypotheses
Research objective			2) Insufficient description of population, variables, or study outcomes

a These statements were composed for comparison and illustrative purposes only.

b These statements are direct quotes from Higashihara and Horiuchi. 16

Variables	Unclear and weak statement (Statement 1)	Clear and good statement (Statement 2)	Points to avoid
Research question	Does disrespect and abuse (D&A) occur in childbirth in Tanzania?	How does disrespect and abuse (D&A) occur and what are the types of physical and psychological abuses observed in midwives’ actual care during facility-based childbirth in urban Tanzania?	1) Ambiguous or oversimplistic questions
Research question			2) Questions unverifiable by data collection and analysis
Hypothesis	Disrespect and abuse (D&A) occur in childbirth in Tanzania.	Hypothesis 1: Several types of physical and psychological abuse by midwives in actual care occur during facility-based childbirth in urban Tanzania.	1) Statements simply expressing facts
Hypothesis	Disrespect and abuse (D&A) occur in childbirth in Tanzania.	Hypothesis 2: Weak nursing and midwifery management contribute to the D&A of women during facility-based childbirth in urban Tanzania.	2) Insufficiently described concepts or variables
Research objective	To describe disrespect and abuse (D&A) in childbirth in Tanzania.	“This study aimed to describe from actual observations the respectful and disrespectful care received by women from midwives during their labor period in two hospitals in urban Tanzania.”	1) Statements unrelated to the research question and hypotheses
Research objective			2) Unattainable or unexplorable objectives

a This statement is a direct quote from Shimoda et al. 17

The other statements were composed for comparison and illustrative purposes only.

CONSTRUCTING RESEARCH QUESTIONS AND HYPOTHESES

To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g001.jpg

Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.

Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12

In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g002.jpg

EXAMPLES OF RESEARCH QUESTIONS FROM PUBLISHED ARTICLES

EXAMPLE 1. Descriptive research question (quantitative research)
- Presents research variables to be assessed (distinct phenotypes and subphenotypes)
“BACKGROUND: Since COVID-19 was identified, its clinical and biological heterogeneity has been recognized. Identifying COVID-19 phenotypes might help guide basic, clinical, and translational research efforts.
RESEARCH QUESTION: Does the clinical spectrum of patients with COVID-19 contain distinct phenotypes and subphenotypes? ” 19
EXAMPLE 2. Relationship research question (quantitative research)
- Shows interactions between dependent variable (static postural control) and independent variable (peripheral visual field loss)
“Background: Integration of visual, vestibular, and proprioceptive sensations contributes to postural control. People with peripheral visual field loss have serious postural instability. However, the directional specificity of postural stability and sensory reweighting caused by gradual peripheral visual field loss remain unclear.
Research question: What are the effects of peripheral visual field loss on static postural control ?” 20
EXAMPLE 3. Comparative research question (quantitative research)
- Clarifies the difference among groups with an outcome variable (patients enrolled in COMPERA with moderate PH or severe PH in COPD) and another group without the outcome variable (patients with idiopathic pulmonary arterial hypertension (IPAH))
“BACKGROUND: Pulmonary hypertension (PH) in COPD is a poorly investigated clinical condition.
RESEARCH QUESTION: Which factors determine the outcome of PH in COPD?
STUDY DESIGN AND METHODS: We analyzed the characteristics and outcome of patients enrolled in the Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) with moderate or severe PH in COPD as defined during the 6th PH World Symposium who received medical therapy for PH and compared them with patients with idiopathic pulmonary arterial hypertension (IPAH) .” 21
EXAMPLE 4. Exploratory research question (qualitative research)
- Explores areas that have not been fully investigated (perspectives of families and children who receive care in clinic-based child obesity treatment) to have a deeper understanding of the research problem
“Problem: Interventions for children with obesity lead to only modest improvements in BMI and long-term outcomes, and data are limited on the perspectives of families of children with obesity in clinic-based treatment. This scoping review seeks to answer the question: What is known about the perspectives of families and children who receive care in clinic-based child obesity treatment? This review aims to explore the scope of perspectives reported by families of children with obesity who have received individualized outpatient clinic-based obesity treatment.” 22
EXAMPLE 5. Relationship research question (quantitative research)
- Defines interactions between dependent variable (use of ankle strategies) and independent variable (changes in muscle tone)
“Background: To maintain an upright standing posture against external disturbances, the human body mainly employs two types of postural control strategies: “ankle strategy” and “hip strategy.” While it has been reported that the magnitude of the disturbance alters the use of postural control strategies, it has not been elucidated how the level of muscle tone, one of the crucial parameters of bodily function, determines the use of each strategy. We have previously confirmed using forward dynamics simulations of human musculoskeletal models that an increased muscle tone promotes the use of ankle strategies. The objective of the present study was to experimentally evaluate a hypothesis: an increased muscle tone promotes the use of ankle strategies. Research question: Do changes in the muscle tone affect the use of ankle strategies ?” 23

EXAMPLES OF HYPOTHESES IN PUBLISHED ARTICLES

EXAMPLE 1. Working hypothesis (quantitative research)
- A hypothesis that is initially accepted for further research to produce a feasible theory
“As fever may have benefit in shortening the duration of viral illness, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response when taken during the early stages of COVID-19 illness .” 24
“In conclusion, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response . The difference in perceived safety of these agents in COVID-19 illness could be related to the more potent efficacy to reduce fever with ibuprofen compared to acetaminophen. Compelling data on the benefit of fever warrant further research and review to determine when to treat or withhold ibuprofen for early stage fever for COVID-19 and other related viral illnesses .” 24
EXAMPLE 2. Exploratory hypothesis (qualitative research)
- Explores particular areas deeper to clarify subjective experience and develop a formal hypothesis potentially testable in a future quantitative approach
“We hypothesized that when thinking about a past experience of help-seeking, a self distancing prompt would cause increased help-seeking intentions and more favorable help-seeking outcome expectations .” 25
“Conclusion
Although a priori hypotheses were not supported, further research is warranted as results indicate the potential for using self-distancing approaches to increasing help-seeking among some people with depressive symptomatology.” 25
EXAMPLE 3. Hypothesis-generating research to establish a framework for hypothesis testing (qualitative research)
“We hypothesize that compassionate care is beneficial for patients (better outcomes), healthcare systems and payers (lower costs), and healthcare providers (lower burnout). ” 26
Compassionomics is the branch of knowledge and scientific study of the effects of compassionate healthcare. Our main hypotheses are that compassionate healthcare is beneficial for (1) patients, by improving clinical outcomes, (2) healthcare systems and payers, by supporting financial sustainability, and (3) HCPs, by lowering burnout and promoting resilience and well-being. The purpose of this paper is to establish a scientific framework for testing the hypotheses above . If these hypotheses are confirmed through rigorous research, compassionomics will belong in the science of evidence-based medicine, with major implications for all healthcare domains.” 26
EXAMPLE 4. Statistical hypothesis (quantitative research)
- An assumption is made about the relationship among several population characteristics ( gender differences in sociodemographic and clinical characteristics of adults with ADHD ). Validity is tested by statistical experiment or analysis ( chi-square test, Students t-test, and logistic regression analysis)
“Our research investigated gender differences in sociodemographic and clinical characteristics of adults with ADHD in a Japanese clinical sample. Due to unique Japanese cultural ideals and expectations of women's behavior that are in opposition to ADHD symptoms, we hypothesized that women with ADHD experience more difficulties and present more dysfunctions than men . We tested the following hypotheses: first, women with ADHD have more comorbidities than men with ADHD; second, women with ADHD experience more social hardships than men, such as having less full-time employment and being more likely to be divorced.” 27
“Statistical Analysis
( text omitted ) Between-gender comparisons were made using the chi-squared test for categorical variables and Students t-test for continuous variables…( text omitted ). A logistic regression analysis was performed for employment status, marital status, and comorbidity to evaluate the independent effects of gender on these dependent variables.” 27

EXAMPLES OF HYPOTHESIS AS WRITTEN IN PUBLISHED ARTICLES IN RELATION TO OTHER PARTS

EXAMPLE 1. Background, hypotheses, and aims are provided
“Pregnant women need skilled care during pregnancy and childbirth, but that skilled care is often delayed in some countries …( text omitted ). The focused antenatal care (FANC) model of WHO recommends that nurses provide information or counseling to all pregnant women …( text omitted ). Job aids are visual support materials that provide the right kind of information using graphics and words in a simple and yet effective manner. When nurses are not highly trained or have many work details to attend to, these job aids can serve as a content reminder for the nurses and can be used for educating their patients (Jennings, Yebadokpo, Affo, & Agbogbe, 2010) ( text omitted ). Importantly, additional evidence is needed to confirm how job aids can further improve the quality of ANC counseling by health workers in maternal care …( text omitted )” 28
“ This has led us to hypothesize that the quality of ANC counseling would be better if supported by job aids. Consequently, a better quality of ANC counseling is expected to produce higher levels of awareness concerning the danger signs of pregnancy and a more favorable impression of the caring behavior of nurses .” 28
“This study aimed to examine the differences in the responses of pregnant women to a job aid-supported intervention during ANC visit in terms of 1) their understanding of the danger signs of pregnancy and 2) their impression of the caring behaviors of nurses to pregnant women in rural Tanzania.” 28
EXAMPLE 2. Background, hypotheses, and aims are provided
“We conducted a two-arm randomized controlled trial (RCT) to evaluate and compare changes in salivary cortisol and oxytocin levels of first-time pregnant women between experimental and control groups. The women in the experimental group touched and held an infant for 30 min (experimental intervention protocol), whereas those in the control group watched a DVD movie of an infant (control intervention protocol). The primary outcome was salivary cortisol level and the secondary outcome was salivary oxytocin level.” 29
“ We hypothesize that at 30 min after touching and holding an infant, the salivary cortisol level will significantly decrease and the salivary oxytocin level will increase in the experimental group compared with the control group .” 29
EXAMPLE 3. Background, aim, and hypothesis are provided
“In countries where the maternal mortality ratio remains high, antenatal education to increase Birth Preparedness and Complication Readiness (BPCR) is considered one of the top priorities [1]. BPCR includes birth plans during the antenatal period, such as the birthplace, birth attendant, transportation, health facility for complications, expenses, and birth materials, as well as family coordination to achieve such birth plans. In Tanzania, although increasing, only about half of all pregnant women attend an antenatal clinic more than four times [4]. Moreover, the information provided during antenatal care (ANC) is insufficient. In the resource-poor settings, antenatal group education is a potential approach because of the limited time for individual counseling at antenatal clinics.” 30
“This study aimed to evaluate an antenatal group education program among pregnant women and their families with respect to birth-preparedness and maternal and infant outcomes in rural villages of Tanzania.” 30
“ The study hypothesis was if Tanzanian pregnant women and their families received a family-oriented antenatal group education, they would (1) have a higher level of BPCR, (2) attend antenatal clinic four or more times, (3) give birth in a health facility, (4) have less complications of women at birth, and (5) have less complications and deaths of infants than those who did not receive the education .” 30

Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.

Disclosure: The authors have no potential conflicts of interest to disclose.

Author Contributions:

Conceptualization: Barroga E, Matanguihan GJ.
Methodology: Barroga E, Matanguihan GJ.
Writing - original draft: Barroga E, Matanguihan GJ.
Writing - review & editing: Barroga E, Matanguihan GJ.

Hypothesis testing

When interpreting research findings, researchers need to assess whether these findings may have occurred by chance. Hypothesis testing is a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population.

Hypothesis testing uses sample data to evaluate a hypothesis about a population . A hypothesis test assesses how unusual the result is, whether it is reasonable chance variation or whether the result is too extreme to be considered chance variation.

Basic concepts

Null and research hypothesis

Probability value and types of errors

Effect size and statistical significance.

Directional and non-directional hypotheses

Null and research hypotheses

To carry out statistical hypothesis testing, research and null hypothesis are employed:

Research hypothesis : this is the hypothesis that you propose, also known as the alternative hypothesis HA. For example:

H A: There is a relationship between intelligence and academic results.

H A: First year university students obtain higher grades after an intensive Statistics course.

H A; Males and females differ in their levels of stress.

The null hypothesis (H o ) is the opposite of the research hypothesis and expresses that there is no relationship between variables, or no differences between groups; for example:

H o : There is no relationship between intelligence and academic results.

H o: First year university students do not obtain higher grades after an intensive Statistics course.

H o : Males and females will not differ in their levels of stress.

The purpose of hypothesis testing is to test whether the null hypothesis (there is no difference, no effect) can be rejected or approved. If the null hypothesis is rejected, then the research hypothesis can be accepted. If the null hypothesis is accepted, then the research hypothesis is rejected.

In hypothesis testing, a value is set to assess whether the null hypothesis is accepted or rejected and whether the result is statistically significant:

A critical value is the score the sample would need to decide against the null hypothesis.
A probability value is used to assess the significance of the statistical test. If the null hypothesis is rejected, then the alternative to the null hypothesis is accepted.

The probability value, or p value , is the probability of an outcome or research result given the hypothesis. Usually, the probability value is set at 0.05: the null hypothesis will be rejected if the probability value of the statistical test is less than 0.05. There are two types of errors associated to hypothesis testing:

What if we observe a difference – but none exists in the population?
What if we do not find a difference – but it does exist in the population?

These situations are known as Type I and Type II errors:

Type I Error: is the type of error that involves the rejection of a null hypothesis that is actually true (i.e. a false positive).
Type II Error: is the type of error that occurs when we do not reject a null hypothesis that is false (i.e. a false negative).

hypothesis testing process and types of errors

These errors cannot be eliminated; they can be minimised, but minimising one type of error will increase the probability of committing the other type.

The probability of making a Type I error depends on the criterion that is used to accept or reject the null hypothesis: the p value or alpha level . The alpha is set by the researcher, usually at .05, and is the chance the researcher is willing to take and still claim the significance of the statistical test.). Choosing a smaller alpha level will decrease the likelihood of committing Type I error.

For example, p<0.05 indicates that there are 5 chances in 100 that the difference observed was really due to sampling error – that 5% of the time a Type I error will occur or that there is a 5% chance that the opposite of the null hypothesis is actually true.

With a p<0.01, there will be 1 chance in 100 that the difference observed was really due to sampling error – 1% of the time a Type I error will occur.

The p level is specified before analysing the data. If the data analysis results in a probability value below the α (alpha) level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected.

When the null hypothesis is rejected, the effect is said to be statistically significant. However, statistical significance does not mean that the effect is important.

A result can be statistically significant, but the effect size may be small. Finding that an effect is significant does not provide information about how large or important the effect is. In fact, a small effect can be statistically significant if the sample size is large enough.

Information about the effect size, or magnitude of the result, is given by the statistical test. For example, the strength of the correlation between two variables is given by the coefficient of correlation, which varies from 0 to 1.

A hypothesis that states that students who attend an intensive Statistics course will obtain higher grades than students who do not attend would be directional.
A non-directional hypothesis states that there will be differences between students who attend do or don’t attend an intensive Statistics course, but we don’t know what group will get higher grades than the other. The hypothesis only states that they will obtain different grades.

The hypothesis testing process

The hypothesis testing process can be divided into five steps:

Restate the research question as research hypothesis and a null hypothesis about the populations.
Determine the characteristics of the comparison distribution.
Determine the cut off sample score on the comparison distribution at which the null hypothesis should be rejected.
Determine your sample’s score on the comparison distribution.
Decide whether to reject the null hypothesis.

This example illustrates how these five steps can be applied to text a hypothesis:

Let’s say that you conduct an experiment to investigate whether students’ ability to memorise words improves after they have consumed caffeine.
The experiment involves two groups of students: the first group consumes caffeine; the second group drinks water.
Both groups complete a memory test.
A randomly selected individual in the experimental condition (i.e. the group that consumes caffeine) has a score of 27 on the memory test. The scores of people in general on this memory measure are normally distributed with a mean of 19 and a standard deviation of 4.
The researcher predicts an effect (differences in memory for these groups) but does not predict a particular direction of effect (i.e. which group will have higher scores on the memory test). Using the 5% significance level, what should you conclude?

Step 1 : There are two populations of interest.

Population 1: People who go through the experimental procedure (drink coffee).

Population 2: People who do not go through the experimental procedure (drink water).

Research hypothesis: Population 1 will score differently from Population 2.
Null hypothesis: There will be no difference between the two populations.

Step 2 : We know that the characteristics of the comparison distribution (student population) are:

Population M = 19, Population SD= 4, normally distributed. These are the mean and standard deviation of the distribution of scores on the memory test for the general student population.

Step 3 : For a two-tailed test (the direction of the effect is not specified) at the 5% level (25% at each tail), the cut off sample scores are +1.96 and -1.99.

Step 4 : Your sample score of 27 needs to be converted into a Z value. To calculate Z = (27-19)/4= 2 ( check the Converting into Z scores section if you need to review how to do this process)

Step 5 : A ‘Z’ score of 2 is more extreme than the cut off Z of +1.96 (see figure above). The result is significant and, thus, the null hypothesis is rejected.

You can find more examples here:

Statistics (RMIT Learning Lab)

Some commonly used statistical techniques

Correlation analysis, multiple regression.

Analysis of variance

Chi-square test for independence

Correlation analysis explores the association between variables . The purpose of correlational analysis is to discover whether there is a relationship between variables, which is unlikely to occur by sampling error. The null hypothesis is that there is no relationship between the two variables. Correlation analysis provides information about:

The direction of the relationship: positive or negative- given by the sign of the correlation coefficient.
The strength or magnitude of the relationship between the two variables- given by the correlation coefficient, which varies from 0 (no relationship between the variables) to 1 (perfect relationship between the variables).
Direction of the relationship.

A positive correlation indicates that high scores on one variable are associated with high scores on the other variable; low scores on one variable are associated with low scores on the second variable . For instance, in the figure below, higher scores on negative affect are associated with higher scores on perceived stress

A negative correlation indicates that high scores on one variable are associated with low scores on the other variable. The graph shows that a person who scores high on perceived stress will probably score low on mastery. The slope of the graph is downwards- as it moves to the right. In the figure below, higher scores on mastery are associated with lower scores on perceived stress.

Fig 2. Negative correlation between two variables. Adapted from Pallant, J. (2013). SPSS survival manual: A step by step guide to data analysis using IBM SPSS (5th ed.). Sydney, Melbourne, Auckland, London: Allen & Unwin

2. The strength or magnitude of the relationship

The strength of a linear relationship between two variables is measured by a statistic known as the correlation coefficient , which varies from 0 to -1, and from 0 to +1. There are several correlation coefficients; the most widely used are Pearson’s r and Spearman’s rho. The strength of the relationship is interpreted as follows:

Small/weak: r= .10 to .29
Medium/moderate: r= .30 to .49
Large/strong: r= .50 to 1

It is important to note that correlation analysis does not imply causality. Correlation is used to explore the association between variables, however, it does not indicate that one variable causes the other. The correlation between two variables could be due to the fact that a third variable is affecting the two variables.

Multiple regression is an extension of correlation analysis. Multiple regression is used to explore the relationship between one dependent variable and a number of independent variables or predictors . The purpose of a multiple regression model is to predict values of a dependent variable based on the values of the independent variables or predictors. For example, a researcher may be interested in predicting students’ academic success (e.g. grades) based on a number of predictors, for example, hours spent studying, satisfaction with studies, relationships with peers and lecturers.

A multiple regression model can be conducted using statistical software (e.g. SPSS). The software will test the significance of the model (i.e. does the model significantly predicts scores on the dependent variable using the independent variables introduced in the model?), how much of the variance in the dependent variable is explained by the model, and the individual contribution of each independent variable.

Example of multiple regression model

From Dunn et al. (2014). Influence of academic self-regulation, critical thinking, and age on online graduate students' academic help-seeking.

In this model, help-seeking is the dependent variable; there are three independent variables or predictors. The coefficients show the direction (positive or negative) and magnitude of the relationship between each predictor and the dependent variable. The model was statistically significant and predicted 13.5% of the variance in help-seeking.

t-Tests are employed to compare the mean score on some continuous variable for two groups . The null hypothesis to be tested is there are no differences between the two groups (e.g. anxiety scores for males and females are not different).

If the significance value of the t-test is equal or less than .05, there is a significant difference in the mean scores on the variable of interest for each of the two groups. If the value is above .05, there is no significant difference between the groups.

t-Tests can be employed to compare the mean scores of two different groups (independent-samples t-test ) or to compare the same group of people on two different occasions ( paired-samples t-test) .

In addition to assessing whether the difference between the two groups is statistically significant, it is important to consider the effect size or magnitude of the difference between the groups. The effect size is given by partial eta squared (proportion of variance of the dependent variable that is explained by the independent variable) and Cohen’s d (difference between groups in terms of standard deviation units).

In this example, an independent samples t-test was conducted to assess whether males and females differ in their perceived anxiety levels. The significance of the test is .004. Since this value is less than .05, we can conclude that there is a statistically significant difference between males and females in their perceived anxiety levels.

Whilst t-tests compare the mean score on one variable for two groups, analysis of variance is used to test more than two groups . Following the previous example, analysis of variance would be employed to test whether there are differences in anxiety scores for students from different disciplines.

Analysis of variance compare the variance (variability in scores) between the different groups (believed to be due to the independent variable) with the variability within each group (believed to be due to chance). An F ratio is calculated; a large F ratio indicates that there is more variability between the groups (caused by the independent variable) than there is within each group (error term). A significant F test indicates that we can reject the null hypothesis; i.e. that there is no difference between the groups.

Again, effect size statistics such as Cohen’s d and eta squared are employed to assess the magnitude of the differences between groups.

In this example, we examined differences in perceived anxiety between students from different disciplines. The results of the Anova Test show that the significance level is .005. Since this value is below .05, we can conclude that there are statistically significant differences between students from different disciplines in their perceived anxiety levels.

Chi-square test for independence is used to explore the relationship between two categorical variables. Each variable can have two or more categories.

For example, a researcher can use a Chi-square test for independence to assess the relationship between study disciplines (e.g. Psychology, Business, Education,…) and help-seeking behaviour (Yes/No). The test compares the observed frequencies of cases with the values that would be expected if there was no association between the two variables of interest. A statistically significant Chi-square test indicates that the two variables are associated (e.g. Psychology students are more likely to seek help than Business students). The effect size is assessed using effect size statistics: Phi and Cramer’s V .

In this example, a Chi-square test was conducted to assess whether males and females differ in their help-seeking behaviour (Yes/No). The crosstabulation table shows the percentage of males of females who sought/didn't seek help. The table 'Chi square tests' shows the significance of the test (Pearson Chi square asymp sig: .482). Since this value is above .05, we conclude that there is no statistically significant difference between males and females in their help-seeking behaviour.

Chi-square test results obtained using SPSS

<< Previous: Probability and the normal distribution
Next: Statistical techniques >>

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 3: Hypothesis Testing

The previous two chapters introduced methods for organizing and summarizing sample data, and using sample statistics to estimate population parameters. This chapter introduces the next major topic of inferential statistics: hypothesis testing.

A hypothesis is a statement or claim about a property of a population.

The Fundamentals of Hypothesis Testing

When conducting scientific research, typically there is some known information, perhaps from some past work or from a long accepted idea. We want to test whether this claim is believable. This is the basic idea behind a hypothesis test:

State what we think is true.
Quantify how confident we are about our claim.
Use sample statistics to make inferences about population parameters.

For example, past research tells us that the average life span for a hummingbird is about four years. You have been studying the hummingbirds in the southeastern United States and find a sample mean lifespan of 4.8 years. Should you reject the known or accepted information in favor of your results? How confident are you in your estimate? At what point would you say that there is enough evidence to reject the known information and support your alternative claim? How far from the known mean of four years can the sample mean be before we reject the idea that the average lifespan of a hummingbird is four years?

Hypothesis testing is a procedure, based on sample evidence and probability, used to test claims regarding a characteristic of a population.

A hypothesis is a claim or statement about a characteristic of a population of interest to us. A hypothesis test is a way for us to use our sample statistics to test a specific claim.

The population mean weight is known to be 157 lb. We want to test the claim that the mean weight has increased.

Two years ago, the proportion of infected plants was 37%. We believe that a treatment has helped, and we want to test the claim that there has been a reduction in the proportion of infected plants.

Components of a Formal Hypothesis Test

The null hypothesis is a statement about the value of a population parameter, such as the population mean (µ) or the population proportion ( p ). It contains the condition of equality and is denoted as H 0 (H-naught).

H 0 : µ = 157 or H 0 : p = 0.37

The alternative hypothesis is the claim to be tested, the opposite of the null hypothesis. It contains the value of the parameter that we consider plausible and is denoted as H 1 .

H 1 : µ > 157 or H 1 : p ≠ 0.37

The test statistic is a value computed from the sample data that is used in making a decision about the rejection of the null hypothesis. The test statistic converts the sample mean ( x̄ ) or sample proportion ( p̂ ) to a Z- or t-score under the assumption that the null hypothesis is true . It is used to decide whether the difference between the sample statistic and the hypothesized claim is significant.

The p-value is the area under the curve to the left or right of the test statistic. It is compared to the level of significance ( α ).

The critical value is the value that defines the rejection zone (the test statistic values that would lead to rejection of the null hypothesis). It is defined by the level of significance.

The level of significance ( α ) is the probability that the test statistic will fall into the critical region when the null hypothesis is true. This level is set by the researcher.

The conclusion is the final decision of the hypothesis test. The conclusion must always be clearly stated, communicating the decision based on the components of the test. It is important to realize that we never prove or accept the null hypothesis. We are merely saying that the sample evidence is not strong enough to warrant the rejection of the null hypothesis. The conclusion is made up of two parts:

1) Reject or fail to reject the null hypothesis, and 2) there is or is not enough evidence to support the alternative claim.

Option 1) Reject the null hypothesis (H 0 ). This means that you have enough statistical evidence to support the alternative claim (H 1 ).

Option 2) Fail to reject the null hypothesis (H 0 ). This means that you do NOT have enough evidence to support the alternative claim (H 1 ).

Another way to think about hypothesis testing is to compare it to the US justice system. A defendant is innocent until proven guilty (Null hypothesis—innocent). The prosecuting attorney tries to prove that the defendant is guilty (Alternative hypothesis—guilty). There are two possible conclusions that the jury can reach. First, the defendant is guilty (Reject the null hypothesis). Second, the defendant is not guilty (Fail to reject the null hypothesis). This is NOT the same thing as saying the defendant is innocent! In the first case, the prosecutor had enough evidence to reject the null hypothesis (innocent) and support the alternative claim (guilty). In the second case, the prosecutor did NOT have enough evidence to reject the null hypothesis (innocent) and support the alternative claim of guilty.

The Null and Alternative Hypotheses

There are three different pairs of null and alternative hypotheses:

where c is some known value.

A Two-sided Test

This tests whether the population parameter is equal to, versus not equal to, some specific value.

H o : μ = 12 vs. H 1 : μ ≠ 12

The critical region is divided equally into the two tails and the critical values are ± values that define the rejection zones.

A forester studying diameter growth of red pine believes that the mean diameter growth will be different if a fertilization treatment is applied to the stand.

H o : μ = 1.2 in./ year
H 1 : μ ≠ 1.2 in./ year

This is a two-sided question, as the forester doesn’t state whether population mean diameter growth will increase or decrease.

A Right-sided Test

This tests whether the population parameter is equal to, versus greater than, some specific value.

H o : μ = 12 vs. H 1 : μ > 12

The critical region is in the right tail and the critical value is a positive value that defines the rejection zone.

A biologist believes that there has been an increase in the mean number of lakes infected with milfoil, an invasive species, since the last study five years ago.

H o : μ = 15 lakes
H 1 : μ >15 lakes

This is a right-sided question, as the biologist believes that there has been an increase in population mean number of infected lakes.

A Left-sided Test

This tests whether the population parameter is equal to, versus less than, some specific value.

H o : μ = 12 vs. H 1 : μ < 12

The critical region is in the left tail and the critical value is a negative value that defines the rejection zone.

A scientist’s research indicates that there has been a change in the proportion of people who support certain environmental policies. He wants to test the claim that there has been a reduction in the proportion of people who support these policies.

H o : p = 0.57
H 1 : p < 0.57

This is a left-sided question, as the scientist believes that there has been a reduction in the true population proportion.

Statistically Significant

When the observed results (the sample statistics) are unlikely (a low probability) under the assumption that the null hypothesis is true, we say that the result is statistically significant, and we reject the null hypothesis. This result depends on the level of significance, the sample statistic, sample size, and whether it is a one- or two-sided alternative hypothesis.

Types of Errors

When testing, we arrive at a conclusion of rejecting the null hypothesis or failing to reject the null hypothesis. Such conclusions are sometimes correct and sometimes incorrect (even when we have followed all the correct procedures). We use incomplete sample data to reach a conclusion and there is always the possibility of reaching the wrong conclusion. There are four possible conclusions to reach from hypothesis testing. Of the four possible outcomes, two are correct and two are NOT correct.

A Type I error is when we reject the null hypothesis when it is true. The symbol α (alpha) is used to represent Type I errors. This is the same alpha we use as the level of significance. By setting alpha as low as reasonably possible, we try to control the Type I error through the level of significance.

A Type II error is when we fail to reject the null hypothesis when it is false. The symbol β (beta) is used to represent Type II errors.

In general, Type I errors are considered more serious. One step in the hypothesis test procedure involves selecting the significance level ( α ), which is the probability of rejecting the null hypothesis when it is correct. So the researcher can select the level of significance that minimizes Type I errors. However, there is a mathematical relationship between α, β , and n (sample size).

As α increases, β decreases
As α decreases, β increases
As sample size increases (n), both α and β decrease

The natural inclination is to select the smallest possible value for α, thinking to minimize the possibility of causing a Type I error. Unfortunately, this forces an increase in Type II errors. By making the rejection zone too small, you may fail to reject the null hypothesis, when, in fact, it is false. Typically, we select the best sample size and level of significance, automatically setting β .

Power of the Test

A Type II error ( β ) is the probability of failing to reject a false null hypothesis. It follows that 1- β is the probability of rejecting a false null hypothesis. This probability is identified as the power of the test, and is often used to gauge the test’s effectiveness in recognizing that a null hypothesis is false.

The probability that at a fixed level α significance test will reject H 0 , when a particular alternative value of the parameter is true is called the power of the test.

Power is also directly linked to sample size. For example, suppose the null hypothesis is that the mean fish weight is 8.7 lb. Given sample data, a level of significance of 5%, and an alternative weight of 9.2 lb., we can compute the power of the test to reject μ = 8.7 lb. If we have a small sample size, the power will be low. However, increasing the sample size will increase the power of the test. Increasing the level of significance will also increase power. A 5% test of significance will have a greater chance of rejecting the null hypothesis than a 1% test because the strength of evidence required for the rejection is less. Decreasing the standard deviation has the same effect as increasing the sample size: there is more information about μ .

Hypothesis Test about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Known

We are going to examine two equivalent ways to perform a hypothesis test: the classical approach and the p-value approach. The classical approach is based on standard deviations. This method compares the test statistic (Z-score) to a critical value (Z-score) from the standard normal table. If the test statistic falls in the rejection zone, you reject the null hypothesis. The p-value approach is based on area under the normal curve. This method compares the area associated with the test statistic to alpha ( α ), the level of significance (which is also area under the normal curve). If the p-value is less than alpha, you would reject the null hypothesis.

As a past student poetically said: If the p-value is a wee value, Reject Ho

Both methods must have:

Data from a random sample.
Verification of the assumption of normality.
A null and alternative hypothesis.
A criterion that determines if we reject or fail to reject the null hypothesis.
A conclusion that answers the question.

There are four steps required for a hypothesis test:

State the null and alternative hypotheses.
State the level of significance and the critical value.
Compute the test statistic.
State a conclusion.

The Classical Method for Testing a Claim about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Known

A forester studying diameter growth of red pine believes that the mean diameter growth will be different from the known mean growth of 1.35 inches/year if a fertilization treatment is applied to the stand. He conducts his experiment, collects data from a sample of 32 plots, and gets a sample mean diameter growth of 1.6 in./year. The population standard deviation for this stand is known to be 0.46 in./year. Does he have enough evidence to support his claim?

Step 1) State the null and alternative hypotheses.

H o : μ = 1.35 in./year
H 1 : μ ≠ 1.35 in./year

Step 2) State the level of significance and the critical value.

We will choose a level of significance of 5% ( α = 0.05).
For a two-sided question, we need a two-sided critical value – Z α /2 and + Z α /2 .
The level of significance is divided by 2 (since we are only testing “not equal”). We must have two rejection zones that can deal with either a greater than or less than outcome (to the right (+) or to the left (-)).
We need to find the Z-score associated with the area of 0.025. The red areas are equal to α /2 = 0.05/2 = 0.025 or 2.5% of the area under the normal curve.
Go into the body of values and find the negative Z-score associated with the area 0.025.

The negative critical value is -1.96. Since the curve is symmetric, we know that the positive critical value is 1.96.
±1.96 are the critical values. These values set up the rejection zone. If the test statistic falls within these red rejection zones, we reject the null hypothesis.

Step 3) Compute the test statistic.

The test statistic is the number of standard deviations the sample mean is from the known mean. It is also a Z-score, just like the critical value.

For this problem, the test statistic is

Step 4) State a conclusion.

Compare the test statistic to the critical value. If the test statistic falls into the rejection zones, reject the null hypothesis. In other words, if the test statistic is greater than +1.96 or less than -1.96, reject the null hypothesis.

In this problem, the test statistic falls in the red rejection zone. The test statistic of 3.07 is greater than the critical value of 1.96.We will reject the null hypothesis. We have enough evidence to support the claim that the mean diameter growth is different from (not equal to) 1.35 in./year.

A researcher believes that there has been an increase in the average farm size in his state since the last study five years ago. The previous study reported a mean size of 450 acres with a population standard deviation ( σ ) of 167 acres. He samples 45 farms and gets a sample mean of 485.8 acres. Is there enough information to support his claim?

H o : μ = 450 acres
H 1 : μ >450 acres
For a one-sided question, we need a one-sided positive critical value Z α .
The level of significance is all in the right side (the rejection zone is just on the right side).
We need to find the Z-score associated with the 5% area in the right tail.

Go into the body of values in the standard normal table and find the Z-score that separates the lower 95% from the upper 5%.
The critical value is 1.645. This value sets up the rejection zone.

Compare the test statistic to the critical value.

The test statistic does not fall in the rejection zone. It is less than the critical value.

We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean farm size has increased from 450 acres.

A researcher believes that there has been a reduction in the mean number of hours that college students spend preparing for final exams. A national study stated that students at a 4-year college spend an average of 23 hours preparing for 5 final exams each semester with a population standard deviation of 7.3 hours. The researcher sampled 227 students and found a sample mean study time of 19.6 hours. Does this indicate that the average study time for final exams has decreased? Use a 1% level of significance to test this claim.

H o : μ = 23 hours
H 1 : μ < 23 hours
This is a left-sided test so alpha (0.01) is all in the left tail.

Go into the body of values in the standard normal table and find the Z-score that defines the lower 1% of the area.
The critical value is -2.33. This value sets up the rejection zone.

The test statistic falls in the rejection zone. The test statistic of -7.02 is less than the critical value of -2.33.

We reject the null hypothesis. We have sufficient evidence to support the claim that the mean final exam study time has decreased below 23 hours.

Testing a Hypothesis using P-values

The p-value is the probability of observing our sample mean given that the null hypothesis is true. It is the area under the curve to the left or right of the test statistic. If the probability of observing such a sample mean is very small (less than the level of significance), we would reject the null hypothesis. Computations for the p-value depend on whether it is a one- or two-sided test.

Steps for a hypothesis test using p-values:

State the level of significance.
Compute the test statistic and find the area associated with it (this is the p-value).
Compare the p-value to alpha ( α ) and state a conclusion.

Instead of comparing Z-score test statistic to Z-score critical value, as in the classical method, we compare area of the test statistic to area of the level of significance.

The Decision Rule: If the p-value is less than alpha, we reject the null hypothesis

Computing P-values

If it is a two-sided test (the alternative claim is ≠), the p-value is equal to two times the probability of the absolute value of the test statistic. If the test is a left-sided test (the alternative claim is “<”), then the p-value is equal to the area to the left of the test statistic. If the test is a right-sided test (the alternative claim is “>”), then the p-value is equal to the area to the right of the test statistic.

Let’s look at Example 6 again.

A forester studying diameter growth of red pine believes that the mean diameter growth will be different from the known mean growth of 1.35 in./year if a fertilization treatment is applied to the stand. He conducts his experiment, collects data from a sample of 32 plots, and gets a sample mean diameter growth of 1.6 in./year. The population standard deviation for this stand is known to be 0.46 in./year. Does he have enough evidence to support his claim?

Step 2) State the level of significance.

For this problem, the test statistic is:

The p-value is two times the area of the absolute value of the test statistic (because the alternative claim is “not equal”).

Look up the area for the Z-score 3.07 in the standard normal table. The area (probability) is equal to 1 – 0.9989 = 0.0011.
Multiply this by 2 to get the p-value = 2 * 0.0011 = 0.0022.

Step 4) Compare the p-value to alpha and state a conclusion.

Use the Decision Rule (if the p-value is less than α , reject H 0 ).
In this problem, the p-value (0.0022) is less than alpha (0.05).
We reject the H 0 . We have enough evidence to support the claim that the mean diameter growth is different from 1.35 inches/year.

Let’s look at Example 7 again.

The p-value is the area to the right of the Z-score 1.44 (the hatched area).

This is equal to 1 – 0.9251 = 0.0749.
The p-value is 0.0749.

Use the Decision Rule.
In this problem, the p-value (0.0749) is greater than alpha (0.05), so we Fail to Reject the H 0 .
The area of the test statistic is greater than the area of alpha ( α ).

We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean farm size has increased.

Let’s look at Example 8 again.

H 0 : μ = 23 hours

The p-value is the area to the left of the test statistic (the little black area to the left of -7.02). The Z-score of -7.02 is not on the standard normal table. The smallest probability on the table is 0.0002. We know that the area for the Z-score -7.02 is smaller than this area (probability). Therefore, the p-value is <0.0002.

In this problem, the p-value (p<0.0002) is less than alpha (0.01), so we Reject the H 0 .
The area of the test statistic is much less than the area of alpha ( α ).

We reject the null hypothesis. We have enough evidence to support the claim that the mean final exam study time has decreased below 23 hours.

Both the classical method and p-value method for testing a hypothesis will arrive at the same conclusion. In the classical method, the critical Z-score is the number on the z-axis that defines the level of significance ( α ). The test statistic converts the sample mean to units of standard deviation (a Z-score). If the test statistic falls in the rejection zone defined by the critical value, we will reject the null hypothesis. In this approach, two Z-scores, which are numbers on the z-axis, are compared. In the p-value approach, the p-value is the area associated with the test statistic. In this method, we compare α (which is also area under the curve) to the p-value. If the p-value is less than α , we reject the null hypothesis. The p-value is the probability of observing such a sample mean when the null hypothesis is true. If the probability is too small (less than the level of significance), then we believe we have enough statistical evidence to reject the null hypothesis and support the alternative claim.

Software Solutions

(referring to Ex. 8)

One-Sample Z

Test of mu = 23 vs. < 23

The assumed standard deviation = 7.3

99% Upper
N	Mean	SE Mean	Bound	Z	P
227	19.600	0.485	20.727	-7.02	0.000

Excel does not offer 1-sample hypothesis testing.

Hypothesis Test about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Unknown

Frequently, the population standard deviation (σ) is not known. We can estimate the population standard deviation (σ) with the sample standard deviation (s). However, the test statistic will no longer follow the standard normal distribution. We must rely on the student’s t-distribution with n-1 degrees of freedom. Because we use the sample standard deviation (s), the test statistic will change from a Z-score to a t-score.

Steps for a hypothesis test are the same that we covered in Section 2.

Just as with the hypothesis test from the previous section, the data for this test must be from a random sample and requires either that the population from which the sample was drawn be normal or that the sample size is sufficiently large (n≥30). A t-test is robust, so small departures from normality will not adversely affect the results of the test. That being said, if the sample size is smaller than 30, it is always good to verify the assumption of normality through a normal probability plot.

We will still have the same three pairs of null and alternative hypotheses and we can still use either the classical approach or the p-value approach.

Selecting the correct critical value from the student’s t-distribution table depends on three factors: the type of test (one-sided or two-sided alternative hypothesis), the sample size, and the level of significance.

For a two-sided test (“not equal” alternative hypothesis), the critical value (t α /2 ), is determined by alpha ( α ), the level of significance, divided by two, to deal with the possibility that the result could be less than OR greater than the known value.

If your level of significance was 0.05, you would use the 0.025 column to find the correct critical value (0.05/2 = 0.025).
If your level of significance was 0.01, you would use the 0.005 column to find the correct critical value (0.01/2 = 0.005).

For a one-sided test (“a less than” or “greater than” alternative hypothesis), the critical value (t α ) , is determined by alpha ( α ), the level of significance, being all in the one side.

If your level of significance was 0.05, you would use the 0.05 column to find the correct critical value for either a left or right-side question. If you are asking a “less than” (left-sided question, your critical value will be negative. If you are asking a “greater than” (right-sided question), your critical value will be positive.

Find the critical value you would use to test the claim that μ ≠ 112 with a sample size of 18 and a 5% level of significance.

In this case, the critical value (t α /2 ) would be 2.110. This is a two-sided question (≠) so you would divide alpha by 2 (0.05/2 = 0.025) and go down the 0.025 column to 17 degrees of freedom.

What would the critical value be if you wanted to test that μ < 112 for the same data?

In this case, the critical value would be 1.740. This is a one-sided question (<) so alpha would be divided by 1 (0.05/1 = 0.05). You would go down the 0.05 column with 17 degrees of freedom to get the correct critical value.

In 2005, the mean pH level of rain in a county in northern New York was 5.41. A biologist believes that the rain acidity has changed. He takes a random sample of 11 rain dates in 2010 and obtains the following data. Use a 1% level of significance to test his claim.

4.70, 5.63, 5.02, 5.78, 4.99, 5.91, 5.76, 5.54, 5.25, 5.18, 5.01

The sample size is small and we don’t know anything about the distribution of the population, so we examine a normal probability plot. The distribution looks normal so we will continue with our test.

The sample mean is 5.343 with a sample standard deviation of 0.397.

H o : μ = 5.41
H 1 : μ ≠ 5.41
This is a two-sided question so alpha is divided by two.

t α /2 is found by going down the 0.005 column with 14 degrees of freedom.
t α /2 = ±3.169.
The test statistic is a t-score.

The test statistic does not fall in the rejection zone.

We will fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean rain pH has changed.

A One-sided Test

Cadmium, a heavy metal, is toxic to animals. Mushrooms, however, are able to absorb and accumulate cadmium at high concentrations. The government has set safety limits for cadmium in dry vegetables at 0.5 ppm. Biologists believe that the mean level of cadmium in mushrooms growing near strip mines is greater than the recommended limit of 0.5 ppm, negatively impacting the animals that live in this ecosystem. A random sample of 51 mushrooms gave a sample mean of 0.59 ppm with a sample standard deviation of 0.29 ppm. Use a 5% level of significance to test the claim that the mean cadmium level is greater than the acceptable limit of 0.5 ppm.

The sample size is greater than 30 so we are assured of a normal distribution of the means.

H o : μ = 0.5 ppm
H 1 : μ > 0.5 ppm
This is a right-sided question so alpha is all in the right tail.

t α is found by going down the 0.05 column with 50 degrees of freedom.
t α = 1.676

Step 4) State a Conclusion.

The test statistic falls in the rejection zone. We will reject the null hypothesis. We have enough evidence to support the claim that the mean cadmium level is greater than the acceptable safe limit.

BUT, what happens if the significance level changes to 1%?

The critical value is now found by going down the 0.01 column with 50 degrees of freedom. The critical value is 2.403. The test statistic is now LESS THAN the critical value. The test statistic does not fall in the rejection zone. The conclusion will change. We do NOT have enough evidence to support the claim that the mean cadmium level is greater than the acceptable safe limit of 0.5 ppm.

The level of significance is the probability that you, as the researcher, set to decide if there is enough statistical evidence to support the alternative claim. It should be set before the experiment begins.

P-value Approach

We can also use the p-value approach for a hypothesis test about the mean when the population standard deviation ( σ ) is unknown. However, when using a student’s t-table, we can only estimate the range of the p-value, not a specific value as when using the standard normal table. The student’s t-table has area (probability) across the top row in the table, with t-scores in the body of the table.

To find the p-value (the area associated with the test statistic), you would go to the row with the number of degrees of freedom.
Go across that row until you find the two values that your test statistic is between, then go up those columns to find the estimated range for the p-value.

Estimating P-value from a Student’s T-table

If your test statistic is 3.789 with 3 degrees of freedom, you would go across the 3 df row. The value 3.789 falls between the values 3.482 and 4.541 in that row. Therefore, the p-value is between 0.02 and 0.01. The p-value will be greater than 0.01 but less than 0.02 (0.01<p<0.02).

If your level of significance is 5%, you would reject the null hypothesis as the p-value (0.01-0.02) is less than alpha ( α ) of 0.05.

If your level of significance is 1%, you would fail to reject the null hypothesis as the p-value (0.01-0.02) is greater than alpha ( α ) of 0.01.

Software packages typically output p-values. It is easy to use the Decision Rule to answer your research question by the p-value method.

(referring to Ex. 12)

One-Sample T

Test of mu = 0.5 vs. > 0.5

95% Lower
N	Mean	StDev	SE Mean	Bound	T	P
51	0.5900	0.2900	0.0406	0.5219	2.22	0.016

Additional example: www.youtube.com/watch?v=WwdSjO4VUsg .

Hypothesis Test for a Population Proportion ( p )

Frequently, the parameter we are testing is the population proportion.

We are studying the proportion of trees with cavities for wildlife habitat.
We need to know if the proportion of people who support green building materials has changed.
Has the proportion of wolves that died last year in Yellowstone increased from the year before?

Recall that the best point estimate of p , the population proportion, is given by

when np (1 – p )≥10. We can use both the classical approach and the p-value approach for testing.

The steps for a hypothesis test are the same that we covered in Section 2.

The test statistic follows the standard normal distribution. Notice that the standard error (the denominator) uses p instead of p̂ , which was used when constructing a confidence interval about the population proportion. In a hypothesis test, the null hypothesis is assumed to be true, so the known proportion is used.

The critical value comes from the standard normal table, just as in Section 2. We will still use the same three pairs of null and alternative hypotheses as we used in the previous sections, but the parameter is now p instead of μ :

For a two-sided test, alpha will be divided by 2 giving a ± Z α /2 critical value.
For a left-sided test, alpha will be all in the left tail giving a – Z α critical value.
For a right-sided test, alpha will be all in the right tail giving a Z α critical value.

A botanist has produced a new variety of hybrid soy plant that is better able to withstand drought than other varieties. The botanist knows the seed germination for the parent plants is 75%, but does not know the seed germination for the new hybrid. He tests the claim that it is different from the parent plants. To test this claim, 450 seeds from the hybrid plant are tested and 321 have germinated. Use a 5% level of significance to test this claim that the germination rate is different from 75%.

H o : p = 0.75
H 1 : p ≠ 0.75

This is a two-sided question so alpha is divided by 2.

Alpha is 0.05 so the critical values are ± Z α /2 = ± Z .025 .
Look on the negative side of the standard normal table, in the body of values for 0.025.
The critical values are ± 1.96.

The test statistic does not fall in the rejection zone. We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the germination rate of the hybrid plant is different from the parent plants.

Let’s answer this question using the p-value approach. Remember, for a two-sided alternative hypothesis (“not equal”), the p-value is two times the area of the test statistic. The test statistic is -1.81 and we want to find the area to the left of -1.81 from the standard normal table.

On the negative page, find the Z-score -1.81. Find the area associated with this Z-score.
The area = 0.0351.
This is a two-sided test so multiply the area times 2 to get the p-value = 0.0351 x 2 = 0.0702.

Now compare the p-value to alpha. The Decision Rule states that if the p-value is less than alpha, reject the H 0 . In this case, the p-value (0.0702) is greater than alpha (0.05) so we will fail to reject H 0 . We do not have enough evidence to support the claim that the germination rate of the hybrid plant is different from the parent plants.

You are a biologist studying the wildlife habitat in the Monongahela National Forest. Cavities in older trees provide excellent habitat for a variety of birds and small mammals. A study five years ago stated that 32% of the trees in this forest had suitable cavities for this type of wildlife. You believe that the proportion of cavity trees has increased. You sample 196 trees and find that 79 trees have cavities. Does this evidence support your claim that there has been an increase in the proportion of cavity trees?

Use a 10% level of significance to test this claim.

H o : p = 0.32
H 1 : p > 0.32

This is a one-sided question so alpha is divided by 1.

Alpha is 0.10 so the critical value is Z α = Z .10
Look on the positive side of the standard normal table, in the body of values for 0.90.
The critical value is 1.28.

The test statistic is the number of standard deviations the sample proportion is from the known proportion. It is also a Z-score, just like the critical value.

The test statistic is larger than the critical value (it falls in the rejection zone). We will reject the null hypothesis. We have enough evidence to support the claim that there has been an increase in the proportion of cavity trees.

Now use the p-value approach to answer the question. This is a right-sided question (“greater than”), so the p-value is equal to the area to the right of the test statistic. Go to the positive side of the standard normal table and find the area associated with the Z-score of 2.49. The area is 0.9936. Remember that this table is cumulative from the left. To find the area to the right of 2.49, we subtract from one.

p-value = (1 – 0.9936) = 0.0064

The p-value is less than the level of significance (0.10), so we reject the null hypothesis. We have enough evidence to support the claim that the proportion of cavity trees has increased.

(referring to Ex. 15)

Test and CI for One Proportion

Test of p = 0.32 vs. p > 0.32

90% Lower
Sample	X	N	Sample p	Bound	Z-Value	p-Value
1	79	196	0.403061	0.358160	2.49	0.006
Using the normal approximation.

Hypothesis Test about a Variance

When people think of statistical inference, they usually think of inferences involving population means or proportions. However, the particular population parameter needed to answer an experimenter’s practical questions varies from one situation to another, and sometimes a population’s variability is more important than its mean. Thus, product quality is often defined in terms of low variability.

Sample variance S 2 can be used for inferences concerning a population variance σ 2 . For a random sample of n measurements drawn from a normal population with mean μ and variance σ 2 , the value S 2 provides a point estimate for σ 2 . In addition, the quantity ( n – 1) S 2 / σ 2 follows a Chi-square ( χ 2 ) distribution, with df = n – 1.

The properties of Chi-square ( χ 2 ) distribution are:

Unlike Z and t distributions, the values in a chi-square distribution are all positive.
The chi-square distribution is asymmetric, unlike the Z and t distributions.
There are many chi-square distributions. We obtain a particular one by specifying the degrees of freedom (df = n – 1) associated with the sample variances S 2 .

One-sample χ 2 test for testing the hypotheses:

Alternative hypothesis:

where the χ 2 critical value in the rejection region is based on degrees of freedom df = n – 1 and a specified significance level of α .

As with previous sections, if the test statistic falls in the rejection zone set by the critical value, you will reject the null hypothesis.

A forester wants to control a dense understory of striped maple that is interfering with desirable hardwood regeneration using a mist blower to apply an herbicide treatment. She wants to make sure that treatment has a consistent application rate, in other words, low variability not exceeding 0.25 gal./acre (0.06 gal. 2 ). She collects sample data (n = 11) on this type of mist blower and gets a sample variance of 0.064 gal. 2 Using a 5% level of significance, test the claim that the variance is significantly greater than 0.06 gal. 2

H 0 : σ 2 = 0.06

H 1 : σ 2 >0.06

The critical value is 18.307. Any test statistic greater than this value will cause you to reject the null hypothesis.

The test statistic is

We fail to reject the null hypothesis. The forester does NOT have enough evidence to support the claim that the variance is greater than 0.06 gal. 2 You can also estimate the p-value using the same method as for the student t-table. Go across the row for degrees of freedom until you find the two values that your test statistic falls between. In this case going across the row 10, the two table values are 4.865 and 15.987. Now go up those two columns to the top row to estimate the p-value (0.1-0.9). The p-value is greater than 0.1 and less than 0.9. Both are greater than the level of significance (0.05) causing us to fail to reject the null hypothesis.

(referring to Ex. 16)

Test and CI for One Variance

Method
Null hypothesis	Sigma-squared	= 0.06
Alternative hypothesis	Sigma-squared	> 0.06

The chi-square method is only for the normal distribution.

Test
Method	Statistic	DF	P-Value
Chi-Square	10.67	10	0.384

Excel does not offer 1-sample χ 2 testing.

Putting it all Together Using the Classical Method

To test a claim about μ when σ is known.

Write the null and alternative hypotheses.
State the level of significance and get the critical value from the standard normal table.

Compare the test statistic to the critical value (Z-score) and write the conclusion.

To Test a Claim about μ When σ is Unknown

State the level of significance and get the critical value from the student’s t-table with n-1 degrees of freedom.

Compare the test statistic to the critical value (t-score) and write the conclusion.

To Test a Claim about p

State the level of significance and get the critical value from the standard normal distribution.

To Test a Claim about Variance

State the level of significance and get the critical value from the chi-square table using n-1 degrees of freedom.

Compare the test statistic to the critical value and write the conclusion.

Skip to secondary menu
Skip to main content
Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Kruskal Wallis Test Explained

By Jim Frost 2 Comments

What is the Kruskal Wallis Test?

The Kruskal Wallis test is a nonparametric hypothesis test that compares three or more independent groups. Statisticians also refer to it as one-way ANOVA on ranks. This analysis extends the Mann Whitney U nonparametric test that can compare only two groups.

If you analyze data, chances are you’re familiar with one-way ANOVA that compares the means of at least three groups. The Kruskal Wallis test is the nonparametric version of it. Because it is nonparametric, the analysis makes fewer assumptions about your data than its parametric equivalent.

Many analysts use the Kruskal Wallis test to determine whether the medians of at least three groups are unequal. However, it’s important to note that it only assesses the medians in particular circumstances. Interpreting the analysis results can be thorny. More on this later!

If you need a nonparametric test for paired groups or a single sample , consider the Wilcoxon signed rank test .

Learn more about Parametric vs. Nonparametric Tests and Hypothesis Testing Overview .

What Does the Kruskal Wallis Test Tell You?

At its core, the Kruskal Wallis test evaluates data ranks. The procedure ranks all the sample data from low to high. Then it averages the ranks for all groups. If the results are statistically significant, the average group ranks are not all equal. Consequently, the analysis indicates whether any groups have values that rank differently. For instance, one group might have values that tend to rank higher than the other groups.

The Kruskal Wallis test doesn’t involve medians or other distributional properties—just the ranks. In fact, by evaluating ranks, it rolls up both the location and shape parameters into a single evaluation of each group’s average rank.

When their average ranks are unequal, you know a group’s distribution tends to produce higher or lower values than the others. However, you don’t know enough to draw conclusions specifically about the distributions’ locations (e.g., the medians).

Special Case for Same Shapes

However, when you hold the distribution shapes constant, the Kruskal Wallis test does tell us about the median. That’s not a property of the procedure itself but logic. If several distributions have the same shape, but the average ranks are shifted higher and lower, their medians must differ. But we can only draw that conclusion about the medians when the distributions have the same shapes.

Graph of three distributions with the same shape for a Kruskal Wallis test.

These three distributions have the same shape, but the red and green are shifted right to higher values. Wherever the median falls on the blue distribution, it’ll be in the corresponding position in the red and blue distributions. In this case, the analysis can assess the medians.

But, if the shapes aren’t similar, we don’t know whether the location, shape, or a combination of the two produced the statistically significant Kruskal Wallis test.

Analysis Assumptions

Like all statistical analyses, the Kruskal Wallis test has assumptions. Ensuring that your data meet these assumptions is crucial.

Independent Groups : Each group has a distinct set of subjects or items.
Independence of Observations : Each observation must be independent of the others. The data points should not influence or predict each other.
Ordinal or Continuous Data : The Kruskal Wallis test can handle both ordinal data and continuous data, making it flexible for various research situations.
Same Distribution Shape : This assumption applies only when you want to draw inferences about the medians. If this assumption holds, the analysis can provide insights about the medians.

Violating these assumptions can lead to incorrect conclusions.

When to Use this Analysis?

Consider using the Kruskal Wallis test in the following cases:

You have ordinal data.
Your data follow a nonnormal distribution, and you have a small sample size.
The median is more relevant to your subject area than the mean.

Learn more about the Normal Distribution .

If you have 3 – 9 groups and more than 15 observations per group or 10 – 12 groups and more than 20 observations per group, you might want to use one-way ANOVA even when you have nonnormal data. The central limit theorem causes the sampling distributions to converge on normality, making ANOVA a suitable choice.

One-way ANOVA has several advantages over the Kruskal Wallis test, including the following:

More statistical power to detect differences.
Can handle distributions with different shapes ( Use Welch’s ANOVA ).
Avoids the interpretation issues discussed above.

In short, use this nonparametric method when you’re specifically interested in the medians, have ordinal data, or can’t use one-way ANOVA because you have a small, nonnormal sample.

Interpreting Kruskal Wallis Test Results

Like one-way ANOVA, the Kruskal Wallis test is an “omnibus” test. Omnibus tests can tell you that not all your groups are equal, but it doesn’t specify which pairs of groups are different.

Specifically, the Kruskal Wallis test evaluates the following hypotheses:

Null : The average ranks are all the same.
Alternative : At least one average rank is different.

Again, if the distributions have similar shapes, you can replace “average ranks” with “medians.”

Imagine you’re studying five different diets and their impact on weight loss. The Kruskal Wallis test can confirm that at least two diets have different results. However, it won’t tell you exactly which pairs of diets have statistically significant differences.

So, how do we solve this problem? Enter post hoc tests. Perform these analyses after (i.e., post) an omnibus analysis to identify specific pairs of groups with statistically significant differences. A standard option includes Dunn’s multiple comparisons procedure. Other options include performing a series of pairwise Mann-Whitney U tests with a Bonferroni correction or the lesser-known but potent Conover-Iman method.

Learn about Post Hoc Tests for ANOVA .

Kruskal Wallis Test Example

Imagine you’re a healthcare administrator analyzing the median number of unoccupied beds in three hospitals. Download the CSV dataset: KruskalWallisTest .

Statistical output for the Kruskal Wallis test.

For this Kruskal Wallis test, the p-value is 0.029, which is less than the typical significance level of 0.05. Consequently, we can reject the null hypothesis that all groups have the same average rank. At least one group has a different average rank than the others.

Furthermore, if the three hospital distributions have the same shape, we can conclude that the medians differ.

At this point, we might decide to use a post hoc test to compare pairs of hospitals.

Reader Interactions

May 20, 2024 at 2:07 pm

Sir kruskal walllis test is Two tailed or one tailed test??

May 20, 2024 at 3:55 pm

It’s a one-tailed test in the same sense that the F-test for one-way ANOVA is one-tailed.

Comments and Questions Cancel reply

How can I use the Z.TEST function in Excel to perform a statistical hypothesis test?

Table of Contents

The Z.TEST function in Excel is a powerful tool that allows users to perform statistical hypothesis tests with ease. This function calculates the probability of a sample mean being equal to a specified population mean, using the standard normal distribution. It is commonly used in research and data analysis to determine the significance of a sample mean compared to a known population mean. To use the Z.TEST function, the user must input the data range for the sample and the known population mean. The function will then return a p-value that can be compared to a chosen significance level to determine if the null hypothesis should be rejected. This allows for efficient and accurate hypothesis testing, making it a valuable tool for decision-making and drawing conclusions from data.

This article describes the formula syntax and usage of the Z.TEST function in Microsoft Excel.

Returns the one-tailed P-value of a z-test.

For a given hypothesized population mean, x, Z.TEST returns the probability that the sample mean would be greater than the average of observations in the data set (array) — that is, the observed sample mean.

To see how Z.TEST can be used in a formula to compute a two-tailed probability value, see the Remarks section below.

Z.TEST(array,x,[sigma])

The Z.TEST function syntax has the following arguments:

Array Required. The array or range of data against which to test x.

x Required. The value to test.

Sigma Optional. The population (known) standard deviation. If omitted, the sample standard deviation is used.

If array is empty, Z.TEST returns the #N/A error value.

Z.TEST is calculated as follows when sigma is not omitted:

Z.TEST( array,x,sigma ) = 1- Norm .S.Dist ((Average(array)- x) / (sigma/√n),TRUE)

or when sigma is omitted:

Z.TEST( array,x ) = 1- Norm .S.Dist ((Average(array)- x) / (STDEV(array)/√n),TRUE)

where x is the sample mean AVERAGE(array), and n is COUNT(array).

Z.TEST represents the probability that the sample mean would be greater than the observed value AVERAGE(array), when the underlying population mean is μ0. From the symmetry of the Normal distribution, if AVERAGE(array) < x, Z.TEST will return a value greater than 0.5.

The following Excel formula can be used to calculate the two-tailed probability that the sample mean would be further from x (in either direction) than AVERAGE(array), when the underlying population mean is x:

=2 * MIN(Z.TEST(array,x,sigma), 1 – Z.TEST(array,x,sigma)).

Copy the example data in the following table, and paste it in cell A1 of a new Excel worksheet. For formulas to show results, select them, press F2, and then press Enter. If you need to, you can adjust the column widths to see all the data.


3
6
7
8
6
5
4
2
1
9

=Z.TEST(A2:A11,4)	One-tailed probability-value of a z-test for the data set above, at the hypothesized population mean of 4 (0.090574)	0.090574
=2 * MIN(Z.TEST(A2:A11,4), 1 – Z.TEST(A2:A11,4))	Two-tailed probability-value of a z-test for the data set above, at the hypothesized population mean of 4 (0.181148)	0.181148
=Z.TEST(A2:A11,6)	One-tailed probability-value of a z-test for the data set above, at the hypothesized population mean of 6 (0.863043)	0.863043
=2 * MIN(Z.TEST(A2:A11,6), 1 – Z.TEST(A2:A11,6))	Two-tailed probability-value of a z-test for the data set above, at the hypothesized population mean of 6 (0.273913)	0.273913

Related terms:

What is the process of using statistical methods to test a hypothesis?
What is the meaning and significance of P-values and statistical significance in statistical analysis?
“What is the statistical significance and relationship between the standard normal distribution and statistical analysis?”
What is hypothesis testing and how is it used in statistical analysis?
What is the null hypothesis for linear regression and how does it relate to the alternative hypothesis?
What type of statistical analysis should I perform using SPSS?
How can I utilize the Hypergeometric Distribution function in Excel for statistical analysis?
What is the purpose of Bartlett’s Test of Sphericity and how is it used in statistical analysis?
What is the paired sample t-test and how is it used in statistical analysis?
What is the Chi Square test and how is it used in statistical analysis?

Main navigation

Our Articles
Dr. Joe's Books
Media and Press
Our History
Public Lectures
Past Newsletters
Photo Gallery: The McGill OSS Separates 25 Years of Separating Sense from Nonsense

Subscribe to the OSS Weekly Newsletter!

The blood microbiome is probably not real.

Add to calendar
Tweet Widget

Up until recently, if bacteria were detected in your blood you would be in a world of trouble. Blood was long considered to be sterile, meaning free of viable microorganisms like bacteria. When disease-causing bacteria spread to the blood, they can cause a life-threatening septic shock.

But the use of DNA sequencing technology has allowed researchers to more easily detect something that had been reported as early as the late 1960s: bacteria can be found in the blood and not cause disease.

As we begin to map out and understand the complex microbial ecosystem that lives in our gut and elsewhere in the body, we contemplate an important question: is there such a thing as a blood microbiome?

Detecting a fingerprint in the blood

Our large intestine is not sterile; it is teeming with bacteria. But there are parts of the body that were long thought to be devoid of microorganisms. The brain. Bones. A variety of internal fluids, like our synovial fluid and peritoneal fluid. And, importantly, the blood.

Blood is made up of a liquid called plasma filled with red blood cells, whose main function is to carry oxygen to our cells. It also transports white blood cells, important to monitor for and fight off infections, as well as platelets, involved in clotting.

In the 1960s, a team of Italian researchers published multiple papers describing “mycoplasm-like forms”—meaning shapes that look like a particular type of bacteria that often contaminate cells cultured in the lab—in the blood of healthy people. This finding was confirmed in 1977 by a different team, which reported that four out of the 60 blood samples they had drawn from healthy volunteers showed bacteria growing in them. These types of tests, however, were rudimentary compared to what we have access to now. In the 2000s, they were mostly supplanted by DNA testing.

While we can sequence the entire DNA of any bacteria found in the blood, the technique most often used is 16S rRNA gene sequencing. I have always admired physicists’ penchant for quirky names: gluons, neutrinos, and charm quarks. Molecular biologists, by comparison, tend to be more sober. Yes, we have genes like Sonic hedgehog and proteins called scramblases; usually, though, we have to contend with the dryness of “16S rRNA.” You see, RNA is a molecule with many uses. Messenger RNA (or mRNA) acts as a disposable copy of a gene, a template for the production of a specific protein. Transfer RNA (or tRNA) actually brings the building blocks of a protein to where they are being assembled. And ribosomal RNA (or rRNA) is the main component of the giant protein factories in our cells known as ribosomes. One of its subunits is made up of, among others, a particular string of RNA known as the 16S rRNA.

The cool thing about the gene that codes for this 16S rRNA molecule is that it is very old and it mutates at a slow rate. By reading its precise sequence, scientists can tell which species it belongs to. Most of the studies of the putative blood microbiome use this technique to tell which species of bacteria are present in the blood being tested. The limitation of this test, however, is that dead bacteria have DNA too. The fact that DNA from the 16S rRNA gene of a precise bacterial species was detected in someone’s blood does not mean these bacteria were alive. For there to be a microbiome in the blood, these microorganisms need to live.

Which brings us to another important point of discussion. In order for scientists to agree that a blood microbiome exists, they first need to decide on the definition of a microbiome, and this is still a point of contention. In 2020, while companies were more than happy to sell hyped-up services testing your gut microbiome and claiming to interpret what it meant for your health, actual experts in the field met to agree on just what the word meant. “We are lacking,” they wrote , “a clear commonly agreed definition of the term ‘microbiome’.” For example, do viruses qualify? A microbiome implies life but viruses live on the edge, pun intended: they have the genetic blueprint for life yet they cannot reproduce on their own.

These experts proposed that the word “microbiome” should refer to the sum of two things: the microbiota, meaning the living microorganisms themselves, and their theatre of activity. It’s like saying that the Earth is not simply the life forms it houses, but also all of their individual components, and the traces they leave behind, and the environmental conditions in which they thrive or die. The microbiome is made up of bacteria and other microorganisms, yes, but also their proteins, lipids, sugars, and DNA and RNA molecules, as well as the signalling molecules and toxins that get exchanged within their theatre. (This is where viruses were sorted, by the way: not as part of the living microbiota but as belonging to the theatre of activity of the microbiome.)

The microbiome is a community, and this community has a distinct habitat.

So, what does the evidence say? Is our blood truly host to a thriving community of microorganisms or is something else going on?

Transient and sporadic

Initial studies of the alleged blood microbiome were small . The amounts of bacteria that were being reported based on DNA sequencing were tiny. If this microbiome existed, it seemed sparse, more “asteroid field in real life” than “asteroid field in the movies.”

An issue looming over this early research is that of contamination. If bacteria are detected in a blood sample, were they really in the blood… or did they contaminate supplies along the way? When blood is drawn, the skin, which has its own microbiome, is punctured. The area is usually swabbed with alcohol to kill bacteria, and the supplies used should be sterile, but suffice to say that from the blood draw to the DNA extraction to the DNA amplification to the sequencing of this DNA, bacteria can be introduced into the system. In fact, it is such common knowledge that certain bacteria are found inside of the laboratory kits used by scientists that this ecosystem has its own name: the kitome. One way to rule out these contaminants is to simultaneously run negative controls alongside samples every step of the way, to make sure that these negative controls are indeed free of bacteria. But early papers rarely reported when controls were used.

Last year, results from what purports to be the largest study ever into the question of whether the blood microbiome exists were published in Nature Microbiology . A total of 9,770 healthy individuals were tested. The conclusion? Yes, some bacteria could be found in their blood, but the evidence contradicted the claim of an ecosystem. In 84% of the samples tested, no bacteria were detected. In most of the other samples, only one species was found. In an ecosystem, you would expect to see species appearing together repeatedly, but this was not the case here. And the species they found most often in their samples were known to contaminate these types of laboratory experiments.

So, what were the few bacteria found in the blood and not recognized as contaminants doing there in the first place if they were not part of a healthy microbiome? The authors lean toward an alternative explanation that had been floated for many years: these bacteria are transient. They end up in the blood from other parts of the body, either because of some minor leak or through their active transportation into the blood by agents such as dendritic cells. Like pedestrians wandering off onto the highway, these bacteria do not normally live in the blood but they can be seen there when we look at the right moment.

Putting the diagnostic cart before the horse

This blood microbiome story could end here and simply be an interesting example of scientific research homing in on a curious finding, testing a hypothesis, and ultimately refuting it (or at the very least providing strong evidence against it). But given the incentives of modern research and the social-media spotlight cast on the academic literature, there are two slightly worrying angles here that merit discussion.

Scientists are more and more incentivized to find practical applications for their research. It’s not enough, for example, to study bacteria that survive at incredibly high temperatures; we must be assured that the DNA replication enzyme these bacteria possess will one day be used in laboratories all over the world to conduct research, identify criminals, and test samples for the presence of a pandemic-causing coronavirus.

In researching this topic, I came across many papers claiming the existence of “blood microbiome signatures” for certain diseases that are not known to be infectious. We are thus not talking about infections leaking in the blood and causing sepsis. I saw reports of signatures for cardiovascular disease , liver disease , heart attacks , even for gastrointestinal disease in dogs . The idea is that these signatures could soon be turned into (profitable) diagnostic tests. The problem, of course, is that these studies are based on the hypothesis that a blood microbiome is real; that its equilibrium can be affected by disease; and that these changes can be reliably detected and interpreted.

But if the blood microbiome is imaginary, we are just chasing ghosts. This is not unlike the time that scientists were publishing signatures of microRNAs in the blood for every possible cancer. When I looked at the published literature in grad school, I realized that the multiple signatures reported for a single cancer barely overlapped . They were just chance findings. Compare enough variables in a small sample set and you will find what appears to be an association.

My second concern is that the transitory leakage of bacteria into the blood, as evidenced by the recent Nature Microbiology paper, will be used as confirmation of a pseudoscientific entity: leaky gut syndrome. At the end of their paper, the researchers hypothesize that these bacteria end up in the blood because the integrity of certain barriers in the body are compromised during disease or during periods of stress. The “net” in our gut gets a bit porous, and some of our colon’s bacteria end up in circulation, though they are not causing disease as far as we can tell. A form of leaky gut is known to exist in certain intestinal diseases , likely to be a consequence and not a cause. But leaky gut syndrome, favoured by non-evidence-based practitioners, does not appear to be real, yet many websites portray it as the one true cause of all diseases, a real epidemic. Nuanced scientific findings have a history of being stolen, distorted, and toyed with by fake doctors to give credence to their pet theories. Though I have yet to see examples of it, I suspect work done on this hypothesized blood microbiome will similarly get weaponized.

You have been warned.

Take-home message: - Our blood was long considered to be sterile, meaning free of viable microbes, unless a dangerous infection leaked into it, causing sepsis - Studies have provided evidence for the presence of bacteria in the blood of some healthy humans, leading to the hypothesis that, much like in our gut, our blood is host to a microbiome - The largest study ever done on the topic provided strong evidence against this hypothesis. It seems that when non-disease-causing bacteria find themselves in our blood, it is temporary and occasional

@CrackedScience

What to read next

The story linking nutrition and health has unexpected twists 28 jun 2024.

A Taste of Bitter Melon 28 Jun 2024

Your Appendix May Not Be Useless After All 21 Jun 2024

On the Trail of Chemtrail Nonsense 21 Jun 2024

Facial Creams and Lotions Offer Hope in a Jar 14 Jun 2024

Protect Yourself Against Sunscreen Myths 14 Jun 2024

Department and University Information

Office for science and society.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 02 July 2024

Research on the influencing factors of promoting flipped classroom teaching based on the integrated UTAUT model and learning engagement theory

Yufan Pan 1 &
Wang He 1

Scientific Reports volume 14 , Article number: 15201 ( 2024 ) Cite this article

Metrics details

Human behaviour
Information technology

With the rapid advancement of educational technology, the flipped classroom approach has garnered considerable attention owing to its potential for enhancing students’ learning capabilities. This research delves into the flipped classroom teaching methodology, employing the Unified Theory of Acceptance and Use of Technology (UTAUT), learning engagement theory, and the 4C skills (comprising communication, collaboration, creativity, and critical thinking) to investigate its effects on learning capabilities. The research surveyed 413 students from three universities in Jiangxi Province, employing stratified random sampling. SPSS 24.0 and Amos were used for structural equation modeling and hypothesis testing analysis. The findings indicate that: (1) Performance expectancy, effort expectancy, and peer influence significantly enhance students’ learning engagement in the flipped classroom. (2) Students’ learning engagement in the flipped classroom notably promotes their learning capabilities. (3) Performance expectancy, effort expectancy, and peer influence can significantly boost learning capabilities by increasing learning engagement. (4) Personality traits significantly moderate the effect of peer influence on learning engagement, highlighting the crucial role of individual differences in learning. (5) The level of students’ learning engagement is differentially influenced by performance expectancy and peer influence across various academic disciplines. Ultimately, this research provides valuable insights for educational policymakers and guides improvements in teaching practices, collectively advancing educational quality and equity.

Introduction

With the rapid development of technology and continuous innovation in educational philosophy, the flipped classroom, as a cutting-edge teaching method, has garnered widespread attention globally. This teaching approach was first proposed by two high school teachers, Jonathan Bergmann and Aaron Sams, in 2007, and has quickly gained widespread application at all levels of education 1 . The core concept of the flipped classroom lies in upending the traditional teaching model, extending students’ learning activities from the classroom to outside, thereby transforming the classroom into a hub for deep learning and practical activities. As such, exploring the impact of flipped classroom teaching on students’ learning abilities constitutes an important area of research. Currently, scholars both domestically and internationally focus their research on flipped classroom teaching primarily in the following two areas:

One aspect concerns studies on the implementation effects of the flipped classroom model. Numerous studies have confirmed its significant role in promoting students’ active participation, enhancing learning motivation, and improving academic performance. In biology instruction, Flores-González and Flores-González 2 found that the flipped classroom fosters self-regulated learning and active engagement among students. Within an online educational environment, Cuetos 3 research indicates that the flipped classroom model elevates university students’ learning motivation and academic achievement. In the field of English teaching, Dewi et al. 4 study also revealed that students hold a positive attitude towards the new teaching mode of the flipped classroom. The flipped classroom provides students with more personalized and autonomous learning opportunities, allowing learning to align more closely with individual learning styles and paces 5 , 6 . Additionally, the impact of the flipped classroom on students’ academic success has garnered considerable attention. Semab and Naureen 7 research demonstrates the positive effect of the flipped classroom model in enhancing students’ academic achievements. Studies by Roehl et al. 8 and Tucker 9 emphasize that the flipped classroom promotes the cultivation of students’ practical application abilities by reallocating classroom time for in-depth discussions, problem-solving, and hands-on activities.

Another area of research pertains to effective flipped classroom teaching design and the challenges it faces. Effective teaching design serves as the cornerstone of a successful flipped classroom. Nicholas 10 underscores multiple teaching factors to consider when implementing a flipped classroom in graduate education. Meanwhile, Yang et al. 11 designed a flipped classroom teaching model based on blended learning and experimentally verified the model’s effectiveness in enhancing students’ academic performance and level of learning engagement. For STEM education, Shofiyah et al. 12 proposed a flipped classroom teaching template grounded in the 5E model, the validity and reliability of its content and structure have also been verified.

The research findings from the two aforementioned aspects provide rich references and insights for this research, yet there are two gaps in existing research. On the one hand, although the flipped classroom teaching method has been widely studied, few studies have explored the impact of the flipped classroom on enhancing 4C skills (communication, collaboration, creativity, and critical thinking), using the 4C skills as a measure of learning capability. On the other hand, despite the general agreement in previous studies on the positive effects of the flipped classroom on learning outcomes, few have investigated the differing impacts of personality and subject differences in flipped classroom teaching.

Therefore, this research seeks to answer the following questions:

Question 1: What are the key factors that affect students’ learning engagement in the flipped classroom teaching method?

Question 2: Does learning engagement further influence students’ 4C learning capabilities?

Question 3: Do individual differences (such as personality) and subject differences have a moderating effect between the relevant variables in this research?

To answer these questions, this research combines the UTAUT model, learning engagement theory, and the 4C skills analysis framework to construct a comprehensive research model. By collecting data from 413 students from three universities in Jiangxi Province, this research utilizes structural equation modeling for data analysis to unveil the complex relationships between performance expectancy, effort expectancy, peer influence, learning engagement, and learning capability in the flipped classroom.

The significance of this research lies not only in illustrating the specific impact of the flipped classroom on students’ learning capabilities through empirical analysis, providing a scientific basis for the further promotion and application of this teaching model, but also in deepening the understanding of the essence of the flipped classroom teaching model by integrating multiple theoretical frameworks. Furthermore, the research findings will provide strong support for educational policy formulation and teaching practice improvement, collectively advancing educational quality and equity.

Compared to existing research, the innovation of this research is primarily reflected in model construction and variable setting. Firstly, while preserving the core variables of the UTAUT model, this research innovatively integrates learning engagement theory and the 4C skills framework, offering a new perspective for understanding the interactions of various factors in the flipped classroom and their impact on learning capabilities. Secondly, in terms of variable setting, this research includes personality traits and subject differences as moderating variables in the analysis and utilizes 4C skills as a measure of learning capabilities, which is a novel approach in the field of flipped classroom teaching research.

In the following chapters, this research will conduct a literature review, derive research hypotheses, describe the survey process, perform model inspection, and finally discuss and present research implications.

Literature review

Theory and application of the utaut.

The UTAUT model, proposed by Venkatesh et al. 13 , integrates key variables from multiple theoretical models, including the Theory of Reasoned Action (TRA), Theory of Planned Behavior (TPB), Technology Acceptance Model (TAM), Motivational Model (MM), Combined TAM and TPB (C-TAM), and Innovation Diffusion Theory (IDT). Its aim is to provide a more comprehensive and accurate framework to explain and predict users’ acceptance behavior towards new technologies.

The dimensions of the UTAUT model have demonstrated strong explanatory power in technology adoption research across multiple domains, particularly in education. The model has been applied to studies on the acceptance behavior of technologies such as Massive Open Online Courses (MOOCs) and AI chatbots. Li and Zhao 14 combined the UTAUT model with Social Presence Theory to analyze factors influencing students’ continued use of MOOCs, finding that the UTAUT model positively affects students’ satisfaction and intention to continue using MOOCs. Meanwhile, Tian et al. 15 utilized both UTAUT and ECM models to explore Chinese graduate students’ acceptance and utilization of AI chatbot technology, discovering that “confirmation” and “satisfaction” from the ECM model have a greater impact on user behavior than the UTAUT model.

The flipped classroom, as an innovative teaching model in higher education, has garnered significant attention regarding its application effects and influencing factors. In this domain, both Alyoussef 16 , Agyei and Razi 17 have conducted in-depth explorations utilizing the UTAUT model. Alyoussef 16 revealed the central mediating role of perceived usefulness and perceived ease of use in students’ acceptance of the flipped classroom, using a sample of students from a university in Saudi Arabia. This finding not only indicates students’ positive attitude towards the flipped classroom but also further confirms the positive role of this teaching model in enhancing learning outcomes. Agyei and Razi 17 enriched the UTAUT model by introducing variables such as experience expectancy, parent-school involvement, perceived behavioral control, and self-efficacy to deeply analyze high school students’ acceptance of using online resources for flipped classroom learning. Their results showed that performance expectancy, effort expectancy, parent-school involvement, students’ self-efficacy, and experience expectancy all significantly positively impact students’ willingness to learn. However, perceived behavioral control did not show a significant effect in their research.

In this research, we focus on the college student population whose social connections are primarily reflected in peer relationships. Therefore, we choose to represent the social influence element in the UTAUT model with the peer influence, aiming to be more aligned with the actual situation of this specific group. Meanwhile, considering the widespread popularity of technology in modern higher education, we do not consider facilitating conditions as a core factor in our research, although this does not imply that this element can be ignored in all environments.

Learning engagement theory

Learning engagement theory, initially proposed by Fredricks et al. 18 in the field of educational psychology, aims to delve into students’ cognitive, emotional, and behavioral engagement exhibited during the learning process. This theory focuses on assessing the enthusiasm and depth of students’ participation in learning activities, encompassing three key dimensions: cognitive engagement, emotional engagement, and behavioral engagement.

In the process of deeply exploring the theory of learning engagement, researchers have conducted extensive studies targeting different educational environments and learning formats. The issue of learning engagement is particularly prominent in the field of MOOCs and online learning. Although MOOCs have attracted a large number of learners due to their openness and flexibility, the low completion rate has always been one of the challenges they face 19 . Studies have shown that students’ intrinsic motivations (such as interest) and extrinsic motivations (such as perceived knowledge value) have a significant impact on their learning engagement in MOOCs. From the perspective of self-determination theory, Lan and Hew 20 research adopted a mixed-method approach to investigate learning engagement in MOOCs. The study found that perceived ability and emotional engagement have a significant impact on students who complete MOOCs, and different dimensions of learning engagement can predict learners’ perceived learning effectiveness. Meanwhile, in the online learning environment, significant changes have occurred in the way students interact with teachers and peers, which requires educators to pay more attention to cultivating students’ autonomous learning abilities, computer and network skills, and online communication abilities to promote their learning engagement 21 .

Related research on the 4C skills analysis framework

Voogt et al. 22 emphasized the importance of 4C skills in 21st-century education in their study. These skills include Communication, Collaboration, Creativity, and Critical Thinking, which focus on cultivating students’ comprehensive literacy to adapt to the complex needs of modern society. Especially when measuring learning capability, the 4C skills provide a comprehensive and appropriate framework.

In the context of exploring the improvement of learning abilities, integrating these four skills—critical thinking, communication, collaboration, and creativity—into the learning process is particularly critical. Specifically, critical thinking skills enable learners to identify true and false information and adapt to environmental changes 23 . Communication skills, including effective speaking, listening, and writing, are key to improving interpersonal efficiency 24 . Collaboration emphasizes working together in a team environment, facilitating knowledge sharing and problem-solving 25 . Innovation ability is the key to gaining an advantage in modern social competition, requiring continuous learning, challenging traditional concepts, and maintaining sensitivity to new technologies 26 . By integrating these abilities, the learning process can be more comprehensive, improving flexibility and effectiveness in responding to various challenges.

Although previous studies have explored the role of the flipped classroom model in promoting student collaboration, criticism, and innovation abilities 9 , 27 , there is still a lack of in-depth exploration of the specific impact and mechanism of communication ability in this model. Therefore, this research aims to comprehensively integrate 4C abilities and deeply explore the overall impact of the flipped classroom teaching model on these abilities.

This research combines the UTAUT model, learning engagement theory, and the 4C theory, selecting Performance expectancy (PE), Effort expectancy (EEX), Peer influence (PI), Learning engagement (ENGA), and Learning capability (SKIL) as research constructs to explore the impact mechanism of flipped classroom teaching on college students’ learning capability. The specific definitions of the variables are shown in Table 1 :

Hypothesis derivation

The relationship between performance expectancy, effort expectancy, peer influence, and learning engagement.

Singh 28 , Riddle and Gier 29 and Clark 30 and other scholars have found that flipped classrooms can improve students’ test scores and course engagement by replacing traditional lectures with micro-lectures and activity-based learning strategies, thus demonstrating that performance expectancy have a significant positive impact on students’ engagement in learning. The fully online flipped classroom model was more effective than the online flipped model in supporting student behavioral engagement, suggesting the importance of effort expectancy in an online environment 31 . Ruiz 32 study demonstrated that integrating interactive technology and peer instruction into a flipped classroom can positively affect student engagement, i.e., the importance of peer influence in enhancing student engagement and learning. Based on the above, the following hypotheses are proposed in this research:

Performance expectancy has a significant positive impact on students’ learning engagement.

Effort expectancy has a significant positive impact on students’ learning engagement.

Peer influence has a significant positive impact on students’ learning engagement.

The relationship between learning engagement and learning capability

Learning Engagement Theory emphasizes the cognitive, emotional, and behavioral engagement of students in the learning process and these factors are believed to positively influence the enhancement of learning capabilities. Zimmerman 33 states that when students believe they can succeed in a learning task, they are more likely to invest more energy and effort, which promotes learning capabilities. Pekrun et al. 34 explored the relationship between students’ engagement in learning and learning competence from an emotional perspective, which further supports the positive link between engagement in learning and increased learning competence. Martin and Bolliger 35 noted that student engagement in learning directly impacts online learning outcomes, improves student performance in online programs and is considered an important factor in measuring teaching quality. Taken together, these studies suggest that learning engagement can have a significant positive impact on students’ learning capabilities by influencing their self-efficacy, motivation, learning strategies, and emotional engagement. Based on the above, this research proposes the following hypotheses:

Students’ learning engagement has a significant positive impact on learning capability.

The mediating role of learning engagement

According to the aforementioned literature, it is evident that performance expectancy, effort expectancy, and peer influence have a positive effect on the enhancement of learning capabilities through learning engagement. This is supported by studies from Chen and Wu 36 , Buabeng-Andoh 37 , Kuo et al. 38 , Zimmerman 33 , Pekrun et al. 34 , and Martin and Bolliger 35 . Additionally, research by Jamaludin and Osman 39 , Wang 40 , and Nerantzi 41 also corroborate the notion that performance expectancy, effort expectancy, and peer influence impact learning capabilities through learning engagement. Based on this information, the current study proposes the following hypotheses:

Performance expectancy has a significant positive impact on learning capability through students’ learning engagement.

Effort expectancy has a significant positive impact on learning capability through students’ learning engagement.

Peer influence has a significant positive impact on learning capability through students’ learning engagement.

Moderating variables

Personality.

Personality is a relatively stable individual difference in behavior, emotion, and cognition exhibited by an individual that encompasses traits, habits, attitudes, and values. Eysenck and Eysenck and Eysenck 42 Proposed the introversion–extraversion theory to explain and describe individual personality differences, which divides human personality into two types: introverted and extroverted. Chuang et al. 43 , Kim et al. 44 and other scholars explored the differences in classroom performance of students with different personality traits in a flipped classroom, with Wang et al. 45 noted that students with moderate openness performed best in flipped classrooms, while students with high openness performed best in online learning situations. However, there are relatively few studies on personality as a moderating variable in the UTAUT model because UTAUT focuses primarily on technology acceptance and use behaviors, while individual differences, including personality, are usually more prominent in other models. Combined with the traits of the subjects under study, introverted students may be more inclined to show higher learning engagement in independent learning environments, while extroverted students may be more adept at collaborating with peers, therefore, the variable of personality is added as a moderating variable in this study. Based on the above, the following hypotheses are proposed in this research:

Personality has a moderating effect between performance expectancy and learning engagement.

Personality has a moderating effect between effort expectancy and learning engagement.

Personality has a moderating effect between peer influence and learning engagement.

Personality has a moderating effect between learning engagement and learning capability.

Based on the perspective of individual differences, subject differences may have different impacts on students’ engagement and learning capabilities in the flipped classroom environment. Liu et al. 46 explored the impact of a flipped classroom integrating subjects on student learning capabilities in different health professional fields. Meanwhile, Fan 47 explored the application of social influences, school motivation, and gender differences in educational psychology and found that disciplinary background may influence student motivation and engagement. Students may exhibit different learning preferences and strategies in humanities and social sciences and science and technology disciplinary contexts. Meanwhile, subject differences in the UTAUT model have been investigated by focusing on the similarities and differences in individuals’ acceptance of technology in different subject areas. Therefore, this research proposes the following hypotheses about the moderating effects of disciplinary differences:

Discipline has a moderating effect between performance expectancy and learning engagement.

Subject has a moderating effect between effort expectancy and learning engagement.

Subject has a moderating effect between peer influence and learning engagement.

Subject has a moderating effect between learning engagement and learning capabilities.

Taking the above into account, a theoretical model can be constructed about the enhancement of students’ learning capabilities by flipped classroom teaching, which is shown in Fig. 1 .

Theoretical model of flipped classroom teaching to enhance students’ learning capability.

Data collection

Collection method.

This research employed a questionnaire survey for data collection. The survey was conducted from November to December 2023, targeting students who had participated in flipped classroom learning in three universities in Jiangxi Province. To ensure the representativeness and breadth of the sample, the research team adopted a stratified random sampling method. A total of 450 eligible students were selected as survey respondents. This sampling strategy aimed to ensure that the sample comprehensively reflected the characteristics and opinions of students at different levels, thereby enhancing the reliability and validity of the research.

During the survey process, a detailed questionnaire was distributed to each participating student, and they were offered a cash incentive to encourage honest and thoughtful responses. Through this approach, we hoped to gather high-quality data that truly reflected students’ flipped classroom experiences, providing a solid foundation for subsequent analysis and research.

A total of 450 questionnaires were distributed, with 418 returned (93% response rate) and 413 valid questionnaires ultimately obtained (99% validity rate). Throughout the questionnaire distribution process, we strictly adhered to ethical principles in academic research, particularly regarding obtaining informed consent from participants. All students participating in this research were clearly informed about the purpose, methodology, potential risks, and their rights related to this research.

Collection instruments

The questionnaire employed a 7-point Likert scale, with the numbers 1, 2, 3, 4, 5, 6, and 7 representing “strongly disagree,” “disagree,” “somewhat disagree,” “neutral,” “somewhat agree,” “agree,” and “strongly agree,” respectively. “Strongly disagree” indicates that the situation described in the item completely contradicts reality, “strongly agree” indicates complete agreement with reality, and “neutral” indicates a middle ground. The research tool used in this research consists of six parts. The first part is a questionnaire on student background variables (including subject, personal personality, etc.). The second part is a performance expectancy questionnaire, adopting questionnaire content designed by Devisakti and Ramayah 48 and others for performance expectancy. It focuses on specific measurement items for university students’ performance expectancy of flipped classrooms and consists of 4 questions. The third part is an effort expectancy questionnaire, utilizing content designed by Zou et al. 49 and others for effort expectancy. It aims to measure specific aspects of university students’ effort expectancy in flipped classrooms and includes 3 questions. The fourth part is a peer influence questionnaire, based on content designed by Zhonggen and Xiaozhi 50 , Khlaisang et al. 51 , and others. It measures specific aspects of peer influence in flipped classrooms for university students and comprises 3 questions. The fifth part is a learning engagement questionnaire, using content designed by Qureshi et al. 52 . This section includes behavioral, emotional, and cognitive engagement as sub-dimensions and aims to measure specific aspects of university students’ learning engagement in flipped classrooms with 12 questions. The sixth part is a personal capability enhancement questionnaire, employing content designed by Arshad and Akram 53 , C.-H. S. Liu 54 , Baruch and Lin 55 and others. It includes sub-dimensions such as communication skills, cooperation skills, innovation skills, and critical thinking skills, aiming to measure specific aspects of personal capability enhancement in university students after learning in flipped classrooms, with 16 questions.

Demographic analysis

In this research, data collection resulted in the recovery of 418 questionnaires. After discarding 5 invalid samples, 413 valid samples remained. The basic characteristics are shown in Table 2 . The primary data collected in this research includes subjects and personal personality traits. Among the 418 surveyed students, 199 were from science and engineering, accounting for 48.18% of the total; 214 were from humanities and social sciences, accounting for 51.82%. In terms of personality traits, 201 students were introverted, accounting for 48.67% of the total, while 212 students were extroverted, representing 51.33% of the total.

Model inspection and results

In this research, SPSS 24.0 and AMOS software were used to perform structural equation modeling analysis on the data to explore the impact of flipped classroom teaching on students’ learning abilities. The analysis primarily consists of two parts: the measurement model (including reliability testing, convergent validity testing, and discriminant validity testing) and the structural model (including model fit analysis, path analysis, mediation effect analysis, and moderation effect analysis).

Measurement model analysis

Questionnaire reliability test.

Table 3 presents the internal consistency of the questionnaire dimensions. The internal consistency (Cronbach’s α) of all dimensions is higher than 0.7, indicating good reliability of the questionnaire sample data. All items will be retained for subsequent analysis.

Convergence validity test

Table 4 presents the standardized factor loadings of each measurement item, as well as the composite reliability and average variance extracted (AVE) for each dimension. The standardized factor loadings range from 0.648 to 0.851, the composite reliability falls between 0.806 and 0.913, and the AVE is between 0.58 and 0.683. These values meet the criteria established by Fornell and Larcker 56 , indicating good convergence validity of the research.

Discriminant validity

This research employed the Average Variance Extracted approach to evaluate the discriminant validity. In order to have sufficient discriminant validity, each construct’s square root of the AVE should be greater than the correlation coefficients between the constructs, according to Fornell and Larcker 56 . Table 5 displays the data demonstrating that the square roots of the AVEs for each of the components are higher than the associated correlation coefficients. The good discriminant validity of the model is confirmed by this finding. Make sure every research concept is unique and not just a reflection of other variables in the model by using discriminant validity.

Structural model analysis

The values of the model fit indices for the structural equation model fall within an acceptable range, as Table 6 shows. Values fewer than 3 or 5 (depending on the criterion employed) are generally considered suggestive of a good fit, therefore the χ 2 /DF value of 1.002 is desirable. A excellent match is shown by the RMSEA value of 0.002, which is significantly less than the typical threshold of 0.08. A good model fit is indicated by the SRMR value of 0.038, which is less than the suggested maximum of 0.08. Both the CFI and the TLI values of 0.999 and 0.999 are near to 1, indicating a very good fit to the data. A good model fit is further confirmed by the values of the GFI and AGFI, which are 0.938 and 0.933, respectively, above the generally recognized criterion of 0.90. Together, these fit indices imply that the structural equation model provides a good match to the data, indicating that the model accurately captures the observed data.

Path analysis

According to the path coefficient analysis results presented in Table 7 and Fig. 2 , Performance Expectancy (PE) (b = 0.244, p < 0.001), Effort Expectancy (EEX) (b = 0.284, p < 0.001), and Peer Influence (PI) (b = 0.242, p < 0.001) all have a significant positive impact on Learning Engagement (ENGA). Furthermore, Learning Engagement (ENGA) also significantly and positively affects the Learning Capability (SKIL) (b = 0.838, p < 0.001). Therefore, hypotheses H1, H2, H3, and H4 are supported.

Path analysis results. Note: * p < 0.05;** p < 0.01;*** p < 0.001, All present standardized regression coefficients.

Furthermore, the findings show that whereas learning engagement accounts for 72.5% of the variance in the improvement of learning skills, performance expectancy, effort expectancy, and peer influence collectively account for 76.6% of the variance in learning engagement. These results underline the significance of Performance Expectancy, Effort Expectancy, and Peer Influence in determining Learning Engagement and validate the research hypothesis. Additionally, they attest to the importance of learning engagement in improving learning capability. This emphasizes how important these factors are to the flipped classroom learning environment and how they all work together to enhance student learning capabilities.

Mediation effect

Based on the indirect effects analysis in the mediation model shown in Table 8 , it is noticed that the p -values are significant and the confidence intervals do not include 0 in all three mediated hypothesis paths (PI → ENGA → SKIL, EEX → ENGA → SKIL, and PE → ENGA → SKIL). This indicates that the mediation effects are valid in all cases. Specifically:

PE has a significant indirect effect on the enhancement of SKIL through ENGA, supporting Hypothesis H7. EEX has a significant indirect effect on the enhancement of SKIL through ENGA, supporting Hypothesis H6. PE has a significant indirect effect on the enhancement of SKIL through ENGA, supporting Hypothesis H5.

These findings demonstrate the critical role of ENGA as a mediator in the relationship between PE, EEX, PI, and the enhancement of SKIL. This supports the theoretical framework proposed in the research, highlighting the importance of these constructs in the context of flipped classroom learning environments.

Moderating effect

When considering individual personality as a moderating variable, among the 413 respondents, there were 201 introverted students and 212 extroverted students. Table 9 presents the regression coefficient values for the two groups, showing the comparison of slope differences between them. Table 10 displays the moderation effect test of the model. Among the four cross-group comparisons of slopes, the path of PI → ENGA reaches a significant level, indicating that the moderation effect is partially established, thus confirming Hypothesis H8C. From the values in Table 9 , it can be observed that the regression coefficient of PI → ENGA for introverted students is significantly higher than that of extroverted students.

When considering the subject as a moderating variable, among the 413 respondents, there were 199 science and engineering students and 214 humanities and social science students. Table 9 shows the regression coefficient values for the two groups, representing the comparison of slope differences between them. Table 10 presents the moderation effect test of the model. Among the eight cross-group comparisons of slopes, the paths of PE → ENGA and PI → ENGA reach significant levels, indicating that the moderation effect is partially established, thus confirming Hypotheses H9A and H9C. From the values in Table 9 , it can be seen that the regression coefficient of PE → ENGA for science and engineering students is significantly higher than that of humanities and social science students, while the regression coefficient of PI → ENGA for humanities and social science students is significantly higher than that of science and engineering students.

Conclusion and discussion

Main research findings and conclusions.

Based on Table 11 , the main findings of this research are as follows:

The four main effect hypotheses, H1, H2, H3, and H4, are validated. Analyses confirming H1, H2, and H3 demonstrate that Performance Expectancy, Effort Expectancy, and Peer Influence significantly positively influence Learning Engagement, aligning with research by Singh 28 , Riddle and Gier 29 , Clark 30 , Jia et al. 31 , and Ruiz 32 . The validation of H4 shows that Learning Engagement significantly positively impacts Learning Capability, consistent with Pekrun et al. 34 , Zimmerman 33 , and Martin and Bolliger 35 .

The three mediation effect hypotheses, H5, H6, and H7, are supported. Analyses for H5, H6, and H7 indicate that Performance Expectancy, Effort Expectancy, and Peer Influence significantly and positively influence Learning Capabilities through Learning Engagement, aligning with findings by Jamaludin and Osman 39 , Wang 40 , Kobayashi 57 , and Nerantzi 41 .

The two moderating effect hypotheses, H8C, H9A, and H9C, are validated. Analysis for H8C shows a significant moderating effect of personality between Peer Influence and Learning Engagement, consistent with Wang et al. 45 . Analyses for H9A and H9C reveal significant moderating effects of the subject field on the paths from Performance Expectancy to Learning Engagement and Peer Influence to Learning Engagement, echoing Fan 47 .

Main research conclusions

Performance expectancy, effort expectancy, and peer influence notably enhance students’ learning engagement in the flipped classroom teaching. Performance expectancy encourages students to set specific learning goals, work towards achieving them, and believe in their potential to excel academically, thereby increasing their focus and dedication to studies. At the same time, a clear understanding of the required effort makes students appreciate every moment in the flipped classroom, fully investing themselves in the learning experience. Moreover, peer influence is essential as students collaborate and interact, fostering a supportive learning environment that further fuels their motivation and boosts learning engagement.

Students’ learning engagement in the flipped classroom teaching significantly enhances their learning capabilities. Highly engaged students exhibit more active participation in class discussions, thereby refining their oral expression and improving their communication skills for smoother and more efficient interactions. Additionally, the flipped classroom frequently necessitates group work, fostering a sense of teamwork and collaboration among students. Through rigorous reflection and problem-solving exercises, students cultivate critical thinking abilities, allowing them to objectively and comprehensively analyze issues. Moreover, the flipped classroom underscores independent learning and exploration, thus motivating students to tackle problems from diverse perspectives and stimulating their innovative thinking and creativity. As a result, students who demonstrate high engagement in the flipped classroom achieve not only academic excellence but also substantial improvements in their communication, collaboration, critical thinking, and creativity skills.

Performance expectancy, effort expectancy, and peer influence can significantly enhance learning capability by increasing learning engagement. Clear performance expectancy fuels students’ motivation, keeping them laser-focused on learning tasks. To achieve better outcomes, students become proactive in communicating with peers and teachers, thus honing their communication skills. Additionally, they collaborate more with classmates to solve problems, strengthening their teamwork abilities. Effort expectancy helps students understand that to reach their learning goals, consistent effort is key. This realization sharpens their critical thinking and drives them to continuously refine their learning methods. In this journey, students also experiment with novel learning strategies, fostering creativity. Moreover, peer influence plays a pivotal role in shaping the learning environment. Mutual encouragement and imitation among peers create a positive atmosphere, prompting deeper engagement and significantly boosting students’ communication, collaboration, critical thinking, and creativity skills.

Personality plays a significant moderating role in the impact of peer influence on learning engagement, emphasizing the role of individual differences in learning. Our personality traits guide us in choosing our peers, often leading similar-minded students to form study groups. This, in turn, shapes the way peers influence each other and how engaged they are in learning through unique interaction styles. Additionally, our personalities determine our preferred learning methods. Extroverted students, for example, might enjoy learning through group discussions and hands-on activities, while introverted students might prefer solo research and quiet reading. These personality-driven choices further shape our attitudes, motivation, and how we handle learning challenges.

In various subject areas, students’ learning engagement is influenced to differing extents by performance expectancy and peer influence. The level of difficulty of the subjects and the students’ personal interests directly shape their performance expectancy. For instance, the abstract logic in science and engineering subjects may pose a challenge to students, influencing their expectations, while the memorization and comprehension in humanities and social sciences may be relatively easier, leading to more optimistic Performance Expectancy. Additionally, subject characteristics shape Peer Influence, as problem-solving in science and engineering subjects and text interpretation in humanities and social sciences subjects guide different peer interaction patterns. These differences create distinct competitive and collaborative atmospheres among peers, ultimately impacting students’ Learning Engagement.

Research Contributions

One of the significant contributions of this research is the substantial enhancement and expansion of the Unified Theory of Acceptance and Use of Technology (UTAUT) model’s application in the flipped classroom environment, providing crucial theoretical and practical insights to the field of education. By integrating variables such as performance expectancy, effort expectancy, and peer influence, and incorporating learning engagement and learning capability into the model, this research innovatively constructs an improved UTAUT model. This not only verifies the direct impact of these variables on learning capability but also reveals the mediating role of learning engagement. The research underscores the importance of performance expectancy, effort expectancy, and peer influence in promoting deep learning, active participation, and enhancing learning capabilities, especially in autonomous learning environments like flipped classrooms. These findings offer empirical evidence for understanding and improving flipped classroom design and provide educators with critical information for designing and implementing more effective strategies.

The second contribution of this research is demonstrating that learning engagement has a significant positive impact on learning capability, highlighting its central role in students’ learning processes. This finding has important implications for educational practice, providing a basis for improving flipped classroom design and emphasizing the necessity of deep learning engagement.

The third contribution lies in exploring how personality and disciplinary backgrounds moderate the effects of peer influence and learning engagement, as well as the impact of performance expectancy on learning engagement. This offers a unique perspective for understanding the influence of individual and disciplinary differences on learning. Furthermore, the model’s innovation lies in its focus on not only the impact of peer influence and performance expectancy on learning engagement but also the moderating effects of personality traits and disciplinary backgrounds. Thus, this research theoretically enhances the empirical foundation of educational psychology and behavioral science, opening new avenues for future educational practice and research. It stimulates research on optimizing learning environments by considering individual and disciplinary characteristics and provides guidance for educators on how to design and improve courses, helping students with different personalities and disciplinary backgrounds better adapt to flipped classrooms.

Suggestions

Strategies for implementing flipped classrooms in practice.

Successfully translating flipped classroom research findings into teaching practice is a gradual process. This process requires teachers to comprehensively and deeply understand the teaching philosophy and methods of the flipped classroom, fully recognizing its significant advantages in enhancing students’ learning initiative and deep engagement. Subsequently, based on specific course content and students’ actual needs, teachers should carefully plan and create preview videos aimed at stimulating students’ curiosity and effectively imparting core knowledge. In the classroom environment, teachers should create diversified interactive learning activities, such as group discussions, role-playing, or experimental operations, aimed at promoting students’ internalization and application of the learned knowledge. Simultaneously, establishing an efficient feedback system is crucial to enable teachers to grasp students’ learning progress and encountered problems in real-time, providing precise guidance and support. This series of processes smoothly transitions the research results of the flipped classroom into practical teaching strategies, thereby significantly improving teaching quality and optimizing students’ learning outcomes.

Adjustment and optimization of flipped classrooms in different educational environments

The research findings of flipped classrooms have profoundly impacted instructional design, emphasizing student-centeredness and focusing on students’ active learning and collaborative inquiry. In different educational backgrounds, the teaching strategies of flipped classrooms need to be adjusted accordingly to adapt to specific teaching environments and student needs. In the basic education stage, where students’ autonomous learning ability is relatively weak, teachers can design more guiding and interesting preview videos. Meanwhile, teacher-student interaction should be strengthened in the classroom to help students better understand and master knowledge. In contrast, in the higher education stage, where students possess stronger autonomous learning and inquiry abilities, teachers can set more challenging and research-oriented preview tasks and classroom activities to stimulate students’ innovative thinking and critical reflection. Through such adjustment strategies, the teaching model of the flipped classroom can be optimized to maximize its teaching effectiveness in different educational backgrounds.

Policy suggestions

The effective implementation of flipped classrooms relies on advanced educational technology platforms, which directly provides evidence for government investment in educational informatization. Based on this, the government can more targetedly increase support for educational technology innovation, thereby promoting the growth and progress of related industries. Additionally, the emphasis on students’ autonomous learning and collaborative inquiry in flipped classrooms aligns with the concept of quality education advocated in current educational reforms. Therefore, the government can formulate corresponding policies based on this, actively encouraging and guiding schools to explore and practice innovative teaching methods such as flipped classrooms. Moreover, educational equity is also a crucial aspect that cannot be ignored. The government should strive to ensure that all students have equal access to high-quality educational resources and technical support. Through this series of comprehensive and logical policy formulation and implementation, the application of educational technologies such as flipped classrooms in teaching will be more widely promoted and developed in-depth.

Research limitations and future research directions

Research limitations.

Sample selection: this research randomly selected 450 students as samples from three universities in Jiangxi Province: Jiangxi University of Finance and Economics, Nanchang University, and Jiangxi Normal University. Although this sample size has a certain degree of representativeness, it is still relatively limited and may not fully reflect the actual situation of all college students. Additionally, the sample only comes from universities in Jiangxi Province, and geographical restrictions may affect the universality of the results.

Potential self-selection bias: students willing to participate in the questionnaire survey may have stronger concerns and interests in issues related to learning engagement and learning capabilities. This may cause the sample to deviate from the overall distribution to some extent.

Limitations of result interpretation: this research mainly draws conclusions based on questionnaire surveys and structural equation modeling analysis. However, questionnaire surveys inherently rely on respondents’ self-reports, which may be subject to subjective bias or memory reconstruction.

Future research directions

Expanding sample scope and diversity: future research can consider selecting more diverse universities nationwide as samples to increase the representativeness and universality of the research. Simultaneously, other student groups besides college students, such as middle school students or graduate students, can also be included.

Controlling self-selection bias: to more accurately assess the relationship between learning engagement and learning capabilities, future research can adopt more rigorous sampling methods, such as multi-stage sampling, to further reduce self-selection bias.

Deeply exploring influencing factors: this research initially explores the impact of personality and disciplinary differences on learning engagement. Future research can further delve into other potential influencing factors, such as family background, learning environment, teacher support, etc., to more comprehensively reveal the complex relationship between learning engagement and learning capabilities.

Ethical approval and informed consent

All methods were carried out in accordance with relevant guidelines and regulations. All experimental protocols were approved by the Academic Committee of the School of International Economics and Trade, Jiangxi University of Finance and Economics. Informed consent was obtained from all subjects and/or their legal guardian(s).

Data availability

The data that support the findings of this research are available from the corresponding author upon reasonable request.

Sohrabi, B. & Iraj, H. Implementing flipped classroom using digital media: A comparison of two demographically different groups perceptions. Comput. Hum. Behav. 60 , 514–524 (2016).

Article Google Scholar

Flores-González, E. & Flores-González, N. The flipped classroom as a tool for learning at High School. J. Crit. Pedagog./Rev. de Pedagog. Crít. 6 (16), 10 (2022).

Google Scholar

Cuetos, M. J. Application of the flipped classroom model to stimulate University students’ learning with online education. J. Technol. Sci. Educ. 13 (1), 368–380 (2023).

Dewi, N. S. S., Padmadewi, N. & Santosa, M. H. The implementation of flipped classroom model in teaching English to Sapta Andika junior high school students in academic year 2019/2020. J. Educ. Res. Eval. 5 (1), 125–135 (2021).

Doung-In, S. Flip your classroom: Reach every student in every class every day. Walailak J. Learn. Innov. 3 (2), 71–78 (2017).

Strayer, J. F. How learning in an inverted classroom influences cooperation, innovation and task orientation. Learn. Environ. Res. 15 , 171–193 (2012).

Semab, T. & Naureen, S. Effects of flipped classroom model on academic achievement of students at elementary level. Pakistan J. Soc. Res. 4 (1), 393–401 (2022).

Roehl, A., Reddy, S. L. & Shannon, G. J. The flipped classroom: An opportunity to engage millennial students through active learning. J. Fam. Consum. Sci. 105 (2), 44 (2013).

Tucker, B. The flipped classroom. Educ. Next 12 (1), 82–83 (2012).

Nicholas, C. Teaching considerations for implementing a flipped classroom approach in postgraduate studies: The case of MBA. In Blended Learning in Practice . 67–80 (Spring, 2021).

Yang, H., Shao, Y., Bai, X., Ma, M., Liu, Y., & Liu, S. The design and application of flipped classroom teaching model based on blended learning: A case study of junior high school information technology course. In Proceedings of the 2023 14th International Conference on E-Education, E-Business, E-Management and E-Learning (2023).

Shofiyah, N., Wulandari, F. E., Mauliana, M. I. & Maghfiroh, L. 5E-based flipped classroom teaching model templates for STEM education. Indones. J. Cult. Community Dev. https://doi.org/10.21070/ijccd.v14i2.917 (2023).

Venkatesh, V., Morris, M. G., Davis, G. B. & Davis, F. D. User acceptance of information technology: Toward a unified view. MIS Q. 27 , 425–478 (2003).

Li, Y. & Zhao, M. The study on the influence factors of intention to continue using MOOCs: integrating UTAUT model and social presence. Interact. Learn. Environ. https://doi.org/10.1080/10494820.2024.2318562 (2024).

Tian, W., Ge, J., Zhao, Y. & Zheng, X. AI Chatbots in Chinese higher education: Adoption, perception, and influence among graduate students—an integrated analysis utilizing UTAUT and ECM models. Front. Psychol. 15 , 1268549 (2024).

Article PubMed PubMed Central Google Scholar

Alyoussef, I. Y. Acceptance of a flipped classroom to improve university students’ learning: An empirical study on the TAM model and the unified theory of acceptance and use of technology (UTAUT). Heliyon 8 (12), e12529 (2022).

Agyei, C. & Razi, Ö. The effect of extended UTAUT model on EFLs’ adaptation to flipped classroom. Educ. Inf. Technol. 27 (2), 1865–1882. https://doi.org/10.1007/s10639-021-10657-2 (2022).

Fredricks, J. A., Blumenfeld, P. C. & Paris, A. H. School engagement: Potential of the concept, state of the evidence. Rev. Educ. Res. 74 (1), 59–109 (2004).

Liu, Y., Zhang, M., Qi, D. & Zhang, Y. Understanding the role of learner engagement in determining MOOCs satisfaction: A self-determination theory perspective. Interact. Learn. Environ. 31 (9), 6084–6098 (2023).

Lan, M. & Hew, K. F. Examining learning engagement in MOOCs: A self-determination theoretical perspective using mixed method. Int. J. Educ. Technol. High. Educ. 17 (1), 7 (2020).

Ojo, A. O., Ravichander, S., Tan, C.N.-L., Anthonysamy, L. & Arasanmi, C. N. Investigating student’s motivation and online learning engagement through the lens of self-determination theory. J. Appl. Res. High. Educ. https://doi.org/10.1108/JARHE-09-2023-0445 (2024).

Voogt, J., Fisser, P., Pareja Roblin, N., Tondeur, J. & van Braak, J. Technological pedagogical content knowledge—A review of the literature. J. Comput. Assist. Learn. 29 (2), 109–121 (2013).

Solichah, H., Jailani, J. & Handayani, M. T. PEMBELAJARAN TIPE NHT UNTUK MENDUKUNG KETRAMPILAN COMMUNICATION, COLLABORATION, CRITICAL THINKING, DAN PROBLEM SOLVING. AKSIOMA J. Program Studi Pendidik. Mat. 11 (2), 381–1390 (2022).

Sari, D. M. M. & Wardhani, A. K. Critical thinking as learning and innovation skill in the 21st century. J. English Lang. Pedagog. 3 (2), 27–34 (2020).

Pratama, S., Haenilah, E. Y. & Adha, M. M. Is there a need for an e-module focused on contextual teaching and learning to improve student critical thinking? A preliminary examination into needs assessment. Int. J. Educ. Stud. Soc. Sci. 2 , 108 (2022).

Nuraini, U., Restuningdiah, N., Sidharta, E. A., & Utami, H. Based learning to improve students’ critical thinking skills in studying business ethics. In International Conference on Strategic Issues of Economics, Business and, Education (ICoSIEBE 2020) (2021)

Zainuddin, Z. & Halili, S. H. Flipped classroom research and trends from different fields of study. Int. Rev. Res. Open Distrib. Learn. 17 (3), 313–340 (2016).

Singh, N. “A Little Flip Goes a Long Way”—The impact of a flipped classroom design on student performance and engagement in a first-year undergraduate economics classroom. Educ. Sci. 10 (11), 319 (2020).

Riddle, E. & Gier, E. Flipped classroom improves student engagement, student performance, and sense of community in a nutritional sciences course (P07-007-19). Curr. Dev. Nutr. 3 (1), nzz032.P07-007-19 (2019).

Article PubMed Central Google Scholar

Clark, K. R. The effects of the flipped model of instruction on student engagement and performance in the secondary mathematics classroom. J. Educ. Online 12 (1), 91–115 (2015).

Jia, C., Hew, K. F., Jiahui, D. & Liuyufeng, L. Towards a fully online flipped classroom model to support student learning outcomes and engagement: A 2-year design-based study. Internet High. Educ. 56 , 100878 (2023).

Ruiz, C. G. The effect of integrating Kahoot! and peer instruction in the Spanish flipped classroom: The student perspective. J. Span. Lang. Teach. 8 (1), 63–78 (2021).

Zimmerman, B. J. Self-efficacy: An essential motive to learn. Contemp. Educ. Psychol. 25 (1), 82–91 (2000).

Article CAS PubMed Google Scholar

Pekrun, R., Goetz, T., Titz, W. & Perry, R. P. Academic emotions in students’ self-regulated learning and achievement: A program of qualitative and quantitative research. Educ. Psychol. 37 (2), 91–105 (2002).

Martin, F. & Bolliger, D. U. Engagement matters: Student perceptions on the importance of engagement strategies in the online learning environment. Online Learn. 22 (1), 205–222 (2018).

Chen, C.-M. & Wu, C.-H. Effects of different video lecture types on sustained attention, emotion, cognitive load, and learning performance. Comput. Educ. 80 , 108–121 (2015).

Buabeng-Andoh, C. Exploring University students’ intention to use mobile learning: A research model approach. Educ. Inf. Technol. 26 (1), 241–256 (2021).

Kuo, Y.-C., Walker, A. E., Belland, B. R., Schroder, K. E. & Kuo, Y.-T. A case study of integrating Interwise: Interaction, internet self-efficacy, and satisfaction in synchronous online learning environments. Int. Rev. Res. Open Distrib. Learn. 15 (1), 161–181 (2014).

Jamaludin, R. & Osman, S. Z. M. The use of a flipped classroom to enhance engagement and promote active learning. J. Educ. Pract. 5 (2), 124–131 (2014).

Wang, F. H. An exploration of online behaviour engagement and achievement in flipped classroom supported by learning management system. Comput. Educ. 114 , 79–91 (2017).

Nerantzi, C. The use of peer instruction and flipped learning to support flexible blended learning during and after the COVID-19 Pandemic. Int. J. Manag. Appl. Res. 7 (2), 184–195 (2020).

Eysenck, H. J. & Eysenck, S. B. The biological basis of personality. In Personality structure and measurement (Psychology Revivals) 49–62 (Routledge, 2013).

Chuang, H. H., Weng, C. Y. & Chen, C. H. Which students benefit most from a flipped classroom approach to language learning?. Br. J. Edu. Technol. 49 (1), 56–68 (2018).

Kim, M., Roh, S. & Ihm, J. The relationship between non-cognitive student attributes and academic achievements in a flipped learning classroom of a pre-dental science course. Korean J. Med. Educ. 30 (4), 339 (2018).

Wang, L., Tian, Y., Lei, Y., & Zhou, Z. The influence of different personality traits on learning achievement in three learning situations. Blended Learning. In New Challenges and Innovative Practices: 10th International Conference, ICBL 2017, (Hong Kong, China, June 27–29, 2017).

Liu, Q. et al. The effectiveness of blended learning in health professions: Systematic review and meta-analysis. J. Med. Internet Res. 18 (1), e2 (2016).

Fan, W. Social influences, school motivation and gender differences: An application of the expectancy-value theory. Educ. Psychol. 31 (2), 157–175 (2011).

Devisakti, A. & Ramayah, T. Sense of belonging and grit in e-learning portal usage in higher education. Interact. Learn. Environ. 31 , 1–15 (2021).

Zou, C., Li, P. & Jin, L. Integrating smartphones in EFL classrooms: Students’ satisfaction and perceived learning performance. Educ. Inf. Technol. 27 (9), 12667–12688 (2022).

Zhonggen, Y. & Xiaozhi, Y. An extended technology acceptance model of a mobile learning technology. Comput. Appl. Eng. Educ. 27 (3), 721–732 (2019).

Khlaisang, J., Songkram, N., Huang, F. & Teo, T. Teachers’ perception of the use of mobile technologies with smart applications to enhance students’ thinking skills: A study among primary school teachers in Thailand. Interact. Learn. Environ. 31 (8), 5037–5058 (2023).

Qureshi, A., Wall, H., Humphries, J. & Balani, A. B. Can personality traits modulate student engagement with learning and their attitude to employability?. Learn. Individ. Differ. 51 , 349–358 (2016).

Arshad, M. & Akram, M. S. Social media adoption by the academic community: Theoretical insights and empirical evidence from developing countries. Int. Rev. Res. Open Distrib. Learn. https://doi.org/10.19173/irrodl.v19i3.3500 (2018).

Liu, C.-H.S. Remodelling progress in tourism and hospitality students’ creativity through social capital and transformational leadership. J. Hosp. Leis. Sport Tour. Educ. 21 , 69–82 (2017).

CAS Google Scholar

Baruch, Y. & Lin, C.-P. All for one, one for all: Coopetition and virtual team performance. Technol. Forecast. Soc. Chang. 79 (6), 1155–1168 (2012).

Fornell, C. & Larcker, D. F. Evaluating structural equation models with unobservable variables and measurement error. J. Mark. Res. 18 (1), 39–50 (1981).

Kobayashi, K. D. Using flipped classroom and virtual field trips to engage students. HortTechnology 27 (4), 458–460 (2017).

Plageras, A., Xenakis, A., Kalovrektis, K. & Vavouyios, D. An application study of the UTAUT methodology for the flipped classroom model adoption by applied sciences and technology teachers. Int. J. Emerg. Technol. Learn. (Online) 18 (2), 190 (2023).

Kissi, P. S., Nat, M. & Armah, R. B. The effects of learning–family conflict, perceived control over time and task-fit technology factors on urban–rural high school students’ acceptance of video-based instruction in flipped learning approach. Educ. Technol. Res. Dev. 66 (6), 1547–1569 (2018).

Tang, Y. & Hew, K. F. Effects of using mobile instant messaging on student behavioral, emotional, and cognitive engagement: A quasi-experimental study. Int. J. Educ. Technol. High. Educ. 19 , 1–22 (2022).

Article CAS Google Scholar

Download references

Project-driven teaching method makes great innovation and practice in the brand planning course (JXJG—17—4—12); To guide college students to participate in the social practice of colleges and universities serving the local economic and social development Explore ——Take Brand Planning as an example(JG2022017).

Author information

Authors and affiliations.

School of International Trade and Economics, Jiangxi University of Finance and Economics, Nanchang, 330013, China

Yufan Pan & Wang He

You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Y.P. and W.H.; Data curation: Y.P. and W.H.; Formal analysis: Y.P. and W.H.; Investigation: Y.P. and W.H.; Methodology: Y.P. and W.H.; Project administration: Y.P. and W.H.; Sofware: Y.P. and W.H., Writing—original draf: Y.P. and W.H. ; Writing—review & editing: Y.P. and W.H.

Corresponding author

Correspondence to Wang He .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Pan, Y., He, W. Research on the influencing factors of promoting flipped classroom teaching based on the integrated UTAUT model and learning engagement theory. Sci Rep 14 , 15201 (2024). https://doi.org/10.1038/s41598-024-66214-7

Download citation

Received : 30 January 2024

Accepted : 28 June 2024

Published : 02 July 2024

DOI : https://doi.org/10.1038/s41598-024-66214-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Flipped classroom
Learning engagement
Learning capability

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

IMAGES

Hypothesis Testing Steps & Examples
Hypothesis Testing : Infographics
Hypothesis Testing Solved Examples(Questions and Solutions)
5 Steps of Hypothesis Testing with Examples
What is Hypothesis Testing? Types and Methods
PPT

VIDEO

Intro to Statistics Basic Concepts and Research Techniques
What Is A Hypothesis?
How to use Hypothesis with JSTOR content
Hypothesis Testing Research Design
How to use hypothesis testing?
1.4.10

COMMENTS

Hypothesis Testing
Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).
Hypothesis Testing, P Values, Confidence Intervals, and Significance
Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting ...
An Introduction to Statistics: Understanding Hypothesis Testing and
HYPOTHESIS TESTING. A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the "alternate" hypothesis, and the opposite ...
Hypothesis Testing: Uses, Steps & Example
Formulate the Hypotheses: Write your research hypotheses as a null hypothesis (H 0) and an alternative hypothesis (H A).; Data Collection: Gather data specifically aimed at testing the hypothesis.; Conduct A Test: Use a suitable statistical test to analyze your data.; Make a Decision: Based on the statistical test results, decide whether to reject the null hypothesis or fail to reject it.
Hypothesis Testing
Hypothesis testing is a scientific method used for making a decision and drawing conclusions by using a statistical approach. It is used to suggest new ideas by testing theories to know whether or not the sample data supports research. A research hypothesis is a predictive statement that has to be tested using scientific methods that join an ...
Hypothesis tests
A hypothesis test is a procedure used in statistics to assess whether a particular viewpoint is likely to be true. They follow a strict protocol, and they generate a 'p-value', on the basis of which a decision is made about the truth of the hypothesis under investigation.All of the routine statistical 'tests' used in research—t-tests, χ 2 tests, Mann-Whitney tests, etc.—are all ...
Statistical Hypothesis Testing Overview
Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. This post provides an overview of statistical hypothesis testing.
Hypothesis Testing
However, in order to use hypothesis testing, you need to re-state your research hypothesis as a null and alternative hypothesis. Before you can do this, it is best to consider the process/structure involved in hypothesis testing and what you are measuring. This structure is presented on the next page. Understand the structure of hypothesis ...
How to Write a Strong Hypothesis
6. Write a null hypothesis. If your research involves statistical hypothesis testing, you will also have to write a null hypothesis. The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0, while the alternative hypothesis is H 1 or H a.
Mastering Hypothesis Testing: A Comprehensive Guide for ...
Hypothesis testing involves various statistical tests, each suited to different types of data and research questions. Understanding these tests and knowing when to use them is crucial for accurate ...
Hypothesis Testing: Definition, Uses, Limitations + Examples
Hypothesis testing is as old as the scientific method and is at the heart of the research process. Research exists to validate or disprove assumptions about various phenomena. The process of validation involves testing and it is in this context that we will explore hypothesis testing.
What is hypothesis testing?
The research methods you use depend on the type of data you need to answer your research question. If you want to measure something or test a hypothesis, use quantitative methods. If you want to explore ideas, thoughts and meanings, use qualitative methods. If you want to analyze a large amount of readily-available data, use secondary data.
Hypothesis Tests
A falsifiable hypothesis is a statement, or hypothesis, that can be contradicted with evidence. In empirical (data-driven) research, this evidence will always be obtained through the data. In statistical hypothesis testing, the hypothesis that we formally test is called the null hypothesis.
What is Hypothesis Testing in Statistics? Types and Examples
Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence.
Research Hypothesis: Definition, Types, Examples and Quick Tips
A research hypothesis is an assumption or a tentative explanation for a specific process observed during research. Unlike a guess, research hypothesis is a calculated, educated guess proven or disproven through research methods. ... He published a paper in 1925 that introduced the concept of null hypothesis testing, and he was also the first to ...
Hypothesis Testing
The first step in testing hypotheses is the transformation of the research question into a null hypothesis, H 0, and an alternative hypothesis, H A. 6 The null and alternative hypotheses are concise statements, usually in mathematical form, of 2 possible versions of "truth" about the relationship between the predictor of interest and the outcome in the population.
Hypothesis Testing: 4 Steps and Example
Hypothesis testing is an act in statistics whereby an analyst tests an assumption regarding a population parameter. The methodology employed by the analyst depends on the nature of the data used ...
A Beginner's Guide to Hypothesis Testing in Business
3. One-Sided vs. Two-Sided Testing. When it's time to test your hypothesis, it's important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests, or one-tailed and two-tailed tests, respectively. Typically, you'd leverage a one-sided test when you have a strong conviction ...
What is Hypothesis Testing? Types and Methods
Hypothesis Testing is a statistical concept to verify the plausibility of a hypothesis that is based on data samples derived from a given population, using two competing hypotheses. ... Alternative Hypothesis (H1) or the research hypothesis states that there is a relationship between two variables (where one variable affects the other). ...
Hypothesis Testing
P-Values. The p-value of a hypothesis test is the probability that your sample data would have occurred if you hypothesis were not correct. Traditionally, researchers have used a p-value of 0.05 (a 5% probability that your sample data would have occurred if your hypothesis was wrong) as the threshold for declaring that a hypothesis is true.
A Practical Guide to Writing Quantitative and Qualitative Research
Hypothesis-testing (Quantitative hypothesis-testing research) - Quantitative research uses deductive reasoning. - This involves the formation of a hypothesis, collection of data in the investigation of the problem, analysis and use of the data from the investigation, and drawing of conclusions to validate or nullify the hypotheses.
Hypothesis testing
Probability value and types of errors. The probability value, or p value, is the probability of an outcome or research result given the hypothesis.Usually, the probability value is set at 0.05: the null hypothesis will be rejected if the probability value of the statistical test is less than 0.05.
Chapter 3: Hypothesis Testing
Components of a Formal Hypothesis Test. The null hypothesis is a statement about the value of a population parameter, such as the population mean (µ) or the population proportion (p).It contains the condition of equality and is denoted as H 0 (H-naught).. H 0: µ = 157 or H 0: p = 0.37. The alternative hypothesis is the claim to be tested, the opposite of the null hypothesis.
Kruskal Wallis Test Explained
Consequently, we can reject the null hypothesis that all groups have the same average rank. At least one group has a different average rank than the others. Furthermore, if the three hospital distributions have the same shape, we can conclude that the medians differ. At this point, we might decide to use a post hoc test to compare pairs of ...
How can I use the Z.TEST function in Excel to perform a statistical
It is commonly used in research and data analysis to determine the significance of a sample mean compared to a known population mean. To use the Z.TEST function, the user must input the data range for the sample and the known population mean. ... This allows for efficient and accurate hypothesis testing, making it a valuable tool for decision ...
The Blood Microbiome Is Probably Not Real
Putting the diagnostic cart before the horse This blood microbiome story could end here and simply be an interesting example of scientific research homing in on a curious finding, testing a hypothesis, and ultimately refuting it (or at the very least providing strong evidence against it).
Research on the influencing factors of promoting flipped ...
This research delves into the flipped classroom teaching methodology, employing the Unified Theory of Acceptance and Use of Technology (UTAUT), learning engagement theory, and the 4C skills ...

Hypothesis Testing – A Complete Guide with Examples

What is a Hypothesis and a Hypothesis Testing?

What is Hypothesis Testing?

Characteristics of the Hypothesis to be Tested

What is a Null Hypothesis and Alternative Hypothesis?

Hire an Expert Researcher

How to Conduct Hypothesis Testing?

Step 1: State the Null and Alternative Hypothesis

Step 2: Data Collection

Step 3: Select Appropriate Statistical Test

One-sided Test

Two-sided Test

Get statistical analysis help at an affordable price

Step 4: Select the Level of Significance

Step 5: Find out Whether the Null Hypothesis is Rejected or Supported

Step 6: Present the Outcomes of your Study

Frequently Asked Questions

What is a hypothesis?

What are null hypothesis?

What is the probability value?

What is p value?

What is a t test?

When to reject null hypothesis?

You May Also Like

Hypothesis Testing

An example of a lecturer's dilemma

The research hypothesis

Sample to population

What is a Hypothesis?

What are the Types of Hypotheses?

2. Complex Hypothesis

3. Null Hypothesis

4. Alternative Hypothesis

5. Logical Hypothesis

6. Empirical Hypothesis

7. Statistical Hypothesis

What is Hypothesis Testing?

How Hypothesis Testing Works

What Are The Stages of Hypothesis Testing?

Applications of Hypothesis Testing in Research

What is an Example of Hypothesis Testing?

Importance/Benefits of Hypothesis Testing

Criticism and Limitations of Hypothesis Testing

Type I vs Type II Errors: Causes, Examples & Prevention

Internal Validity in Research: Definition, Threats, Examples

What is Pure or Basic Research? + [Examples & Method]

Alternative vs Null Hypothesis: Pros, Cons, Uses & Examples

Formplus - For Seamless Data Collection

Frequently asked questions

Frequently asked questions: Methodology

Ask our team

Hypothesis tests #

Test statistics #

Testing the equality of two proportions #

The data and research question #

The population structure #

A test statistic #

Calibrating the evidence in the test statistic #

Summary of this example #

Comparison of means #

Assessment of a correlation #

Sampling properties of p-values #

Tutorial Playlist

The Best Guide to Understand Bayes Theorem

A Complete Guide to Get a Grasp of Time Series Analysis

The Complete Guide to Understand Pearson's Correlation

Table of Contents

The Ultimate Ticket to Top Data Science Job Roles

What Is Hypothesis Testing in Statistics?

Hypothesis Testing Formula

How Hypothesis Testing Works?

Your Dream Career is Just Around The Corner!

Null Hypothesis and Alternate Hypothesis

Become a Data Scientist with Hands-on Training!

Hypothesis Testing Calculation With Examples

Steps of Hypothesis Testing

Formulate Hypotheses

Choose the Significance Level (α)

Select the Appropriate Test

Collect Data