Columbia University
Room 1005 SSW, MC 4690
1255 Amsterdam Avenue
New York, NY 10027
Phone: 212.851.2132
Fax: 212.851.2164
Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.
Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organisations.
To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.
After collecting data from your sample, you can organise and summarise the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalise your findings.
This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.
Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarise your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, frequently asked questions about statistics.
To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.
The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.
A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.
While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.
A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.
First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.
Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.
First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.
In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.
When planning a research design, you should operationalise your variables and decide exactly how you will measure them.
For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:
Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.
Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.
In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.
Variable | Type of data |
---|---|
Age | Quantitative (ratio) |
Gender | Categorical (nominal) |
Race or ethnicity | Categorical (nominal) |
Baseline test scores | Quantitative (interval) |
Final test scores | Quantitative (interval) |
Parental income | Quantitative (ratio) |
---|---|
GPA | Quantitative (interval) |
In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.
Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.
There are two main approaches to selecting a sample.
In theory, for highly generalisable findings, you should use a probability sampling method. Random selection reduces sampling bias and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.
But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to be biased, they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.
If you want to use parametric tests for non-probability samples, you have to make the case that:
Keep in mind that external validity means that you can only generalise your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialised, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.
If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalised in your discussion section .
Based on the resources available for your research, decide on how you’ll recruit participants.
Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.
Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.
There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.
To use these calculators, you have to understand and input these key components:
Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarise them.
There are various ways to inspect your data, including the following:
By visualising your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.
A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.
In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.
Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.
Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:
However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.
Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:
Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.
Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.
Pretest scores | Posttest scores | |
---|---|---|
Mean | 68.44 | 75.25 |
Standard deviation | 9.43 | 9.88 |
Variance | 88.96 | 97.96 |
Range | 36.25 | 45.12 |
30 |
From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.
It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.
Parental income (USD) | GPA | |
---|---|---|
Mean | 62,100 | 3.12 |
Standard deviation | 15,000 | 0.45 |
Variance | 225,000,000 | 0.16 |
Range | 8,000–378,000 | 2.64–4.00 |
653 |
A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.
Researchers often use two main methods (simultaneously) to make inferences in statistics.
You can make two types of estimates of population parameters from sample statistics:
If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.
You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).
There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.
A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.
Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.
Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:
Statistical tests come in three main varieties:
Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.
Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.
A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).
Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.
The z and t tests have subtypes based on the number and types of samples and the hypotheses:
The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.
However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.
You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:
Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.
A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:
The final step of statistical analysis is interpreting your results.
In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.
Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.
This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.
Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.
A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.
In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .
With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.
Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.
You can aim to minimise the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.
Traditionally, frequentist statistics emphasises null hypothesis significance testing and always starts with the assumption of a true null hypothesis.
However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.
Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
The research methods you use depend on the type of data you need to answer your research question .
Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.
Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, between-subjects design | examples, pros & cons, more interesting articles.
IMAGES
VIDEO
COMMENTS
Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.
The results chapter (also referred to as the findings or analysis chapter) is one of the most important chapters of your dissertation or thesis because it shows the reader what you've found in terms of the quantitative data you've collected. It presents the data using a clear text narrative, supported by tables, graphs and charts.
As a result, you have to run another statistical test (e.g., a Wilcoxon signed-rank test instead of a dependent t-test). At this stage in the dissertation process, it is important, or at the very least, useful to think about the data analysis techniques you may apply to your data when it is collected. We suggest that you do this for two reasons:
And place questionnaires, copies of focus groups and interviews, and data sheets in the appendix. On the other hand, one must put the statistical analysis and sayings quoted by interviewees within the dissertation. 8. Thoroughness of Data. It is a common misconception that the data presented is self-explanatory.
Dissertation methodologies require a data analysis plan. Your dissertation data analysis plan should clearly state the statistical tests and assumptions of these tests to examine each of the research questions, how scores are cleaned and created, and the desired sample size for that test. The selection of statistical tests depend on two factors ...
The results chapter of a thesis or dissertation presents your research results concisely and objectively. In quantitative research, for each question or hypothesis, state: The type of analysis used; Relevant results in the form of descriptive and inferential statistics; Whether or not the alternative hypothesis was supported
The data analysis process involves three steps: (STEP ONE) select the correct statistical tests to run on your data; (STEP TWO) prepare and analyse the data you have collected using a relevant statistics package; and (STEP THREE) interpret the findings properly so that you can write up your results (i.e., usually in Chapter Four: Results ).
Fast-Track Your Data Analysis, Today. Enter your details below, pop us an email, or book an introductory consultation. If you are a human seeing this field, please leave it empty. Get 1-on-1 help analysing and interpreting your qualitative or quantitative dissertation or thesis data from the experts at Grad Coach. Book online now.
The first step in dissertation data analysis is to carefully prepare and clean the collected data. This may involve removing any irrelevant or incomplete information, addressing missing data, and ensuring data integrity. Once the data is ready, various statistical and analytical techniques can be applied to extract meaningful information.
STATA is a statistical analysis software program commonly used in the sciences and economics. STATA can be used for data management, statistical modelling, descriptive statistics analysis, and data visualization tasks. e. SAS. SAS is a commercial statistical analysis software program used by businesses and organizations worldwide.
Statistical analysis involves the application of various mathematical and computational techniques to analyze and interpret data. In the context of a dissertation, statistical analysis helps researchers draw conclusions, validate research hypotheses, and make informed decisions based on empirical evidence.
When carrying out dissertation statistical analyses, many students feel that they have opened up a Pandora's Box.Some of the common issues that cause such frustration in the dissertation statistical analyses include a poorly developed methodology or even an inadequately designed research framework. But if the foundation of your research is completed logically, then statistical analysis ...
The statistical analysis for your thesis or dissertation should be appropriate for what you are researching and should fit with your needs and capabilities. I know, that's not saying much, but it's important that you're comfortable with the statistical analysis you will be conducting. An experienced dissertation consultant will help you ...
Dissertation data analysis chapter is a significant addition to your research. Learn how to come up with perfect data analysis with the help of this guide. ... In this chapter, researchers employ statistical techniques, qualitative methods, or a combination of both to make sense of the data gathered during the research process.
2.1. Basic Statistical Analysis. The type of statistical analysis that you choose for the results and findings chapter depends on the extent to which you wish to analyse the data and summarise your findings. If you do not major in quantitative subjects but write a dissertation in social sciences, basic statistical analysis will be sufficient.
Therefore, it is essential that a dissertation writer has an understanding of the range of software and tools available to them in order to create an effective statistical analysis. Analyzing Data Exploratory analysis and data visualization are two key techniques used in dissertation projects to investigate the relationship between variables.
However, understanding statistical results is crucial when you're conducting quantitative research for your dissertation. In this blog post, we will outline a step-by-step guide to help you get started with interpreting the results of statistical analysis for your dissertation. 🔍 Step 1: Review your Research Questions and Hypotheses
Statistical analysis is the link between the theoretical theories put forward in the dissertation and data from the real world. It proves or disproves these hypotheses, which turns the study into more than just a theory. ... Because custom dissertation writing services have become more popular in academia, it is important for Master's ...
Dissertation data analysis is especially difficult to perform because it requires that the doctoral student knows all there is to know about statistics, statistical procedures and statistical methodologies. Thus, without the proper expertise and know-how in statistics, doctoral students can flounder through the dissertation data analysis part ...
Guidelines and Explanations. In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping ...
Statistical analysis in a dissertation is a critical component that involves applying mathematical and statistical techniques to the collected data to test hypotheses, analyze patterns, and draw conclusions. It serves as the backbone of the research, providing a quantitative foundation for validating the research questions and supporting the ...
Theses/Dissertations from 2016 PDF. A Statistical Analysis of Hurricanes in the Atlantic Basin and Sinkholes in Florida, Joy Marie D'andrea. PDF. Statistical Analysis of a Risk Factor in Finance and Environmental Models for Belize, Sherlene Enriquez-Savery. PDF
Hire me as a consultant to work on the data analysis (statistical analysis) portion of your dissertation or thesis. Text me on my Discord CWCO#8243 & Click here to view Completed Projects I'm great with STATA, SPSS, R (I love the R Studio IDE btw), JAMOVI, EViews & Minitab. If you prefer email, shoot a quick DM.
Dissertation TBA. Sponsor: Sumit Mukherjee. 2021 Ph.D. Dissertations. Tong Li. On the Construction of Minimax Optimal Nonparametric Tests with Kernel Embedding Methods. Sponsor: Liam Paninski. Ding Zhou. Advances in Statistical Machine Learning Methods for Neural Data Science. Sponsor: Liam Paninski.
Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarise your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.