LOGO ANALYTICS FOR DECISIONS

11 Tips For Writing a Dissertation Data Analysis

Since the evolution of the fourth industrial revolution – the Digital World; lots of data have surrounded us. There are terabytes of data around us or in data centers that need to be processed and used. The data needs to be appropriately analyzed to process it, and Dissertation data analysis forms its basis. If data analysis is valid and free from errors, the research outcomes will be reliable and lead to a successful dissertation. 

Considering the complexity of many data analysis projects, it becomes challenging to get precise results if analysts are not familiar with data analysis tools and tests properly. The analysis is a time-taking process that starts with collecting valid and relevant data and ends with the demonstration of error-free results.

So, in today’s topic, we will cover the need to analyze data, dissertation data analysis, and mainly the tips for writing an outstanding data analysis dissertation. If you are a doctoral student and plan to perform dissertation data analysis on your data, make sure that you give this article a thorough read for the best tips!

What is Data Analysis in Dissertation?

Dissertation Data Analysis  is the process of understanding, gathering, compiling, and processing a large amount of data. Then identifying common patterns in responses and critically examining facts and figures to find the rationale behind those outcomes.

Even f you have the data collected and compiled in the form of facts and figures, it is not enough for proving your research outcomes. There is still a need to apply dissertation data analysis on your data; to use it in the dissertation. It provides scientific support to the thesis and conclusion of the research.

Data Analysis Tools

There are plenty of indicative tests used to analyze data and infer relevant results for the discussion part. Following are some tests  used to perform analysis of data leading to a scientific conclusion:

11 Most Useful Tips for Dissertation Data Analysis

Doctoral students need to perform dissertation data analysis and then dissertation to receive their degree. Many Ph.D. students find it hard to do dissertation data analysis because they are not trained in it.

1. Dissertation Data Analysis Services

The first tip applies to those students who can afford to look for help with their dissertation data analysis work. It’s a viable option, and it can help with time management and with building the other elements of the dissertation with much detail.

Dissertation Analysis services are professional services that help doctoral students with all the basics of their dissertation work, from planning, research and clarification, methodology, dissertation data analysis and review, literature review, and final powerpoint presentation.

One great reference for dissertation data analysis professional services is Statistics Solutions , they’ve been around for over 22 years helping students succeed in their dissertation work. You can find the link to their website here .

For a proper dissertation data analysis, the student should have a clear understanding and statistical knowledge. Through this knowledge and experience, a student can perform dissertation analysis on their own. 

Following are some helpful tips for writing a splendid dissertation data analysis:

2. Relevance of Collected Data

If the data is irrelevant and not appropriate, you might get distracted from the point of focus. To show the reader that you can critically solve the problem, make sure that you write a theoretical proposition regarding the selection  and analysis of data.

3. Data Analysis

For analysis, it is crucial to use such methods that fit best with the types of data collected and the research objectives. Elaborate on these methods and the ones that justify your data collection methods thoroughly. Make sure to make the reader believe that you did not choose your method randomly. Instead, you arrived at it after critical analysis and prolonged research.

On the other hand,  quantitative analysis  refers to the analysis and interpretation of facts and figures – to build reasoning behind the advent of primary findings. An assessment of the main results and the literature review plays a pivotal role in qualitative and quantitative analysis.

The overall objective of data analysis is to detect patterns and inclinations in data and then present the outcomes implicitly.  It helps in providing a solid foundation for critical conclusions and assisting the researcher to complete the dissertation proposal. 

4. Qualitative Data Analysis

Qualitative data refers to data that does not involve numbers. You are required to carry out an analysis of the data collected through experiments, focus groups, and interviews. This can be a time-taking process because it requires iterative examination and sometimes demanding the application of hermeneutics. Note that using qualitative technique doesn’t only mean generating good outcomes but to unveil more profound knowledge that can be transferrable.

Presenting qualitative data analysis in a dissertation  can also be a challenging task. It contains longer and more detailed responses. Placing such comprehensive data coherently in one chapter of the dissertation can be difficult due to two reasons. Firstly, we cannot figure out clearly which data to include and which one to exclude. Secondly, unlike quantitative data, it becomes problematic to present data in figures and tables. Making information condensed into a visual representation is not possible. As a writer, it is of essence to address both of these challenges.

          Qualitative Data Analysis Methods

Following are the methods used to perform quantitative data analysis. 

  •   Deductive Method

This method involves analyzing qualitative data based on an argument that a researcher already defines. It’s a comparatively easy approach to analyze data. It is suitable for the researcher with a fair idea about the responses they are likely to receive from the questionnaires.

  •  Inductive Method

In this method, the researcher analyzes the data not based on any predefined rules. It is a time-taking process used by students who have very little knowledge of the research phenomenon.

5. Quantitative Data Analysis

Quantitative data contains facts and figures obtained from scientific research and requires extensive statistical analysis. After collection and analysis, you will be able to conclude. Generic outcomes can be accepted beyond the sample by assuming that it is representative – one of the preliminary checkpoints to carry out in your analysis to a larger group. This method is also referred to as the “scientific method”, gaining its roots from natural sciences.

The Presentation of quantitative data  depends on the domain to which it is being presented. It is beneficial to consider your audience while writing your findings. Quantitative data for  hard sciences  might require numeric inputs and statistics. As for  natural sciences , such comprehensive analysis is not required.

                Quantitative Analysis Methods

Following are some of the methods used to perform quantitative data analysis. 

  • Trend analysis:  This corresponds to a statistical analysis approach to look at the trend of quantitative data collected over a considerable period.
  • Cross-tabulation:  This method uses a tabula way to draw readings among data sets in research.  
  • Conjoint analysis :   Quantitative data analysis method that can collect and analyze advanced measures. These measures provide a thorough vision about purchasing decisions and the most importantly, marked parameters.
  • TURF analysis:  This approach assesses the total market reach of a service or product or a mix of both. 
  • Gap analysis:  It utilizes the  side-by-side matrix  to portray quantitative data, which captures the difference between the actual and expected performance. 
  • Text analysis:  In this method, innovative tools enumerate  open-ended data  into easily understandable data. 

6. Data Presentation Tools

Since large volumes of data need to be represented, it becomes a difficult task to present such an amount of data in coherent ways. To resolve this issue, consider all the available choices you have, such as tables, charts, diagrams, and graphs. 

Tables help in presenting both qualitative and quantitative data concisely. While presenting data, always keep your reader in mind. Anything clear to you may not be apparent to your reader. So, constantly rethink whether your data presentation method is understandable to someone less conversant with your research and findings. If the answer is “No”, you may need to rethink your Presentation. 

7. Include Appendix or Addendum

After presenting a large amount of data, your dissertation analysis part might get messy and look disorganized. Also, you would not be cutting down or excluding the data you spent days and months collecting. To avoid this, you should include an appendix part. 

The data you find hard to arrange within the text, include that in the  appendix part of a dissertation . And place questionnaires, copies of focus groups and interviews, and data sheets in the appendix. On the other hand, one must put the statistical analysis and sayings quoted by interviewees within the dissertation. 

8. Thoroughness of Data

It is a common misconception that the data presented is self-explanatory. Most of the students provide the data and quotes and think that it is enough and explaining everything. It is not sufficient. Rather than just quoting everything, you should analyze and identify which data you will use to approve or disapprove your standpoints. 

Thoroughly demonstrate the ideas and critically analyze each perspective taking care of the points where errors can occur. Always make sure to discuss the anomalies and strengths of your data to add credibility to your research.

9. Discussing Data

Discussion of data involves elaborating the dimensions to classify patterns, themes, and trends in presented data. In addition, to balancing, also take theoretical interpretations into account. Discuss the reliability of your data by assessing their effect and significance. Do not hide the anomalies. While using interviews to discuss the data, make sure you use relevant quotes to develop a strong rationale. 

It also involves answering what you are trying to do with the data and how you have structured your findings. Once you have presented the results, the reader will be looking for interpretation. Hence, it is essential to deliver the understanding as soon as you have submitted your data.

10. Findings and Results

Findings refer to the facts derived after the analysis of collected data. These outcomes should be stated; clearly, their statements should tightly support your objective and provide logical reasoning and scientific backing to your point. This part comprises of majority part of the dissertation. 

In the finding part, you should tell the reader what they are looking for. There should be no suspense for the reader as it would divert their attention. State your findings clearly and concisely so that they can get the idea of what is more to come in your dissertation.

11. Connection with Literature Review

At the ending of your data analysis in the dissertation, make sure to compare your data with other published research. In this way, you can identify the points of differences and agreements. Check the consistency of your findings if they meet your expectations—lookup for bottleneck position. Analyze and discuss the reasons behind it. Identify the key themes, gaps, and the relation of your findings with the literature review. In short, you should link your data with your research question, and the questions should form a basis for literature.

The Role of Data Analytics at The Senior Management Level

The Role of Data Analytics at The Senior Management Level

From small and medium-sized businesses to Fortune 500 conglomerates, the success of a modern business is now increasingly tied to how the company implements its data infrastructure and data-based decision-making. According

The Decision-Making Model Explained (In Plain Terms)

The Decision-Making Model Explained (In Plain Terms)

Any form of the systematic decision-making process is better enhanced with data. But making sense of big data or even small data analysis when venturing into a decision-making process might

13 Reasons Why Data Is Important in Decision Making

13 Reasons Why Data Is Important in Decision Making

Wrapping Up

Writing data analysis in the dissertation involves dedication, and its implementations demand sound knowledge and proper planning. Choosing your topic, gathering relevant data, analyzing it, presenting your data and findings correctly, discussing the results, connecting with the literature and conclusions are milestones in it. Among these checkpoints, the Data analysis stage is most important and requires a lot of keenness.

In this article, we thoroughly looked at the tips that prove valuable for writing a data analysis in a dissertation. Make sure to give this article a thorough read before you write data analysis in the dissertation leading to the successful future of your research.

Oxbridge Essays. Top 10 Tips for Writing a Dissertation Data Analysis.

Emidio Amadebai

As an IT Engineer, who is passionate about learning and sharing. I have worked and learned quite a bit from Data Engineers, Data Analysts, Business Analysts, and Key Decision Makers almost for the past 5 years. Interested in learning more about Data Science and How to leverage it for better decision-making in my business and hopefully help you do the same in yours.

Recent Posts

Causal vs Evidential Decision-making (How to Make Businesses More Effective) 

In today’s fast-paced business landscape, it is crucial to make informed decisions to stay in the competition which makes it important to understand the concept of the different characteristics and...

Bootstrapping vs. Boosting

Over the past decade, the field of machine learning has witnessed remarkable advancements in predictive techniques and ensemble learning methods. Ensemble techniques are very popular in machine...

dissertation big data analysis

  • Cookies & Privacy
  • GETTING STARTED
  • Introduction
  • FUNDAMENTALS

dissertation big data analysis

Getting to the main article

Choosing your route

Setting research questions/ hypotheses

Assessment point

Building the theoretical case

Setting your research strategy

Data collection

Data analysis

Data analysis techniques

In STAGE NINE: Data analysis , we discuss the data you will have collected during STAGE EIGHT: Data collection . However, before you collect your data, having followed the research strategy you set out in this STAGE SIX , it is useful to think about the data analysis techniques you may apply to your data when it is collected.

The statistical tests that are appropriate for your dissertation will depend on (a) the research questions/hypotheses you have set, (b) the research design you are using, and (c) the nature of your data. You should already been clear about your research questions/hypotheses from STAGE THREE: Setting research questions and/or hypotheses , as well as knowing the goal of your research design from STEP TWO: Research design in this STAGE SIX: Setting your research strategy . These two pieces of information - your research questions/hypotheses and research design - will let you know, in principle , the statistical tests that may be appropriate to run on your data in order to answer your research questions.

We highlight the words in principle and may because the most appropriate statistical test to run on your data not only depend on your research questions/hypotheses and research design, but also the nature of your data . As you should have identified in STEP THREE: Research methods , and in the article, Types of variables , in the Fundamentals part of Lærd Dissertation, (a) not all data is the same, and (b) not all variables are measured in the same way (i.e., variables can be dichotomous, ordinal or continuous). In addition, not all data is normal , nor is the data when comparing groups necessarily equal , terms we explain in the Data Analysis section in the Fundamentals part of Lærd Dissertation. As a result, you might think that running a particular statistical test is correct at this point of setting your research strategy (e.g., a statistical test called a dependent t-test ), based on the research questions/hypotheses you have set, but when you collect your data (i.e., during STAGE EIGHT: Data collection ), the data may fail certain assumptions that are important to such a statistical test (i.e., normality and homogeneity of variance ). As a result, you have to run another statistical test (e.g., a Wilcoxon signed-rank test instead of a dependent t-test ).

At this stage in the dissertation process, it is important, or at the very least, useful to think about the data analysis techniques you may apply to your data when it is collected. We suggest that you do this for two reasons:

REASON A Supervisors sometimes expect you to know what statistical analysis you will perform at this stage of the dissertation process

This is not always the case, but if you have had to write a Dissertation Proposal or Ethics Proposal , there is sometimes an expectation that you explain the type of data analysis that you plan to carry out. An understanding of the data analysis that you will carry out on your data can also be an expected component of the Research Strategy chapter of your dissertation write-up (i.e., usually Chapter Three: Research Strategy ). Therefore, it is a good time to think about the data analysis process if you plan to start writing up this chapter at this stage.

REASON B It takes time to get your head around data analysis

When you come to analyse your data in STAGE NINE: Data analysis , you will need to think about (a) selecting the correct statistical tests to perform on your data, (b) running these tests on your data using a statistics package such as SPSS, and (c) learning how to interpret the output from such statistical tests so that you can answer your research questions or hypotheses. Whilst we show you how to do this for a wide range of scenarios in the in the Data Analysis section in the Fundamentals part of Lærd Dissertation, it can be a time consuming process. Unless you took an advanced statistics module/option as part of your degree (i.e., not just an introductory course to statistics, which are often taught in undergraduate and master?s degrees), it can take time to get your head around data analysis. Starting this process at this stage (i.e., STAGE SIX: Research strategy ), rather than waiting until you finish collecting your data (i.e., STAGE EIGHT: Data collection ) is a sensible approach.

Final thoughts...

Setting the research strategy for your dissertation required you to describe, explain and justify the research paradigm, quantitative research design, research method(s), sampling strategy, and approach towards research ethics and data analysis that you plan to follow, as well as determine how you will ensure the research quality of your findings so that you can effectively answer your research questions/hypotheses. However, from a practical perspective, just remember that the main goal of STAGE SIX: Research strategy is to have a clear research strategy that you can implement (i.e., operationalize ). After all, if you are unable to clearly follow your plan and carry out your research in the field, you will struggle to answer your research questions/hypotheses. Once you are sure that you have a clear plan, it is a good idea to take a step back, speak with your supervisor, and assess where you are before moving on to collect data. Therefore, when you are ready, proceed to STAGE SEVEN: Assessment point .

  • Open access
  • Published: 06 January 2022

The use of Big Data Analytics in healthcare

  • Kornelia Batko   ORCID: orcid.org/0000-0001-6561-3826 1 &
  • Andrzej Ślęzak 2  

Journal of Big Data volume  9 , Article number:  3 ( 2022 ) Cite this article

74k Accesses

108 Citations

28 Altmetric

Metrics details

The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities. The direct research was carried out based on research questionnaire and conducted on a sample of 217 medical facilities in Poland. Literature studies have shown that the use of Big Data Analytics can bring many benefits to medical facilities, while direct research has shown that medical facilities in Poland are moving towards data-based healthcare because they use structured and unstructured data, reach for analytics in the administrative, business and clinical area. The research positively confirmed that medical facilities are working on both structural data and unstructured data. The following kinds and sources of data can be distinguished: from databases, transaction data, unstructured content of emails and documents, data from devices and sensors. However, the use of data from social media is lower as in their activity they reach for analytics, not only in the administrative and business but also in the clinical area. It clearly shows that the decisions made in medical facilities are highly data-driven. The results of the study confirm what has been analyzed in the literature that medical facilities are moving towards data-based healthcare, together with its benefits.

Introduction

The main contribution of this paper is to present an analytical overview of using structured and unstructured data (Big Data) analytics in medical facilities in Poland. Medical facilities use both structured and unstructured data in their practice. Structured data has a predetermined schema, it is extensive, freeform, and comes in variety of forms [ 27 ]. In contrast, unstructured data, referred to as Big Data (BD), does not fit into the typical data processing format. Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools. It remains stored but not analyzed. Due to the lack of a well-defined schema, it is difficult to search and analyze such data and, therefore, it requires a specific technology and method to transform it into value [ 20 , 68 ]. Integrating data stored in both structured and unstructured formats can add significant value to an organization [ 27 ]. Organizations must approach unstructured data in a different way. Therefore, the potential is seen in Big Data Analytics (BDA). Big Data Analytics are techniques and tools used to analyze and extract information from Big Data. The results of Big Data analysis can be used to predict the future. They also help in creating trends about the past. When it comes to healthcare, it allows to analyze large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [ 60 ].

This paper is the first study to consolidate and characterize the use of Big Data from different perspectives. The first part consists of a brief literature review of studies on Big Data (BD) and Big Data Analytics (BDA), while the second part presents results of direct research aimed at diagnosing the use of big data analyses in medical facilities in Poland.

Healthcare is a complex system with varied stakeholders: patients, doctors, hospitals, pharmaceutical companies and healthcare decision-makers. This sector is also limited by strict rules and regulations. However, worldwide one may observe a departure from the traditional doctor-patient approach. The doctor becomes a partner and the patient is involved in the therapeutic process [ 14 ]. Healthcare is no longer focused solely on the treatment of patients. The priority for decision-makers should be to promote proper health attitudes and prevent diseases that can be avoided [ 81 ]. This became visible and important especially during the Covid-19 pandemic [ 44 ].

The next challenges that healthcare will have to face is the growing number of elderly people and a decline in fertility. Fertility rates in the country are found below the reproductive minimum necessary to keep the population stable [ 10 ]. The reflection of both effects, namely the increase in age and lower fertility rates, are demographic load indicators, which is constantly growing. Forecasts show that providing healthcare in the form it is provided today will become impossible in the next 20 years [ 70 ]. It is especially visible now during the Covid-19 pandemic when healthcare faced quite a challenge related to the analysis of huge data amounts and the need to identify trends and predict the spread of the coronavirus. The pandemic showed it even more that patients should have access to information about their health condition, the possibility of digital analysis of this data and access to reliable medical support online. Health monitoring and cooperation with doctors in order to prevent diseases can actually revolutionize the healthcare system. One of the most important aspects of the change necessary in healthcare is putting the patient in the center of the system.

Technology is not enough to achieve these goals. Therefore, changes should be made not only at the technological level but also in the management and design of complete healthcare processes and what is more, they should affect the business models of service providers. The use of Big Data Analytics is becoming more and more common in enterprises [ 17 , 54 ]. However, medical enterprises still cannot keep up with the information needs of patients, clinicians, administrators and the creator’s policy. The adoption of a Big Data approach would allow the implementation of personalized and precise medicine based on personalized information, delivered in real time and tailored to individual patients.

To achieve this goal, it is necessary to implement systems that will be able to learn quickly about the data generated by people within clinical care and everyday life. This will enable data-driven decision making, receiving better personalized predictions about prognosis and responses to treatments; a deeper understanding of the complex factors and their interactions that influence health at the patient level, the health system and society, enhanced approaches to detecting safety problems with drugs and devices, as well as more effective methods of comparing prevention, diagnostic, and treatment options [ 40 ].

In the literature, there is a lot of research showing what opportunities can be offered to companies by big data analysis and what data can be analyzed. However, there are few studies showing how data analysis in the area of healthcare is performed, what data is used by medical facilities and what analyses and in which areas they carry out. This paper aims to fill this gap by presenting the results of research carried out in medical facilities in Poland. The goal is to analyze the possibilities of using Big Data Analytics in healthcare, especially in Polish conditions. In particular, the paper is aimed at determining what data is processed by medical facilities in Poland, what analyses they perform and in what areas, and how they assess their analytical maturity. In order to achieve this goal, a critical analysis of the literature was performed, and the direct research was based on a research questionnaire conducted on a sample of 217 medical facilities in Poland. It was hypothesized that medical facilities in Poland are working on both structured and unstructured data and moving towards data-based healthcare and its benefits. Examining the maturity of healthcare facilities in the use of Big Data and Big Data Analytics is crucial in determining the potential future benefits that the healthcare sector can gain from Big Data Analytics. There is also a pressing need to predicate whether, in the coming years, healthcare will be able to cope with the threats and challenges it faces.

This paper is divided into eight parts. The first is the introduction which provides background and the general problem statement of this research. In the second part, this paper discusses considerations on use of Big Data and Big Data Analytics in Healthcare, and then, in the third part, it moves on to challenges and potential benefits of using Big Data Analytics in healthcare. The next part involves the explanation of the proposed method. The result of direct research and discussion are presented in the fifth part, while the following part of the paper is the conclusion. The seventh part of the paper presents practical implications. The final section of the paper provides limitations and directions for future research.

Considerations on use Big Data and Big Data Analytics in the healthcare

In recent years one can observe a constantly increasing demand for solutions offering effective analytical tools. This trend is also noticeable in the analysis of large volumes of data (Big Data, BD). Organizations are looking for ways to use the power of Big Data to improve their decision making, competitive advantage or business performance [ 7 , 54 ]. Big Data is considered to offer potential solutions to public and private organizations, however, still not much is known about the outcome of the practical use of Big Data in different types of organizations [ 24 ].

As already mentioned, in recent years, healthcare management worldwide has been changed from a disease-centered model to a patient-centered model, even in value-based healthcare delivery model [ 68 ]. In order to meet the requirements of this model and provide effective patient-centered care, it is necessary to manage and analyze healthcare Big Data.

The issue often raised when it comes to the use of data in healthcare is the appropriate use of Big Data. Healthcare has always generated huge amounts of data and nowadays, the introduction of electronic medical records, as well as the huge amount of data sent by various types of sensors or generated by patients in social media causes data streams to constantly grow. Also, the medical industry generates significant amounts of data, including clinical records, medical images, genomic data and health behaviors. Proper use of the data will allow healthcare organizations to support clinical decision-making, disease surveillance, and public health management. The challenge posed by clinical data processing involves not only the quantity of data but also the difficulty in processing it.

In the literature one can find many different definitions of Big Data. This concept has evolved in recent years, however, it is still not clearly understood. Nevertheless, despite the range and differences in definitions, Big Data can be treated as a: large amount of digital data, large data sets, tool, technology or phenomenon (cultural or technological.

Big Data can be considered as massive and continually generated digital datasets that are produced via interactions with online technologies [ 53 ]. Big Data can be defined as datasets that are of such large sizes that they pose challenges in traditional storage and analysis techniques [ 28 ]. A similar opinion about Big Data was presented by Ohlhorst who sees Big Data as extremely large data sets, possible neither to manage nor to analyze with traditional data processing tools [ 57 ]. In his opinion, the bigger the data set, the more difficult it is to gain any value from it.

In turn, Knapp perceived Big Data as tools, processes and procedures that allow an organization to create, manipulate and manage very large data sets and storage facilities [ 38 ]. From this point of view, Big Data is identified as a tool to gather information from different databases and processes, allowing users to manage large amounts of data.

Similar perception of the term ‘Big Data’ is shown by Carter. According to him, Big Data technologies refer to a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data by enabling high velocity capture, discovery and/or analysis [ 13 ].

Jordan combines these two approaches by identifying Big Data as a complex system, as it needs data bases for data to be stored in, programs and tools to be managed, as well as expertise and personnel able to retrieve useful information and visualization to be understood [ 37 ].

Following the definition of Laney for Big Data, it can be state that: it is large amount of data generated in very fast motion and it contains a lot of content [ 43 ]. Such data comes from unstructured sources, such as stream of clicks on the web, social networks (Twitter, blogs, Facebook), video recordings from the shops, recording of calls in a call center, real time information from various kinds of sensors, RFID, GPS devices, mobile phones and other devices that identify and monitor something [ 8 ]. Big Data is a powerful digital data silo, raw, collected with all sorts of sources, unstructured and difficult, or even impossible, to analyze using conventional techniques used so far to relational databases.

While describing Big Data, it cannot be overlooked that the term refers more to a phenomenon than to specific technology. Therefore, instead of defining this phenomenon, trying to describe them, more authors are describing Big Data by giving them characteristics included a collection of V’s related to its nature [ 2 , 3 , 23 , 25 , 58 ]:

Volume (refers to the amount of data and is one of the biggest challenges in Big Data Analytics),

Velocity (speed with which new data is generated, the challenge is to be able to manage data effectively and in real time),

Variety (heterogeneity of data, many different types of healthcare data, the challenge is to derive insights by looking at all available heterogenous data in a holistic manner),

Variability (inconsistency of data, the challenge is to correct the interpretation of data that can vary significantly depending on the context),

Veracity (how trustworthy the data is, quality of the data),

Visualization (ability to interpret data and resulting insights, challenging for Big Data due to its other features as described above).

Value (the goal of Big Data Analytics is to discover the hidden knowledge from huge amounts of data).

Big Data is defined as an information asset with high volume, velocity, and variety, which requires specific technology and method for its transformation into value [ 21 , 77 ]. Big Data is also a collection of information about high-volume, high volatility or high diversity, requiring new forms of processing in order to support decision-making, discovering new phenomena and process optimization [ 5 , 7 ]. Big Data is too large for traditional data-processing systems and software tools to capture, store, manage and analyze, therefore it requires new technologies [ 28 , 50 , 61 ] to manage (capture, aggregate, process) its volume, velocity and variety [ 9 ].

Undoubtedly, Big Data differs from the data sources used so far by organizations. Therefore, organizations must approach this type of unstructured data in a different way. First of all, organizations must start to see data as flows and not stocks—this entails the need to implement the so-called streaming analytics [ 48 ]. The mentioned features make it necessary to use new IT tools that allow the fullest use of new data [ 58 ]. The Big Data idea, inseparable from the huge increase in data available to various organizations or individuals, creates opportunities for access to valuable analyses, conclusions and enables making more accurate decisions [ 6 , 11 , 59 ].

The Big Data concept is constantly evolving and currently it does not focus on huge amounts of data, but rather on the process of creating value from this data [ 52 ]. Big Data is collected from various sources that have different data properties and are processed by different organizational units, resulting in creation of a Big Data chain [ 36 ]. The aim of the organizations is to manage, process and analyze Big Data. In the healthcare sector, Big Data streams consist of various types of data, namely [ 8 , 51 ]:

clinical data, i.e. data obtained from electronic medical records, data from hospital information systems, image centers, laboratories, pharmacies and other organizations providing health services, patient generated health data, physician’s free-text notes, genomic data, physiological monitoring data [ 4 ],

biometric data provided from various types of devices that monitor weight, pressure, glucose level, etc.,

financial data, constituting a full record of economic operations reflecting the conducted activity,

data from scientific research activities, i.e. results of research, including drug research, design of medical devices and new methods of treatment,

data provided by patients, including description of preferences, level of satisfaction, information from systems for self-monitoring of their activity: exercises, sleep, meals consumed, etc.

data from social media.

These data are provided not only by patients but also by organizations and institutions, as well as by various types of monitoring devices, sensors or instruments [ 16 ]. Data that has been generated so far in the healthcare sector is stored in both paper and digital form. Thus, the essence and the specificity of the process of Big Data analyses means that organizations need to face new technological and organizational challenges [ 67 ]. The healthcare sector has always generated huge amounts of data and this is connected, among others, with the need to store medical records of patients. However, the problem with Big Data in healthcare is not limited to an overwhelming volume but also an unprecedented diversity in terms of types, data formats and speed with which it should be analyzed in order to provide the necessary information on an ongoing basis [ 3 ]. It is also difficult to apply traditional tools and methods for management of unstructured data [ 67 ]. Due to the diversity and quantity of data sources that are growing all the time, advanced analytical tools and technologies, as well as Big Data analysis methods which can meet and exceed the possibilities of managing healthcare data, are needed [ 3 , 68 ].

Therefore, the potential is seen in Big Data analyses, especially in the aspect of improving the quality of medical care, saving lives or reducing costs [ 30 ]. Extracting from this tangle of given association rules, patterns and trends will allow health service providers and other stakeholders in the healthcare sector to offer more accurate and more insightful diagnoses of patients, personalized treatment, monitoring of the patients, preventive medicine, support of medical research and health population, as well as better quality of medical services and patient care while, at the same time, the ability to reduce costs (Fig.  1 ).

figure 1

(Source: Own elaboration)

Healthcare Big Data Analytics applications

The main challenge with Big Data is how to handle such a large amount of information and use it to make data-driven decisions in plenty of areas [ 64 ]. In the context of healthcare data, another major challenge is to adjust big data storage, analysis, presentation of analysis results and inference basing on them in a clinical setting. Data analytics systems implemented in healthcare are designed to describe, integrate and present complex data in an appropriate way so that it can be understood better (Fig.  2 ). This would improve the efficiency of acquiring, storing, analyzing and visualizing big data from healthcare [ 71 ].

figure 2

Process of Big Data Analytics

The result of data processing with the use of Big Data Analytics is appropriate data storytelling which may contribute to making decisions with both lower risk and data support. This, in turn, can benefit healthcare stakeholders. To take advantage of the potential massive amounts of data in healthcare and to ensure that the right intervention to the right patient is properly timed, personalized, and potentially beneficial to all components of the healthcare system such as the payer, patient, and management, analytics of large datasets must connect communities involved in data analytics and healthcare informatics [ 49 ]. Big Data Analytics can provide insight into clinical data and thus facilitate informed decision-making about the diagnosis and treatment of patients, prevention of diseases or others. Big Data Analytics can also improve the efficiency of healthcare organizations by realizing the data potential [ 3 , 62 ].

Big Data Analytics in medicine and healthcare refers to the integration and analysis of a large amount of complex heterogeneous data, such as various omics (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenetics, deasomics), biomedical data, talemedicine data (sensors, medical equipment data) and electronic health records data [ 46 , 65 ].

When analyzing the phenomenon of Big Data in the healthcare sector, it should be noted that it can be considered from the point of view of three areas: epidemiological, clinical and business.

From a clinical point of view, the Big Data analysis aims to improve the health and condition of patients, enable long-term predictions about their health status and implementation of appropriate therapeutic procedures. Ultimately, the use of data analysis in medicine is to allow the adaptation of therapy to a specific patient, that is personalized medicine (precision, personalized medicine).

From an epidemiological point of view, it is desirable to obtain an accurate prognosis of morbidity in order to implement preventive programs in advance.

In the business context, Big Data analysis may enable offering personalized packages of commercial services or determining the probability of individual disease and infection occurrence. It is worth noting that Big Data means not only the collection and processing of data but, most of all, the inference and visualization of data necessary to obtain specific business benefits.

In order to introduce new management methods and new solutions in terms of effectiveness and transparency, it becomes necessary to make data more accessible, digital, searchable, as well as analyzed and visualized.

Erickson and Rothberg state that the information and data do not reveal their full value until insights are drawn from them. Data becomes useful when it enhances decision making and decision making is enhanced only when analytical techniques are used and an element of human interaction is applied [ 22 ].

Thus, healthcare has experienced much progress in usage and analysis of data. A large-scale digitalization and transparency in this sector is a key statement of almost all countries governments policies. For centuries, the treatment of patients was based on the judgment of doctors who made treatment decisions. In recent years, however, Evidence-Based Medicine has become more and more important as a result of it being related to the systematic analysis of clinical data and decision-making treatment based on the best available information [ 42 ]. In the healthcare sector, Big Data Analytics is expected to improve the quality of life and reduce operational costs [ 72 , 82 ]. Big Data Analytics enables organizations to improve and increase their understanding of the information contained in data. It also helps identify data that provides insightful insights for current as well as future decisions [ 28 ].

Big Data Analytics refers to technologies that are grounded mostly in data mining: text mining, web mining, process mining, audio and video analytics, statistical analysis, network analytics, social media analytics and web analytics [ 16 , 25 , 31 ]. Different data mining techniques can be applied on heterogeneous healthcare data sets, such as: anomaly detection, clustering, classification, association rules as well as summarization and visualization of those Big Data sets [ 65 ]. Modern data analytics techniques explore and leverage unique data characteristics even from high-speed data streams and sensor data [ 15 , 16 , 31 , 55 ]. Big Data can be used, for example, for better diagnosis in the context of comprehensive patient data, disease prevention and telemedicine (in particular when using real-time alerts for immediate care), monitoring patients at home, preventing unnecessary hospital visits, integrating medical imaging for a wider diagnosis, creating predictive analytics, reducing fraud and improving data security, better strategic planning and increasing patients’ involvement in their own health.

Big Data Analytics in healthcare can be divided into [ 33 , 73 , 74 ]:

descriptive analytics in healthcare is used to understand past and current healthcare decisions, converting data into useful information for understanding and analyzing healthcare decisions, outcomes and quality, as well as making informed decisions [ 33 ]. It can be used to create reports (i.e. about patients’ hospitalizations, physicians’ performance, utilization management), visualization, customized reports, drill down tables, or running queries on the basis of historical data.

predictive analytics operates on past performance in an effort to predict the future by examining historical or summarized health data, detecting patterns of relationships in these data, and then extrapolating these relationships to forecast. It can be used to i.e. predict the response of different patient groups to different drugs (dosages) or reactions (clinical trials), anticipate risk and find relationships in health data and detect hidden patterns [ 62 ]. In this way, it is possible to predict the epidemic spread, anticipate service contracts and plan healthcare resources. Predictive analytics is used in proper diagnosis and for appropriate treatments to be given to patients suffering from certain diseases [ 39 ].

prescriptive analytics—occurs when health problems involve too many choices or alternatives. It uses health and medical knowledge in addition to data or information. Prescriptive analytics is used in many areas of healthcare, including drug prescriptions and treatment alternatives. Personalized medicine and evidence-based medicine are both supported by prescriptive analytics.

discovery analytics—utilizes knowledge about knowledge to discover new “inventions” like drugs (drug discovery), previously unknown diseases and medical conditions, alternative treatments, etc.

Although the models and tools used in descriptive, predictive, prescriptive, and discovery analytics are different, many applications involve all four of them [ 62 ]. Big Data Analytics in healthcare can help enable personalized medicine by identifying optimal patient-specific treatments. This can influence the improvement of life standards, reduce waste of healthcare resources and save costs of healthcare [ 56 , 63 , 71 ]. The introduction of large data analysis gives new analytical possibilities in terms of scope, flexibility and visualization. Techniques such as data mining (computational pattern discovery process in large data sets) facilitate inductive reasoning and analysis of exploratory data, enabling scientists to identify data patterns that are independent of specific hypotheses. As a result, predictive analysis and real-time analysis becomes possible, making it easier for medical staff to start early treatments and reduce potential morbidity and mortality. In addition, document analysis, statistical modeling, discovering patterns and topics in document collections and data in the EHR, as well as an inductive approach can help identify and discover relationships between health phenomena.

Advanced analytical techniques can be used for a large amount of existing (but not yet analytical) data on patient health and related medical data to achieve a better understanding of the information and results obtained, as well as to design optimal clinical pathways [ 62 ]. Big Data Analytics in healthcare integrates analysis of several scientific areas such as bioinformatics, medical imaging, sensor informatics, medical informatics and health informatics [ 65 ]. Big Data Analytics in healthcare allows to analyze large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [ 65 ]. Discussing all the techniques used for Big Data Analytics goes beyond the scope of a single article [ 25 ].

The success of Big Data analysis and its accuracy depend heavily on the tools and techniques used to analyze the ability to provide reliable, up-to-date and meaningful information to various stakeholders [ 12 ]. It is believed that the implementation of big data analytics by healthcare organizations could bring many benefits in the upcoming years, including lowering health care costs, better diagnosis and prediction of diseases and their spread, improving patient care and developing protocols to prevent re-hospitalization, optimizing staff, optimizing equipment, forecasting the need for hospital beds, operating rooms, treatments, and improving the drug supply chain [ 71 ].

Challenges and potential benefits of using Big Data Analytics in healthcare

Modern analytics gives possibilities not only to have insight in historical data, but also to have information necessary to generate insight into what may happen in the future. Even when it comes to prediction of evidence-based actions. The emphasis on reform has prompted payers and suppliers to pursue data analysis to reduce risk, detect fraud, improve efficiency and save lives. Everyone—payers, providers, even patients—are focusing on doing more with fewer resources. Thus, some areas in which enhanced data and analytics can yield the greatest results include various healthcare stakeholders (Table 1 ).

Healthcare organizations see the opportunity to grow through investments in Big Data Analytics. In recent years, by collecting medical data of patients, converting them into Big Data and applying appropriate algorithms, reliable information has been generated that helps patients, physicians and stakeholders in the health sector to identify values and opportunities [ 31 ]. It is worth noting that there are many changes and challenges in the structure of the healthcare sector. Digitization and effective use of Big Data in healthcare can bring benefits to every stakeholder in this sector. A single doctor would benefit the same as the entire healthcare system. Potential opportunities to achieve benefits and effects from Big Data in healthcare can be divided into four groups [ 8 ]:

Improving the quality of healthcare services:

assessment of diagnoses made by doctors and the manner of treatment of diseases indicated by them based on the decision support system working on Big Data collections,

detection of more effective, from a medical point of view, and more cost-effective ways to diagnose and treat patients,

analysis of large volumes of data to reach practical information useful for identifying needs, introducing new health services, preventing and overcoming crises,

prediction of the incidence of diseases,

detecting trends that lead to an improvement in health and lifestyle of the society,

analysis of the human genome for the introduction of personalized treatment.

Supporting the work of medical personnel

doctors’ comparison of current medical cases to cases from the past for better diagnosis and treatment adjustment,

detection of diseases at earlier stages when they can be more easily and quickly cured,

detecting epidemiological risks and improving control of pathogenic spots and reaction rates,

identification of patients who are predicted to have the highest risk of specific, life-threatening diseases by collating data on the history of the most common diseases, in healing people with reports entering insurance companies,

health management of each patient individually (personalized medicine) and health management of the whole society,

capturing and analyzing large amounts of data from hospitals and homes in real time, life monitoring devices to monitor safety and predict adverse events,

analysis of patient profiles to identify people for whom prevention should be applied, lifestyle change or preventive care approach,

the ability to predict the occurrence of specific diseases or worsening of patients’ results,

predicting disease progression and its determinants, estimating the risk of complications,

detecting drug interactions and their side effects.

Supporting scientific and research activity

supporting work on new drugs and clinical trials thanks to the possibility of analyzing “all data” instead of selecting a test sample,

the ability to identify patients with specific, biological features that will take part in specialized clinical trials,

selecting a group of patients for which the tested drug is likely to have the desired effect and no side effects,

using modeling and predictive analysis to design better drugs and devices.

Business and management

reduction of costs and counteracting abuse and counseling practices,

faster and more effective identification of incorrect or unauthorized financial operations in order to prevent abuse and eliminate errors,

increase in profitability by detecting patients generating high costs or identifying doctors whose work, procedures and treatment methods cost the most and offering them solutions that reduce the amount of money spent,

identification of unnecessary medical activities and procedures, e.g. duplicate tests.

According to research conducted by Wang, Kung and Byrd, Big Data Analytics benefits can be classified into five categories: IT infrastructure benefits (reducing system redundancy, avoiding unnecessary IT costs, transferring data quickly among healthcare IT systems, better use of healthcare systems, processing standardization among various healthcare IT systems, reducing IT maintenance costs regarding data storage), operational benefits (improving the quality and accuracy of clinical decisions, processing a large number of health records in seconds, reducing the time of patient travel, immediate access to clinical data to analyze, shortening the time of diagnostic test, reductions in surgery-related hospitalizations, exploring inconceivable new research avenues), organizational benefits (detecting interoperability problems much more quickly than traditional manual methods, improving cross-functional communication and collaboration among administrative staffs, researchers, clinicians and IT staffs, enabling data sharing with other institutions and adding new services, content sources and research partners), managerial benefits (gaining quick insights about changing healthcare trends in the market, providing members of the board and heads of department with sound decision-support information on the daily clinical setting, optimizing business growth-related decisions) and strategic benefits (providing a big picture view of treatment delivery for meeting future need, creating high competitive healthcare services) [ 73 ].

The above specification does not constitute a full list of potential areas of use of Big Data Analysis in healthcare because the possibilities of using analysis are practically unlimited. In addition, advanced analytical tools allow to analyze data from all possible sources and conduct cross-analyses to provide better data insights [ 26 ]. For example, a cross-analysis can refer to a combination of patient characteristics, as well as costs and care results that can help identify the best, in medical terms, and the most cost-effective treatment or treatments and this may allow a better adjustment of the service provider’s offer [ 62 ].

In turn, the analysis of patient profiles (e.g. segmentation and predictive modeling) allows identification of people who should be subject to prophylaxis, prevention or should change their lifestyle [ 8 ]. Shortened list of benefits for Big Data Analytics in healthcare is presented in paper [ 3 ] and consists of: better performance, day-to-day guides, detection of diseases in early stages, making predictive analytics, cost effectiveness, Evidence Based Medicine and effectiveness in patient treatment.

Summarizing, healthcare big data represents a huge potential for the transformation of healthcare: improvement of patients’ results, prediction of outbreaks of epidemics, valuable insights, avoidance of preventable diseases, reduction of the cost of healthcare delivery and improvement of the quality of life in general [ 1 ]. Big Data also generates many challenges such as difficulties in data capture, data storage, data analysis and data visualization [ 15 ]. The main challenges are connected with the issues of: data structure (Big Data should be user-friendly, transparent, and menu-driven but it is fragmented, dispersed, rarely standardized and difficult to aggregate and analyze), security (data security, privacy and sensitivity of healthcare data, there are significant concerns related to confidentiality), data standardization (data is stored in formats that are not compatible with all applications and technologies), storage and transfers (especially costs associated with securing, storing, and transferring unstructured data), managerial skills, such as data governance, lack of appropriate analytical skills and problems with Real-Time Analytics (health care is to be able to utilize Big Data in real time) [ 4 , 34 , 41 ].

The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities in Poland.

Presented research results are part of a larger questionnaire form on Big Data Analytics. The direct research was based on an interview questionnaire which contained 100 questions with 5-point Likert scale (1—strongly disagree, 2—I rather disagree, 3—I do not agree, nor disagree, 4—I rather agree, 5—I definitely agree) and 4 metrics questions. The study was conducted in December 2018 on a sample of 217 medical facilities (110 private, 107 public). The research was conducted by a specialized market research agency: Center for Research and Expertise of the University of Economics in Katowice.

When it comes to direct research, the selected entities included entities financed from public sources—the National Health Fund (23.5%), and entities operating commercially (11.5%). In the surveyed group of entities, more than a half (64.9%) are hybrid financed, both from public and commercial sources. The diversity of the research sample also applies to the size of the entities, defined by the number of employees. Taking into account proportions of the surveyed entities, it should be noted that in the sector structure, medium-sized (10–50 employees—34% of the sample) and large (51–250 employees—27%) entities dominate. The research was of all-Poland nature, and the entities included in the research sample come from all of the voivodships. The largest group were entities from Łódzkie (32%), Śląskie (18%) and Mazowieckie (18%) voivodships, as these voivodships have the largest number of medical institutions. Other regions of the country were represented by single units. The selection of the research sample was random—layered. As part of medical facilities database, groups of private and public medical facilities have been identified and the ones to which the questionnaire was targeted were drawn from each of these groups. The analyses were performed using the GNU PSPP 0.10.2 software.

The aim of the study was to determine whether medical facilities in Poland use Big Data Analytics and if so, in which areas. Characteristics of the research sample is presented in Table 2 .

The research is non-exhaustive due to the incomplete and uneven regional distribution of the samples, overrepresented in three voivodeships (Łódzkie, Mazowieckie and Śląskie). The size of the research sample (217 entities) allows the authors of the paper to formulate specific conclusions on the use of Big Data in the process of its management.

For the purpose of this paper, the following research hypotheses were formulated: (1) medical facilities in Poland are working on both structured and unstructured data (2) medical facilities in Poland are moving towards data-based healthcare and its benefits.

The paper poses the following research questions and statements that coincide with the selected questions from the research questionnaire:

From what sources do medical facilities obtain data? What types of data are used by the particular organization, whether structured or unstructured, and to what extent?

From what sources do medical facilities obtain data?

In which area organizations are using data and analytical systems (clinical or business)?

Is data analytics performed based on historical data or are predictive analyses also performed?

Determining whether administrative and medical staff receive complete, accurate and reliable data in a timely manner?

Determining whether real-time analyses are performed to support the particular organization’s activities.

Results and discussion

On the basis of the literature analysis and research study, a set of questions and statements related to the researched area was formulated. The results from the surveys show that medical facilities use a variety of data sources in their operations. These sources are both structured and unstructured data (Table 3 ).

According to the data provided by the respondents, considering the first statement made in the questionnaire, almost half of the medical institutions (47.58%) agreed that they rather collect and use structured data (e.g. databases and data warehouses, reports to external entities) and 10.57% entirely agree with this statement. As much as 23.35% of representatives of medical institutions stated “I agree or disagree”. Other medical facilities do not collect and use structured data (7.93%) and 6.17% strongly disagree with the first statement. Also, the median calculated based on the obtained results (median: 4), proves that medical facilities in Poland collect and use structured data (Table 4 ).

In turn, 28.19% of the medical institutions agreed that they rather collect and use unstructured data and as much as 9.25% entirely agree with this statement. The number of representatives of medical institutions that stated “I agree or disagree” was 27.31%. Other medical facilities do not collect and use structured data (17.18%) and 13.66% strongly disagree with the first statement. In the case of unstructured data the median is 3, which means that the collection and use of this type of data by medical facilities in Poland is lower.

In the further part of the analysis, it was checked whether the size of the medical facility and form of ownership have an impact on whether it analyzes unstructured data (Tables 4 and 5 ). In order to find this out, correlation coefficients were calculated.

Based on the calculations, it can be concluded that there is a small statistically monotonic correlation between the size of the medical facility and its collection and use of structured data (p < 0.001; τ = 0.16). This means that the use of structured data is slightly increasing in larger medical facilities. The size of the medical facility is more important according to use of unstructured data (p < 0.001; τ = 0.23) (Table 4 .).

To determine whether the form of medical facility ownership affects data collection, the Mann–Whitney U test was used. The calculations show that the form of ownership does not affect what data the organization collects and uses (Table 5 ).

Detailed information on the sources of from which medical facilities collect and use data is presented in the Table 6 .

The questionnaire results show that medical facilities are especially using information published in databases, reports to external units and transaction data, but they also use unstructured data from e-mails, medical devices, sensors, phone calls, audio and video data (Table 6 ). Data from social media, RFID and geolocation data are used to a small extent. Similar findings are concluded in the literature studies.

From the analysis of the answers given by the respondents, more than half of the medical facilities have integrated hospital system (HIS) implemented. As much as 43.61% use integrated hospital system and 16.30% use it extensively (Table 7 ). 19.38% of exanimated medical facilities do not use it at all. Moreover, most of the examined medical facilities (34.80% use it, 32.16% use extensively) conduct medical documentation in an electronic form, which gives an opportunity to use data analytics. Only 4.85% of medical facilities don’t use it at all.

Other problems that needed to be investigated were: whether medical facilities in Poland use data analytics? If so, in what form and in what areas? (Table 8 ). The analysis of answers given by the respondents about the potential of data analytics in medical facilities shows that a similar number of medical facilities use data analytics in administration and business (31.72% agreed with the statement no. 5 and 12.33% strongly agreed) as in the clinical area (33.04% agreed with the statement no. 6 and 12.33% strongly agreed). When considering decision-making issues, 35.24% agree with the statement "the organization uses data and analytical systems to support business decisions” and 8.37% of respondents strongly agree. Almost 40.09% agree with the statement that “the organization uses data and analytical systems to support clinical decisions (in the field of diagnostics and therapy)” and 15.42% of respondents strongly agree. Exanimated medical facilities use in their activity analytics based both on historical data (33.48% agree with statement 7 and 12.78% strongly agree) and predictive analytics (33.04% agrees with the statement number 8 and 15.86% strongly agree). Detailed results are presented in Table 8 .

Medical facilities focus on development in the field of data processing, as they confirm that they conduct analytical planning processes systematically and analyze new opportunities for strategic use of analytics in business and clinical activities (38.33% rather agree and 10.57% strongly agree with this statement). The situation is different with real-time data analysis, here, the situation is not so optimistic. Only 28.19% rather agree and 14.10% strongly agree with the statement that real-time analyses are performed to support an organization’s activities.

When considering whether a facility’s performance in the clinical area depends on the form of ownership, it can be concluded that taking the average and the Mann–Whitney U test depends. A higher degree of use of analyses in the clinical area can be observed in public institutions.

Whether a medical facility performs a descriptive or predictive analysis do not depend on the form of ownership (p > 0.05). It can be concluded that when analyzing the mean and median, they are higher in public facilities, than in private ones. What is more, the Mann–Whitney U test shows that these variables are dependent from each other (p < 0.05) (Table 9 ).

When considering whether a facility’s performance in the clinical area depends on its size, it can be concluded that taking the Kendall’s Tau (τ) it depends (p < 0.001; τ = 0.22), and the correlation is weak but statistically important. This means that the use of data and analytical systems to support clinical decisions (in the field of diagnostics and therapy) increases with the increase of size of the medical facility. A similar relationship, but even less powerful, can be found in the use of descriptive and predictive analyses (Table 10 ).

Considering the results of research in the area of analytical maturity of medical facilities, 8.81% of medical facilities stated that they are at the first level of maturity, i.e. an organization has developed analytical skills and does not perform analyses. As much as 13.66% of medical facilities confirmed that they have poor analytical skills, while 38.33% of the medical facility has located itself at level 3, meaning that “there is a lot to do in analytics”. On the other hand, 28.19% believe that analytical capabilities are well developed and 6.61% stated that analytics are at the highest level and the analytical capabilities are very well developed. Detailed data is presented in Table 11 . Average amounts to 3.11 and Median to 3.

The results of the research have enabled the formulation of following conclusions. Medical facilities in Poland are working on both structured and unstructured data. This data comes from databases, transactions, unstructured content of emails and documents, devices and sensors. However, the use of data from social media is smaller. In their activity, they reach for analytics in the administrative and business, as well as in the clinical area. Also, the decisions made are largely data-driven.

In summary, analysis of the literature that the benefits that medical facilities can get using Big Data Analytics in their activities relate primarily to patients, physicians and medical facilities. It can be confirmed that: patients will be better informed, will receive treatments that will work for them, will have prescribed medications that work for them and not be given unnecessary medications [ 78 ]. Physician roles will likely change to more of a consultant than decision maker. They will advise, warn, and help individual patients and have more time to form positive and lasting relationships with their patients in order to help people. Medical facilities will see changes as well, for example in fewer unnecessary hospitalizations, resulting initially in less revenue, but after the market adjusts, also the accomplishment [ 78 ]. The use of Big Data Analytics can literally revolutionize the way healthcare is practiced for better health and disease reduction.

The analysis of the latest data reveals that data analytics increase the accuracy of diagnoses. Physicians can use predictive algorithms to help them make more accurate diagnoses [ 45 ]. Moreover, it could be helpful in preventive medicine and public health because with early intervention, many diseases can be prevented or ameliorated [ 29 ]. Predictive analytics also allows to identify risk factors for a given patient, and with this knowledge patients will be able to change their lives what, in turn, may contribute to the fact that population disease patterns may dramatically change, resulting in savings in medical costs. Moreover, personalized medicine is the best solution for an individual patient seeking treatment. It can help doctors decide the exact treatments for those individuals. Better diagnoses and more targeted treatments will naturally lead to increases in good outcomes and fewer resources used, including doctors’ time.

The quantitative analysis of the research carried out and presented in this article made it possible to determine whether medical facilities in Poland use Big Data Analytics and if so, in which areas. Thanks to the results obtained it was possible to formulate the following conclusions. Medical facilities are working on both structured and unstructured data, which comes from databases, transactions, unstructured content of emails and documents, devices and sensors. According to analytics, they reach for analytics in the administrative and business, as well as in the clinical area. It clearly showed that the decisions made are largely data-driven. The results of the study confirm what has been analyzed in the literature. Medical facilities are moving towards data-based healthcare and its benefits.

In conclusion, Big Data Analytics has the potential for positive impact and global implications in healthcare. Future research on the use of Big Data in medical facilities will concern the definition of strategies adopted by medical facilities to promote and implement such solutions, as well as the benefits they gain from the use of Big Data analysis and how the perspectives in this area are seen.

Practical implications

This work sought to narrow the gap that exists in analyzing the possibility of using Big Data Analytics in healthcare. Showing how medical facilities in Poland are doing in this respect is an element that is part of global research carried out in this area, including [ 29 , 32 , 60 ].

Limitations and future directions

The research described in this article does not fully exhaust the questions related to the use of Big Data Analytics in Polish healthcare facilities. Only some of the dimensions characterizing the use of data by medical facilities in Poland have been examined. In order to get the full picture, it would be necessary to examine the results of using structured and unstructured data analytics in healthcare. Future research may examine the benefits that medical institutions achieve as a result of the analysis of structured and unstructured data in the clinical and management areas and what limitations they encounter in these areas. For this purpose, it is planned to conduct in-depth interviews with chosen medical facilities in Poland. These facilities could give additional data for empirical analyses based more on their suggestions. Further research should also include medical institutions from beyond the borders of Poland, enabling international comparative analyses.

Future research in the healthcare field has virtually endless possibilities. These regard the use of Big Data Analytics to diagnose specific conditions [ 47 , 66 , 69 , 76 ], propose an approach that can be used in other healthcare applications and create mechanisms to identify “patients like me” [ 75 , 80 ]. Big Data Analytics could also be used for studies related to the spread of pandemics, the efficacy of covid treatment [ 18 , 79 ], or psychology and psychiatry studies, e.g. emotion recognition [ 35 ].

Availability of data and materials

The datasets for this study are available on request to the corresponding author.

Abouelmehdi K, Beni-Hessane A, Khaloufi H. Big healthcare data: preserving security and privacy. J Big Data. 2018. https://doi.org/10.1186/s40537-017-0110-7 .

Article   Google Scholar  

Agrawal A, Choudhary A. Health services data: big data analytics for deriving predictive healthcare insights. Health Serv Eval. 2019. https://doi.org/10.1007/978-1-4899-7673-4_2-1 .

Al Mayahi S, Al-Badi A, Tarhini A. Exploring the potential benefits of big data analytics in providing smart healthcare. In: Miraz MH, Excell P, Ware A, Ali M, Soomro S, editors. Emerging technologies in computing—first international conference, iCETiC 2018, proceedings (Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST). Cham: Springer; 2018. p. 247–58. https://doi.org/10.1007/978-3-319-95450-9_21 .

Bainbridge M. Big data challenges for clinical and precision medicine. In: Househ M, Kushniruk A, Borycki E, editors. Big data, big challenges: a healthcare perspective: background, issues, solutions and research directions. Cham: Springer; 2019. p. 17–31.

Google Scholar  

Bartuś K, Batko K, Lorek P. Business intelligence systems: barriers during implementation. In: Jabłoński M, editor. Strategic performance management new concept and contemporary trends. New York: Nova Science Publishers; 2017. p. 299–327. ISBN: 978-1-53612-681-5.

Bartuś K, Batko K, Lorek P. Diagnoza wykorzystania big data w organizacjach-wybrane wyniki badań. Informatyka Ekonomiczna. 2017;3(45):9–20.

Bartuś K, Batko K, Lorek P. Wykorzystanie rozwiązań business intelligence, competitive intelligence i big data w przedsiębiorstwach województwa śląskiego. Przegląd Organizacji. 2018;2:33–9.

Batko K. Możliwości wykorzystania Big Data w ochronie zdrowia. Roczniki Kolegium Analiz Ekonomicznych. 2016;42:267–82.

Bi Z, Cochran D. Big data analytics with applications. J Manag Anal. 2014;1(4):249–65. https://doi.org/10.1080/23270012.2014.992985 .

Boerma T, Requejo J, Victora CG, Amouzou A, Asha G, Agyepong I, Borghi J. Countdown to 2030: tracking progress towards universal coverage for reproductive, maternal, newborn, and child health. Lancet. 2018;391(10129):1538–48.

Bollier D, Firestone CM. The promise and peril of big data. Washington, D.C: Aspen Institute, Communications and Society Program; 2010. p. 1–66.

Bose R. Competitive intelligence process and tools for intelligence analysis. Ind Manag Data Syst. 2008;108(4):510–28.

Carter P. Big data analytics: future architectures, skills and roadmaps for the CIO: in white paper, IDC sponsored by SAS. 2011. p. 1–16.

Castro EM, Van Regenmortel T, Vanhaecht K, Sermeus W, Van Hecke A. Patient empowerment, patient participation and patient-centeredness in hospital care: a concept analysis based on a literature review. Patient Educ Couns. 2016;99(12):1923–39.

Chen H, Chiang RH, Storey VC. Business intelligence and analytics: from big data to big impact. MIS Q. 2012;36(4):1165–88.

Chen CP, Zhang CY. Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci. 2014;275:314–47.

Chomiak-Orsa I, Mrozek B. Główne perspektywy wykorzystania big data w mediach społecznościowych. Informatyka Ekonomiczna. 2017;3(45):44–54.

Corsi A, de Souza FF, Pagani RN, et al. Big data analytics as a tool for fighting pandemics: a systematic review of literature. J Ambient Intell Hum Comput. 2021;12:9163–80. https://doi.org/10.1007/s12652-020-02617-4 .

Davenport TH, Harris JG. Competing on analytics, the new science of winning. Boston: Harvard Business School Publishing Corporation; 2007.

Davenport TH. Big data at work: dispelling the myths, uncovering the opportunities. Boston: Harvard Business School Publishing; 2014.

De Cnudde S, Martens D. Loyal to your city? A data mining analysis of a public service loyalty program. Decis Support Syst. 2015;73:74–84.

Erickson S, Rothberg H. Data, information, and intelligence. In: Rodriguez E, editor. The analytics process. Boca Raton: Auerbach Publications; 2017. p. 111–26.

Fang H, Zhang Z, Wang CJ, Daneshmand M, Wang C, Wang H. A survey of big data research. IEEE Netw. 2015;29(5):6–9.

Fredriksson C. Organizational knowledge creation with big data. A case study of the concept and practical use of big data in a local government context. 2016. https://www.abo.fi/fakultet/media/22103/fredriksson.pdf .

Gandomi A, Haider M. Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag. 2015;35(2):137–44.

Groves P, Kayyali B, Knott D, Van Kuiken S. The ‘big data’ revolution in healthcare. Accelerating value and innovation. 2015. http://www.pharmatalents.es/assets/files/Big_Data_Revolution.pdf (Reading: 10.04.2019).

Gupta V, Rathmore N. Deriving business intelligence from unstructured data. Int J Inf Comput Technol. 2013;3(9):971–6.

Gupta V, Singh VK, Ghose U, Mukhija P. A quantitative and text-based characterization of big data research. J Intell Fuzzy Syst. 2019;36:4659–75.

Hampel HOBS, O’Bryant SE, Castrillo JI, Ritchie C, Rojkova K, Broich K, Escott-Price V. PRECISION MEDICINE-the golden gate for detection, treatment and prevention of Alzheimer’s disease. J Prev Alzheimer’s Dis. 2016;3(4):243.

Harerimana GB, Jang J, Kim W, Park HK. Health big data analytics: a technology survey. IEEE Access. 2018;6:65661–78. https://doi.org/10.1109/ACCESS.2018.2878254 .

Hu H, Wen Y, Chua TS, Li X. Toward scalable systems for big data analytics: a technology tutorial. IEEE Access. 2014;2:652–87.

Hussain S, Hussain M, Afzal M, Hussain J, Bang J, Seung H, Lee S. Semantic preservation of standardized healthcare documents in big data. Int J Med Inform. 2019;129:133–45. https://doi.org/10.1016/j.ijmedinf.2019.05.024 .

Islam MS, Hasan MM, Wang X, Germack H. A systematic review on healthcare analytics: application and theoretical perspective of data mining. In: Healthcare. Basel: Multidisciplinary Digital Publishing Institute; 2018. p. 54.

Ismail A, Shehab A, El-Henawy IM. Healthcare analysis in smart big data analytics: reviews, challenges and recommendations. In: Security in smart cities: models, applications, and challenges. Cham: Springer; 2019. p. 27–45.

Jain N, Gupta V, Shubham S, et al. Understanding cartoon emotion using integrated deep neural network on large dataset. Neural Comput Appl. 2021. https://doi.org/10.1007/s00521-021-06003-9 .

Janssen M, van der Voort H, Wahyudi A. Factors influencing big data decision-making quality. J Bus Res. 2017;70:338–45.

Jordan SR. Beneficence and the expert bureaucracy. Public Integr. 2014;16(4):375–94. https://doi.org/10.2753/PIN1099-9922160404 .

Knapp MM. Big data. J Electron Resourc Med Libr. 2013;10(4):215–22.

Koti MS, Alamma BH. Predictive analytics techniques using big data for healthcare databases. In: Smart intelligent computing and applications. New York: Springer; 2019. p. 679–86.

Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff. 2014;33(7):1163–70.

Kruse CS, Goswamy R, Raval YJ, Marawi S. Challenges and opportunities of big data in healthcare: a systematic review. JMIR Med Inform. 2016;4(4):e38.

Kyoungyoung J, Gang HK. Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. Healthc Inform Res. 2013;19(2):79–85.

Laney D. Application delivery strategies 2011. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf .

Lee IK, Wang CC, Lin MC, Kung CT, Lan KC, Lee CT. Effective strategies to prevent coronavirus disease-2019 (COVID-19) outbreak in hospital. J Hosp Infect. 2020;105(1):102.

Lerner I, Veil R, Nguyen DP, Luu VP, Jantzen R. Revolution in health care: how will data science impact doctor-patient relationships? Front Public Health. 2018;6:99.

Lytras MD, Papadopoulou P, editors. Applying big data analytics in bioinformatics and medicine. IGI Global: Hershey; 2017.

Ma K, et al. Big data in multiple sclerosis: development of a web-based longitudinal study viewer in an imaging informatics-based eFolder system for complex data analysis and management. In: Proceedings volume 9418, medical imaging 2015: PACS and imaging informatics: next generation and innovations. 2015. p. 941809. https://doi.org/10.1117/12.2082650 .

Mach-Król M. Analiza i strategia big data w organizacjach. In: Studia i Materiały Polskiego Stowarzyszenia Zarządzania Wiedzą. 2015;74:43–55.

Madsen LB. Data-driven healthcare: how analytics and BI are transforming the industry. Hoboken: Wiley; 2014.

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Hung BA. Big data: the next frontier for innovation, competition, and productivity. Washington: McKinsey Global Institute; 2011.

Marconi K, Dobra M, Thompson C. The use of big data in healthcare. In: Liebowitz J, editor. Big data and business analytics. Boca Raton: CRC Press; 2012. p. 229–48.

Mehta N, Pandit A. Concurrence of big data analytics and healthcare: a systematic review. Int J Med Inform. 2018;114:57–65.

Michel M, Lupton D. Toward a manifesto for the ‘public understanding of big data.’ Public Underst Sci. 2016;25(1):104–16. https://doi.org/10.1177/0963662515609005 .

Mikalef P, Krogstie J. Big data analytics as an enabler of process innovation capabilities: a configurational approach. In: International conference on business process management. Cham: Springer; 2018. p. 426–41.

Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M. Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun Surv Tutor. 2018;20(4):2923–60.

Nambiar R, Bhardwaj R, Sethi A, Vargheese R. A look at challenges and opportunities of big data analytics in healthcare. In: 2013 IEEE international conference on big data; 2013. p. 17–22.

Ohlhorst F. Big data analytics: turning big data into big money, vol. 65. Hoboken: Wiley; 2012.

Olszak C, Mach-Król M. A conceptual framework for assessing an organization’s readiness to adopt big data. Sustainability. 2018;10(10):3734.

Olszak CM. Toward better understanding and use of business intelligence in organizations. Inf Syst Manag. 2016;33(2):105–23.

Palanisamy V, Thirunavukarasu R. Implications of big data analytics in developing healthcare frameworks—a review. J King Saud Univ Comput Inf Sci. 2017;31(4):415–25.

Provost F, Fawcett T. Data science and its relationship to big data and data-driven decisionmaking. Big Data. 2013;1(1):51–9.

Raghupathi W, Raghupathi V. An overview of health analytics. J Health Med Inform. 2013;4:132. https://doi.org/10.4172/2157-7420.1000132 .

Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2(1):3.

Ratia M, Myllärniemi J. Beyond IC 4.0: the future potential of BI-tool utilization in the private healthcare, conference: proceedings IFKAD, 2018 at: Delft, The Netherlands.

Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform. 2018. https://doi.org/10.1515/jib-2017-0030 .

Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol. 2016;13(6):350–9. https://doi.org/10.1038/nrcardio.2016.42 .

Schmarzo B. Big data: understanding how data powers big business. Indianapolis: Wiley; 2013.

Senthilkumar SA, Rai BK, Meshram AA, Gunasekaran A, Chandrakumarmangalam S. Big data in healthcare management: a review of literature. Am J Theor Appl Bus. 2018;4:57–69.

Shubham S, Jain N, Gupta V, et al. Identify glomeruli in human kidney tissue images using a deep learning approach. Soft Comput. 2021. https://doi.org/10.1007/s00500-021-06143-z .

Thuemmler C. The case for health 4.0. In: Thuemmler C, Bai C, editors. Health 4.0: how virtualization and big data are revolutionizing healthcare. New York: Springer; 2017.

Tsai CW, Lai CF, Chao HC, et al. Big data analytics: a survey. J Big Data. 2015;2:21. https://doi.org/10.1186/s40537-015-0030-3 .

Wamba SF, Gunasekaran A, Akter S, Ji-fan RS, Dubey R, Childe SJ. Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res. 2017;70:356–65.

Wang Y, Byrd TA. Business analytics-enabled decision-making effectiveness through knowledge absorptive capacity in health care. J Knowl Manag. 2017;21(3):517–39.

Wang Y, Kung L, Wang W, Yu C, Cegielski CG. An integrated big data analytics-enabled transformation model: application to healthcare. Inf Manag. 2018;55(1):64–79.

Wicks P, et al. Scaling PatientsLikeMe via a “generalized platform” for members with chronic illness: web-based survey study of benefits arising. J Med Internet Res. 2018;20(5):e175.

Willems SM, et al. The potential use of big data in oncology. Oral Oncol. 2019;98:8–12. https://doi.org/10.1016/j.oraloncology.2019.09.003 .

Williams N, Ferdinand NP, Croft R. Project management maturity in the age of big data. Int J Manag Proj Bus. 2014;7(2):311–7.

Winters-Miner LA. Seven ways predictive analytics can improve healthcare. Medical predictive analytics have the potential to revolutionize healthcare around the world. 2014. https://www.elsevier.com/connect/seven-ways-predictive-analytics-can-improve-healthcare (Reading: 15.04.2019).

Wu J, et al. Application of big data technology for COVID-19 prevention and control in China: lessons and recommendations. J Med Internet Res. 2020;22(10): e21980.

Yan L, Peng J, Tan Y. Network dynamics: how can we find patients like us? Inf Syst Res. 2015;26(3):496–512.

Yang JJ, Li J, Mulder J, Wang Y, Chen S, Wu H, Pan H. Emerging information technologies for enhanced healthcare. Comput Ind. 2015;69:3–11.

Zhang Q, Yang LT, Chen Z, Li P. A survey on deep learning for big data. Inf Fusion. 2018;42:146–57.

Download references

Acknowledgements

We would like to thank those who have touched our science paths.

This research was fully funded as statutory activity—subsidy of Ministry of Science and Higher Education granted for Technical University of Czestochowa on maintaining research potential in 2018. Research Number: BS/PB–622/3020/2014/P. Publication fee for the paper was financed by the University of Economics in Katowice.

Author information

Authors and affiliations.

Department of Business Informatics, University of Economics in Katowice, Katowice, Poland

Kornelia Batko

Department of Biomedical Processes and Systems, Institute of Health and Nutrition Sciences, Częstochowa University of Technology, Częstochowa, Poland

Andrzej Ślęzak

You can also search for this author in PubMed   Google Scholar

Contributions

KB proposed the concept of research and its design. The manuscript was prepared by KB with the consultation of AŚ. AŚ reviewed the manuscript for getting its fine shape. KB prepared the manuscript in the contexts such as definition of intellectual content, literature search, data acquisition, data analysis, and so on. AŚ obtained research funding. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Kornelia Batko .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Batko, K., Ślęzak, A. The use of Big Data Analytics in healthcare. J Big Data 9 , 3 (2022). https://doi.org/10.1186/s40537-021-00553-4

Download citation

Received : 28 August 2021

Accepted : 19 December 2021

Published : 06 January 2022

DOI : https://doi.org/10.1186/s40537-021-00553-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Big Data Analytics
  • Data-driven healthcare

dissertation big data analysis

Grad Coach

Research Topics & Ideas: Data Science

50 Topic Ideas To Kickstart Your Research Project

Research topics and ideas about data science and big data analytics

If you’re just starting out exploring data science-related topics for your dissertation, thesis or research project, you’ve come to the right place. In this post, we’ll help kickstart your research by providing a hearty list of data science and analytics-related research ideas , including examples from recent studies.

PS – This is just the start…

We know it’s exciting to run through a list of research topics, but please keep in mind that this list is just a starting point . These topic ideas provided here are intentionally broad and generic , so keep in mind that you will need to develop them further. Nevertheless, they should inspire some ideas for your project.

To develop a suitable research topic, you’ll need to identify a clear and convincing research gap , and a viable plan to fill that gap. If this sounds foreign to you, check out our free research topic webinar that explores how to find and refine a high-quality research topic, from scratch. Alternatively, consider our 1-on-1 coaching service .

Research topic idea mega list

Data Science-Related Research Topics

  • Developing machine learning models for real-time fraud detection in online transactions.
  • The use of big data analytics in predicting and managing urban traffic flow.
  • Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.
  • The application of predictive analytics in personalizing cancer treatment plans.
  • Analyzing consumer behavior through big data to enhance retail marketing strategies.
  • The role of data science in optimizing renewable energy generation from wind farms.
  • Developing natural language processing algorithms for real-time news aggregation and summarization.
  • The application of big data in monitoring and predicting epidemic outbreaks.
  • Investigating the use of machine learning in automating credit scoring for microfinance.
  • The role of data analytics in improving patient care in telemedicine.
  • Developing AI-driven models for predictive maintenance in the manufacturing industry.
  • The use of big data analytics in enhancing cybersecurity threat intelligence.
  • Investigating the impact of sentiment analysis on brand reputation management.
  • The application of data science in optimizing logistics and supply chain operations.
  • Developing deep learning techniques for image recognition in medical diagnostics.
  • The role of big data in analyzing climate change impacts on agricultural productivity.
  • Investigating the use of data analytics in optimizing energy consumption in smart buildings.
  • The application of machine learning in detecting plagiarism in academic works.
  • Analyzing social media data for trends in political opinion and electoral predictions.
  • The role of big data in enhancing sports performance analytics.
  • Developing data-driven strategies for effective water resource management.
  • The use of big data in improving customer experience in the banking sector.
  • Investigating the application of data science in fraud detection in insurance claims.
  • The role of predictive analytics in financial market risk assessment.
  • Developing AI models for early detection of network vulnerabilities.

Research topic evaluator

Data Science Research Ideas (Continued)

  • The application of big data in public transportation systems for route optimization.
  • Investigating the impact of big data analytics on e-commerce recommendation systems.
  • The use of data mining techniques in understanding consumer preferences in the entertainment industry.
  • Developing predictive models for real estate pricing and market trends.
  • The role of big data in tracking and managing environmental pollution.
  • Investigating the use of data analytics in improving airline operational efficiency.
  • The application of machine learning in optimizing pharmaceutical drug discovery.
  • Analyzing online customer reviews to inform product development in the tech industry.
  • The role of data science in crime prediction and prevention strategies.
  • Developing models for analyzing financial time series data for investment strategies.
  • The use of big data in assessing the impact of educational policies on student performance.
  • Investigating the effectiveness of data visualization techniques in business reporting.
  • The application of data analytics in human resource management and talent acquisition.
  • Developing algorithms for anomaly detection in network traffic data.
  • The role of machine learning in enhancing personalized online learning experiences.
  • Investigating the use of big data in urban planning and smart city development.
  • The application of predictive analytics in weather forecasting and disaster management.
  • Analyzing consumer data to drive innovations in the automotive industry.
  • The role of data science in optimizing content delivery networks for streaming services.
  • Developing machine learning models for automated text classification in legal documents.
  • The use of big data in tracking global supply chain disruptions.
  • Investigating the application of data analytics in personalized nutrition and fitness.
  • The role of big data in enhancing the accuracy of geological surveying for natural resource exploration.
  • Developing predictive models for customer churn in the telecommunications industry.
  • The application of data science in optimizing advertisement placement and reach.

Recent Data Science-Related Studies

While the ideas we’ve presented above are a decent starting point for finding a research topic, they are fairly generic and non-specific. So, it helps to look at actual studies in the data science and analytics space to see how this all comes together in practice.

Below, we’ve included a selection of recent studies to help refine your thinking. These are actual studies,  so they can provide some useful insight as to what a research topic looks like in practice.

  • Data Science in Healthcare: COVID-19 and Beyond (Hulsen, 2022)
  • Auto-ML Web-application for Automated Machine Learning Algorithm Training and evaluation (Mukherjee & Rao, 2022)
  • Survey on Statistics and ML in Data Science and Effect in Businesses (Reddy et al., 2022)
  • Visualization in Data Science VDS @ KDD 2022 (Plant et al., 2022)
  • An Essay on How Data Science Can Strengthen Business (Santos, 2023)
  • A Deep study of Data science related problems, application and machine learning algorithms utilized in Data science (Ranjani et al., 2022)
  • You Teach WHAT in Your Data Science Course?!? (Posner & Kerby-Helm, 2022)
  • Statistical Analysis for the Traffic Police Activity: Nashville, Tennessee, USA (Tufail & Gul, 2022)
  • Data Management and Visual Information Processing in Financial Organization using Machine Learning (Balamurugan et al., 2022)
  • A Proposal of an Interactive Web Application Tool QuickViz: To Automate Exploratory Data Analysis (Pitroda, 2022)
  • Applications of Data Science in Respective Engineering Domains (Rasool & Chaudhary, 2022)
  • Jupyter Notebooks for Introducing Data Science to Novice Users (Fruchart et al., 2022)
  • Towards a Systematic Review of Data Science Programs: Themes, Courses, and Ethics (Nellore & Zimmer, 2022)
  • Application of data science and bioinformatics in healthcare technologies (Veeranki & Varshney, 2022)
  • TAPS Responsibility Matrix: A tool for responsible data science by design (Urovi et al., 2023)
  • Data Detectives: A Data Science Program for Middle Grade Learners (Thompson & Irgens, 2022)
  • MACHINE LEARNING FOR NON-MAJORS: A WHITE BOX APPROACH (Mike & Hazzan, 2022)
  • COMPONENTS OF DATA SCIENCE AND ITS APPLICATIONS (Paul et al., 2022)
  • Analysis on the Application of Data Science in Business Analytics (Wang, 2022)

As you can see, these research topics are a lot more focused than the generic topic ideas we presented earlier. So, for you to develop a high-quality research topic, you’ll need to get specific and laser-focused on a specific context with specific variables of interest.  In the video below, we explore some other important things you’ll need to consider when crafting your research topic.

Get 1-On-1 Help

If you’re still unsure about how to find a quality research topic, check out our Research Topic Kickstarter service, which is the perfect starting point for developing a unique, well-justified research topic.

Research Topic Kickstarter - Need Help Finding A Research Topic?

You Might Also Like:

IT & Computer Science Research Topics

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • How it works

researchprospect post subheader

Sample Masters Big Data Full Dissertation

Here is a sample that showcases why we are one of the world’s leading academic writing firms. This assignment was created by one of our expert academic writers and demonstrated the highest academic quality. Place your order today to achieve academic greatness.

View a different grade

Investigating the Impact of Big Data on Automobile Industry Operations

The current study uses a quantitative research approach to analyze how Big Data initiatives impact the operation functions of automobile companies in the UK. The research used a survey as the research instrument to gather data from 132 participants working in automobile companies in the UK. The survey looked to examine the opinions that executives had held about Big Data and how it impacted the company. The survey was distributed online to individuals that worked for automobile companies in the UK using Survey Monkey. The data obtained were then analyzed using descriptive statistics to find factors that may be influencing the use of Big Data in automobile companies. Based on these results, it is concluded that more significant investments in Big Data bring about positive impacts. The results presented conclude that investing more than 1 billion GBP on Big Data initiatives would provide greater tangible benefits for a business and positively impact the company. The results also found that companies with greater analytical abilities on the adequate and above adequate range could see measurable results. In the end, Big Data did have a positive and large impact on the operations business function of automobile companies.

Chapter 1: Introduction to the Research Topic

Introduction.

Big Data is recently on the rise as imperative information and tools need to be incorporated into businesses and daily life. Pflugfelder (2013) defines Big Data as large in volume, high in velocity, extensive in its variety, unable to be handled using conventional systems like relational databases, and needs unique and advanced technology for storage management analysis and visualization. However, the actual definition of Big Data varies from industry to industry and business to business. Schroeck et al. (2012) found in their research that 18 percent of businesses defined Big Data as a vast source of information, 15 percent of companies named it as real-time information, and seven percent of these businesses considered Big Data as a source of information from social media. By combining these demarcations, the resultant is a definition that portrays Big Data as a source of information that can be structured, unstructured, and semi-structured, which needs new technology, tools, and techniques for its storage, processing, analysis, and visualization for a large volume of data that is emitted at high speed and variety.

Significance of the Research Area

The automobile industry is increasingly becoming competitive in sustaining economies, especially with fierce competition between Western and Eastern manufacturers (Wallner and Kriglstein 2013). The industry has had a significant impact on regional and world economies and societies (Lee et al., 2014). To capture a large chunk of the market and consumers’ interest in an increasingly competitive market, it is crucial to make decisions based on real-time data. For this reason, many automobile companies around the world have begun to integrate Big Data into their decision-making process that ranges from manufacturing to marketing. Walker (2015) has found that integrating Big Data into business-related tasks in the automobile industry can be accomplished through the following; • Recalculating entire risk assortments within minutes • Identifying fraudulent behaviour quickly, which might affect the automobile industry. • Regulating root causes of problems, issues, failures, and defects could affect the longer term and shorter term. • Generating sales based on market research of consumer behaviour. With automobiles being an intricate part of developed society, it becomes mandatory for companies to ensure that they are providing quality products for the masses. Big Data can play a significant role in the business activities of automobile companies. With changing consumer behaviour and more informed consumers, it has become essential for companies to integrate real-time information into business decisions.

Problem Statement

Automobile manufacturing in the UK has become a vital part of its economy. According to the Society of Motor Manufacturers and Traders (2016), there were 2.63 million cars registered in 2015, which increased six percent from 2014. Due to the rapid changes and developments in the automobile industry, Big Data analysis has become vital to ensure new success levels in this revolutionary period. For this reason, the current study looks to understand the significance that Big Data plays in the Automobile industry. To compete in an already competitive environment, it has become necessary for businesses to understand the value that Big Data can bring to them. This makes it imperative for Big Data users to make decisions that can bring a competitive edge to the business, or else integrating Big Data becomes of use now. Schroeck et al. (2012) find that a vast deal of available data to companies is commonly unrelated. It comes from various data sources such as sensors, mobiles, transactions, social medial, log files, audio, video, images, and emails. The processing of such large amounts of data to produce meaningful decisions has become critical for businesses to thrive and succeed in markets where consumer trends cannot change rapidly (Shah et al., 2014). The automobile industry in the UK needs to improve its decision-making process to advance critical operations to compete in a highly competitive regional and international market. Monaghan (2016) notes that the British car industry has been enjoying prolonged periods of growth, as witnessed with a car production increase in June 2016 that rose by 10.4% to 159,000 cars, the highest since June 1998. According to the Society of Motor Manufacturers and Traders (SMMT), by 2017, the UK has the possibility of building a record number of vehicles per year that may overtake France and Spain to become Europe’s second-largest producer after Germany (Foy 2014). However, Foy (2014) points that such success may be hindered due to the eroding supply chain and operations of the British car industry, primarily smaller companies that provide parts and electronic components that go into cars, making it the biggest concern of the industry. To overcome these concerns, many are looking towards including more efficient data that may help industry leaders make better decisions for more prosperous businesses (Shooter 2013).

Research Aim and Objectives

Big Data is now widely being used in the automobile industry to take quick actions, saving time and cost prices. Understanding how the automobile industry can integrate the analysis of Big Data into its daily operations has become imperative to improve its integration and ensure that Big Data is being used correctly to obtain the maximum benefit from it. Therefore, the following research question has been formulated.

How has Big Data Impacted the UK Automobile Industry’s Operations?

Based on the research question, the research’s main aim is to investigate the impact of Big Data on the automobile industry, specifically in the UK, operations such as sales, customer retention, the manufacturing process, performance, marketing, logistics, and supply chain management. To achieve the research aim and answer the research question, the following objectives have been developed. • Assess the impact of Big Data on sales of automobiles and their marketing. • Assess the impact Big Data has on ensuring customer retention. • Examine how Big Data has revolutionized the automobile industry in the Uk and increased the potential use for business analytics. • Assess the impact that Big Data can have on improving the performance and efficiency of an automobile company.

Research Approach

The current study will be conducted using a qualitative research approach. Based on the sections above, the study’s aim and objectives have been developed to pursue the study’s research question using the proposed research approach. To build a research approach, a literature review was conducted to understand previous studies that have attempted to analyze Big Data’s influence in various industries. The results of the literature review (i.e., chapter two) aided in building the research approach. Under this approach, primary research is conducted using semi-structured interviews as the research instrument for data collection. The justification of this approach will be discussed in detail in chapter three of the study.

Project Outline

The current study is divided into six chapters. Below is the outline of the study;

bigdata image

Big Data has been influential in the 21st century by providing industries and companies with detailed information to make more intelligent business decisions. Very little research has been conducted on how Big Data impacts the automobile industry. Therefore, the current study aims to analyze and comprehend how Big Data impacts the UK automobile industry in influencing operations, sales, marketing, and other business aspects. For this purpose, the study developed a set of objectives that will be used to fulfil the study’s aim and the primary research question. The study is structured according to a qualitative research approach. Building the research approach designated the need for a literature review presented in the next chapter (i.e., Chapter two).

Chapter 2: Literature Review

The literature review chapter is constructed based on systematic research principles to provide an in-depth analysis of previously published literature on topics related to the current research. The literature review will provide critical insight into various definitions relevant to developing the current research and its primary focus throughout the dissertation. To conduct this literature review, it was essential to search for relevant papers through various databases such as Wiley Online Library, Science Direct, IEEE Xplore Digital Library, and Google Scholar. For the current literature review, the chapter is divided into sections that answer the literature reviews research questions, which are as follows;

  • How to have various other fields and domains, other than the automobile industry, used Big Data analytics and visual analytics?
  • What are the types of data sources that have been reported in the literature?
  • What are types of Big Data visualization techniques and tools for Big Data visual analytics?

Previous literature that can provide understanding based on these questions was included in the literature review. Based on the analysis of the literature included, the methodology of the current research will be constructed.

The 7V’s of Big Data

Big Data is defined using the 7V’s known as volume, velocity, variety, variability, veracity, visualization, and value.

7V’s of Big Data

Figure 2.2-1: 7V’s of Big Data

Defining-7Vs-of-Big-Data

Fields and Domains using Big Data and Visual Analytics

Based on the literature review, there are practically little to no publications available that portray the extent or detailed use of Big Data analytics in the automobile industry. However, many vital publications have noted that Big Data analytics is becoming a trend impacting businesses globally. Wozaniak et al. (2015) examined and determined to comprehend the type of data available to Volvo and how it was extracting such data. Based on the study, it is found that Volvo used data from its production planes and service centres to obtain data about their vehicles to assess information such as customer satisfaction, mileage coverage, and other vital factors that would improve decision making. Wozniak et al. (2015) found that Volvo uses data sources from logged production information; product specifications, client information, dealer information, product session information, telematics data, service history, repair history, warranties, and service contracts, which are then dispersed throughout the organization to specific departments, software teams, and engineers to use the data for production or operations improvements. Many other industries are also using Big Data analytics for their services and products.

Big Data analytics can have a profound impact on the future of banking industries. Collecting data at a massive scale can allow banks to comprehend the needs and expectations of their customers. However, banks lack the skills to execute and deploy significant data initiatives as they leverage more familiar technologies and software-development lifecycle (SDLC) methodologies. To develop analytic tools that experts in the banking industry comprehend, it is essential to meld together accurate data interpretation on a user-friendly interface. Commotion is an example of a Big Data analytics tool that keeps the user in mind. Commotion allows for a comfortable and easy experience for bank data exploration (Laberge, Anderson, et al., 2012). The analytics tool will enable analysts to drag and drop data collections that produce variable chart visualizations. The process is formally known as the “think loop process,” allowing analysts to dig and separate larger data collections to explore particular hypotheses based on smaller groupings to understand banks’ network anomalies (Laberge, Anderson, et al., 2012).

Implementing Big Data into the transportation industry has allowed it to become resilient in extreme scenarios. A large portion of the world’s population has shifted to urban living areas requiring cities to deliver sustainable, effective, and efficient services. Big data analytic research projects are currently commenced in the transportation industry to deal with massive data coming from roads & vehicle sensors, GPS devices, customer apps, and other websites. Ben Ayed et al. (2015) have reported using Big Data analytics in Dublin to improve the city’s public bus transportation network and reduce issues with increased traffic congestion. Using advanced analytics on the collected data, specific traffic problems were identified. The optimal time needed to start bus lanes was answered, and recommendations were made to add bus lanes (Ayed et al., 2015).

Ferreira, Poco et al.’s 2013 study provides insight into taxi trips to visually query taxi trips allowing taxi companies to make better decisions to schedule driver shifts and increase revenue. The use of Big Data analytics in transportation has also allowed policymakers to develop improved preparation plans and disaster management plans for high-risk events such as accidents, public gatherings, and natural disasters. Using smart card data and social media data, the resilience of transportation systems can be increased by analyzing changes in passenger behaviour, replaying historical events within the specific area to discover anomalous situations, and customer service (Itoh, Yokoyama, et al. 2014).

Types of Big Data Sources

Unlike typical data, Big Data contains videos, text, audio, images, and other forms of data collected from numerous datasets making it difficult to process with traditional database management tools giving rise to a new generation of tools specifically designed to analyze and visualization Big Data.

Santourian et al. (2014) observe that Big Data is often generated from transactions (i.e., invoices, payment orders, delivery records, and storage records) or unstructured data such as text extracts from websites, social media, or images.

However, Santourian et al. (2104) note that Big Data can also be collected in “real-time” from sensors such as those found in smartphones or from logs extracted from behaviour found online.

Big Data’s rawness due to the velocity by which it is being received oftentimes is unable to serve a statistical purpose as they have been collected by third parties who don’t emphasize data collection.

Big Data sources vary across industries as data collection needs to fit the purpose for they are to be used in the analysis. For example, Fiore et al. (2015) use data sources that have been made available by project partners or made available through national and international agencies developing a more static setup for Big Data analysis.

This included sources of data coming from satellite imagery, remote sensing data, hyperspectral imagery, and climate data used to formulate a use case infrastructure to analyze climate change trends in Manaus, Brazil (Fiore et al., 2015).

A study conducted by Baciu et al. (2015) has reported that use of sources that vary across fields, such as extracting data from a website known as a Bright kite that collects data from 4.5 million mobile users locations, such as their latitude and longitudes of each of the mobile users over specific intervals of time.

Studies that are less scientifically complex in theme use other sources of data; such as text sources which include words, phrases, and even entire documents extracted from social media platforms (i.e., Facebook) is used to analyze and predict events such as market trends, analyses product defects, and management of calamities (Fan and Gordon 2014; Mahmud et al. 2014).

Large companies also use various data sources to collect raw data to turn it into meaningful knowledge that can then be used to improve customer service, examine product defects, analyze organizational changes, and comprehend changing consumer trends (Heer and Kandel 2012; Kateja et al. 2014).

Volvo, an automobile manufacturer, uses data sources from logged product information; product specifications, client information, dealer information, product session information, telematics data, service history, repair history, warranties, and service contracts, which is then dispersed throughout the organization to specific departments/divisions, software teams, and engineers to use the data for improvements (Wozniak et al. 2015).

Big Data Visualization Techniques and Tools for Big Data Visual Analytics

Vatrapu et al. (2015) define data visualization as a method to communicate and transfer information clearly and effectively through graphical means. Given the rise of Big Data, analysts have begun to use data visualization methods to visualize, recognize, differentiate, interpret, and communicate configured data patterns based on the new visualization techniques specifically for massive datasets.

With new techniques, data scientists, analysts, and industry leaders benefit from comprehending massive amounts of data, recognizing emerging properties within the data, data quality control, feature detection on a small and large scale, and evidence for formulating hypotheses.

Generally, all visualization techniques and tools follow a similar pattern which includes the use of processing steps of data acquisition data transforming, mapping data onto visual models, and lastly, rendering or viewing the data (Zhang et al. 2013; Goonetilleke et al. Liu et al. 2015; Fu et al. 2014). Following is a brief discussion of visualization tools and techniques that have been used across diverse industries and studies to Big Data.

Popular domains that highly demand Big Data are healthcare, automobile, transport/Urban infrastructure, banking, and retail. The chapter also found sources through which the domains discussed retrieve vital information/data to use as meaningful knowledge. It is evident from the literature review that sources for retrieval of data diverge significantly from normal sources.

Firstly, the Big Data sources will contain massive data from sensors such as those on a phone that monitor health. With such massive data, it is necessary to follow specific steps laid out for Big Data analytics. Data analysis is done to the most microscopic level that a researcher can go with such tremendous amounts of data.

Finally, data visualization becomes necessary for producing information that can be used to help in decision-making.

A systematic literature review has also revealed the numerous different sources from which Big Data is extracted. Sources vary depending on the domain, which is the source to extract specific kinds of data.

Literature reveals that typical Big Data contains videos, texts, audio, and images at massive levels of datasets. The datasets’ complexity produces a challenge for traditional database management tools to handle the volume of the data that is being analyzed.

Familiar sources for Big Data generation are payment orders, delivery records, invoices, and storage records. However, sources can be “real-time” if it is collected by sensors such as those present in smartphones.

Unstructured data is also commonly seen in Big Data ranging from social media posts, images, text extracts from websites, or even whole websites. Regardless of what type of data it is, the sources from which it is obtained will vary from industry to industry.

Data can come from social media data such as Facebook wall posts, comments, likes, and Twitter tweets, to name a few. Simultaneously, more experimental and scientific sources also provide specific data such as temperature, humidity, and wind speeds data in “real-time” to analyze and make predictions towards climate change.

Hire an Expert Dissertation Writer

Orders completed by our expert writers are

  • Formally drafted in the academic style
  • 100% Plagiarism-free & 100% Confidential
  • Never resold
  • Include unlimited free revisions
  • Completed to match exact client requirements

Hire an Expert Dissertation Writer

Chapter 3: Conceptual Framework

The chapter presents the conceptual framework for automobile company executives to adopt. This is achieved using the adopters’ category under the diffusion of innovations theory proposed by Rogers (2003).

Theoretical Development

The diffusion of innovations theory was heavily relied upon to develop the conceptual framework, as Rogers (2003) proposed. Based on the idea, diffusion is the process by which innovation is communicated over some time among those participating in a social system. According to Rogers (2003), four main elements influence the spread of a new idea; innovation, communication channels, time, and a social system. Currently, automobile companies are slowly creeping into Big Data to handle operations, as evident from the literature review. The process of diffusion relies extremely on human capital. This means that innovation needs to be widely adopted within a setting to self-sustain itself.

There are various strategies available to help an innovation reach the stage of critical mass. This includes the strategy of when an innovation is adopted by a highly respective person in an organisation and develops an instinctive desire for a specific innovation. Rogers (2003) argues that one of the best strategies is to place innovation into a group of individuals who can readily use the technology and provide positive reactions resulting in benefits to early adopters. By using the adoption process under innovation theory diffusion, automobile companies can target respected high-level executives to shift their support towards big data initiatives.

The proposed conceptual framework provides automobile companies with strategies that adopt big data initiatives to promote innovation. The best way to do so is to present innovation to highly respectable executives in the company to promote innovation to self-sustain it.

Chapter 4: Methodology

The current chapter presents developing the research methods needed to complete the experimentation portion of the current study. The chapter will discuss in detail the various stages of developing the methodology of the current study. This includes a detailed discussion of the philosophical background of the research method chosen. In addition to this, the chapter describes the data collection strategy, including a selection of research instrumentation and sampling. The chapter closes with a discussion on the analysis tools used to analyse the data collected.

Selecting an Appropriate Research Approach

Creswall (2013) stated that research approaches are plans and procedures that range from making broad assumptions to detailed methods of data collection, analysis, and interpretation.

The several decisions involved in the process are used to decide which approach should be used in a specific study that is informed using philosophical assumptions brought to the study (Creswall 2013).

These are procedures of inquiry or research designs and specific research methods used for data collection, its analysis, and finally, its interpretation. However, Guetterman (2015); Lewis (2015); and Creswall (2013) argue that the selection of the specific research approach is based on the nature of the research problem, or the issue that is being addressed by any study, personal experiences of the researchers’, and even the audience for which the study is being developed for.

The main three categories with which research approaches are organised include; qualitative, quantitative, and mixed research methods. Creswall (2013) comments that all three approaches are not considered discrete or distinct.

Creswall (2013) states, “qualitative and quantitative approaches should not be viewed as rigid, distinct categories, polar opposite, or dichotomies” (p.32).

Guetterman (2015) points out that a clearer way of viewing gradations of differences between the approaches is to examine the basic philosophical assumptions brought to the study, the kinds of research strategies used, and the particular methods implemented in conducting the strategy.

Underlying Philosophical Assumptions

An important component of defining the research approach involves philosophical assumptions that contribute to the broad research approach of planning or proposing to conduct research. It involves the intersection of philosophy, research designs, and specific methods, as illustrated in Fig. 1 below.

Research-Onion-Source-Saunders-and-Tosey

Figure 4.2-1- Research Onion (Source; Saunders and Tosey 2013)

Saunders et al. (2009) define research philosophy as a belief about how data about a phenomenon should be gathered, analyzed, and used. Positivism reflects the acceptance in adopting the philosophical stance of natural scientists (Saunders, 2003).

According to Remenyi et al. (1998), there is a greater preference in working with an “observable social reality” and that the outcome of such research can be “law-like” generalisations that are the same as those which physical and natural scientists produce.

Gill and Johnson (1997) add that it will also emphasise a highly structured methodology to replicate other studies. Dumke (2002) agrees and explains that a positivist philosophical assumption produces highly structured methods and allows for generalisation and quantification of objectives that statistical methods can evaluate.

For this philosophical approach, the researcher is considered an objective observer who should not be impacted by or impact the research subject.

The current study chooses positivist assumptions due to the literature review’s discussion of the importance of Big Data in industrial domains and the need to measure its success in business operations.

To identify a positive relationship between Big Data usage and beneficial business outcomes, the theory needs to be used to generate hypotheses that can later be tested of the relationship, which would allow for explanations of laws that can thereafter be assessed (Bryman and Bell, 2015).

Selecting Interpretive Research Approach

Interpretive research approaches are derived from the research philosophy that is adopted. According to Dumke (2002), the two main research approaches are deductive and inductive.

The inductive approach is commonly referred to when theory is derived from observations. Thus, the research begins with specific observations and measures. It is then from detecting some pattern that a hypothesis is developed.

Dumke (2002) argues that researchers who use an inductive approach usually work with qualitative data and apply various methods to gather specific information that places different views.

From the philosophical assumptions discussed in the previous section, it is reasonable to use the deductive approach for the current study. It is also considered the most commonly used theory to establish a relationship between theory and research. The figure below illustrates the steps used for the process of deduction.

The-process-of-deduction-Source-Bryman-and-Bell

Figure 4.2-2- The process of deduction (Source, Bryman and Bell 2015)

Based on what is known about a specific domain, the theoretical considerations encompassing it a hypothesis or hypotheses are deduced that will later be subjected to empirical inquiry (Daum, 2013). Through these hypotheses, concepts of the subject of interest will be translated into rational entities for a study. Researchers are then able to deduce their hypotheses and convert them into operational terms.

Justifying the Use of Quantitative Research Method

Saunders (2003) notes that almost all research will involve numerical data or even contain data quantified to help a researcher answer their research questions and meet the study’s objectives.

However, quantitative data refers to all data that can be a product of all research strategies (Bryman and Bell, 2015; Guetterman, 2015; Lewis, 2015; Saunders, 2003).

Based on the philosophical assumptions and interpretive research approach, a quantitative research method is the best suited for the current study. Mujis (2010) defends the use of quantitative research because, unlike qualitative research, which argues that there is no pre-existing reality, quantitative assumes that there is only a single reality about a social condition that researchers cannot influence way.

Selecting an Appropriate Research Strategy

There are many strategies available to implement in a study, as evidence by Fig. 1. There are many mono-quantitative methods, such as telephone interviews, web-based surveys, postal surveys, and structured questionnaires (Haq 2014).

Each instrument has its own pros and cons in terms of quality, time, and data cost. Brymand (2006); Driscoll et al. (2007); Edwards et al. (2002); and Newby et al. (2003) note that most researchers use structured questionnaires for data collection they are unable to control or influence respondents, which leads to low response rates but more accurate data obtained.

Saunders and Tosey (2015) have argued that quantitative data is simpler to obtain and more concise to present. Therefore, the current study uses a survey-based questionnaire (See Appendix A).

Justifying the use of Survey Based Questionnaire

Surveys are considered the most traditional forms of research and use in non-experimental descriptive designs that describe some reality. Survey-based questionnaires are often restricted to a representative sample of a potential group of the study’s interest.

In this case, it is the executives currently working for automobile companies in the UK. The survey instrument is then chosen for its effectiveness at being practical and inexpensive (Kelley et al., 2003).

The philosophical assumptions, interpretive approach, and methodological approach are chosen. The current study’s survey design is considered the best instrument in line with these premises and cost-effectively.

Empirical Research Methodology

Research design.

This section describes how research is designed to use the techniques used for data collection, sampling strategy, and data analysis for a quantitative method. Before going into the strategies of data collection and analysis, a set of hypotheses were developed.

Hypotheses Development

bigdata1 image

Data Collection

This section includes the sampling method used to collect the number of respondents needed to provide information then analysed after collection.

Sampling Method

Collis (2009) explains that there are many kinds of sampling methods that can be used for creating a specific target sample from a population. This current study uses simple random sampling to acquire respondents with which the survey will be conducted.

Simple random sampling is considered the most basic form of probability sampling. Under the method, elements taken from the population are random, with all elements having an equal chance of being selected.

According to the Office of National Statistics (ONS), as of 2014, there are about thirty-five active British car manufacturers in the UK, each having an employee population of 150 or more.

This is why the total population of employees in car manufacturers is estimated to be 5,250 employees. The sample therefore developed, used the following equation;

formulaa image

Where; N is the population size, e is a margin of error (as a decimal), z is confidence level (as a z-score), and p is percentage value (as a decimal). Thus, the sample size is with a normal distribution of 50%. With the above equation, a population of 5,250, with a 95% confidence level and 5% margin of error, the total sample size needed for the current equals 300. Therefore, N=300 is the sample size of the current study.

The survey development (see Appendix A) has a total of three sections, A, B, and C, with a total of 39 questions. Each section has its own set of questions to accomplish.

The survey is a mix of closed-ended questions that comprehend the respondent’s demographic make us, the Big Data initiatives of the company, and the impact that Big Data was having in their company. The survey is designed to take no longer than twenty minutes. The survey was constructed on Survey Monkey.com, an online survey-provided website.

The survey was left on the website for a duration of 3.5 weeks to ensure that a maximum number of respondents answered the survey. The only way the survey was allowed for a respondent is to pass a security question that asks if they are working for an automobile company in the UK to take the survey.

Gupta et al. (2004) believe that web surveys are visual stimuli, and the respondent has complete control concerning whether or how each question is read and understood. That is why Dillman (2000) argued that web questionnaires are expected to resemble those taken through the mail/postal services closely.

Data Analysis

The collected data is then analysed using the Statistical Package for Social Science (SPSS) version 24 for descriptive analysis. The demographic section of the survey will be analysed using descriptive statistics. Further analysis of the data also includes the use of descriptive statistics.

Conclusions

The chapter provides a descriptive and in-depth discussion of the methods involved in the current study’s research. The current study looks towards a quantitative approach that considers positivism as its philosophical undertaking, using deductive reasoning for its interpretive approach, a mono-quantitative method that involves using a survey instrument for data collection.

The methodology chapter also provided the data analysis technique, which is descriptive statistics through frequency analysis and regression analysis.

Chapter 5 Results and Analysis

The chapter provides the findings of the current study based on the survey results obtained. It provides a straightforward statement of the results using descriptive statistics, which would later be further analysed using SPSS v.24 software. The need for SPSS is to conduct a regression analysis to provide a detailed examination of the data.

Section A- Demographic Results

The study had called for 300 respondents to answer the survey using Survey Monkey, left online for 3.5 weeks. However, the total completed surveys obtained was 132, making the survey’s response rate only forty-four percent (44%). It was not the best response rate, but it still provided a broad range of participants to analyse.

The first question of the survey’s section A called for respondents to identify their job title for the current automobile company that they were working for. Fig. 5.2-1-1 shows that operations managers and supervisors made up the greatest number of respondents in the study.

Operations managers had composed 14 percent of the respondents, followed by the foreperson, supervisor, lead person of 13 percent, and the project managers that made 12 percent of the respondents.

Job-title image

Respondents were also asked to indicate the number of years they have been employed in a specific organisation. Allowing for such insight would provide a sense of experience that the participant may have had while working in the company.

How-long-have-you-worked-for-the-organization

Figure 5.2-2.- How long have you worked for the organization?

This is illustrated in Fig. 5.2-2, in which 42 percent of respondents have indicated that they have worked for the company for five to ten years. Of the respondents, thirty-three percent have indicated that they have worked for their company for 10-15 years, while thirty percent indicated less than five years. The remaining 27 percent indicated employment for over fifteen years.

The survey also asked respondents to indicate the number of employees who worked for the firm ty was employed in. Having such knowledge would allow the researcher to understand the extent of operations conducted in the automobile company. Having such an understanding provides insight into the scope of use in Big Data being implemented in the company (see Fig.5.2-3).

How-many-people-are-employed-in-the-organization-you-work-for

Figure 5.2-3- How many people are employed in the organization you work for?

A total of 46.97 percent of respondents indicated that they worked for companies that employed 50 to 250 employees. Also, 35.61 percent of employees indicated that they were employed by companies with more than 250 employees working for them.

Lastly, 17.42 percent of respondents indicated that the companies they worked for had 10-15 employees. Many respondents indicated that they worked for companies with more than 50 employees, indicating that the companies included in the study are small-to-medium businesses and large enterprises.

Big-Data-analytics

Figure 5.2-4 – Does your company use Big Data analytics?

Of the respondents participating, 72.73 percent indicated that their company was using Big Data analytics. This was crucial as it provided insight into the number of automobile companies with Big Data analytic systems.

As seen in Fig. 5.2.-5, only eighty-one (81) respondents from 132 had direct exposure to Big Data to either analyse, visualise, or make business decisions based on it. The pool of respondents was considerably smaller than anticipated.

Still, these slight details will provide greater insight into automobile companies’ workings regarding their use or integration of Big Data into the company. Based on the demographic analysis, participants who completed their survey had some access to Big Data analysis. But there is still a large group of people in these companies that do not have any exposure or access to Big Data.

bigdata2 image

Figure 5.2-5 – Have you ever been exposed to using any form of Big Data in terms of analyzing it, visualizing it, or making decisions based on it?

Section B & C- Company Big Data Initiative and Impacts Results

The next section of the questionnaire, section B, aimed to analyse the respondent’s answers to identify the extent of integration or implementation of Big Data initiatives in the automobile company they worked for.

This section aims to understand the extent to which Big Data is present in automobile companies. This information can compare with the next team, which looks to understand and examine the effects of big data initiatives in the company.

Fig. 5.3-1 illustrates the main issues that may have caused the automobile company to implement Big Data initiatives. Based on the graph, it is concluded that analysing streaming data and data sets greater than 1 Terabyte (TB) were the greatest cause of initiating Big Data into the company, as per the response of 19.70 percent of respondents, respectively.

bigdata image

Figure 5.3-1- What were the organization’s primary data issues that led it to consider Big Data?

Another issue that instigated Big Data analytics in companies was analysing data sets from 1 TB to 100 TB, as indicated by 18.18 percent of respondents. Next in the rank was analysing new data types, which led to using Big Data analytics as indicated by 13.64 percent of respondents.

Fig. 5.3-2 illustrates the reaction of respondents to two questions.

  • Questions 2. How would you rate the analytical abilities of the company you are employed in?
  • Question 5. How would you rate the access to relevant, accurate, and timely data in your company today?

analytical-abilities-of-the-company

Figure 5.3-2 – How would you rate the analytical abilities of the company you are employed in? How would you rate the access to relevant, accurate, and timely data in your company today?

There is a strong correlation between access to Big Data and the analytical abilities of the company. Based on the illustration, 55 people who had access to Big Data thought the access was adequate, with 42 of them believing that its analytical ability was adequate. Furthermore, 69 participants indicated that access to Big Data was more than adequate, with 57 participants believing that the firm’s analytical ability was more than adequate. It can be concluded that the greater the access to Big Data, the adequate or more adequate the analytical abilities of the firm.

The next graph indicates the amount of spending that is placed on a Big Data initiative’s budget. Oftentimes, it was seen, as from the literature review, that funding Big Data analytics in a company allowed for greater business gains. Therefore, it was essential to understand the budget amount that was invested in Big Data initiatives.

A majority of respondents, about 47 percent, indicated that their company had a Big Data initiative budget of £1 million to GBP 10 million. Another 40 percent of respondents have indicated that their company spent £100,000 to GBP 1 million on their Big Data systems.

The amount of staff dedicated to Big Data analytics is also thought to play a part in advancing the goals that may be set for an automobile company regarding Big Data. The figure below takes two questions;

Questions 7- Approximately how many staff in your company are dedicated to analytics, modelling, data mining (not including routine reporting)? Question 8- Of these staff, are you mostly working in or for your consumer-facing (B2C) businesses, your commercial or wholesale (B2B) businesses, or both?

how-many-staff-in-your-company

Figure 5.3-4 Approximately how many staff in your company are dedicated to analytics, modeling, data mining (not including routine reporting)? Of these staff, are you mostly working in or for your consumer-facing (B2C) businesses, your commercial or wholesale (B2B) businesses, or both?

Based on the illustration, nineteen (19) respondents indicated that 501-1000 employees are dedicated to B2B and B2C analytics. Using Big Data analytics for both B2B and B2C comprises the most agreement of respondents, with 72 of 132 indicated so.

Big-Data-initiatives

Figure 5.3-5 How does the company plan to measure the success of your Big Data initiatives?

The figure above represents the respondent’s answers to their automobile company’s plan for measuring Big Data’s success. Of the 132 participants, 44.70 percent responded that the company is planning on using quantitative metrics associated with business performance to analyse if Big Data is actually successful.

Another 30.30 percent indicated that their company was planning on using qualitative metrics tied to business performance. Using business performance to analyse Big Data’s success is coherent to the results of the literature review that indicated previous studies of doing such.

As an automobile company, they need to know the results of using Big Data analytics, and that is only by using business performance indicators regardless of being qualitative or quantitative.

investments-in-Big-Data

Figure 5.3-6 Has the company achieved measurable results from its investments in Big Data?

Fig. 5.3-6 portrays the response of participants regarding achieving measurable results from Big Data. According to 68.18percentt of the respondents,s the company they worked for did indeed show measurable results from their Big Data investments.

However, 31.82 percent indicated that there was indeed no measurable result in investing in Big Data. Based on these results and those presented in 5.3-2, the results support H5, which states that a company’s analytical abilities allow for measurable results.

Impact-of-Big-Data-on-Company

Figure 5.3-7: Impact of Big Data on Company

Fig. 5.3-7 presents the answers of respondents of the impact of Big Data on automobile companies. An estimate of 60% of participants indicated that Big Data initiatives had been started, and the company has benefited from a decrease in expenses.

This response, coupled with the responses seen in Fig. 5.3-3, supports hypothesis H2 that the greater company’s budget (>1million GBP) would decrease expenses.

Also, over 70% of respondents indicate that their companies had started and benefited from Big Data initiatives by monetising from the initiatives. These results, coupled with those presented in Fig. 5.3-3, supports H1, which suggested that larger investments (<1million GBP) would result in the company’s ability to monetise and generate new revenues.

Since-the-Big-Data-initiatives

Figure 5.3-8: Questions 11 & 12 11. Since the Big Data initiatives implemented, what tangible benefits have been achieved in the company? What are the tangible benefits the company is aiming to achieve using Big Data initiatives?

Fig. 5.3-8 presents the actual and projected benefits of Big Data initiatives. Over 60% of respondents indicated that their automobile company had witnessed actual benefits in increasing sales and product innovations since their Big Data initiatives.

Other benefits that overcame project benefits include improved customer experience, higher quality products/services, efficient operations, and improved decision making. Coupled with the results from 5.3-3, the data support hypotheses H6 and H8.

What-business-functions-in-the-company-are

Figure 5.3-9: What business functions in the company are fueling Big Data initiatives?

Fig. 5.3-9 presents the results of question 13 in section C of the questionnaire. Respondents were asked which business function may be fueling the drive for Big Data initiatives.

The sample, 34.09%, indicated that operations were the main business function fueling Big Data in the company. After operations, the second-highest function is customer service, with 18.94% of respondents indicating this.

The business function thought to be the least influential in driving Big Data in automobile companies was Information Technology, with 9.09% indicating it.

The results of this partially support H7 because according to question 15 section C, 25% of respondents indicated that in the next five years, Big Data would impact and fundamentally change the way business is done in the organisation as opposed to 15.91% of respondents that indicated it would change the way the business will organise operations.

Based on the study results, hypotheses H1, H2, H5, H6, H8, and part of H7 have been supported. This leads to the conclusion that Big Data initiatives in automobile companies have had a significant impact on the company’s operations.

The companies have significantly benefited from increased sales, greater product innovations, improved customer care, and efficient decision-making. Greater investment of more than 1 billion GBP has led to better results obtained from Big Data initiatives.

If you need assistance with writing your dissertation, our professional dissertation writers are here to help!

Chapter 6: conclusion and discussion, research overview.

The current study aimed to analyze the impact that Big Data initiatives had on automobile companies in the UK, especially its operations. The current study was developed using a quantitative approach, which meant using philosophical assumptions from the positivist school of thought and producing a methodology that would follow deductive reasoning.

Under these assumptions, the quantitative approach was selected and used the survey instrument to gather data. This data was then analyzed using descriptive statistics to examine the results and link it to a set of proposed hypotheses.

The results presented conclude that investing more than 1 billion GBP on Big Data initiatives would provide greater tangible benefits for a business and positively impact the company.

The results also found that companies with greater analytical abilities on the adequate and above adequate range could see measurable results. In the end, Big Data did have a positive and large impact on the operations business function of automobile companies.

Meeting the Aim and Objectives of the Project

The research’s main aim was to investigate the impact of Big Data on the automobile industry, specifically in the UK, operations such as sales, customer retention, the manufacturing process, performance, marketing, logistics, and supply chain management.

The current study was able to accomplish this using the objectives. The study’s aims and objectives were supported by the revelation that the following hypotheses are supported by the results and analysis in Chapter 5.

H1- The greater the company’s budget for Big Data initiatives (More than 1 million GBP), the greater its ability to monetize and generate new revenues.

H2- The greater the company’s budget for Big Data initiatives (More than 1 million GBP), the greater decrease in expenses is found.

H5- The analytical abilities of a company allow for achieved measurable results.

H6- Investing in Big Data will lead to highly successful business results.

H7- A business’s operations function is fueling Big Data initiatives.

H8- The implementation of Big Data in the company has positive impacts on business.

Statement of Contributions and Research Novelty

Based on the literature review conducted in chapter 2, there is little to no academic research on Big Data’s impact on automobile companies. Due to this significant gap in research, the current study can contribute to literature using the insight provided by this study’s results.

The study analyzed how executives in automobile companies in the UK perceive the contributions made by Big Data in their companies. This insight can then be used to attract other researchers to study the phenomena. Big Data and its emergence in the current markets is fairly new, making the idea behind the current a novel idea.

Research Limitations

The research was severely limited due to the number of respondents being a lot less than those proposed; 300 respondents were needed; however, only 132 had completed the survey.

This may be because the survey was distributed online. This makes it difficult to tell how many people had seen the survey link but had not participated. The idea that the survey may have been too long, making respondents weary of answering the questions due to the great length of time it took to answer.

Due to the sample constraint, the results obtained from the current study cannot be generalized to the population sampled. It is recommended that other forms of distributing surveys be used to garner the maximum number of respondents.

There is also the inability of automobile companies to speak to researchers on the phone, which led to the drop in using interviews in the study. With interviews, a greater deal of insight can be brought to the results obtained from the survey. Complementing these would have made the results of the study more accurate and reliable.

Recommendations for Future Research

It is recommended that future studies take into account the loopholes of the current study. From the literature review, very little literature is available on the impact of Big Data on automobile companies.

Due to this lack, future researchers are encouraged to research this industry because drastic changes may result in increased use of Big Data. Future researchers are recommended to use a mixed-methods approach to obtaining and analyzing data.

With a mixed-methods approach, qualitative and quantitative data can complement each other to make assumptions stronger and test hypotheses in a highly effective manner.

Abusharekh, A., Stewart, S. A., Hashemian, N., Abidi, S. S. R. 2015. H-Drive: A big health data analytics platform for evidence-informed decision making. IEEE International Congress on Big Data, p. 416- 432.

Aihara, K., Imura, H., Takasu, A., Tanaka, Y., Adachi, J. 2014. Crowdsourced mobile sensing in smarter city life. IEEE 7th International Conference on Service-Oriented Computing and Applications, p. 334- 337.

Amelia, A., and Saptawati, G. A. P. 2014. Detection of potential traffic jams based on traffic characteristic data analysis. IEEE, p. 1- 5.

Bryman, A., Bell, E., 2015. Business Research Methods. Oxford University Press.

Cook, K., Grinstein, G., Whiting, M., Cooper, M., Having, P., Ligget, K., Nebesh, B., and Paul, C. L. 2012. VAST Challenge 2012: Visual Analytics for Big Data. IEEE Symposium on Visual Analytics Science and Technology 2012, p. 251- 257. Seattle, WA: Print.

Daum, P., 2013. International Synergy Management: A Strategic Approach for Raising Efficiencies in the Cross-border Interaction Process. Anchor Academic Publishing (aap_verlag).

Dümke, R., 2002. Corporate Reputation and its Importance for Business Success: A European Perspective and its Implication for Public Relations Consultancies. diplom.de.

Foy, H. 2014. UK’s resurgent car industry still faces challenges. Financial Times.

Guetterman, T.C., 2015. Descriptions of Sampling Practices Within Five Approaches to Qualitative Research in Education and the Health Sciences. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research 16.

Haq, M., 2014. A Comparative Analysis of Qualitative and Quantitative Research Methods and a Justification for Adopting Mixed Methods in Social Research (PDF Download Available). ResearchGate 1–22. doi:http://dx.doi.org/10.13140/RG.2.1.1945.8640

Itoh, M., Yokoyama, D., Toyoda, M., and Tomita, Y. 2014. Visual fusion of Mega-City Big Data: An application to traffic and tweets data analysis of Metro passengers. IEEE International Conference on Big Data, p. 431-442.

Kerr, K., Hausman, B. L., Gad, S., and Javen, W. 2013. Visualization and rhetoric: key concerns for utilizing bid data in humanities research- A case study of vaccination discourse 1918-1919. IEEE International Conference on Big Data, p.25- 32.

Kelley, K., Clark, B., Brown, V., Sitzia, J., 2003. Good practice in the conduct and reporting of survey research. Int J Qual Health Care 15, 261–266. doi:10.1093/intqhc/mzg031.

Lee, J., Noh, G., Kim, C. K. 2014. Analysis & visualization of movie’s popularity and reviews. IEEE International Conference on Big Data, pp. 189-190.

Lewis, S., 2015. Qualitative Inquiry and Research Design: Choosing Among Five Approaches. Health Promotion Practice 16, 473–475. doi:10.1177/1524839915580941

Liu, D., Kitamura, Y., Zeng, X. 2015. Analysis and visualization of traffic conditions of the road network by route bus probe data. IEEE International Conference on Multimedia Big Data, p. 248-251.

Lorenzo, G. D., Sbodio, M. L., Calabrese, F., Berlongerio, M., Nair, R., and Pinelli, F. 2014. All aboard: Visual exploration of cellphone mobility data to optimize public transport. IUI Haifa, Israel, p. 335- 340. Print.

Monaghan, A. 2016. UK car manufacturing hits high but industry warns of Brexit effect. The Guardian. [online] < https://www.theguardian.com/business/2016/jul/28/uk-car-manufacturing-hits-high-industry-warns-brexit-effect >. [Accessed : 2017 March 2].

Pu, J., Liu, S., Qu, H., Ni, L. 2013. T-watcher: A new visual analytic system for effective traffic surveillance. IEEE 14th International Conference on Mobile Data Management, p. 127- 136.

Rysavy, S. J., Bromley, D., and Daggett, V. 2014. DIVE: A graph-based visual analytics framework for Big Data. Visual Analytics for Biological Data, IEEE Computer Graphics and Applications, p. 26-37.

Saunders, M., 2003. Research Methods for Business Students. Pearson Education India.

Saunders, M.N.K., Tosey, P., 2015. Handbook of Research Methods on Human Resource Development. Edward Elgar Publishing.

Shah, A. H., Gopalakrishnan, G., Rajendran, A., and Liebel, U. 2014. Data mining and sharing tool for high content screening large scale biological image data. IEEE International Conference on Big Data, p. 1068- 1076.

Steiger, E., Ellersiek, T., and Sipf, A. 2014. Explorative public transport flow analysis from uncertain social media data. SIGSPATAL, p. 1-7. Print.

Walker, R. 2015. From Big Data to Big Profits (1st ed.). Print.

Wallner, G., and Kriglstein, S. 2013. Visualization-based analysis of gameplay data- A review of the literature. Entertainment Computing, 4, pp. 143-155.

Wozniak, P., Valton, R., Fjeld, M. 2015. Volvo single view of vehicle: Building a Big Data service from scratch in the automotive industry. CHI: Crossings, Seoul, Korea, p. 671-678.

Xiao, S., Liu, C. X., Wang, Y. 2015. Data driven geospatial-enabled transportation platform for freeway performance. IEEE Intelligent Transportation Systems Magazine, p. 10-21.

Zhang, Z., Wang, S., Cao, G., Padmanabhan, A., and Wu, K. 2014. A scalable approach to extracting mobility patterns from social media data. National Science Foundation, p. 1-6. Print.

9 Appendix A- Survey

Please contact us to get access to the full Appendix Survey.

10 Appendix B- Raw Data

11 appendix c- responses job title x organization size.

APPENDIX-C-Responses

12 Appendix D- Responses Question 2, 5, & 6

APPENDIX-D-Responses

Frequently Asked Questions

How much time it takes to write a masters level full dissertation.

The time required to write a master’s level full dissertation varies, but it typically takes 6-12 months, depending on research complexity and individual pace.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Deepening big data sustainable value creation: insights using IPMA, NCA, and cIPMA

  • Original Article
  • Published: 25 May 2024

Cite this article

dissertation big data analysis

  • Randy Riggs 1 ,
  • Carmen M. Felipe 2 ,
  • José L. Roldán   ORCID: orcid.org/0000-0003-4053-7526 2 &
  • Juan C. Real 3  

3 Altmetric

The impact of big data analytics capabilities (BDACs) on firms’ sustainable performance (SP) is exerted through a set of underlying mechanisms that operate as a “black box.” Previous research, from the perspective of IT-enabled capabilities, demonstrated that a serial mediation of supply chain management capabilities (SCMCs) and circular economy practices (CEPs) is required to improve SP from BDACs. However, further insight regarding the role of BDACs in the processes of SP creation can be provided by deploying complementary analytics techniques, namely importance-performance map analysis (IPMA), necessary condition analysis (NCA), and combined importance-performance map analysis (cIPMA). This paper applies these techniques to a sample of 210 Spanish companies with the potential for circularity and environmental impact. The results show that BDACs exert a positive total effect toward achieving SP. However, companies still have the potential to improve and benefit from these capabilities. In addition, BDACs are a necessary condition (must-have factor) for all dependent variables in the model, including SP. In this case, high levels of BDACs are required to achieve excellence in SP, justifying organizational initiatives that prioritize the improvement of BDACs to achieve SP goals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

dissertation big data analysis

Abrantes, B.F., and K.G. Ostergaard. 2022. Digital footprint wrangling: Are analytics used for better or worse? A concurrent mixed methods research on the commercial (ab)use of dataveillance. Journal of Marketing Analytics 10 (3): 187–206. https://doi.org/10.1057/s41270-021-00144-5 .

Article   Google Scholar  

Akter, S., S. Fosso Wamba, A. Gunasekaran, R. Dubey, and S.J. Childe. 2016. How to improve firm performance using big data analytics capability and business strategy alignment? International Journal of Production Economics 182: 113–131. https://doi.org/10.1016/j.ijpe.2016.08.018 .

Andersson, S., G. Svensson, F.J. Molina-Castillo, C. Otero-Neira, J. Lindgren, N.P.E. Karlsson, and H. Laurell. 2022. Sustainable development—Direct and indirect effects between economic, social, and environmental dimensions in business practices. Corporate Social Responsibility and Environmental Management  29 (5): 1158–1172. https://doi.org/10.1002/csr.2261 .

Bag, S., and J.H.C. Pretorius. 2022. Relationships between industry 4.0, sustainable manufacturing and circular economy: Proposal of a research framework. International Journal of Organizational Analysis 30 (4): 864–898. https://doi.org/10.1108/IJOA-04-2020-2120 .

Bag, S., J.H.C. Pretorius, S. Gupta, and Y.K. Dwivedi. 2021. Role of institutional pressures and resources in the adoption of big data analytics powered artificial intelligence, sustainable manufacturing practices and circular economy capabilities. Technological Forecasting and Social Change 163: 120420. https://doi.org/10.1016/j.techfore.2020.120420 .

Baig, M.I., E. Yadegaridehkordi, Md. Nizam Bin, and M.H. Nasir. 2023. Influence of big data adoption on sustainable marketing and operation of SMEs: A hybrid approach of SEM-ANN. Management Decision 61 (7): 2231–2253. https://doi.org/10.1108/MD-06-2022-0778 .

Behl, A., J. Gaur, V. Pereira, R. Yadav, and B. Laker. 2022. Role of big data analytics capabilities to improve sustainable competitive advantage of MSME service firms during COVID-19—A multi-theoretical approach. Journal of Business Research 148: 378–389. https://doi.org/10.1016/j.jbusres.2022.05.009 .

Benitez, J., J. Llorens, and J. Braojos. 2018a. How information technology influences opportunity exploration and exploitation firm’s capabilities. Information and Management 55 (4): 508–523. https://doi.org/10.1016/j.im.2018.03.001 .

Benitez, J., G. Ray, and J. Henseler. 2018b. Impact of information technology infrastructure flexibility on mergers and acquisitions. MIS Quarterly 42 (1): 25–43. https://doi.org/10.25300/MISQ/2018/13245 .

Berrone, P., H.E. Rousseau, J.E. Ricart, E. Brito, and A. Giuliodori. 2023. How can research contribute to the implementation of sustainable development goals? An interpretive review of SDG literature in management. International Journal of Management Reviews  25 (2): 318–339. https://doi.org/10.1111/ijmr.12331 .

Bharadwaj, A.S. 2000. A resource-based perspective on information technology capability and firm performance: An empirical investigation. MIS Quarterly 24 (1): 169–196. https://doi.org/10.2307/3250983 .

Bokrantz, J., and J. Dul. 2023. Building and testing necessity theories in supply chain management. Journal of Supply Chain Management 59 (1): 48–65. https://doi.org/10.1111/jscm.12287 .

van Buren, N., M. Demmers, R. van der Heijden, and F. Witlox. 2016. Towards a circular economy: The role of Dutch logistics industries and governments. Sustainability (Switzerland) 8 (7): 1–17. https://doi.org/10.3390/su8070647 .

Çankaya, S.Y., and B. Sezen. 2019. Effects of green supply chain management practices on sustainability performance. Journal of Manufacturing Technology Management 30 (1): 98–121. https://doi.org/10.1108/JMTM-03-2018-0099 .

Cetindamar, D., B. Shdifat, and E. Erfani. 2022. Understanding big data analytics capability and sustainable supply chains. Information Systems Management 39 (1): 19–33. https://doi.org/10.1080/10580530.2021.1900464 .

Chen, D.Q., D.S. Preston, and M. Swink. 2015. How the use of big data analytics affects value creation in supply chain management. Journal of Management Information Systems 32 (4): 4–39. https://doi.org/10.1080/07421222.2015.1138364 .

Cheng, T.C.E., S.S. Kamble, A. Belhadi, N.O. Ndubisi, K. Lai, and hung and Kharat, M.G. 2021. Linkages between big data analytics, circular economy, sustainable supply chain flexibility, and sustainable performance in manufacturing firms. International Journal of Production Research  60 (22): 6908–6922. https://doi.org/10.1080/00207543.2021.1906971 .

Chiappetta Jabbour, C.J., A.B. Jabbour Lopes de Sousa, J. Sarkis, and M.G. Filho. 2019. Unlocking the circular economy through new business models based on large-scale data: An integrative framework and research agenda. Technological Forecasting and Social Change 144: 546–552. https://doi.org/10.1016/j.techfore.2017.09.010 .

Ciavolino, E., M. Aria, J.H. Cheah, and J.L. Roldán. 2022. A tale of PLS Structural Equation Modelling: Episode I—A Bibliometrix Citation Analysis. Social Indicators Research 164 (3): 1323–1348. https://doi.org/10.1007/s11205-022-02994-7 .

Dam, N.A.K., T. Le Dinh, and W. Menvielle. 2019. A systematic literature review of big data adoption in internationalization. Journal of Marketing Analytics 7 (3): 182–195. https://doi.org/10.1057/s41270-019-00054-7 .

Damberg, S., Y. Liu, and C.M. Ringle. 2024. Does culture matter? Corporate reputation and sustainable satisfaction in the Chinese and German banking sector. Journal of Marketing Analytics 12 (1): 6–24. https://doi.org/10.1057/s41270-023-00259-x .

Das, D. 2018. The impact of Sustainable Supply Chain Management practices on firm performance: Lessons from Indian organizations. Journal of Cleaner Production 203: 179–196. https://doi.org/10.1016/j.jclepro.2018.08.250 .

de Souza, M., G.M. Pereira, A.B. Lopes de Sousa Jabbour, C.J. Chiappetta Jabbour, L.R. Trento, M. Borchardt, and L. Zvirtes. 2021. A digitally enabled circular economy for mitigating food waste: Understanding innovative marketing strategies in the context of an emerging economy. Technological Forecasting and Social Change  173: 121062. https://doi.org/10.1016/j.techfore.2021.121062 .

Dubey, R., A. Gunasekaran, S.J. Childe, T. Papadopoulos, Z. Luo, S. Fosso Wamba, and D. Roubaud. 2019. Can big data and predictive analytics improve social and environmental sustainability? Technological Forecasting and Social Change, North-Holland 144: 534–545. https://doi.org/10.1016/J.TECHFORE.2017.06.020 .

Dul, J. 2016. Necessary condition analysis (NCA). Organizational Research Methods 19 (1): 10–52. https://doi.org/10.1177/1094428115584005 .

Dul, J. 2020. Conducting Necessary Condition Analysis for Business and Management Students , edited by Stitt, R., SAGE Publications Ltd, London.

Dul, J. 2023a. Necessary condition analysis (NCA) with R (Version 3.3.3) .

Dul, J. 2023b. Advances in necessary condition analysis V1.4. Bookdown.Org , available at: https://bookdown.org/ncabook/advanced_nca2/ . Accessed 15 Oct 2023.

Dul, J. 2024. A different causal perspective with Necessary Condition Analysis. Journal of Business Research 177: 114618. https://doi.org/10.1016/j.jbusres.2024.114618 .

Dul, J., S. Hauff, and R.B. Bouncken. 2023. Necessary condition analysis (NCA): Review of research topics and guidelines for good practice. Review of Managerial Science 17 (2): 683–714. https://doi.org/10.1007/s11846-023-00628-x .

El-Haddadeh, R., M. Osmani, N. Hindi, and A. Fadlalla. 2021. Value creation for realising the sustainable development goals: Fostering organisational adoption of big data analytics. Journal of Business Research 131: 402–410. https://doi.org/10.1016/j.jbusres.2020.10.066 .

Elia, G., E. Raguseo, G. Solazzo, and F. Pigni. 2022. Strategic business value from big data analytics: An empirical analysis of the mediating effects of value creation mechanisms. Information and Management  59 (8): 103701. https://doi.org/10.1016/j.im.2022.103701 .

Elkington, J. 1997. Cannibals with forks: The triple bottom line of 21st century business, conscientious commerce . New York: New Society Publishers.

Google Scholar  

Ellen MacArthur Foundation. 2013. Towards the Circular Economy .

European Commission Directorate-General for Communication. 2020. Circular Economy Action Plan: For a Cleaner and More Competitive Europe , European Commission , Publications Office of the European Union. https://doi.org/10.2779/05068 .

Felipe, C.M., D.E. Leidner, J.L. Roldán, and A.L. Leal-Rodríguez. 2020. Impact of IS capabilities on firm performance: The roles of organizational agility and industry technology intensity. Decision Sciences 51 (3): 575–619. https://doi.org/10.1111/deci.12379 .

Ferraris, A., A. Mazzoleni, A. Devalle, and J. Couturier. 2019. Big data analytics capabilities and knowledge management: Impact on firm performance. Management Decision, Emerald Group Holdings Ltd. 57 (8): 1923–1936. https://doi.org/10.1108/MD-07-2018-0825 .

Fosso Wamba, S., R. Dubey, A. Gunasekaran, and S. Akter. 2020. The performance effects of big data analytics and supply chain ambidexterity: The moderating effect of environmental dynamism. International Journal of Production Economics 222: 107498. https://doi.org/10.1016/J.IJPE.2019.09.019 .

Garmaki, M., Boughzala, I., Fosso Wamba, S., and Fosso, S. 2016. The effect of big data analytics capability on firm performance. PACIS 2016 Proceedings, 301. http://aisel.aisnet.org/pacis2016/301

Ghasemaghaei, M. 2020. Improving organizational performance through the use of big data. Journal of Computer Information Systems 60 (5): 395–408. https://doi.org/10.1080/08874417.2018.1496805 .

Govindan, K., and M. Hasanagic. 2018. A systematic review on drivers, barriers, and practices towards circular economy: A supply chain perspective. International Journal of Production Research 56 (1–2): 278–311. https://doi.org/10.1080/00207543.2017.1402141 .

Gupta, S., H. Chen, B.T. Hazen, S. Kaur, and E.D.R. Santibañez Gonzalez. 2019. Circular economy and big data analytics: A stakeholder perspective. Technological Forecasting and Social Change 144: 466–474. https://doi.org/10.1016/j.techfore.2018.06.030 .

Hassani, H., X. Huang, X. Macfeely, and M.R. Entezrian. 2021. Big data and the United Nations sustainable development goals (UN SDGs) at a glance. Big Data and Cognitive Computing  5 (3): 28. https://doi.org/10.3390/bdcc5030028 .

Hauff, S., Richter, N. F., Sarstedt, M., and Ringle, C. M. 2024. Importance and performance in PLS-SEM and NCA: Introducing the combined importance-performance map analysis (cIPMA). Journal of Retailing and Consumer Services , 78, 103723. https://doi.org/10.1016/j.jretconser.2024.103723

Hayajneh, J.A.M., M.B.H. Elayan, M.A.M. Abdellatif, and A.M. Abubakar. 2022. Impact of business analytics and π-shaped skills on innovative performance: Findings from PLS-SEM and fsQCA. Technology in Society 68: 101914. https://doi.org/10.1016/j.techsoc.2022.101914 .

Henseler, J. 2021. Composite-Based Structural Equation Modeling: Analyzing Latent and Emergent Variables . New York: The Guilford Press.

Jha, A.K., M.A.N. Agi, and E.W.T. Ngai. 2020. A note on big data analytics capability development in supply chain. Decision Support Systems  138: 113382. https://doi.org/10.1016/j.dss.2020.113382 .

Jiménez, E., M. de la Cuesta-González, and M. Boronat-Navarro. 2021. How small and medium-sized enterprises can uptake the sustainable development goals through a cluster management organization: A case study. Sustainability (Switzerland) . https://doi.org/10.3390/su13115939 .

Kalubanga, M., and S. Gudergan. 2022. The impact of dynamic capabilities in disrupted supply chains—The role of turbulence and dependence. Industrial Marketing Management 103: 154–169. https://doi.org/10.1016/j.indmarman.2022.03.005 .

Kamble, S.S., A. Belhadi, A. Gunasekaran, L. Ganapathy, and S. Verma. 2021. A large multi-group decision-making technique for prioritizing the big data-driven circular economy practices in the automobile component manufacturing industry. Technological Forecasting and Social Change 165: 120567. https://doi.org/10.1016/J.TECHFORE.2020.120567 .

Khan, S.A.R., A.S.A. Shah, Z. Yu, and M. Tanveer. 2022. A systematic literature review on circular economy practices: Challenges, opportunities and future trends. Journal of Entrepreneurship in Emerging Economies 15 (4): 754–795. https://doi.org/10.1108/JEEE-09-2021-0349 .

Kirchherr, J., D. Reike, and M. Hekkert. 2017. Conceptualizing the circular economy: An analysis of 114 definitions. Resources, Conservation and Recycling 127: 221–232. https://doi.org/10.1016/j.resconrec.2017.09.005 .

Kiron, D., P.K. Prentice, and R.B. Ferguson. 2014. The analytics mandate. MIT Sloan Management Review 55 (4): 1-25.

Korhonen, J., C. Nuur, A. Feldmann, S.E. Birkie, A. Honkasalo, and J. Seppälä. 2018. Circular economy: The concept and its limitations. Ecological Economics 143: 37–46. https://doi.org/10.1016/j.ecolecon.2017.06.041 .

Kristoffersen, E., P. Mikalef, F. Blomsma, and J. Li. 2021. Towards a business analytics capability for the circular economy. Technological Forecasting and Social Change  171: 120957. https://doi.org/10.1016/j.techfore.2021.120957 .

Le, T.T., A. Ferraris, and B.K. Dhar. 2023. The contribution of circular economy practices on the resilience of production systems: Eco-innovation and cleaner production’s mediation role for sustainable development. Journal of Cleaner Production . https://doi.org/10.1016/j.jclepro.2023.138806 .

Lee, I., and G. Mangalaraj. 2022. Big data analytics in supply chain management: A systematic literature review and research directions. Big Data and Cognitive Computing  6 (1): 17. https://doi.org/10.3390/bdcc6010017 .

Li, L., J. Lin, W. Luo, and X. Luo. 2023. Investigating the effect of artificial intelligence on customer relationship management performance in e-commerce enterprises. Journal of Electronic Commerce Research 24 (1): 68–83.

Liu, Y., Y. Lee, and A.N.K. Chen. 2020. How IT wisdom affects firm performance: An empirical investigation of 15-year US panel data. Decision Support Systems  133: 113300. https://doi.org/10.1016/j.dss.2020.113300 .

Magnano, D. G., Grimstad, S. M. F., Glavee-Geo, R., and Anwar, F. 2024. Disentangling circular economy practices and firm’s sustainability performance: A systematic literature review of past achievements and future promises. Journal of Environmental Management 353, 120138. https://doi.org/10.1016/j.jenvman.2024.120138

Majhi, S.G., A. Anand, A. Mukherjee, and N.P. Rana. 2022. The optimal configuration of IT-enabled dynamic capabilities in a firm’s capabilities portfolio: A strategic alignment perspective. Information Systems Frontiers 24: 1435–1450. https://doi.org/10.1007/s10796-021-10145-5/Published .

Mandal, S. 2019. The influence of big data analytics management capabilities on supply chain preparedness, alertness and agility: An empirical investigation. Information Technology and People 32 (2): 297–318. https://doi.org/10.1108/ITP-11-2017-0386 .

Manikas, I., B. Sundarakani, and M. Shehabeldin. 2023. Big data utilisation and its effect on supply chain resilience in Emirati companies. International Journal of Logistics Research and Applications 26 (10): 1334–1358. https://doi.org/10.1080/13675567.2022.2052825 .

McAfee, A., and E. Brynjolfsson. 2012. Big data: The management revolution. Harvard Business Review 90 (10): 60–68.

Mikalef, P., I.O. Pappas, J. Krogstie, and M. Giannakos. 2018. Big data analytics capabilities: A systematic literature review and research agenda. Information Systems and E-Business Management 16 (3): 547–578. https://doi.org/10.1007/s10257-017-0362-y .

Mikalef, P., and A. Pateli. 2017. Information technology-enabled dynamic capabilities and their indirect effect on competitive performance: Findings from PLS-SEM and fsQCA. Journal of Business Research 70: 1–16. https://doi.org/10.1016/j.jbusres.2016.09.004 .

Mikalef, P., A. Pateli, and R. van de Wetering. 2021. IT architecture flexibility and IT governance decentralisation as drivers of IT-enabled dynamic capabilities and competitive performance: The moderating effect of the external environment. European Journal of Information Systems 30 (5): 512–540. https://doi.org/10.1080/0960085X.2020.1808541 .

Min, S., Zacharia, Z. G., and Smith, C. D. 2019. Defining Supply Chain Management: In the Past, Present, and Future. Journal of Business Logistics , 40(1), 44–55. https://doi.org/10.1111/jbl.12201

Moryadee, C., and K. Jitt-Aer. 2020. Exploring the nexus between the absorptive capacity, corporate sustainability, supply chain agility and manufacturing firm performance. International Journal of Supply Chain Management 9 (2): 360–367. https://doi.org/10.59160/ijscm.v9i2.4611 .

Nitzl, C., and W.W. Chin. 2017. The case of partial least squares (PLS) path modeling in managerial accounting research. Journal of Management Control 28 (2): 137–156. https://doi.org/10.1007/s00187-017-0249-6 .

Novikov, S.V. 2020. Data science and big data technologies role in the digital economy. TEM Journal 9 (2): 756–762. https://doi.org/10.18421/TEM92-44 .

Nursimloo, S., D. Ramdhony, and O. Mooneeapen. 2020. Influence of board characteristics on TBL reporting. Corporate Governance 20 (5): 765–780. https://doi.org/10.1108/CG-06-2019-0187 .

Olabode, O.E., N. Boso, M. Hultman, and C.N. Leonidou. 2022. Big data analytics capability and market performance: The roles of disruptive business models and competitive intensity. Journal of Business Research 139: 1218–1230. https://doi.org/10.1016/j.jbusres.2021.10.042 .

Pappas, I.O., P. Mikalef, Y.K. Dwivedi, L. Jaccheri, and J. Krogstie. 2023. Responsible digital transformation for a sustainable society. Information Systems Frontiers  25 (3): 945–953. https://doi.org/10.1007/s10796-023-10406-5 .

Paulraj, A. 2011. Understanding the relationships between internal resources and capabilities, sustainable supply management and organizational sustainability. Journal of Supply Chain Management 47 (1): 19–37. https://doi.org/10.1111/j.1745-493X.2010.03212.x .

Persaud, A., and J. Zare. 2023. Beyond technological capabilities: The mediating effects of analytics culture and absorptive capacity on big data analytics value creation in small- and medium-sized enterprises. IEEE Transactions on Engineering Management  71: 7147–7159. https://doi.org/10.1109/TEM.2023.3249415 .

Petrescu, M., and A.S. Krishen. 2023. A decade of marketing analytics and more to come: JMA insights. Journal of Marketing Analytics 11 (2): 117–129. https://doi.org/10.1057/s41270-023-00226-6 .

Phoon, K.K., J. Ching, and Z. Cao. 2022. Unpacking data-centric geotechnics. Underground Space (China) 7 (6): 967–989. https://doi.org/10.1016/j.undsp.2022.04.001 .

Pieroni, M.P.P., T.C. McAloone, and D.C.A. Pigosso. 2019. Business model innovation for circular economy and sustainability: A review of approaches. Journal of Cleaner Production 215: 198–216. https://doi.org/10.1016/j.jclepro.2019.01.036 .

Piprani, A.Z., S.A.R. Khan, R. Salim, and M. Khalilur Rahman. 2023. Unlocking sustainable supply chain performance through dynamic data analytics: A multiple mediation model of sustainable innovation and supply chain resilience. Environmental Science and Pollution Research 30 (39): 90615–90638. https://doi.org/10.1007/s11356-023-28507-8 .

Pratt, J.A., L. Chen, H.F. Kishel, and A.Y. Nahm. 2023. Information systems and operations/supply chain management: A systematic literature review. Journal of Computer Information Systems 63 (2): 334–350. https://doi.org/10.1080/08874417.2022.2065649 .

Purvis, B., Y. Mao, and D. Robinson. 2019. The concept of sustainable economic development. Environmental Conservation 14 (2): 101–110. https://doi.org/10.1017/S0376892900011449 .

Raut, R.D., S.K. Mangla, V.S. Narwane, M. Dora, and M. Liu. 2021. Big data analytics as a mediator in lean, agile, resilient, and green (LARG) practices effects on sustainable supply chains. Transportation Research Part E: Logistics and Transportation Review 145: 102170. https://doi.org/10.1016/j.tre.2020.102170 .

Richter, N.F., and S. Hauff. 2022. Necessary conditions in international business research–Advancing the field with a new perspective on causality and data analysis. Journal of World Business 57 (5): 101310. https://doi.org/10.1016/j.jwb.2022.101310 .

Richter, N.F., S. Schubring, S. Hauff, C.M. Ringle, and M. Sarstedt. 2020. When predictors of outcomes are necessary: Guidelines for the combined use of PLS-SEM and NCA. Industrial Management & Data Systems, Emerald Group Holdings Ltd. 120 (12): 2243–2267. https://doi.org/10.1108/IMDS-11-2019-0638 .

Riggs, R.L., J.L. Roldán, J.C. Real, and C.M. Felipe. 2023. Opening the black box of big data sustainable value creation: The mediating role of supply chain management capabilities and circular economy practices. International Journal of Physical Distribution and Logistics Management  53 (7/8): 762–788. https://doi.org/10.1108/IJPDLM-03-2022-0098 .

Ringle, Christian M., S. Wende, J.-M. and Becker. 2022. SmartPLS 4 . Oststeinbek: SmartPLS GmbH.

Ringle, C.M., and M. Sarstedt. 2016. Gain more insight from your PLS-SEM results the importance-performance map analysis. Industrial Management and Data Systems 116 (9): 1865–1886. https://doi.org/10.1108/IMDS-10-2015-0449 .

Rolf, B., I. Jackson, M. Müller, S. Lang, T. Reggelin, and D. Ivanov. 2023. A review on reinforcement learning algorithms and applications in supply chain management. International Journal of Production Research 61 (20): 7151–7179. https://doi.org/10.1080/00207543.2022.2140221 .

Sáenz, J., A. Ortiz de Guinea, and C. Peñalba-Aguirrezabalaga. 2022. Value creation through marketing data analytics: The distinct contribution of data analytics assets and capabilities to unit and firm performance. Information and Management  59 (8): 103724. https://doi.org/10.1016/j.im.2022.103724 .

Sahoo, S., A. Upadhyay, and A. Kumar. 2023. Circular economy practices and environmental performance: Analysing the role of big data analytics capability and responsible research and innovation. Business Strategy and the Environment 32 (8): 6029–6036. https://doi.org/10.1002/bse.3471 .

Schroeder, P., K. Anggraeni, and U. Weber. 2019. The relevance of circular economy practices to the sustainable development goals. Journal of Industrial Ecology 23 (1): 77–95. https://doi.org/10.1111/jiec.12732 .

Sharma, V., R.D. Raut, M. Hajiaghaei-Keshteli, B.E. Narkhede, R. Gokhale, and P. Priyadarshinee. 2022. Mediating effect of industry 4.0 technologies on the supply chain management practices and supply chain performance. Journal of Environmental Management 322: 115945. https://doi.org/10.1016/j.jenvman.2022.115945 .

Stek, K., and H. Schiele. 2021. How to train supply managers – Necessary and sufficient purchasing skills leading to success. Journal of Purchasing and Supply Management  27 (4): 100700. https://doi.org/10.1016/j.pursup.2021.100700 .

Talib, S., A. Papastathopoulo, and S.Z. Ahmad. 2023. Sufficiency and necessity of big data capabilities for decision performance in the public sector. Digital Policy, Regulation and Governance 26 (1): 18–37. https://doi.org/10.1108/DPRG-05-2023-0057 .

Tan, K.H., Y.Z. Zhan, G. Ji, F. Ye, and C. Chang. 2015. Harvesting big data to enhance supply chain innovation capabilities: An analytic infrastructure based on deduction graph. International Journal of Production Economics 165: 223–233. https://doi.org/10.1016/j.ijpe.2014.12.034 .

Tipu, S.A.A., and K. Fantazy. 2023. Examining the relationships between big data analytics capability, entrepreneurial orientation and sustainable supply chain performance: Moderating role of trust. Benchmarking . https://doi.org/10.1108/BIJ-04-2023-0206 .

United Nations. 2015. Transforming Our World: The 2030 Agenda for Sustainable Development (A/RES/70/1) , United Nations .

Wu, F., S. Yeniyurt, D. Kim, and S.T. Cavusgil. 2006. The impact of information technology on supply chain capabilities and firm performance: A resource-based view. Industrial Marketing Management 35 (4): 493–504. https://doi.org/10.1016/j.indmarman.2005.05.003 .

Yadav, H., A.K. Kar, and S. Kashiramka. 2022. How does entrepreneurial orientation and SDG orientation of CEOs evolve before and during a pandemic. Journal of Enterprise Information Management 35 (1): 160–178. https://doi.org/10.1108/JEIM-03-2021-0149 .

Ylijoki, O., and J. Porras. 2019. A recipe for big data value creation. Business Process Management Journal 25 (5): 1085–1100. https://doi.org/10.1108/BPMJ-03-2018-0082 .

Yu, Z., S.A.R. Khan, and M. Umar. 2022. Circular economy practices and industry 4.0 technologies: A strategic move of automobile industry. Business Strategy and the Environment 31 (3): 796–809. https://doi.org/10.1002/bse.2918 .

Zhu, X., and Y. Yang. 2021. Big data analytics for improving financial performance and sustainability. Journal of Systems Science and Information 9 (2): 175–191. https://doi.org/10.21078/JSSI-2021-175-17 .

Download references

Acknowledgmen

The authors acknowledge the financial support provided by “Fondo Europeo de Desarrollo Regional (FEDER)” and the “Consejería de Economía, Conocimiento, Empresas y Universidad de la Junta de Andalucía,” within the “Programa Operativo FEDER 2014-2020”, Research Project US-1264451.

Author information

Authors and affiliations.

Doctoral Program in Strategic Management and International Business, Universidad de Sevilla, Seville, Spain

Randy Riggs

Department of Business Administration and Marketing, Universidad de Sevilla, Seville, Spain

Carmen M. Felipe & José L. Roldán

Department of Business Management and Marketing, Universidad Pablo de Olavide, Seville, Spain

Juan C. Real

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to José L. Roldán .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • HOC higher-order construct

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Riggs, R., Felipe, C.M., Roldán, J.L. et al. Deepening big data sustainable value creation: insights using IPMA, NCA, and cIPMA. J Market Anal (2024). https://doi.org/10.1057/s41270-024-00321-2

Download citation

Revised : 28 November 2023

Accepted : 15 April 2024

Published : 25 May 2024

DOI : https://doi.org/10.1057/s41270-024-00321-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Big data analytics capabilities
  • Circular economy practices
  • Supply chain management capabilities
  • Sustainable performance
  • IT-enabled capabilities perspective
  • Importance-performance map analysis (IPMA)
  • Necessary condition analysis (NCA)
  • Combined importance-performance map analysis (cIPMA)
  • Find a journal
  • Publish with us
  • Track your research
  • How It Works
  • PhD thesis writing
  • Master thesis writing
  • Bachelor thesis writing
  • Dissertation writing service
  • Dissertation abstract writing
  • Thesis proposal writing
  • Thesis editing service
  • Thesis proofreading service
  • Thesis formatting service
  • Coursework writing service
  • Research paper writing service
  • Architecture thesis writing
  • Computer science thesis writing
  • Engineering thesis writing
  • History thesis writing
  • MBA thesis writing
  • Nursing dissertation writing
  • Psychology dissertation writing
  • Sociology thesis writing
  • Statistics dissertation writing
  • Buy dissertation online
  • Write my dissertation
  • Cheap thesis
  • Cheap dissertation
  • Custom dissertation
  • Dissertation help
  • Pay for thesis
  • Pay for dissertation
  • Senior thesis
  • Write my thesis

214 Best Big Data Research Topics for Your Thesis Paper

big data research topics

Finding an ideal big data research topic can take you a long time. Big data, IoT, and robotics have evolved. The future generations will be immersed in major technologies that will make work easier. Work that was done by 10 people will now be done by one person or a machine. This is amazing because, in as much as there will be job loss, more jobs will be created. It is a win-win for everyone.

Big data is a major topic that is being embraced globally. Data science and analytics are helping institutions, governments, and the private sector. We will share with you the best big data research topics.

On top of that, we can offer you the best writing tips to ensure you prosper well in your academics. As students in the university, you need to do proper research to get top grades. Hence, you can consult us if in need of research paper writing services.

Big Data Analytics Research Topics for your Research Project

Are you looking for an ideal big data analytics research topic? Once you choose a topic, consult your professor to evaluate whether it is a great topic. This will help you to get good grades.

  • Which are the best tools and software for big data processing?
  • Evaluate the security issues that face big data.
  • An analysis of large-scale data for social networks globally.
  • The influence of big data storage systems.
  • The best platforms for big data computing.
  • The relation between business intelligence and big data analytics.
  • The importance of semantics and visualization of big data.
  • Analysis of big data technologies for businesses.
  • The common methods used for machine learning in big data.
  • The difference between self-turning and symmetrical spectral clustering.
  • The importance of information-based clustering.
  • Evaluate the hierarchical clustering and density-based clustering application.
  • How is data mining used to analyze transaction data?
  • The major importance of dependency modeling.
  • The influence of probabilistic classification in data mining.

Interesting Big Data Analytics Topics

Who said big data had to be boring? Here are some interesting big data analytics topics that you can try. They are based on how some phenomena are done to make the world a better place.

  • Discuss the privacy issues in big data.
  • Evaluate the storage systems of scalable in big data.
  • The best big data processing software and tools.
  • Data mining tools and techniques are popularly used.
  • Evaluate the scalable architectures for parallel data processing.
  • The major natural language processing methods.
  • Which are the best big data tools and deployment platforms?
  • The best algorithms for data visualization.
  • Analyze the anomaly detection in cloud servers
  • The scrutiny normally done for the recruitment of big data job profiles.
  • The malicious user detection in big data collection.
  • Learning long-term dependencies via the Fourier recurrent units.
  • Nomadic computing for big data analytics.
  • The elementary estimators for graphical models.
  • The memory-efficient kernel approximation.

Big Data Latest Research Topics

Do you know the latest research topics at the moment? These 15 topics will help you to dive into interesting research. You may even build on research done by other scholars.

  • Evaluate the data mining process.
  • The influence of the various dimension reduction methods and techniques.
  • The best data classification methods.
  • The simple linear regression modeling methods.
  • Evaluate the logistic regression modeling.
  • What are the commonly used theorems?
  • The influence of cluster analysis methods in big data.
  • The importance of smoothing methods analysis in big data.
  • How is fraud detection done through AI?
  • Analyze the use of GIS and spatial data.
  • How important is artificial intelligence in the modern world?
  • What is agile data science?
  • Analyze the behavioral analytics process.
  • Semantic analytics distribution.
  • How is domain knowledge important in data analysis?

Big Data Debate Topics

If you want to prosper in the field of big data, you need to try even hard topics. These big data debate topics are interesting and will help you to get a better understanding.

  • The difference between big data analytics and traditional data analytics methods.
  • Why do you think the organization should think beyond the Hadoop hype?
  • Does the size of the data matter more than how recent the data is?
  • Is it true that bigger data are not always better?
  • The debate of privacy and personalization in maintaining ethics in big data.
  • The relation between data science and privacy.
  • Do you think data science is a rebranding of statistics?
  • Who delivers better results between data scientists and domain experts?
  • According to your view, is data science dead?
  • Do you think analytics teams need to be centralized or decentralized?
  • The best methods to resource an analytics team.
  • The best business case for investing in analytics.
  • The societal implications of the use of predictive analytics within Education.
  • Is there a need for greater control to prevent experimentation on social media users without their consent?
  • How is the government using big data; for the improvement of public statistics or to control the population?

University Dissertation Topics on Big Data

Are you doing your Masters or Ph.D. and wondering the best dissertation topic or thesis to do? Why not try any of these? They are interesting and based on various phenomena. While doing the research ensure you relate the phenomenon with the current modern society.

  • The machine learning algorithms are used for fall recognition.
  • The divergence and convergence of the internet of things.
  • The reliable data movements using bandwidth provision strategies.
  • How is big data analytics using artificial neural networks in cloud gaming?
  • How is Twitter accounts classification done using network-based features?
  • How is online anomaly detection done in the cloud collaborative environment?
  • Evaluate the public transportation insights provided by big data.
  • Evaluate the paradigm for cancer patients using the nursing EHR to predict the outcome.
  • Discuss the current data lossless compression in the smart grid.
  • How does online advertising traffic prediction helps in boosting businesses?
  • How is the hyperspectral classification done using the multiple kernel learning paradigm?
  • The analysis of large data sets downloaded from websites.
  • How does social media data help advertising companies globally?
  • Which are the systems recognizing and enforcing ownership of data records?
  • The alternate possibilities emerging for edge computing.

The Best Big Data Analysis Research Topics and Essays

There are a lot of issues that are associated with big data. Here are some of the research topics that you can use in your essays. These topics are ideal whether in high school or college.

  • The various errors and uncertainty in making data decisions.
  • The application of big data on tourism.
  • The automation innovation with big data or related technology
  • The business models of big data ecosystems.
  • Privacy awareness in the era of big data and machine learning.
  • The data privacy for big automotive data.
  • How is traffic managed in defined data center networks?
  • Big data analytics for fault detection.
  • The need for machine learning with big data.
  • The innovative big data processing used in health care institutions.
  • The money normalization and extraction from texts.
  • How is text categorization done in AI?
  • The opportunistic development of data-driven interactive applications.
  • The use of data science and big data towards personalized medicine.
  • The programming and optimization of big data applications.

The Latest Big Data Research Topics for your Research Proposal

Doing a research proposal can be hard at first unless you choose an ideal topic. If you are just diving into the big data field, you can use any of these topics to get a deeper understanding.

  • The data-centric network of things.
  • Big data management using artificial intelligence supply chain.
  • The big data analytics for maintenance.
  • The high confidence network predictions for big biological data.
  • The performance optimization techniques and tools for data-intensive computation platforms.
  • The predictive modeling in the legal context.
  • Analysis of large data sets in life sciences.
  • How to understand the mobility and transport modal disparities sing emerging data sources?
  • How do you think data analytics can support asset management decisions?
  • An analysis of travel patterns for cellular network data.
  • The data-driven strategic planning for citywide building retrofitting.
  • How is money normalization done in data analytics?
  • Major techniques used in data mining.
  • The big data adaptation and analytics of cloud computing.
  • The predictive data maintenance for fault diagnosis.

Interesting Research Topics on A/B Testing In Big Data

A/B testing topics are different from the normal big data topics. However, you use an almost similar methodology to find the reasons behind the issues. These topics are interesting and will help you to get a deeper understanding.

  • How is ultra-targeted marketing done?
  • The transition of A/B testing from digital to offline.
  • How can big data and A/B testing be done to win an election?
  • Evaluate the use of A/B testing on big data
  • Evaluate A/B testing as a randomized control experiment.
  • How does A/B testing work?
  • The mistakes to avoid while conducting the A/B testing.
  • The most ideal time to use A/B testing.
  • The best way to interpret results for an A/B test.
  • The major principles of A/B tests.
  • Evaluate the cluster randomization in big data
  • The best way to analyze A/B test results and the statistical significance.
  • How is A/B testing used in boosting businesses?
  • The importance of data analysis in conversion research
  • The importance of A/B testing in data science.

Amazing Research Topics on Big Data and Local Governments

Governments are now using big data to make the lives of the citizens better. This is in the government and the various institutions. They are based on real-life experiences and making the world better.

  • Assess the benefits and barriers of big data in the public sector.
  • The best approach to smart city data ecosystems.
  • The big analytics used for policymaking.
  • Evaluate the smart technology and emergence algorithm bureaucracy.
  • Evaluate the use of citizen scoring in public services.
  • An analysis of the government administrative data globally.
  • The public values are found in the era of big data.
  • Public engagement on local government data use.
  • Data analytics use in policymaking.
  • How are algorithms used in public sector decision-making?
  • The democratic governance in the big data era.
  • The best business model innovation to be used in sustainable organizations.
  • How does the government use the collected data from various sources?
  • The role of big data for smart cities.
  • How does big data play a role in policymaking?

Easy Research Topics on Big Data

Who said big data topics had to be hard? Here are some of the easiest research topics. They are based on data management, research, and data retention. Pick one and try it!

  • Who uses big data analytics?
  • Evaluate structure machine learning.
  • Explain the whole deep learning process.
  • Which are the best ways to manage platforms for enterprise analytics?
  • Which are the new technologies used in data management?
  • What is the importance of data retention?
  • The best way to work with images is when doing research.
  • The best way to promote research outreach is through data management.
  • The best way to source and manage external data.
  • Does machine learning improve the quality of data?
  • Describe the security technologies that can be used in data protection.
  • Evaluate token-based authentication and its importance.
  • How can poor data security lead to the loss of information?
  • How to determine secure data.
  • What is the importance of centralized key management?

Unique IoT and Big Data Research Topics

Internet of Things has evolved and many devices are now using it. There are smart devices, smart cities, smart locks, and much more. Things can now be controlled by the touch of a button.

  • Evaluate the 5G networks and IoT.
  • Analyze the use of Artificial intelligence in the modern world.
  • How do ultra-power IoT technologies work?
  • Evaluate the adaptive systems and models at runtime.
  • How have smart cities and smart environments improved the living space?
  • The importance of the IoT-based supply chains.
  • How does smart agriculture influence water management?
  • The internet applications naming and identifiers.
  • How does the smart grid influence energy management?
  • Which are the best design principles for IoT application development?
  • The best human-device interactions for the Internet of Things.
  • The relation between urban dynamics and crowdsourcing services.
  • The best wireless sensor network for IoT security.
  • The best intrusion detection in IoT.
  • The importance of big data on the Internet of Things.

Big Data Database Research Topics You Should Try

Big data is broad and interesting. These big data database research topics will put you in a better place in your research. You also get to evaluate the roles of various phenomena.

  • The best cloud computing platforms for big data analytics.
  • The parallel programming techniques for big data processing.
  • The importance of big data models and algorithms in research.
  • Evaluate the role of big data analytics for smart healthcare.
  • How is big data analytics used in business intelligence?
  • The best machine learning methods for big data.
  • Evaluate the Hadoop programming in big data analytics.
  • What is privacy-preserving to big data analytics?
  • The best tools for massive big data processing
  • IoT deployment in Governments and Internet service providers.
  • How will IoT be used for future internet architectures?
  • How does big data close the gap between research and implementation?
  • What are the cross-layer attacks in IoT?
  • The influence of big data and smart city planning in society.
  • Why do you think user access control is important?

Big Data Scala Research Topics

Scala is a programming language that is used in data management. It is closely related to other data programming languages. Here are some of the best scala questions that you can research.

  • Which are the most used languages in big data?
  • How is scala used in big data research?
  • Is scala better than Java in big data?
  • How is scala a concise programming language?
  • How does the scala language stream process in real-time?
  • Which are the various libraries for data science and data analysis?
  • How does scala allow imperative programming in data collection?
  • Evaluate how scala includes a useful REPL for interaction.
  • Evaluate scala’s IDE support.
  • The data catalog reference model.
  • Evaluate the basics of data management and its influence on research.
  • Discuss the behavioral analytics process.
  • What can you term as the experience economy?
  • The difference between agile data science and scala language.
  • Explain the graph analytics process.

Independent Research Topics for Big Data

These independent research topics for big data are based on the various technologies and how they are related. Big data will greatly be important for modern society.

  • The biggest investment is in big data analysis.
  • How are multi-cloud and hybrid settings deep roots?
  • Why do you think machine learning will be in focus for a long while?
  • Discuss in-memory computing.
  • What is the difference between edge computing and in-memory computing?
  • The relation between the Internet of things and big data.
  • How will digital transformation make the world a better place?
  • How does data analysis help in social network optimization?
  • How will complex big data be essential for future enterprises?
  • Compare the various big data frameworks.
  • The best way to gather and monitor traffic information using the CCTV images
  • Evaluate the hierarchical structure of groups and clusters in the decision tree.
  • Which are the 3D mapping techniques for live streaming data.
  • How does machine learning help to improve data analysis?
  • Evaluate DataStream management in task allocation.
  • How is big data provisioned through edge computing?
  • The model-based clustering of texts.
  • The best ways to manage big data.
  • The use of machine learning in big data.

Is Your Big Data Thesis Giving You Problems?

These are some of the best topics that you can use to prosper in your studies. Not only are they easy to research but also reflect on real-time issues. Whether in University or college, you need to put enough effort into your studies to prosper. However, if you have time constraints, we can provide professional writing help. Are you looking for online expert writers? Look no further, we will provide quality work at a cheap price.

ethics paper topics

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment * Error message

Name * Error message

Email * Error message

Save my name, email, and website in this browser for the next time I comment.

As Putin continues killing civilians, bombing kindergartens, and threatening WWIII, Ukraine fights for the world's peaceful future.

Ukraine Live Updates

Premier-Dissertations-Logo

Get an experienced writer start working

Review our examples before placing an order, learn how to draft academic papers, a step-by-step guide to dissertation data analysis.

dissertation-conclusion-example

How to Write a Dissertation Conclusion? | Tips & Examples

dissertation big data analysis

What is PhD Thesis Writing? | Beginner’s Guide

dissertation big data analysis

A data analysis dissertation is a complex and challenging project requiring significant time, effort, and expertise. Fortunately, it is possible to successfully complete a data analysis dissertation with careful planning and execution.

As a student, you must know how important it is to have a strong and well-written dissertation, especially regarding data analysis. Proper data analysis is crucial to the success of your research and can often make or break your dissertation.

To get a better understanding, you may review the data analysis dissertation examples listed below;

  • Impact of Leadership Style on the Job Satisfaction of Nurses
  • Effect of Brand Love on Consumer Buying Behaviour in Dietary Supplement Sector
  • An Insight Into Alternative Dispute Resolution
  • An Investigation of Cyberbullying and its Impact on Adolescent Mental Health in UK

3-Step  Dissertation Process!

dissertation big data analysis

Get 3+ Topics

dissertation big data analysis

Dissertation Proposal

dissertation big data analysis

Get Final Dissertation

Types of data analysis for dissertation.

The various types of data Analysis in a Dissertation are as follows;

1.   Qualitative Data Analysis

Qualitative data analysis is a type of data analysis that involves analyzing data that cannot be measured numerically. This data type includes interviews, focus groups, and open-ended surveys. Qualitative data analysis can be used to identify patterns and themes in the data.

2.   Quantitative Data Analysis

Quantitative data analysis is a type of data analysis that involves analyzing data that can be measured numerically. This data type includes test scores, income levels, and crime rates. Quantitative data analysis can be used to test hypotheses and to look for relationships between variables.

3.   Descriptive Data Analysis

Descriptive data analysis is a type of data analysis that involves describing the characteristics of a dataset. This type of data analysis summarizes the main features of a dataset.

4.   Inferential Data Analysis

Inferential data analysis is a type of data analysis that involves making predictions based on a dataset. This type of data analysis can be used to test hypotheses and make predictions about future events.

5.   Exploratory Data Analysis

Exploratory data analysis is a type of data analysis that involves exploring a data set to understand it better. This type of data analysis can identify patterns and relationships in the data.

Time Period to Plan and Complete a Data Analysis Dissertation?

When planning dissertation data analysis, it is important to consider the dissertation methodology structure and time series analysis as they will give you an understanding of how long each stage will take. For example, using a qualitative research method, your data analysis will involve coding and categorizing your data.

This can be time-consuming, so allowing enough time in your schedule is important. Once you have coded and categorized your data, you will need to write up your findings. Again, this can take some time, so factor this into your schedule.

Finally, you will need to proofread and edit your dissertation before submitting it. All told, a data analysis dissertation can take anywhere from several weeks to several months to complete, depending on the project’s complexity. Therefore, starting planning early and allowing enough time in your schedule to complete the task is important.

Essential Strategies for Data Analysis Dissertation

A.   Planning

The first step in any dissertation is planning. You must decide what you want to write about and how you want to structure your argument. This planning will involve deciding what data you want to analyze and what methods you will use for a data analysis dissertation.

B.   Prototyping

Once you have a plan for your dissertation, it’s time to start writing. However, creating a prototype is important before diving head-first into writing your dissertation. A prototype is a rough draft of your argument that allows you to get feedback from your advisor and committee members. This feedback will help you fine-tune your argument before you start writing the final version of your dissertation.

C.   Executing

After you have created a plan and prototype for your data analysis dissertation, it’s time to start writing the final version. This process will involve collecting and analyzing data and writing up your results. You will also need to create a conclusion section that ties everything together.

D.   Presenting

The final step in acing your data analysis dissertation is presenting it to your committee. This presentation should be well-organized and professionally presented. During the presentation, you’ll also need to be ready to respond to questions concerning your dissertation.

Data Analysis Tools

Numerous suggestive tools are employed to assess the data and deduce pertinent findings for the discussion section. The tools used to analyze data and get a scientific conclusion are as follows:

a.     Excel

Excel is a spreadsheet program part of the Microsoft Office productivity software suite. Excel is a powerful tool that can be used for various data analysis tasks, such as creating charts and graphs, performing mathematical calculations, and sorting and filtering data.

b.     Google Sheets

Google Sheets is a free online spreadsheet application that is part of the Google Drive suite of productivity software. Google Sheets is similar to Excel in terms of functionality, but it also has some unique features, such as the ability to collaborate with other users in real-time.

c.     SPSS

SPSS is a statistical analysis software program commonly used in the social sciences. SPSS can be used for various data analysis tasks, such as hypothesis testing, factor analysis, and regression analysis.

d.     STATA

STATA is a statistical analysis software program commonly used in the sciences and economics. STATA can be used for data management, statistical modelling, descriptive statistics analysis, and data visualization tasks.

SAS is a commercial statistical analysis software program used by businesses and organizations worldwide. SAS can be used for predictive modelling, market research, and fraud detection.

R is a free, open-source statistical programming language popular among statisticians and data scientists. R can be used for tasks such as data wrangling, machine learning, and creating complex visualizations.

g.     Python

A variety of applications may be used using the distinctive programming language Python, including web development, scientific computing, and artificial intelligence. Python also has a number of modules and libraries that can be used for data analysis tasks, such as numerical computing, statistical modelling, and data visualization.

Testimonials

Very satisfied students

This is our reason for working. We want to make all students happy, every day. Review us on Sitejabber

Tips to Compose a Successful Data Analysis Dissertation

a.   Choose a Topic You’re Passionate About

The first step to writing a successful data analysis dissertation is to choose a topic you’re passionate about. Not only will this make the research and writing process more enjoyable, but it will also ensure that you produce a high-quality paper.

Choose a topic that is particular enough to be covered in your paper’s scope but not so specific that it will be challenging to obtain enough evidence to substantiate your arguments.

b.   Do Your Research

data analysis in research is an important part of academic writing. Once you’ve selected a topic, it’s time to begin your research. Be sure to consult with your advisor or supervisor frequently during this stage to ensure that you are on the right track. In addition to secondary sources such as books, journal articles, and reports, you should also consider conducting primary research through surveys or interviews. This will give you first-hand insights into your topic that can be invaluable when writing your paper.

c.   Develop a Strong Thesis Statement

After you’ve done your research, it’s time to start developing your thesis statement. It is arguably the most crucial part of your entire paper, so take care to craft a clear and concise statement that encapsulates the main argument of your paper.

Remember that your thesis statement should be arguable—that is, it should be capable of being disputed by someone who disagrees with your point of view. If your thesis statement is not arguable, it will be difficult to write a convincing paper.

d.   Write a Detailed Outline

Once you have developed a strong thesis statement, the next step is to write a detailed outline of your paper. This will offer you a direction to write in and guarantee that your paper makes sense from beginning to end.

Your outline should include an introduction, in which you state your thesis statement; several body paragraphs, each devoted to a different aspect of your argument; and a conclusion, in which you restate your thesis and summarize the main points of your paper.

e.   Write Your First Draft

With your outline in hand, it’s finally time to start writing your first draft. At this stage, don’t worry about perfecting your grammar or making sure every sentence is exactly right—focus on getting all of your ideas down on paper (or onto the screen). Once you have completed your first draft, you can revise it for style and clarity.

And there you have it! Following these simple tips can increase your chances of success when writing your data analysis dissertation. Just remember to start early, give yourself plenty of time to research and revise, and consult with your supervisor frequently throughout the process.

How Does It Work ?

dissertation big data analysis

Fill the Form

dissertation big data analysis

Writer Starts Working

dissertation big data analysis

3+ Topics Emailed!

Studying the above examples gives you valuable insight into the structure and content that should be included in your own data analysis dissertation. You can also learn how to effectively analyze and present your data and make a lasting impact on your readers.

In addition to being a useful resource for completing your dissertation, these examples can also serve as a valuable reference for future academic writing projects. By following these examples and understanding their principles, you can improve your data analysis skills and increase your chances of success in your academic career.

You may also contact Premier Dissertations to develop your data analysis dissertation.

For further assistance, some other resources in the dissertation writing section are shared below;

How Do You Select the Right Data Analysis

How to Write Data Analysis For A Dissertation?

How to Develop a Conceptual Framework in Dissertation?

What is a Hypothesis in a Dissertation?

Get an Immediate Response

Discuss your requirments with our writers

WhatsApp Us Email Us Chat with Us

Get 3+ Free   Dissertation Topics within 24 hours?

Your Number

Academic Level Select Academic Level Undergraduate Masters PhD

Area of Research

admin farhan

admin farhan

Related posts.

How to Write a Reaction Paper: Format, Template, & Examples

How to Write a Reaction Paper: Format, Template, & Examples

What Is a Covariate? Its Role in Statistical Modeling

What Is a Covariate? Its Role in Statistical Modeling

What is Conventions in Writing | Definition, Importance & Examples

What is Conventions in Writing | Definition, Importance & Examples

Comments are closed.

DZone

  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
  • Manage My Drafts

Spark + Kafka for Real-Time Machine Learning: Join us May 29 to learn more about real-time ML and how to solve challenges data teams face.

Data Engineering: Work with DBs? Build data pipelines? Or maybe you're exploring AI-driven data capabilities? We want to hear your insights.

Modern API Management : Dive into APIs’ growing influence across domains, prevalent paradigms, microservices, the role AI plays, and more.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

  • Useful Tips and Tricks for Data Scientists
  • Data Analytics Trends To Watch in 2024
  • Simplify Big Data Analytics With AirMettle
  • Unlocking the Secrets of Data Privacy: Navigating the World of Data Anonymization, Part 1
  • Understanding API Technologies: A Comparative Analysis of REST, GraphQL, and Asynchronous APIs
  • I Built an App With Remix in 30 Minutes
  • Optimize Your Machine Learning Batch Inference Jobs Using AWS
  • Introducing Kilo
  • Data Engineering

Apache Doris for Log and Time Series Data Analysis

Netease has replaced elasticsearch and influxdb with apache doris, achieving 11x query performance and saving 70% of resources..

Frank Z user avatar

Join the DZone community and get the full member experience.

For most people looking for a log management and analytics solution, Elasticsearch is the go-to choice. The same applies to InfluxDB for time series data analysis. These were exactly the choices of NetEase, one of the world's highest-yielding game companies but more than that. As NetEase expands its business horizons, the logs and time series data it receives explode, and problems like surging storage costs and declining stability come. As NetEase's pick among all big data components for platform upgrades, Apache Doris fits into both scenarios and brings much faster query performance.  

We list the gains of NetEase after adopting Apache Doris in their monitoring platform and time series data platform and share their best practice with users who have similar needs.

Monitoring Platform: Elasticsearch -> Apache Doris

NetEase provides a collaborative workspace platform that combines email, calendar, cloud-based documents, instant messaging, and customer management, etc. To oversee its performance and availability, NetEase builds the Eagle monitoring platform, which collects logs for analysis. Eagle was supported by Elasticsearch and Logstash. The data pipeline was simple: Logstash gathers log data, cleans and transforms it, and then outputs it to Elasticsearch, which handles real-time log retrieval and analysis requests from users.

Pipeline

Due to NetEase's increasingly sizable log dataset, Elastisearch's index design, and limited hardware resources, the monitoring platform exhibits high latency in daily queries. Additionally, Elasticsearch maintains high data redundancy for forward indexes, inverted indexes, and columnar storage. This adds to cost pressure.

After migration to Apache Doris, NetEase achieves a 70% reduction in storage costs and an 11-fold increase in query speed.

with Doris

  • 70% reduction in storage costs : This means a dataset that takes up 100TB in Elasticsearch only requires 30TB in Apache Doris. Moreover, thanks to the much-reduced storage space usage, they can replace their HDDs with more expensive SSDs for hot data storage to achieve higher query performance while staying within the same budget.
  • 11-fold increase in query speed : Apache Doris can deliver faster queries while consuming less CPU resources than Elasticsearch. As shown below, Doris has reliably low latency in queries of various sizes, while Elasticsearch demonstrates longer latency and greater fluctuations, and the smallest speed difference is 11-fold.

query latency

Time Series Data Platform: InfluxDB -> Apache Doris

NetEase is also an instant messaging (IM) PaaS provider. To support this, it builds a data platform to analyze time series data from their IM services. The platform was built on InfluxDB, a time series database. Data flowed into a Kafka message queue. After the fields were parsed and cleaned, they arrived in InfluxDB, ready to be queried. InfluxDB responded to both online and offline queries. The former was to generate metric monitoring reports and bills in real-time, and the latter was to batch-analyze data from a day ago. 

time series data

This platform was also challenged by the increasing data size and diversifying data sources.

  • OOM : Offline data analysis across multiple data sources was putting InfluxDB under huge pressure and causing OOM errors.
  • High storage costs : Cold data took up a large portion but it was stored the same way as hot data. That added up to huge expenditures.

Example with Doris

Replacing InfluxDB with Apache Doris has brought higher cost efficiency to the data platform:

  • Higher throughput : Apache Doris maintains a writing throughput of 500MB/s and achieves a peak writing throughput of 1GB/s. With InfluxDB, they used to require 22 servers for a CPU utilization rate of 50%. Now, with Doris, it only takes them 11 servers at the same CPU utilization rate. That means Doris helps cut down resource consumption by half.
  • 67% less storage usage : The same dataset used 150TB of storage space with InfluxDB but only took up 50TB with Doris. Thus, Doris helps reduce storage costs by 67%.
  • Faster and more stable query performance : The performance test was to select a random online query SQL and run it 99 consecutive times. As is shown below, Doris delivers generally faster response time and maintains stability throughout the 99 queries.

comparison chart

Best Practice

Adopting a new product and putting it into a production environment is, after all, a big project. The NetEase engineers came across a few hiccups during the journey, and they are kind enough to share about how they solved these problems and saved other users some detours.

Table Creation

Table schema design has a significant impact on database performance, and this holds for log and time series data processing as well. Apache Doris provides optimization options for these scenarios. These are some recommendations provided by NetEase.

  • Retrieval of the latest N logs : Using a DATETIME type time field as the primary key can largely speed queries up.
  • Partitioning strategy : Use PARTITION BY RANGE based on a time field and enable dynamic partition . This allows for auto-management of data partitions.
  • Bucketing strategy : Adopt random bucketing and set the number of buckets to roughly three times the total number of disks in the cluster. (Apache Doris also provides an auto bucket feature to avoid performance loss caused by improper data sharding.)
  • Indexing : Create indexes for frequently searched fields to improve query efficiency. Pay attention to the parser for the fields that require full-text searching, because it determines query accuracy.
  • Compaction : Optimize the compaction strategies based on your own business needs.
  • Data compression : Enable ZSTD for better a higher compression ratio.

Cluster Configuration

Backend (be) configuration, stream load optimization.

During peak times, the data platform is undertaking up to 1 million TPS and a writing throughput of 1GB/s. This is demanding for the system. Meanwhile, at peak time, a large number of concurrent write operations are loading data into lots of tables, but each individual write operation only involves a small amount of data. Thus, it takes a long time to accumulate a batch, which is contradictory to the data freshness requirement from the query side.

As a result, the data platform was bottlenecked by data backlogs in Apache Kafka. NetEase adopts the Stream Load method to ingest data from Kafka to Doris. So the key was to accelerate Stream Load. After talking to the Apache Doris developers , NetEase adopted two optimizations for their log and time series data analysis:

  • Single replica data loading : Load one data replica and pull data from it to generate more replicas. This avoids the overhead of ranking and creating indexes for multiple replicas.
  • Single tablet data loading   ( load_to_single_tablet=true ): Compared to writing data to multiple tablets, this reduces the I/O overhead and the number the small files generated during data loading.

The above measures are effective in improving data loading performance:

  • 2X data consumption speed from Kafka

2X data consumption speed from Kafka

  • 75% lower data latency

75% lower data latency

  • 70% faster response of Stream Load

70% faster response of Stream Load

Before putting the upgraded data platform in their production environment, NetEase has conducted extensive stress testing and grayscale testing. This is their experience in tackling errors along the way.

1. Stream Load Timeout

The early stage of stress testing often reported frequent timeout errors during data import. Additionally, despite the processes and cluster status being normal, the monitoring system couldn't collect the correct BE metrics. The engineers obtained the Doris BE stack using Pstack and analyzed it with PT-PMT. They discovered that the root cause was the lack of HTTP chunked encoding or content-length settings when initiating requests. This led Doris to mistakenly consider the data transfer as incomplete, causing it to remain in a waiting state. The solution was to simply add a chunked encoding setting on the client side.

2. Data Size in a Single Stream Load Exceeding Threshold

The default limit is 100 MB. The solution was to increase streaming_load_json_max_mb to 250 MB.

3. Error: alive replica num 0 < quorum replica num 1

By the show backends command, it was discovered that one BE node was in OFFLINE state. A lookup in the be_custom configuration file revealed a broken_storage_path . Further inspection of the BE logs located an error message "too many open files," meaning the number of file handles opened by the BE process had exceeded the system's limit, and this caused I/O operations to fail. When Doris detected such an abnormality, it marked the disk as unavailable. Because the table was configured with one single replica, when the disk holding the only replica was unavailable, data writing failed.

The solution was to increase the maximum open file descriptor limit for the process to 1 million, delete the be_custom.conf file, and restart the BE node.

4. FE Memory Jitter

During grayscale testing, the FE could not be connected. The monitoring data showed that the JVM's 32 GB was exhausted, and the bdb directory under the FE's meta-directory had ballooned to 50 GB. Memory jitter occurred every hour, with peak memory usage reaching 80%

The root cause was improper parameter configuration. During high-concurrency Stream Load operations, the FE records the related Load information. Each import adds about 200 KB of information to the memory. The cleanup time for such information is controlled by the streaming_label_keep_max_second parameter, which by default is 12 hours. Reducing this to 5 minutes can prevent the FE memory from being exhausted. However, they didn't modify the label_clean_interval_second parameter, which controls the interval of the label cleanup thread. The default value of this parameter is 1 hour, which explains the hourly memory jitter. 

The solution was to dial down label_clean_interval_second to 5 minutes.

The engineers found results that did not match the filtering conditions in a query on the Eagle monitoring platform. 

query

This was due to the engineers' misconception of match_all in Apache Doris. match_all identifies data records that include all the specified tokens while tokenization is based on space and punctuation marks. In the unqualified result, although the timestamp did not match, the message included "29", which compensated for the unmatched part in the timestamp. That's why this data record was included as a query result.

query result

For Doris to produce what the engineers wanted in this query, MATCH_PHRASE should be used instead, because it also identifies the sequence of texts.

If you want to enable support_phrase for existing tables that have already been populated with data, you can execute DROP INDEX and then ADD INDEX to replace the old index with a new one. This process is incremental and does not require rewriting the entire table.

This is another advantage of Doris compared to Elasticsearch: It supports more flexible index management and allows easy addition and removal of indexes.

Apache Doris supports the log and time series data analytic workloads of NetEase with higher query performance and less storage consumption. Beyond these, Apache Doris has other capabilities such as data lake analysis since it is designed as an all-in-one big data analytic platform. If you want a quick evaluation of whether Doris is right for your use case, come talk to the Doris makers on Slack .

Published at DZone with permission of Frank Z . See the original article here.

Opinions expressed by DZone contributors are their own.

Partner Resources

  • About DZone
  • Send feedback
  • Community research
  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone
  • Terms of Service
  • Privacy Policy
  • 3343 Perimeter Hill Drive
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • MyU : For Students, Faculty, and Staff

CS&E Announces 2024-25 Doctoral Dissertation Fellowship (DDF) Award Winners

Collage of headshots of scholarship recipients

Seven Ph.D. students working with CS&E professors have been named Doctoral Dissertation Fellows for the 2024-25 school year. The Doctoral Dissertation Fellowship is a highly competitive fellowship that gives the University’s most accomplished Ph.D. candidates an opportunity to devote full-time effort to an outstanding research project by providing time to finalize and write a dissertation during the fellowship year. The award includes a stipend of $25,000, tuition for up to 14 thesis credits each semester, and subsidized health insurance through the Graduate Assistant Health Plan.

CS&E congratulates the following students on this outstanding accomplishment:

  • Athanasios Bacharis (Advisor: Nikolaos Papanikolopoulos )
  • Karin de Langis (Advisor:  Dongyeop Kang )
  • Arshia Zernab Hassan (Advisors: Chad Myers )
  • Xinyue Hu (Advisors: Zhi-Li Zhang )
  • Lucas Kramer (Advisors: Eric Van Wyk )
  • Yijun Lin (Advisors: Yao-Yi Chiang )
  • Mingzhou Yang (Advisors: Shashi Shekhar )

Athanasios Bacharis

Athanasios Bacharis headshot

Bacharis’ work centers around the robot-vision area, focusing on making autonomous robots act on visual information. His research includes active vision approaches, namely, view planning and next-best-view, to tackle the problem of 3D reconstruction via different optimization frameworks. The acquisition of 3D information is crucial for automating tasks, and active vision methods obtain it via optimal inference. Areas of impact include agriculture and healthcare, where 3D models can lead to reduced use of fertilizers via phenotype analysis of crops and effective management of cancer treatments. Bacharis has a strong publication record, with two peer-reviewed conference papers and one journal paper already published. He also has one conference paper under review and two journal papers in the submission process. His publications are featured in prestigious robotic and automation venues, further demonstrating his expertise and the relevance of his research in the field.

Karin de Langis

Karin de Langis headshot

Karin's thesis works at the intersection of Natural Language Processing (NLP) and cognitive science. Her work uses eye-tracking and other cognitive signals to improve NLP systems in their performance and cognitive interpretability, and to create NLP systems that process language more similarly to humans. Her human-centric approach to NLP is motivated by the possibility of addressing the shortcomings of current statistics-based NLP systems, which often become stuck on explainability and interpretability, resulting in potential biases. This work has most recently been accepted and presented at SIGNLL Conference on Computational Natural Language Learning (CoNLL) conference which has a special focus on theoretically, cognitively and scientifically motivated approaches to computational linguistics.

Arshia Zernab Hassan

Arshia Zernab Hassan headshot

Hassan's thesis work delves into developing computational methods for interpreting data from genome wide CRISPR/Cas9 screens. CRISPR/Cas9 is a new approach for genome editing that enables precise, large-scale editing of genomes and construction of mutants in human cells. These are powerful data for inferring functional relationships among genes essential for cancer growth. Moreover, chemical-genetic CRISPR screens, where population of mutant cells are grown in the presence of chemical compounds, help us understand the effect the chemicals have on cancer cells and formulate precise drug solutions. Given the novelty of these experimental technologies, computational methods to process and interpret the resulting data and accurately quantify the various genetic interactions are still quite limited, and this is where Hassan’s dissertation is focused on. Her research extends to developing deep-learning based methods that leverage CRISPR chemical-genetic and other genomic datasets to predict cancer sensitivity to candidate drugs. Her methods on improving information content in CRISPR screens was published in the Molecular Systems Biology journal, a highly visible journal in the computational biology field. 

Xinyue Hu headshot

Hu's Ph.D. dissertation is concentrated on how to effectively leverage the power of artificial intelligence and machine learning (AI/ML) – especially deep learning – to tackle challenging and important problems in the design and development of reliable, effective and secure (independent) physical infrastructure networks. More specifically, her research focuses on two critical infrastructures: power grids and communication networks, in particular, emerging 5G networks, both of which not only play a critical role in our daily life but are also vital to the nation’s economic well-being and security. Due to the enormous complexity, diversity, and scale of these two infrastructures, traditional approaches based on (simplified) theoretical models and heuristics-based optimization are no longer sufficient in overcoming many technical challenges in the design and operations of these infrastructures: data-driven machine learning approaches have become increasingly essential. The key question now is: how does one leverage the power of AI/ML without abandoning the rich theory and practical expertise that have accumulated over the years? Hu’s research has pioneered a new paradigm – (domain) knowledge-guided machine learning (KGML) – in tackling challenging and important problems in power grid and communications (e.g., 5G) network infrastructures.

Lucas Kramer

Lucas Kramer headshot

Kramer is now the driving force in designing tools and techniques for building extensible programming languages, with the Minnesota Extensible Language Tools (MELT) group. These are languages that start with a host language such as C or Java, but can then be extended with new syntax (notations) and new semantics (e.g. error-checking analyses or optimizations) over that new syntax and the original host language syntax. One extension that Kramer created was to embed the domain-specific language Halide in MELT's extensible specification of C, called ableC. This extension allows programmers to specify how code working on multi-dimensional matrices is transformed and optimized to make efficient use of hardware. Another embeds the logic-programming language Prolog into ableC; yet another provides a form of nondeterministic parallelism useful in some algorithms that search for a solution in a structured, but very large, search space. The goal of his research is to make building language extensions such as these more practical for non-expert developers.  To this end he has made many significant contributions to the MELT group's Silver meta-language, making it easier for extension developers to correctly specify complex language features with minimal boilerplate. Kramer is the lead author of one journal and four conference papers on his work at the University of Minnesota, winning the distinguished paper award for his 2020 paper at the Software Language Engineering conference, "Strategic Tree Rewriting in Attribute Grammars".

Yijun Lin headshot

Lin’s doctoral dissertation focuses on a timely, important topic of spatiotemporal prediction and forecasting using multimodal and multiscale data. Spatiotemporal prediction and forecasting are important scientific problems applicable to diverse phenomena, such as air quality, ambient noise, traffic conditions, and meteorology. Her work also couples the resulting prediction and forecasting with multimodal (e.g., satellite imagery, street-view photos, census records, and human mobility data) and multiscale geographic information (e.g., census records focusing on small tracts vs. neighborhood surveys) to characterize the natural and built environment, facilitating our understanding of the interactions between and within human social systems and the ecosystem. Her work has a wide-reaching impact across multiple domains such as smart cities, urban planning, policymaking, and public health.

Mingzhou Yang

Mingzhou Yang headshot

Yang is developing a thesis in the broad area of spatial data mining for problems in transportation. His thesis has both societal and theoretical significance. Societally, climate change is a grand challenge due to the increasing severity and frequency of climate-related disasters such as wildfires, floods, droughts, etc. Thus, many nations are aiming at carbon neutrality (also called net zero) by mid-century to avert the worst impacts of global warming. Improving energy efficiency and reducing toxic emissions in transportation is important because transportation accounts for the vast majority of U.S. petroleum consumption as well as over a third of GHG emissions and over a hundred thousand U.S. deaths annually via air pollution. To accurately quantify the expected environmental cost of vehicles during real-world driving, Yang's thesis explores ways to incorporate physics in the neural network architecture complementing other methods of integration: feature incorporation, and regularization. This approach imposes stringent physical constraints on the neural network model, guaranteeing that its outputs are consistently in accordance with established physical laws for vehicles. Extensive experiments including ablation studies demonstrated the efficacy of incorporating physics into the model. 

Related news releases

  • Brock Shamblin Wins 2024 Riedl TA Award
  • Ph.D. Student Angel Sylvester Mentor’s High School Student
  • 2024 John T. Riedl Memorial Graduate Teaching Assistant Award
  • CS&E Earns Five Awards at 2023 SIAM SDM
  • CS&E Announces 2023-24 Doctoral Dissertation Fellowship (DDF) Award Winners
  • Future undergraduate students
  • Future transfer students
  • Future graduate students
  • Future international students
  • Diversity and Inclusion Opportunities
  • Learn abroad
  • Living Learning Communities
  • Mentor programs
  • Programs for women
  • Student groups
  • Visit, Apply & Next Steps
  • Information for current students
  • Departments and majors overview
  • Departments
  • Undergraduate majors
  • Graduate programs
  • Integrated Degree Programs
  • Additional degree-granting programs
  • Online learning
  • Academic Advising overview
  • Academic Advising FAQ
  • Academic Advising Blog
  • Appointments and drop-ins
  • Academic support
  • Commencement
  • Four-year plans
  • Honors advising
  • Policies, procedures, and forms
  • Career Services overview
  • Resumes and cover letters
  • Jobs and internships
  • Interviews and job offers
  • CSE Career Fair
  • Major and career exploration
  • Graduate school
  • Collegiate Life overview
  • Scholarships
  • Diversity & Inclusivity Alliance
  • Anderson Student Innovation Labs
  • Information for alumni
  • Get engaged with CSE
  • Upcoming events
  • CSE Alumni Society Board
  • Alumni volunteer interest form
  • Golden Medallion Society Reunion
  • 50-Year Reunion
  • Alumni honors and awards
  • Outstanding Achievement
  • Alumni Service
  • Distinguished Leadership
  • Honorary Doctorate Degrees
  • Nobel Laureates
  • Alumni resources
  • Alumni career resources
  • Alumni news outlets
  • CSE branded clothing
  • International alumni resources
  • Inventing Tomorrow magazine
  • Update your info
  • CSE giving overview
  • Why give to CSE?
  • College priorities
  • Give online now
  • External relations
  • Giving priorities
  • CSE Dean's Club
  • Donor stories
  • Impact of giving
  • Ways to give to CSE
  • Matching gifts
  • CSE directories
  • Invest in your company and the future
  • Recruit our students
  • Connect with researchers
  • K-12 initiatives
  • Diversity initiatives
  • Research news
  • Give to CSE
  • CSE priorities
  • Corporate relations
  • Information for faculty and staff
  • Administrative offices overview
  • Office of the Dean
  • Academic affairs
  • Finance and Operations
  • Communications
  • Human resources
  • Undergraduate programs and student services
  • CSE Committees
  • CSE policies overview
  • Academic policies
  • Faculty hiring and tenure policies
  • Finance policies and information
  • Graduate education policies
  • Human resources policies
  • Research policies
  • Research overview
  • Research centers and facilities
  • Research proposal submission process
  • Research safety
  • Award-winning CSE faculty
  • National academies
  • University awards
  • Honorary professorships
  • Collegiate awards
  • Other CSE honors and awards
  • Staff awards
  • Performance Management Process
  • Work. With Flexibility in CSE
  • K-12 outreach overview
  • Summer camps
  • Outreach events
  • Enrichment programs
  • Field trips and tours
  • CSE K-12 Virtual Classroom Resources
  • Educator development
  • Sponsor an event

IMAGES

  1. Data Analysis

    dissertation big data analysis

  2. Data analysis section of dissertation. How to Use Quantitative Data

    dissertation big data analysis

  3. Writing the Best Dissertation Data Analysis Possible

    dissertation big data analysis

  4. (PDF) Msc-Computer Science (Dissertation): Using Big Data For Corporate

    dissertation big data analysis

  5. Dissertation Data Analysis: A Quick Help With 8 Steps

    dissertation big data analysis

  6. How to Gather Data Using Dissertation Research Methodology

    dissertation big data analysis

VIDEO

  1. Using Big Data to Revolutionize Sustainability

  2. DISSERTATION-Poster-Presentation- MSc Data Science (ML- CRIME PREDICTION)

  3. How to write a good introduction #research #thesis #dataanalytics #dissertation #introduction

  4. MSc Dissertation Poster Presentation

  5. Qualitative Data Analysis Workshop

  6. Researcher Stories: Using Big Data to advise international development

COMMENTS

  1. PDF The Evolution of Big Data and Its Business Applications

    The Evolution of Big Data and Its Business Applications. Doctor of Philosophy (Information Science), May 2018, 113 pp., 12 tables, 10 figures, references, 203 titles. The arrival of the Big Data era has become a major topic of discussion in many sectors because of the premises of big data utilizations and its impact on decisionmaking. It is an -

  2. PDF Investigating the Impact of Big Data Analytics on Supply Chain

    Thesis Title: Investigating the Impact of Big Data Analytics on Supply Chain Operations: Case Studies from the UK Private Sector A thesis submitted for the degree of Doctor of Philosophy By Ruaa Hasan Brunel Business School Brunel University London 2021 . 2 | P a g e

  3. Factor Influencing the Adoption of Big Data Analytics: A Systematic

    The keywords are: "Big data analytics factors,""Big data analytics adoption,""Big data analytics models,""Big data analytics in SMEs" and "Big data analytics for Decision making." Following this step, the keywords were modified as varying sources differed in their inclusion criteria syntax.

  4. Strategies to Implement Big Data Analytics in Telecommunications

    Big data analytics drive organizational intelligence among leaders to assist them with improving their business goals, such as more effective operations, satisfied customers, and high profits (Alsghaier et al., 2017).

  5. PDF How Do Big Data and Data Analytics Impact the External Audit? a

    H1: Applying Big Data and Data Analytics all through the different phases of the external audit leads to more efficiency and effectiveness: less time consuming and less costly. H2: The usage of Big Data and Data Analytics enhances the audit quality. H3: The usage of Big Data and Data Analytics enhances the identification of possible

  6. The Impact of Big Data Analytics on Decision-Making

    According to Ayokanmbi and Sabri (2021), the impact of Big data analytics' influence on decision-making, along with industry 4.0 technology, aims to foster a fact-based, data-driven culture that ...

  7. PDF The Role of Big Data in Strategic Decision-making

    ABSTRACT Author: Tiina Hammarberg Title: The role of big data in strategic decision-making Faculty: School of business and management Major: Strategy, Innovation and Sustainability Year: 2018 Master's thesis: Lappeenranta University of Technology, 118 pages, 11 figures, 5 tables, 1 appendix

  8. How Healthcare Big Data Analytics Information Asymmetry Influences

    unstructured digital health data referred to as. big data, has outpaced the capacity to analyze and derive actionable information (Prosperi et al., 2018). BDA, which is the integration and processing of heterogeneous digital data with specialized tools to discover new relationships and discover actionable insights (Ristevski & Chen, 2018), has the

  9. 11 Tips For Writing a Dissertation Data Analysis

    And place questionnaires, copies of focus groups and interviews, and data sheets in the appendix. On the other hand, one must put the statistical analysis and sayings quoted by interviewees within the dissertation. 8. Thoroughness of Data. It is a common misconception that the data presented is self-explanatory.

  10. Big data analytics in healthcare: a systematic literature review

    2.1. Characteristics of big data. The concept of BDA overarches several data-intensive approaches to the analysis and synthesis of large-scale data (Galetsi, Katsaliaki, and Kumar Citation 2020; Mergel, Rethemeyer, and Isett Citation 2016).Such large-scale data derived from information exchange among different systems is often termed 'big data' (Bahri et al. Citation 2018; Khanra, Dhir ...

  11. Big data analytics and firm performance: Findings from a mixed-method

    A widely used definition of big data analytics regards them as " a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and / or analysis " ( Mikalef, Pappas, Krogstie, & Giannakos, 2017 ).

  12. PDF Big Data Analytics: A Literature Review Perspective

    thesis work, various big data analytical techniques and tools are discussed to allow analysis of the application of big data analytics in several different domains. Keywords: Literature review, big data, big data analytics and tools, decision making, big data applications.

  13. PDF Big Data Analytics PhD Graduate Program Handbook

    The Big Data Analytics PhD program consists of at least 72 credit hours of course work beyond the Bachelor's degree, of which a minimum of 42 hours of formal course work, exclusive of independent study, and 15 credit hours of dissertation research (STA 7980) are required. The program requires 15 hours of elective courses.

  14. Step 7: Data analysis techniques for your dissertation

    An understanding of the data analysis that you will carry out on your data can also be an expected component of the Research Strategy chapter of your dissertation write-up (i.e., usually Chapter Three: Research Strategy). Therefore, it is a good time to think about the data analysis process if you plan to start writing up this chapter at this ...

  15. Digital Commons @ NJIT

    by Kevin Byron. Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations ...

  16. The use of Big Data Analytics in healthcare

    The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data ...

  17. (PDF) Msc-Computer Science (Dissertation): Using big data for corporate

    data that is crucial for an in-depth analysis for corporate reputation. 1.1 Problem statement T o perform RA, marketers use questionnaires to conduct opinion polls with a spe-

  18. Data Science & Analytics Research Topics (Includes Free Webinar)

    Data Science-Related Research Topics. Developing machine learning models for real-time fraud detection in online transactions. The use of big data analytics in predicting and managing urban traffic flow. Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.

  19. Pepperdine Digital Commons

    Pepperdine Digital Commons | Pepperdine University Research

  20. Sample Masters Big Data Full Dissertation

    The time required to write a master's level full dissertation varies, but it typically takes 6-12 months, depending on research complexity and individual pace. This is a sample Masters' level full dissertation in the area of big data, developed by our experts and demonstrates the quality of our work.

  21. Deepening big data sustainable value creation: insights ...

    The impact of big data analytics capabilities (BDACs) on firms' sustainable performance (SP) is exerted through a set of underlying mechanisms that operate as a "black box." Previous research, from the perspective of IT-enabled capabilities, demonstrated that a serial mediation of supply chain management capabilities (SCMCs) and circular economy practices (CEPs) is required to improve SP ...

  22. 214 Big Data Research Topics: Interesting Ideas To Try

    These 15 topics will help you to dive into interesting research. You may even build on research done by other scholars. Evaluate the data mining process. The influence of the various dimension reduction methods and techniques. The best data classification methods. The simple linear regression modeling methods.

  23. A Step-by-Step Guide to Dissertation Data Analysis

    A. Planning. The first step in any dissertation is planning. You must decide what you want to write about and how you want to structure your argument. This planning will involve deciding what data you want to analyze and what methods you will use for a data analysis dissertation. B. Prototyping.

  24. The integration of cultural tourism based on the Internet of Things and

    Based on SPARK's parallel data mining research and application. J Univ Electr Sci Technol. 2018; 14 (8): 69-70. Google Scholar [10] Xu W. Analysis of data mining technology and application analysis based on big data. J Anyang Norm Univ. 2018; 20 (8): 117-118. Google Scholar [11] Su YM, Zhao P.

  25. What is Spark?

    Hadoop MapReduce is a programming model for processing big data sets with a parallel, distributed algorithm. Developers can write massively parallelized operators, without having to worry about work distribution, and fault tolerance. ... It ingests data in mini-batches, and enables analytics on that data with the same application code written ...

  26. Build Spark Structured Streaming applications with the open source

    Apache Spark is a powerful big data engine used for large-scale data analytics. Its in-memory computing makes it great for iterative algorithms and interactive queries. You can use Apache Spark to process streaming data from a variety of streaming sources, including Amazon Kinesis Data Streams for use cases like clickstream analysis, fraud detection, and more. Kinesis Data Streams is a ...

  27. How next-gen data analytics is changing American football

    The winning entry in the 2024 Big Data Bowl assigns a missed tackle to a defender when his probability of making a tackle rises above the 75 percent threshold for at least half a second before dropping back and neither he nor a teammate makes the tackle in the next second.This example, from a November 7, 2022, game between the Baltimore Ravens and the New Orleans Saints, shows the paths of ...

  28. Apache Doris for Log and Time Series Data Analysis

    The data pipeline was simple: Logstash gathers log data, cleans and transforms it, and then outputs it to Elasticsearch, which handles real-time log retrieval and analysis requests from users.

  29. CS&E Announces 2024-25 Doctoral Dissertation Fellowship (DDF) Award

    Seven Ph.D. students working with CS&E professors have been named Doctoral Dissertation Fellows for the 2024-25 school year. The Doctoral Dissertation Fellowship is a highly competitive fellowship that gives the University's most accomplished Ph.D. candidates an opportunity to devote full-time effort to an outstanding research project by providing time to finalize and write a dissertation ...

  30. Prospect Capital Is A Steal (NASDAQ:PSEC)

    Still A Big Discount To Book Value. ... Data by YCharts. Why The Investment Thesis Might Disappoint. ... Thanks for your analysis on PSEC. As a long time holder of PSEC since September 2017 I'm ...