Grad Coach

Research Topics & Ideas: Data Science

50 Topic Ideas To Kickstart Your Research Project

Research topics and ideas about data science and big data analytics

If you’re just starting out exploring data science-related topics for your dissertation, thesis or research project, you’ve come to the right place. In this post, we’ll help kickstart your research by providing a hearty list of data science and analytics-related research ideas , including examples from recent studies.

PS – This is just the start…

We know it’s exciting to run through a list of research topics, but please keep in mind that this list is just a starting point . These topic ideas provided here are intentionally broad and generic , so keep in mind that you will need to develop them further. Nevertheless, they should inspire some ideas for your project.

To develop a suitable research topic, you’ll need to identify a clear and convincing research gap , and a viable plan to fill that gap. If this sounds foreign to you, check out our free research topic webinar that explores how to find and refine a high-quality research topic, from scratch. Alternatively, consider our 1-on-1 coaching service .

Research topic idea mega list

Data Science-Related Research Topics

  • Developing machine learning models for real-time fraud detection in online transactions.
  • The use of big data analytics in predicting and managing urban traffic flow.
  • Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.
  • The application of predictive analytics in personalizing cancer treatment plans.
  • Analyzing consumer behavior through big data to enhance retail marketing strategies.
  • The role of data science in optimizing renewable energy generation from wind farms.
  • Developing natural language processing algorithms for real-time news aggregation and summarization.
  • The application of big data in monitoring and predicting epidemic outbreaks.
  • Investigating the use of machine learning in automating credit scoring for microfinance.
  • The role of data analytics in improving patient care in telemedicine.
  • Developing AI-driven models for predictive maintenance in the manufacturing industry.
  • The use of big data analytics in enhancing cybersecurity threat intelligence.
  • Investigating the impact of sentiment analysis on brand reputation management.
  • The application of data science in optimizing logistics and supply chain operations.
  • Developing deep learning techniques for image recognition in medical diagnostics.
  • The role of big data in analyzing climate change impacts on agricultural productivity.
  • Investigating the use of data analytics in optimizing energy consumption in smart buildings.
  • The application of machine learning in detecting plagiarism in academic works.
  • Analyzing social media data for trends in political opinion and electoral predictions.
  • The role of big data in enhancing sports performance analytics.
  • Developing data-driven strategies for effective water resource management.
  • The use of big data in improving customer experience in the banking sector.
  • Investigating the application of data science in fraud detection in insurance claims.
  • The role of predictive analytics in financial market risk assessment.
  • Developing AI models for early detection of network vulnerabilities.

Research topic evaluator

Data Science Research Ideas (Continued)

  • The application of big data in public transportation systems for route optimization.
  • Investigating the impact of big data analytics on e-commerce recommendation systems.
  • The use of data mining techniques in understanding consumer preferences in the entertainment industry.
  • Developing predictive models for real estate pricing and market trends.
  • The role of big data in tracking and managing environmental pollution.
  • Investigating the use of data analytics in improving airline operational efficiency.
  • The application of machine learning in optimizing pharmaceutical drug discovery.
  • Analyzing online customer reviews to inform product development in the tech industry.
  • The role of data science in crime prediction and prevention strategies.
  • Developing models for analyzing financial time series data for investment strategies.
  • The use of big data in assessing the impact of educational policies on student performance.
  • Investigating the effectiveness of data visualization techniques in business reporting.
  • The application of data analytics in human resource management and talent acquisition.
  • Developing algorithms for anomaly detection in network traffic data.
  • The role of machine learning in enhancing personalized online learning experiences.
  • Investigating the use of big data in urban planning and smart city development.
  • The application of predictive analytics in weather forecasting and disaster management.
  • Analyzing consumer data to drive innovations in the automotive industry.
  • The role of data science in optimizing content delivery networks for streaming services.
  • Developing machine learning models for automated text classification in legal documents.
  • The use of big data in tracking global supply chain disruptions.
  • Investigating the application of data analytics in personalized nutrition and fitness.
  • The role of big data in enhancing the accuracy of geological surveying for natural resource exploration.
  • Developing predictive models for customer churn in the telecommunications industry.
  • The application of data science in optimizing advertisement placement and reach.

Recent Data Science-Related Studies

While the ideas we’ve presented above are a decent starting point for finding a research topic, they are fairly generic and non-specific. So, it helps to look at actual studies in the data science and analytics space to see how this all comes together in practice.

Below, we’ve included a selection of recent studies to help refine your thinking. These are actual studies,  so they can provide some useful insight as to what a research topic looks like in practice.

  • Data Science in Healthcare: COVID-19 and Beyond (Hulsen, 2022)
  • Auto-ML Web-application for Automated Machine Learning Algorithm Training and evaluation (Mukherjee & Rao, 2022)
  • Survey on Statistics and ML in Data Science and Effect in Businesses (Reddy et al., 2022)
  • Visualization in Data Science VDS @ KDD 2022 (Plant et al., 2022)
  • An Essay on How Data Science Can Strengthen Business (Santos, 2023)
  • A Deep study of Data science related problems, application and machine learning algorithms utilized in Data science (Ranjani et al., 2022)
  • You Teach WHAT in Your Data Science Course?!? (Posner & Kerby-Helm, 2022)
  • Statistical Analysis for the Traffic Police Activity: Nashville, Tennessee, USA (Tufail & Gul, 2022)
  • Data Management and Visual Information Processing in Financial Organization using Machine Learning (Balamurugan et al., 2022)
  • A Proposal of an Interactive Web Application Tool QuickViz: To Automate Exploratory Data Analysis (Pitroda, 2022)
  • Applications of Data Science in Respective Engineering Domains (Rasool & Chaudhary, 2022)
  • Jupyter Notebooks for Introducing Data Science to Novice Users (Fruchart et al., 2022)
  • Towards a Systematic Review of Data Science Programs: Themes, Courses, and Ethics (Nellore & Zimmer, 2022)
  • Application of data science and bioinformatics in healthcare technologies (Veeranki & Varshney, 2022)
  • TAPS Responsibility Matrix: A tool for responsible data science by design (Urovi et al., 2023)
  • Data Detectives: A Data Science Program for Middle Grade Learners (Thompson & Irgens, 2022)
  • MACHINE LEARNING FOR NON-MAJORS: A WHITE BOX APPROACH (Mike & Hazzan, 2022)
  • COMPONENTS OF DATA SCIENCE AND ITS APPLICATIONS (Paul et al., 2022)
  • Analysis on the Application of Data Science in Business Analytics (Wang, 2022)

As you can see, these research topics are a lot more focused than the generic topic ideas we presented earlier. So, for you to develop a high-quality research topic, you’ll need to get specific and laser-focused on a specific context with specific variables of interest.  In the video below, we explore some other important things you’ll need to consider when crafting your research topic.

Get 1-On-1 Help

If you’re still unsure about how to find a quality research topic, check out our Research Topic Kickstarter service, which is the perfect starting point for developing a unique, well-justified research topic.

Research Topic Kickstarter - Need Help Finding A Research Topic?

You Might Also Like:

IT & Computer Science Research Topics

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Data management

  • Technology and analytics
  • Analytics and data science
  • Performance indicators

research topics in data management

How Well Does Your Company Use Analytics?

  • Preethika Sainam
  • Seigyoung Auh
  • Richard Ettenson
  • Yeon Sung Jung
  • July 27, 2022

research topics in data management

How Midsize Companies Can Compete in AI

  • Yannick Bammens
  • Paul Hünermund
  • September 06, 2021

research topics in data management

Generative AI Will Transform Virtual Meetings

  • Dash Bibhudatta
  • November 29, 2023

research topics in data management

Bad Data Is Sapping Your Team’s Productivity

  • Thomas C. Redman
  • November 30, 2022

The Untapped Potential of Health Care APIs

  • Robert S. Huckman
  • Maya Uppaluru
  • December 23, 2015

research topics in data management

5 Pillars for Democratizing Data at Your Organization

  • Hippolyte Lefebvre
  • Christine Legner
  • Elizabeth A Teracino
  • November 24, 2023

research topics in data management

How GDPR Will Transform Digital Marketing

  • Dipayan Ghosh
  • May 21, 2018

research topics in data management

What Do We Do About the Biases in AI?

  • James Manyika
  • Jake Silberg
  • Brittany Presten
  • October 25, 2019

research topics in data management

The Dangers of Digital Protectionism

  • Ziyang K Fan and Anil Gupta
  • Anil K. Gupta
  • August 30, 2018

research topics in data management

We Need AI That Is Explainable, Auditable, and Transparent

  • Greg Satell
  • Josh Sutton
  • October 28, 2019

research topics in data management

Why Your Company Needs Data-Product Managers

  • Thomas H. Davenport
  • October 13, 2022

research topics in data management

Make Data a Cornerstone of Your Team

  • Vadim Revzin
  • Sergei Revzin
  • October 09, 2018

Two leading researchers discuss the value of oddball data

  • Stephen Scherer
  • Roger Martin
  • From the November 2009 Issue

research topics in data management

The Dangers of Categorical Thinking

  • Bart de Langhe
  • Phil Fernbach
  • From the September–October 2019 Issue

research topics in data management

Using Uncertainty Modeling to Better Predict Demand

  • Murat Tarakci
  • January 06, 2022

research topics in data management

Why You Aren't Getting More from Your Marketing AI

  • Eva Ascarza
  • Michael Ross
  • Bruce G.S. Hardie
  • From the July–August 2021 Issue

research topics in data management

How Google Proved Management Matters

  • David A. Garvin
  • November 19, 2013

Selling into Micromarkets

  • Manish Goyal
  • Maryanne Q. Hancock
  • Homayoun Hatami
  • From the July–August 2012 Issue

research topics in data management

3 Ways to Build a Data-Driven Team

  • Tomas Chamorro-Premuzic PhD.
  • October 10, 2018

research topics in data management

With Big Data Comes Big Responsibility

  • Alex "Sandy" Pentland
  • Scott Berinato
  • From the November 2014 Issue

research topics in data management

Electronic Medical Records at the ISS Clinic in Mbarara, Uganda

  • Julie Rosenberg
  • Rebecca Weintraub
  • May 18, 2012

Fleet Sales Pricing at Fjord Motor

  • Robert L. Phillips
  • October 04, 2011

Modak Analytics: Shaping the Future in Digital India?

  • Naga Lakshmi Damaraju
  • Navneet Kaur Khangura
  • January 31, 2017

GasBuddy: Fueling Its Digital Platform for Agility and Growth

  • Clare Gillan Huang
  • March 01, 2018

From Theme Park To Resort: Customer Information Management At Port Aventura

  • Mariano A Hervas
  • Marc Planell
  • Xavier Sala
  • July 05, 2011

E-Commerce Analytics for CPG Firms (C): Free Delivery Terms

  • Ayelet Israeli
  • Fedor Ted Lisitsyn
  • January 06, 2021

research topics in data management

Fusion Strategy: How Real-Time Data and AI Will Power the Industrial Future

  • Vijay Govindarajan
  • N Venkat Venkatraman
  • March 12, 2024

research topics in data management

Data Driven: Profiting from Your Most Important Business Asset

  • September 22, 2008

Behavior Change for Good

  • Max H. Bazerman
  • Michael Luca
  • Marie Lawrence
  • March 02, 2020

Happy Cow Ice Cream: Data-Driven Sales Forecasting

  • Akarhade Prasanna
  • Tim Summers
  • Xiao Fang Cai
  • October 08, 2019

research topics in data management

Customer Data and Privacy: Tools for Preparing Your Team for the Future

  • Harvard Business Review
  • Timothy Morey
  • Andrew Burt
  • Christine Moorman
  • October 07, 2020

VIA Science (C)

  • Juan Alcacer
  • Rembrand Koning
  • Annelena Lobb
  • Kerry Herman
  • April 01, 2021

research topics in data management

HBR's Year in Business and Technology: 2021 (2 Books)

  • October 20, 2020

Generative AI Value Chain

  • Matt Higgins
  • July 17, 2023

research topics in data management

The Year in Tech, 2021: Tools for Preparing Your Team for the Future

  • David Weinberger
  • Darrell K. Rigby
  • David Furlonger

research topics in data management

HBR Insights Web3, Crypto, and Blockchain Collection (3 Books)

  • December 05, 2023

Facebook: Hard Questions (B)

  • Neil Malhotra
  • Sheila Melvin
  • October 03, 2018

research topics in data management

Generative AI: Tools for Preparing Your Team for the Future

  • Ethan Mollick
  • David De Cremer
  • Tsedal Neeley
  • Prabhakant Sinha
  • February 06, 2024

Zalora: Data-Driven Pricing Recommendations

  • August 19, 2022

Managing Marketing Data at Allstate

  • John Deighton
  • March 11, 2016

research topics in data management

HR and the Information Agenda

  • David Ulrich
  • October 24, 2016

research topics in data management

The (Often Hidden) Costs of Poor Data and Information

Popular topics, partner center.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Research data management in academic institutions: A scoping review

Contributed equally to this work with: Laure Perrier, Erik Blondal, Heather MacDonald

* E-mail: [email protected]

Affiliation Gerstein Science Information Centre, University of Toronto, Toronto, Ontario, Canada

ORCID logo

Affiliation Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada

¶ ‡ These authors also contributed equally to this work.

Affiliation Gibson D. Lewis Health Science Library, UNT Health Science Center, Fort Worth, Texas, United States of America

Affiliation St. Michael’s Hospital Library, St. Michael’s Hospital, Toronto, Ontario, Canada

Affiliation Faculty of Information, University of Toronto, Toronto, Ontario, Canada

Affiliation Engineering & Computer Science Library, University of Toronto, Toronto, Ontario, Canada

Affiliation Map and Data Library, University of Toronto, Toronto, Ontario, Canada

Affiliation MacOdrum Library, Carleton University, Ottawa, Ontario, Canada

  • Laure Perrier, 
  • Erik Blondal, 
  • A. Patricia Ayala, 
  • Dylanne Dearborn, 
  • Tim Kenny, 
  • David Lightfoot, 
  • Roger Reka, 
  • Mindy Thuna, 
  • Leanne Trimble, 
  • Heather MacDonald

PLOS

  • Published: May 23, 2017
  • https://doi.org/10.1371/journal.pone.0178261
  • Reader Comments

Fig 1

The purpose of this study is to describe the volume, topics, and methodological nature of the existing research literature on research data management in academic institutions.

Materials and methods

We conducted a scoping review by searching forty literature databases encompassing a broad range of disciplines from inception to April 2016. We included all study types and data extracted on study design, discipline, data collection tools, and phase of the research data lifecycle.

We included 301 articles plus 10 companion reports after screening 13,002 titles and abstracts and 654 full-text articles. Most articles (85%) were published from 2010 onwards and conducted within the sciences (86%). More than three-quarters of the articles (78%) reported methods that included interviews, cross-sectional, or case studies. Most articles (68%) included the Giving Access to Data phase of the UK Data Archive Research Data Lifecycle that examines activities such as sharing data. When studies were grouped into five dominant groupings (Stakeholder, Data, Library, Tool/Device, and Publication), data quality emerged as an integral element.

Most studies relied on self-reports (interviews, surveys) or accounts from an observer (case studies) and we found few studies that collected empirical evidence on activities amongst data producers, particularly those examining the impact of research data management interventions. As well, fewer studies examined research data management at the early phases of research projects. The quality of all research outputs needs attention, from the application of best practices in research data management studies, to data producers depositing data in repositories for long-term use.

Citation: Perrier L, Blondal E, Ayala AP, Dearborn D, Kenny T, Lightfoot D, et al. (2017) Research data management in academic institutions: A scoping review. PLoS ONE 12(5): e0178261. https://doi.org/10.1371/journal.pone.0178261

Editor: Sanjay B. Jadhao, International Nutrition Inc, UNITED STATES

Received: February 27, 2017; Accepted: April 26, 2017; Published: May 23, 2017

Copyright: © 2017 Perrier et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Dataset is available from the Zenodo Repository, DOI: 10.5281/zenodo.557043 .

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Increased connectivity has accelerated progress in global research and estimates indicate scientific output is doubling approximately every ten years [ 1 ]. A rise in research activity results in an increase in research data output. However, data generated from research that is not prepared and stored for long-term access is at risk of being lost forever. Vines and colleagues report that the availability of data related to studies declines rapidly with the age of a study and determined that the odds of a data set being reported as available decreased 17% per year after publication)[ 2 ]. At the same time, research funding agencies and scholarly journals are progressively moving towards directives that require data management plans and demand data sharing [ 3 – 6 ]. The current research ecosystem is complex and highlights the need for focused attention on the stewardship of research data [ 1 , 7 ].

Academic institutions are multifaceted organizations that exist within the research ecosystem. Researchers practicing within universities and higher education institutions must comply with funding agency requirements when they are the recipients of research grants. For some disciplines, such as genomics and astronomy, persevering and sharing data is the norm [ 8 – 9 ] yet best practices stipulate that research be reproducible and transparent which indicates effective data management is pertinent to all disciplines.

Interest in research data management in the global community is on the rise. Recent activity has included the Bill & Melinda Gates Foundation moving their open access/open data policy, considered to be exceptionally strong, into force at the beginning of 2017 [ 10 ]. Researchers working towards a solution to the Zika virus organized themselves to publish all epidemiological and clinical data as soon as it was gathered and analyzed [ 11 ]. Fecher and colleagues [ 12 ] conducted a systematic review focusing on data sharing to support the development of a conceptual framework, however it lacked rigorous methods, such as the use of a comprehensive search strategy [ 13 ]. Another review on data sharing was conducted by Bull and colleagues [ 14 ] that examined stakeholders’ perspectives on ethical best practices but focused specifically on low- and middle-income settings. In this scoping review, we aim to assess the research literature that examines research data management as it relates to academic institutions. It is a time of increasing activity in the area of research data management [ 15 ] and higher learning institutions need to be ready to address this change, as well as provide support for their faculty and researchers. Identifying the current state of the literature so there is a clear understanding of the evidence in the area will provide guidance in planning strategies for services and support, as well as outlining essential areas for future research endeavors in research data management. The purpose of this study is to describe the volume, topics, and methodological nature of the existing research literature on research data management in academic institutions.

We conducted a scoping review using guidance from Arksey and O’Malley [ 16 ] and the Joanna Briggs Manual for Scoping Reviews [ 17 ]. A scoping review protocol was prepared and revised based on input from the research team, which included methodologists and librarians specializing in data management. It is available upon request from the corresponding author. Although traditionally applied to systematic reviews, the PRISMA Statement was used for reporting [ 18 ].

Data sources and literature search

We searched 40 electronic literature databases from inception until April 3–4, 2016. Since research data management is relevant to all disciplines, we did not restrict our search to literature databases in the sciences. This was done in order to gain an understanding of the breadth of research available and provide context for the science research literature on the topic of research data management. The search was peer-reviewed by an experienced librarian (HM) using the Peer Review of Electronic Search Strategies checklist and modified as necessary [ 19 ]. The full literature search for MEDLINE is available in the S1 File . Additional database literature searches are available from the corresponding author. Searches were performed with no year or language restrictions. We also searched conference proceedings and gray literature. The gray literature discovery process involved identifying and searching the websites of relevant organizations (such as the Association of Research Libraries, the Joint Information Systems Committee, and the Data Curation Centre). Finally, we scanned the references of included studies to identify other potentially relevant articles. The results were imported into Covidence (covidence.org) for the review team to screen the records.

Study selection

All study designs were considered, including qualitative and quantitative methods such as focus groups, interviews, cross-sectional studies, and randomized controlled trials. Eligible studies included academic institutions and reported on research data management involving areas such as infrastructure, services, and policy. We included studies from all disciplines within academic institutions with no restrictions on geographical location. Studies reporting results that accepted participants outside of academic institutions were included if 50% or more of the total sample represented respondents from academic institutions. For studies that examined entities other than human subjects, the study was included if the outcomes were pertinent to the broader research community, including academia. For example, if a sample of journal articles were retrieved to examine the data sharing statements but each study was not explicitly linked to a research sector, it was accepted into our review since the outcomes are significant to the entire research community and academia was not explicitly excluded. We excluded commentaries, editorials, or papers providing descriptions of processes that lacked a research component.

We define an academic institution as a higher education degree-granting organization dedicated to education and research. Research data management is defined as the storage, access, and preservation of data produced from a given investigation [ 20 ]. This includes issues such as creating data management plans, matters related to sharing data, delivery of services and tools, infrastructure considerations typically related to researchers, planners, librarians, and administrators.

A two-stage process was used to assess articles. Two investigators independently reviewed the retrieved titles and abstracts to identify those that met the inclusion criteria. The study selection process was pilot tested on a sample of records from the literature search. In the second stage, full-text articles of all records identified as relevant were retrieved and independently assessed by two investigators to determine if they met the inclusion criteria. Discrepancies were addressed by having a third reviewer resolve disagreements.

Data abstraction and analysis

After a training exercise, two investigators independently read each article and extracted relevant data in duplicate. Extracted data included study design, study details (such as purpose, methodology), participant characteristics, discipline, and data collection tools used to gather information for the study. In addition, articles were aligned with the research data lifecycle proposed by the United Kingdom Data Archive [ 21 ]. Although represented in a simple diagram, this framework incorporates a comprehensive set of activities (creating data, processing data, analyzing data, preserving data, giving access to data, re-using data) and actions associated with research data management clarifying the longer lifespan that data has outside of the research project that is was created within (see S2 File ). Differences in abstraction were resolved by a third reviewer. Companion reports were identified by matching the authors, timeframe for the study, and intervention. Those that were identified were used for supplementary material only. Risk of bias of individual studies was not assessed because our aim was to examine the extent, range, and nature of research activity, as is consistent with the proposed scoping review methodology [ 16 – 17 ].

We summarized the results descriptively with the use of validated guidelines for narrative synthesis [ 22 – 25 ]. Following guidance from Rodgers and colleagues, [ 22 ] data extraction tables were examined to determine the presence of dominant groups or clusters of characteristics by which the subsequent analysis could be organized. Two team members independently evaluated the abstracted data from the included articles in order to identify key characteristics and themes. Disagreement was resolved through discussion. Due to the heterogeneity of the data, articles and themes were summarized as frequencies and proportions.

Literature search

The literature search identified a total of 15,228 articles. After reviewing titles and abstracts, we retrieved 654 potentially relevant full-text articles. 301 articles were identified for inclusion in the study along with 10 companion documents ( Fig 1 ). The full list of citations for the included studies can be found in the S3 File . The five literature databases that identified the most included studies were MEDLINE (81 articles or 21.60%), Compendex (60 articles or 16%), INSPEC (55 articles or 14.67%), Library and Information Science Abstracts (52 articles or 13.87%), and BIOSIS Previews (47 articles or 12.53%). The full list of electronic databases is available in the S4 File which also includes the number of included studies traced back to their original literature database.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0178261.g001

Characteristics of included articles

Most of the 301 articles were published from 2010 onwards (256 or 85.04%) with 15% published prior to that time ( Table 1 ). Almost half (45.85%) identified North America (Canada, United States, or Mexico) as the region where studies were conducted; however, close to one fifth of articles (18.60%) did not report where the study was conducted. Most of the articles (78.51%) reported methods that included cross-sectional (129 or 35.54%), interviews (86 or 23.69%), or case studies (70 or 19.28%), with 42 articles (out of 301) describing two or more methods. Articles were almost even for reporting qualitative evidence (44.85%) and quantitative evidence (43.85%), with mixed methods representing a smaller proportion (11.29%). Reliance was put on authors in reporting characteristics of studies and no interpretations were made with regards to how attributes of the studies were reported. As a result, some information may appear to have overlap in the reporting of disciplines. For example, health science, medicine, and biomedicine are reported separately as disciplines/subject areas. Authors identified 35 distinct disciplines in the articles with just under ten percent (8.64%) not reporting a discipline and the largest group (105 or 34.88%) being a multidisciplinary. The two disciplines reported most often were medicine and information science/library science (31 or 10.30% each). Studies were reported in 116 journals, 43 conference papers, 26 gray literature documents (e.g., reports), two book chapters, and one PhD dissertation. Almost one-third of the articles (99 or 32.89%) did not use a data collection tool (e.g., when a case study was reported) and a small number (22 or 7.31%) based their data collection tools on instruments previously reported in the literature. Most data collection tools were either developed by authors (97 or 32.23%) or no description was provided about their development (83 or 27.57%). No validated data collection tools were reported. We identified articles that offered no information on the sample size or participant characteristics, [ 26 – 29 ] as well as those that reported on the number of participants that completed the study but failed to describe how many were recruited [ 30 – 31 ].

thumbnail

https://doi.org/10.1371/journal.pone.0178261.t001

Research data lifecycle framework

Two hundred and seven (31.13%) articles aligned with the Giving Access to Data phase of the Research Data Lifecycle [ 20 ] ( Table 2 ) which include the components of distributing data, sharing data, controlling access, establishing copyright, and promoting data. The Preserving Data phase contained the next largest set of articles with 178 (26.77%). In contrast, Analysing Data and Processing Data were the two phases with the least amount of articles containing 28 (4.21%) and 49 (7.37%) respectively. Most articles (87 or 28.9%) were aligned with two phases of the Research Data Lifecycle and were followed by an almost even match of 73 (24.25%) aligning with three phases and 70 (23.26%) with one phase. Twenty-nine (9.63%) were not aligned with any phase of the Research Data Lifecycle and these included articles such as those that described education and training for librarians, or identified skill sets needed to work in research data management.

thumbnail

https://doi.org/10.1371/journal.pone.0178261.t002

Key characteristics of articles

Five dominant groupings were identified for the 301 articles ( Table 3 ). Each of these dominant groups were further categorized into subgroupings of articles to provide more granularity. The top three study types and the top three discipline/subject area is reported for each of the dominant groups. Half of the articles (151 or 50.17%) concentrated on stakeholders (Stakeholder Group), e.g., activities of researchers, publishers, participants / patients, funding agencies, 57 (18.94%) were data-focused (Data Group), e.g., investigating quality or integrity of data in repositories, development or refinement of metadata, 42 (13.95%) centered on library-related activities (Library Group), e.g., identifying skills or training for librarians working in data management, 27 (8.97%) described specific tools/applications/repositories (Tool/Device Group), e.g., introducing an electronic notebook into a laboratory, and 24 (7.97%) articles focused on the activities of publishing (Publication Group), e.g., examining data policies. The Stakeholder Group contained the largest subgroup of articles which was labelled ‘Researcher’ (119 or 39.53%).

thumbnail

https://doi.org/10.1371/journal.pone.0178261.t003

We identified 301 articles and 10 companion documents that focus on research data management in academic institutions published between 1995 and 2016. Tracing articles back to their original literature database indicates that 86% of the studies accepted into our review were from the applied science or basic science literature indicating high activity for research in this area among the sciences. The number of published articles has risen dramatically since 2010 with 85% of articles published post-2009, signaling the increased importance and interest in this area of research. However, the limited use of study designs, deficiency in standardized or validated data collection tools, and lack of transparency in reporting demonstrate the need for attention to rigor. As well, there are limited studies that examine the impact of research data management activities (e.g., the implementation of services, training, or tools).

Few of the study designs employed in the 301 articles collected empirical evidence on activities amongst data producers such as examining changes in behavior (e.g., movement from data withholding to data sharing) or identifying changes in endeavors (e.g., strategies to increase data quality in repositories). Close to 80% of the articles rely on self-reports (e.g., participating in interviews, filling out surveys) or accounts from an observer (e.g., describing events in a case study). Case studies made up almost one-fifth of the articles examined. This group of articles ranged from question-and-answer journalistic style reports, [ 32 ] to articles that offered structured descriptions of activities and processes [ 33 ]. Although study quality was not formally assessed, this range of offerings provided challenges with data abstraction, in particular with the journalistic style accounts. If papers provided clear reporting that included declaring a purpose and describing well-defined outcomes, these articles could supply valuable input to knowledge syntheses such as a realist review [ 34 – 35 ] despite being ranked lower in the hierarchy of evidence [ 36 ]. One exception was Hruby and colleagues [ 37 ] that included a retrospective analysis in their case report that examined the impact of introducing a centralized research data repository for datasets within a urology department at Columbia University. This study offered readers a fuller understanding of the impact of a research data management intervention by providing evidence that detailed a change. Results described a reduction in the time required to complete studies, and an increase in publication quantity and quality (i.e., increase in average journal impact factor of papers published). There is opportunity for those wishing to conduct studies that provide empirical evidence for data producers and those interested in data reuse, however, for those wishing to conduct case studies, the development of reporting guidelines may be of benefit.

Using the Research Data Lifecycle framework provides the opportunity to understand where researchers are focusing their efforts in studying research data management. Most studies fell within the Giving Access to Data phase of the framework which includes activities such as sharing data and controlling access to data, and the Preserving Data phase which focuses on activities such as documenting and archiving data. This aligns with the global trend of funding agencies moving towards requirements for open access and open data [ 15 ] which includes activities such as creating metadata/documentation and sharing data in public repositories when possible. Fewer studies fell within phases that occurred at the beginning of the Research Data Lifecycle which includes activities such as writing data management plans and the preparation of data for preservation. Research in these early phases that include planning and setting up processes for handling data as it is being created may provide insight into how these activities impact later phases of the Research Data Lifecycle, in particular with regards to data quality.

Data quality was examined in several of the Groups described in Table 3 . Within the Data Group, ‘data quality and integrity’ comprised the biggest subgroup of articles. Two other subgroups in the Data Group, ‘classification systems’ and ‘repositories’, provided articles that touched on issues related to data quality as well. These issues included refining metadata and improving functionalities in repositories that enabled scholarly use and reuse of materials. Willoughby and colleagues illustrated some of the challenges related to data quality when reporting on researchers in chemistry, biology, and physics [ 38 ]. They found that when filling out metadata for a repository, researchers used a ‘minimum required’ approach. The biggest inhibitor to adding useful metadata was the ‘blank canvas’ effect, where the users may have been willing to add metadata but did not know how. The authors concluded that simply providing a mechanism to add metadata was not sufficient. Data quality, or the lack thereof, was also identified in the Publication Group, with ‘data availability, accessibility, and reuse’ and ‘data policies’ subgroups listing articles that tracked the completeness of deposited data sets, and offered assessments on the guidance offered by journals on their data sharing policies. Piwowar and Chapman analyzed whether data sharing frequency was associated with funder and publisher requirements [ 39 ]. They found that NIH (National Institute of Health) funding had little impact on data sharing despite policies that required this. Data sharing was significantly association with the impact factor of a journal (not a journal’s data sharing policy) and the experience of the first/last authors. Studies that investigate processes to improve the quality of data deposited in repositories, or strategies to increase compliance with journal or funder data sharing policies that support depositing high-quality and useable data, could potentially provide tangible guidance to investigators interested in effective data reuse.

We found a number of articles with important information not reported. This included the geographic region in which the study was conducted (56 or 18.6%) and the discipline or subject area being examined (26 or 8.64%). Data abstraction identified studies that provided no information on participant populations (such as sample size or characteristics of the participants) as well as studies that reported the number of participants who completed the study, but failed to report the number recruited. Lack of transparency and poor documentation of research is highlighted in the recent Lancet series on ‘research waste’ that calls attention to avoiding the misuse of valuable resources and the inadequate emphasis on the reproducibility of research [ 40 ]. Those conducting research in data management must recognize the importance of research integrity being reflected in all research outputs that includes both publications and data.

We identified a sizable body of literature that describes research data management related to academic institutions, with the majority of studies conducted in the applied or basic sciences. Our results should promote further research in several areas. One area includes shifting the focus of studies towards collecting empirical evidence that demonstrates the impact of interventions related to research data management. Another area that requires further attention is researching activities that demonstrate concrete improvements to the quality and usefulness of data in repositories for reuse, as well as the examining facilitators and barriers for researchers to participate in this activity. In particular, there is a gap in research that examines activities in the early phases of research projects to determine the impact of interventions at this stage. Finally, researchers investigating research data management must follow best practices in research reporting and ensure the high quality of their own research outputs that includes both publications and datasets.

Supporting information

S1 file. medline search strategy..

https://doi.org/10.1371/journal.pone.0178261.s001

S2 File. Research data lifecycle phases.

https://doi.org/10.1371/journal.pone.0178261.s002

S3 File. Included studies.

https://doi.org/10.1371/journal.pone.0178261.s003

S4 File. Literature databases searched.

https://doi.org/10.1371/journal.pone.0178261.s004

Acknowledgments

We thank Mikaela Gray for retrieving articles, tracking papers back to their original literature databases, and assisting with references. We also thank Lily Yuxi Ren for retrieving conference proceedings and searching the gray literature. We acknowledge Matt Gertler for screening abstracts.

Author Contributions

  • Conceptualization: LP.
  • Data curation: LP EB HM.
  • Formal analysis: LP EB.
  • Investigation: LP EB HM APA DD TK DL MT LT RR.
  • Methodology: LP.
  • Project administration: LP.
  • Supervision: LP.
  • Validation: LP HM.
  • Writing – original draft: LP.
  • Writing – review & editing: LP EB HM APA DD DL TK MT LT RR.
  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 3. Holdren JP. Increasing access to the results of federally funded scientific research. February 22, 2013. Office of Science and Technology Policy. Executive Office of the President. United States of America. Available at: https://obamawhitehouse.archives.gov/blog/2016/02/22/increasing-access-results-federally-funded-science . Accessed February 27, 2017.
  • 4. OECD (Organization for Economic Co-Operation and Development). Declaration on access to research data from public funding. 2004. Available at: http://acts.oecd.org/Instruments/ShowInstrumentView.aspx?InstrumentID=157 . Accessed February 27, 2017.
  • 5. Government of Canada. Research data. 2011. Available at: http://www.science.gc.ca/default.asp?lang=en&n=2BBD98C5-1 . Accessed February 27, 2017.
  • 6. DCC (Digital Curation Centre). Overview of funders’ data policies. Available at: http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies . Accessed February 27, 2017.
  • 8. Hayes J. The data-sharing policy of the World Meteorological Organization: The case for international sharing of scientific data. In: Mathae KB, Uhlir PF, editors. Committee on the Case of International Sharing of Scientific Data: A Focus on Developing Countries. National Academies Press; 2012. p. 29–31.
  • 9. Ivezic Z. Data sharing in astronomy. In: Mathae KB, Uhlir PF, editors. Committee on the Case of International Sharing of Scientific Data: A Focus on Developing Countries. National Academies Press; 2012. p. 41–45.
  • 10. van Noorden R. Gates Foundation announces world’s strongest policy on open access research. Nature Newsblog. Available at: http://blogs.nature.com/news/2014/11/gates-foundation-announces-worlds-strongest-policy-on-open-access-research.html . Accessed February 27, 2017.
  • 13. Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. Available from www.handbook.cochrane.org . Accessed February 27, 2017.
  • 15. Shearer K. Comprehensive Brief on Research Data Management Policies. April 2015. Available at: http://web.archive.org/web/20151001135755/http://science.gc.ca/default.asp?lang=En&n=1E116DB8-1 . Accessed February 27, 2017.
  • 17. The Joanna Briggs Institute. Joanna Briggs Institute Reviewers’ Manual: 2015 Edition. Methodology for JBI Scoping Reviews. Available at: http://joannabriggs.org/assets/docs/sumari/Reviewers-Manual_Methodology-for-JBI-Scoping-Reviews_2015_v2.pdf . Accessed February 27, 2017.
  • 19. PRESS–Peer Review of Electronic Search Strategies: 2015 Guideline Explanation and Elaboration (PRESS E&E). Ottawa: CADTH; 2016 Jan.
  • 20. Research Data Canada. Glossary–Research Data Management. Available at: http://www.rdc-drc.ca/glossary . Accessed February 27, 2017.
  • 21. UK Data Archive. Research Data Lifecycle. Available at: http://www.data-archive.ac.uk/create-manage/life-cycle . Accessed February 27, 2017.
  • 25. Hurwitz B, Greenhalgn T, Skultans V. Meta-narrative mapping: a new approach to the systematic review of complex evidence. In: Greenhalgh T, editor. Narrative Research in Health and Illness. Malden, MA: Blackwell Publishing Ltd; 2008. P. 349–381.
  • 29. Wynholds LA, Wallis JC, Borgman CL, Sands A. Data, data use, and scientific inquiry: two case studies of data practices. Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries. 2012;19–22.
  • 30. Averkamp S, Gu X. Report on the University libraries’ data management need survey. 2012. Available at: http://ir.uiowa.edu/lib_pubs/152 . Accessed February 27, 2017.
  • 32. Roos A. Case study: developing research data management training and support at Helsinki University Library. Association of European Research Libraries. LIBER Case Study. June 2014. Available at: http://libereurope.eu/wp-content/uploads/2014/06/LIBER-Case-Study-Helsinki.pdf . Accessed February 27, 2017.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 June 2022

A focus groups study on data sharing and research data management

  • Devan Ray Donaldson   ORCID: orcid.org/0000-0003-2304-6303 1 &
  • Joshua Wolfgang Koepke   ORCID: orcid.org/0000-0002-0870-2918 1  

Scientific Data volume  9 , Article number:  345 ( 2022 ) Cite this article

6391 Accesses

10 Citations

39 Altmetric

Metrics details

  • Research data
  • Research management

Data sharing can accelerate scientific discovery while increasing return on investment beyond the researcher or group that produced them. Data repositories enable data sharing and preservation over the long term, but little is known about scientists’ perceptions of them and their perspectives on data management and sharing practices. Using focus groups with scientists from five disciplines (atmospheric and earth science, computer science, chemistry, ecology, and neuroscience), we asked questions about data management to lead into a discussion of what features they think are necessary to include in data repository systems and services to help them implement the data sharing and preservation parts of their data management plans. Participants identified metadata quality control and training as problem areas in data management. Additionally, participants discussed several desired repository features, including: metadata control, data traceability, security, stable infrastructure, and data use restrictions. We present their desired repository features as a rubric for the research community to encourage repository utilization. Future directions for research are discussed.

Similar content being viewed by others

research topics in data management

Frequent disturbances enhanced the resilience of past human populations

research topics in data management

Interviews in the social sciences

research topics in data management

The environmental sustainability of digital content consumption

Introduction.

Sharing scientific research data has many benefits. Data sharing produces stronger initial publication data by allowing peer review and validation of datasets and methods prior to publication 1 , 2 . Enabling such activities enhances the integrity of research data and promotes transparency 1 , 2 , both of which are critical for increasing confidence in science 3 , 4 . After publication, data sharing encourages further scientific inquiry and advancements by making data available for other scientists to explore and build upon 2 , 3 , 4 , 5 . Open data allows further scientific inquiry without the costs associated with new data creation 4 , 6 . Researchers in the developing world disproportionately experience the high costs of new data creation as they may struggle to find funding for projects not associated with direct improvement in living conditions 6 . Therefore, data sharing can enable lower-cost research opportunities within developing nations through reusing datasets, creating what Ukwoma and Dike 7 refer to as a “global network” of scientific data. The development of vaccines for the COVID-19 virus illustrates the impact of data sharing on society. Through open data sharing practices, including sharing the genome sequence for the virus, scientists were able to build on each other’s data to create vaccines in record time 8 , 9 , saving millions of people’s lives.

Despite the benefits, research has shown that many scientists still do not share their research data 10 , 11 . Most disciplines operate without established data sharing or data management guidelines, relying on individual or institutional solutions for data management and sharing 10 . Exceptions include research associated with government funding, specific grant restrictions, or approval requirements from institutional review boards (IRBs).

Current scholarship on data management and data sharing within academic disciplines is fragmented. Few interdisciplinary studies exist, and of these, important topics, such as data librarianship and scientists’ perceptions of necessary repository features, are left out of analysis 12 , 13 , 14 , 15 . Also, singular discipline studies on scientific data management and repository utilization contribute limited views and are dated 16 , 17 , 18 .

Applying the conceptual framework of Knowledge Infrastructures (KI) provides a basis for understanding the creation, flow, and maintenance of knowledge 19 . KI posits that seven interconnected entities account for the system: shared norms and values, artifacts, people, institutions, policies, routines and practices, and technology 19 . Examination of some or all of these entities throw into relief inefficiencies and areas for growth within knowledge creation, sharing, and maintenance. In particular, prior research 11 points to repositories and human resources as key areas of investment to improve data management and increase sharing within the scientific community.

This study uses KI as a lens to focus on the individual scientist (i.e., people), their use of repositories (i.e., routines and practices/technology), their opinions of librarians and data sharing (i.e., norms and values), and their data management plans (i.e., policies/institutions). We explore the data management and sharing practices of scientists across five disciplines by answering the following research question: what features do scientists think are necessary to include in data repository systems and services to help them implement the data sharing and preservation parts of their data management plans (DMPs)? We found a consensus across disciplines on certain desired repository features and areas where scientists felt they needed help with data management. In contrast, we found that some discipline-specific issues require unique data management and repository usage. Also, there was little consensus among study participants on the perceived role of librarians in scientific data management.

This paper discusses the following. First, we provide an analysis of the results of our focus group research. Second, we discuss how our study advances research on understanding scientists’ perspectives on data sharing, data management, and repository needs and introduce a rubric for determining data repository appropriateness. Our rubric contributes to research on KI by providing an aid for scientists in selecting data repositories. Finally, we discuss our methods and research design.

Scientists’ data practices

Participants across all the focus groups indicated having a DMP for at least one of their recent or current projects. Regarding data storage, some participants across four focus groups (atmosphere and earth science, chemistry, computer science, and neuroscience) used institutional repositories (IRs) for their data at some point within the data lifecycle, with five participants explicitly indicating use of IRs in their DMPs. The other popular choice discussed across four focus groups (atmospheric and earth science, computer science, ecology, and neuroscience) was proprietary cloud storage systems (e.g., DropBox, GitHub, and Google Drive). These users were concerned about file size limitations, costs, long-term preservation, data mining by the service providers, and the number of storage solutions becoming burdensome.

Desired repository features

Data traceability.

Participants across four focus groups (atmosphere and earth science, chemistry, ecology, and neuroscience) mentioned wanting different kinds of information about how their data were being used to be tracked after data deposit in repositories. They wanted to know how many researchers view, cite, and publish based on the data they deposit. Additionally, participants wanted repositories to track any changes to their data post-deposit. For example, they suggested the creation of a path for updates to items in repositories after initial submission. They also wanted repositories to allow explicit versioning of their materials to clearly inform users of changes to materials over time. Relatedly, participants wanted repositories to provide notification systems for data depositors and users to know when new versions or derivative works based on their data become available as well as notifications for depositors about when their data has been viewed, cited, or included in a publication.

Participants across three focus groups (atmospheric and earth science, chemistry, and neuroscience) discussed wanting high quality metadata within repositories. Some argued for automated metadata creation when uploading their data into repositories to save time and provide at least some level of description of their data (e.g., P1, P4, Chemistry). Within their own projects and in utilizing repositories, participants wanted help with metadata quality control issues. Participants within atmospheric and earth science who frequently created or interacted with complex files wanted expanded types of metadata (e.g., greater spatial metadata for geographic information system (GIS) data). Atmospheric and earth scientists, chemists, and neuroscientists wanted greater searchability and machine readability of data and entities within datasets housed in repositories, specifically to find a variable by multiple search parameters.

Data use restrictions

Participants across all five focus groups agreed that repositories need to clearly explain what a researcher can and cannot do with a dataset. For example, participants thought repositories should clearly state on every dataset whether researchers can: base new research on the data, publish based on the data, and use the data for business purposes. Participants stated current data restrictions can be confusing to those not acquainted with legal principles. For example, one data professional (P2, Chemistry) explained that researchers often mislabeled their datasets with ill-suited licenses. Participants commonly reported using Open Access or Creative Commons, but articulated the necessity of having the option for restrictive or proprietary licenses, although most had not used such licenses.

Some participants used embargoes and others never had. Most viewed embargoes as “a necessary evil,” provided that they are limited to approximately a few years after repository submission or until time of publication. Participants did not think it was fair to repository staff or potential data reusers to have any data embargoed in perpetuity.

Stable infrastructure

Participants across two focus groups (atmospheric and earth science, and chemistry) expressed concern about the long-term stability of their data in repositories. Some stated that their fear of a repository not being able to provide long-term preservation of their data led them to seek out and utilize alternative storage solutions. Others expected repositories to commit to the future of their data and have satisfactory funding structures to fulfill their stated missions. Participants described stable repository infrastructure in terms of updating data files (i.e., versioning) and formats over time and ensuring their usability.

Participants across four focus groups (atmospheric and earth science, chemistry, computer science, and neuroscience) discussed wanting their data to be secure. They feared lax security could compromise their data. Specific to embargoed data, they feared lax security could enable “scooping” of research before data depositors are able to make use of the data through publication. Those handling data with confidential, sensitive or personally identifiable information expressed the most concern about potential security breaches because it could result in a breach and loss of trust with their current and future study participants, making it harder for themselves and future researchers to recruit study participants in the long-term, and it would result in noncompliance with mandates from their IRBs.

Desired help with aspects of data management

Help with metadata standardization and quality control.

Participants across four focus groups (atmospheric and earth science, chemistry, ecology, and neuroscience) discussed wanting help with metadata standardization, including quality control for metadata associated with their datasets, to help fulfill their DMPs while enhancing data searchability and discoverability.

Help with verification of deleted data when necessary

Our university-affiliated participants were particularly concerned about verifiable deletion of data when necessary to comply with their IRBs. Participants expressed concern about their newer graduate students’ capacity and follow-through on deleting sensitive data that their predecessors (i.e., graduate students who graduated before study completion) collected before they started school. In this scenario, failure to delete sensitive data is a serious breach of IRB policy, which can lead to the data being compromised and/or revocation of permission to conduct future research. Participants who worked with sensitive information frequently (e.g., passwords in computer science and medical information in neuroscience) cited this as a concern.

Need for data management training

Participants across four focus groups (atmospheric and earth science, chemistry, ecology, and neuroscience) mentioned the need for additional training in data management. Participants stated they were unaware of the number of discipline-specific repositories that were currently available until their peers or librarians shared this information. Consequently, several participants suggested training sessions to raise awareness about these repositories within their disciplines. Participants were also concerned about what they perceived as a limited amount of training available for graduate students and new researchers on data management tools. They described current efforts that they were aware of as either training new workers/students on simpler tools or conducting training “piecemeal” on advanced data management tools, both of which they perceived as limiting project productivity. No participants in the computer science focus group mentioned the need for additional technical or informational training in data management.

Knowledge of existing data management principles and practices

Results were mixed on participants’ knowledge about the FAIR Guiding Principles for Scientific Data Management and Stewardship 20 . Twelve participants across all five focus groups knew about the FAIR principles, while ten across four disciplines (chemistry, computer science, ecology, and neuroscience) did not know about them. Those who were familiar mentioned challenges with applying the FAIR principles to large and multimodal datasets (e.g., P4, Neuroscience).

Role of librarians in data management

Participants across two focus groups (atmospheric and earth science and chemistry) did not think librarians should have a role in their data management for two reasons. First, they thought their data were too technical or specialized for librarians to meaningfully contribute to their management (e.g., P3, Chemistry). Second, they assumed librarians were too busy to help with data management. In their view, librarians were already stretched too thin with greater priorities related to addressing the “modern era of information overload” to be concerned with managing their data (e.g., P6, Chemistry).

In contrast, other participants across all five focus groups thought librarians could play a role in scientific data management and sharing by providing assistance with publication, literature searches, patents, copyright searches, management of data mandates, embargo enforcement, information literacy, and metadata standardization. The two largest areas of agreement for participants who indicated a role for librarians were the more traditional area of assistance with information research and search help (e.g., P5, Atmospheric and Earth Science) as well as data management (e.g., P4, Neuroscience).

This study contributes to the research literature on scientists’ perspectives on data management, repositories, and librarians. Additionally, our study presents a rubric based on the perceived importance of repository features by our participants as a decision-support tool to enable the selection of data repositories based on scientists’ data management and sharing needs and preferences.

Our results suggest several aspects to improve KI focused on research data management and sharing. For example, in terms of data management wants and needs, participants in the atmospheric and earth science, ecology, and neuroscience disciplines who stated a need for help with metadata also wanted greater searchability of metadata from repositories. However, poor metadata searchability might be the net result of poor metadata quality control by data producers during data creation and/or deposit, which repositories can provide guidelines for, but in many cases cannot force data producers to do (or do well). This may be an example of KI entities (routines and practices impacting technology allowances of the corresponding repository) impacting each other. Metadata regulation issues are consistent with prior research 12 , 18 , 21 , 22 , 23 , 24 , 25 .

Data integrity challenges that researchers needed assistance with were developing data standards and the technical skills of employees. Both issues connect to data integrity and trust development, which are vital for data sharing and reuse 1 , 2 . The need for training on data management topics is consistent with prior research 25 , 26 , 27 . Within metadata standards development, this study’s results point to the discipline-specific call for repository integration of GIS data for discoverability by atmospheric and earth science participants. Sixty percent of participants within this subject area expressed a desire for such metadata additions. This study recognizes the need for standardized, more descriptive, and quality controlled metadata for repositories, highlighting the metadata problem faced by open access initiatives. Open access repositories have significant metadata issues, especially between repositories, which limit their searchability 28 , 29 . Future research on creating standards and corresponding guides to encourage better metadata creation by dataset originators may advance description and open access efforts.

Our findings are on trend with predictions from prior research about the policies, routines, and practices of data storage during and after scientists’ studies. For example, our finding of utilization of cloud storage solutions was predicted in prior research to increase over time 15 . Additionally, our study confirms overall low IR usage within 14 and between disciplines 12 , 13 , 14 with a slight increase in IR utilization over time 13 . Future research on DMP use and repository integration within DMPs may contribute to the KI entity of policy, possibly influencing more well-formed data, more shared data, and increased data integrity.

Applying the people aspect of KI to our data set, our participants did not have a consensus on the role of librarians in data management and sharing. To them, librarians’ roles appear largely ad hoc and dependent on individual institutions and librarians. Articulations of their roles varied broadly, including: teaching data management skills, implementing data standards, helping with legal aspects (e.g., rights management), and resolving technical preservation issues that concern scientists 30 . Interestingly, participants framed librarian roles in data management as a dichotomy between help with technical issues (e.g., programming skills) and traditional librarianship skills (e.g., literature search and journal access). Whether true or not, scientists’ assumptions about librarians’ roles likely have an impact on their utilization of librarians and libraries for data management and sharing. Exploring scientists’ perceptions of librarians’ roles may provide the necessary insight to foreground collaboration between scientists and librarians on, for example, improving dataset integrity 31 and increasing dataset usage 32 , helping to justify the costs associated with making data open.

Finally, examining the routines, practices, norms, and technologies used by researchers regarding repositories has brought to the surface both an appreciation of open data as a concept and a lack of provision of open data by some of the same scientists who think open data is a good idea. A reluctance to provide open data may stem from the perceived lack of repository features discussed above, in the section Desired Repository Features, in whatever repositories scientists may have entertained using at some point in the past. Consequently, in an effort to encourage greater and more effective repository utilization by scientists across disciplines, we present a repository evaluation rubric based on the desired repository features for which we found empirical support in our study (see Supplementary Table  1 ).

The intended users of this rubric are scientists, librarians, and repository managers. Scientists can use our rubric to compare the relative merits of repositories based on their needs and consider what features they deem important. Librarians providing data consultations may utilize our rubric when helping their patrons. Repository managers can use our rubric to evaluate their repositories and services as a decision-support tool for what areas to improve, including what features to add to their repositories. The purpose of our rubric is to aid in repository selection and critical analysis of available repositories while encouraging repositories to provide features that we have found scientists want. As a corollary, we hope to encourage an increase in data deposit by scientists thereby increasing research opportunities to advance the studied academic disciplines 2 , 3 , 4 , 5 . This is particularly important for scientists in developing countries who may need to be more reliant on utilizing existing datasets for cost effectiveness 6 . Moreover our rubric’s encouragement of data deposit may increase research integrity by making the data available for experts to check 1 , 2 .

Future studies can compare the desired repository features that we gathered empirical support for to additional desired repository features identified through conducting comparable research with scientists from similar and different disciplines and scientists from countries with differing levels of development than those we studied here to assess the generalizability and appropriateness of our instrument.

We produced a convenience sample for our study by browsing the participants lists of major conferences in each discipline: AGU Annual Meetings for atmospheric and earth science, American Chemical Society for chemistry, SOUPS’19 and SOUPS’20 for computer science, Society for Freshwater Science Annual Meetings for ecology, and Neuroscience’19 and Neuroscience’20 for neuroscience. From these participants lists we randomly selected individuals to receive recruitment emails from us inviting their participation in our study. Snowball sampling yielded a few additional participants, and a few other participants were obtained from informal knowledge networks in online communities (e.g., subject-specific discord groups). All professionals were vetted for credentials before inclusion (e.g., master’s and/or doctoral degrees in their disciplines).

Our study participants came from a variety of educational and workplace backgrounds. Table  1 lists our study participants by subject discipline, occupation, and affiliation. Study participant selection criteria required participants to self-identify as a scientist from one of the five subject domains under investigation and possess or be in the process of obtaining a graduate-level degree (e.g., active enrollment within a graduate program) within a particular scientific discipline. We sought individuals from separate institutional affiliations, subdiscipline research backgrounds/interests, and career stages for each scientific discipline to encourage a diversity of opinions and to ensure certain data metrics would not be artificially skewed (e.g., having multiple chemistry participants who actively work together on projects or work at the same institution were likely to answer our questions similarly). All participants currently work and reside in developed western countries. Most are from the United States, with two individuals (one from the scientific discipline of chemistry and another from neuroscience) living in developed countries in western Europe. Participants ranged from researchers primarily focused on experimenting to professors combining research with teaching responsibilities to professionals providing services or guidance to researchers (e.g., a data manager for a lab). The majority were mid-career, with a few early- and late-career. Most were affiliated with universities; however, some were government-affiliated or worked at private, for-profit enterprises.

We conducted focus groups via zoom video-conferencing software between April and August of 2021. After introductions and collecting basic demographic information (e.g., education, work experience, research interests, etc.), we asked participants questions about their data, past and present research projects, their data management, DMPs, what aspects of data management they would like help with, whether they think libraries can help, and data sharing. We used these questions to lead into a discussion about the FAIR principles and what features they thought were necessary to include in data repository systems and services to help them implement the data sharing and preservation parts of their DMPs. Specifically, we asked participants about their expectations for: file size acceptance, licensing, embargo periods, data discoverability, and reuse. Each focus group lasted approximately an hour and a half. We gave participants $50 electronic amazon gift cards as incentives for their participation. The Indiana University Human Subjects Office approved this study (IRB Study #1907150522). Informed consent was obtained from all participants.

We transcribed the focus group recordings and analyzed the transcripts in MAXQDA, qualitative data analysis software. Afterwards, we followed the steps outlined in the literature on the analysis of focus groups data 33 : (1) familiarization, (2) generating codes, (3) constructing themes, (4) revising and (5) defining themes, and (6) producing the report, to apply thematic analysis to our dataset, which is publicly available in figshare 34 .

Limitations

While using focus groups as a data collection methodology had many benefits for us, including the distinct advantage of unscripted interactions between participants, the ability to ask follow-up questions in the moment, and ask open-ended questions to elicit an in-depth understanding of the complex and individualized topic of scientists’ data management practices 35 , 36 , 37 , it also had some disadvantages. Our overall sample size was small, which may have diminished the generalizability and repeatability of our results 36 . However, it is important to note that the sizes of our focus groups are consistent with the sizes of focus groups that were conducted in prior research on similar topics 13 , 38 . Despite the limitations of our method, we argue that the benefits of the knowledge gained from our scientific inquiry outweigh any potential drawbacks.

Data availability

The dataset generated and analyzed during the current study is available in figshare, https://doi.org/10.6084/m9.figshare.19493060.v1 34 .

Code availability

No custom code was used to generate or process the data described in this manuscript.

Curty, R. G., Crowston, K., Specht, A., Grant, B. W. & Dalton, E. D. Attitudes and norms affecting scientists’ data reuse. PLOS ONE 12 , e0189288 (2017).

Article   Google Scholar  

Vuong, Q. H. Author’s corner: open data, open review and open dialogue in making social sciences plausible. Scientific Data Updates http://blogs.nature.com/scientificdata/2017/12/12/author’s-corner-open-data-open-review-and-open-dialogue-in-making-social-sciences-plausible/ (2017).

Duke, C. S. & Porter, J. H. The ethics of data sharing and reuse in biology. BioScience 63 , 483–489 (2013).

Perrino, T. et al . Advancing science through collaborative data sharing and synthesis. Perspect Psychol Sci 8 , 433–444 (2013).

Pisani, E. et al . Beyond open data: realising the health benefits of sharing data. BMJ 355 , i5295 (2016).

Vuong, Q. H. The (ir)rational consideration of the cost of science in transition economies. Nat Hum Behav 2 , 5–5 (2018).

Ukwoma, S. C. & Dike, V. W. Academics’ attitudes toward the utilization of institutional repositories in Nigerian universities. portal 17 , 17–32 (2017).

Bagdasarian, N., Cross, G. B. & Fisher, D. Rapid publications risk the integrity of science in the era of COVID-19. BMC Med 18 , 192 (2020).

Article   CAS   Google Scholar  

Vuong, Q. H. et al . Covid-19 vaccines production and societal immunization under the serendipity-mindsponge-3D knowledge management theory and conceptual framework. Humanit Soc Sci Commun 9 , 22 (2022).

Bezuidenhout, L. To share or not to share: incentivizing data sharing in life science communities. Developing World Bioeth 19 , 18–24 (2019).

Borgman, C. L. Big Data, Little Data, No Data: Scholarship in the Networked World . (The MIT Press, 2016).

Akers, K. G. & Doty, J. Disciplinary differences in faculty research data management practices and perspectives. IJDC 8 , 5–26 (2013).

Cragin, M. H., Palmer, C. L., Carlson, J. R. & Witt, M. Data sharing, small science and institutional repositories. Phil. Trans. R. Soc. A. 368 , 4023–4038 (2010).

Article   ADS   CAS   Google Scholar  

Pryor, G. Attitudes and aspirations in a diverse world: the Project StORe perspective on scientific repositories. IJDC 2 , 135–144 (2008).

Weller, T. & Monroe-Gulick, A. Understanding methodological and disciplinary differences in the data practices of academic researchers. Library Hi Tech 32 , 467–482 (2014).

Borgman, C. L., Wallis, J. C. & Enyedy, N. Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries. Int J Digit Libr 7 , 17–30 (2007).

Cragin, M. H. & Shankar, K. Scientific data collections and distributed collective practice. Comput Supported Coop Work 15 , 185–204 (2006).

Polydoratou, P. Use and linkage of source and output repositories and the expectations of the chemistry research community about their use. in Digital Libraries: Achievements , Challenges and Opportunities (eds. Sugimoto, S., Hunter, J., Rauber, A. & Morishima, A.) 4312 429–438 (Springer Berlin Heidelberg, 2006).

Edwards, P. N. A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming . (The MIT Press, 2013).

Wilkinson, M. D. et al . The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3 , 160018 (2016).

Gil, Y. et al . Toward the geoscience paper of the future: best practices for documenting and sharing research from data to software to provenance. Earth and Space Science 3 , 388–415 (2016).

Article   ADS   Google Scholar  

Tenopir, C. et al . Data sharing by scientists: practices and perceptions. PLOS ONE 6 , e21101 (2011).

Tenopir, C. et al . Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLOS ONE 10 , e0134826 (2015).

Waide, R. B., Brunt, J. W. & Servilla, M. S. Demystifying the landscape of ecological data repositories in the United States. BioScience 67 , 1044–1051 (2017).

Whitmire, A. L., Boock, M. & Sutton, S. C. Variability in academic research data management practices: Implications for data services development from a faculty survey. Program: Electronic Library and Information Systems 49 , 382–407 (2015).

Hampton, S. E. et al . Big data and the future of ecology. Frontiers in Ecology and the Environment 11 , 156–162 (2013).

Tenopir, C., Christian, L., Allard, S. & Borycz, J. Research data sharing: practices and attitudes of geophysicists. Earth and Space Science 5 , 891–902 (2018).

De Biagi, L. D., Saccone, M., Trufelli, L. & Puccinelli, R. Research product repositories: strategies for data and metadata quality control. Grey Journal (TGJ) 8 , 83–94 (2012).

Google Scholar  

Schriml, L. M. et al . COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci Data 7 , 188 (2020).

Gold, A. Cyberinfrastructure, data, and libraries, part 2: libraries and the data challenge: roles and actions for libraries. D-Lib Magazine 13 (2007).

MacMillan, D. Data sharing and discovery: what librarians need to know. The Journal of Academic Librarianship 40 , 541–549 (2014).

Borgman, C. L. The conundrum of sharing research data. JASIST 63 , 1059–1078 (2012).

Braun, V., Clarke, V., Hayfield, N. & Terry, G. Thematic analysis. in Handbook of Research Methods in Health Social Sciences , https://doi.org/10.1007/978-981-10-5251-4_103 (Springer Singapore, 2019).

Donaldson, D. R. Focus groups on data sharing and research data management with scientists from five disciplines. figshare https://doi.org/10.6084/m9.figshare.19493060.v1 (2022).

Krueger, R. A. & Casey, M. A. Focus Groups: A Practical Guide for Applied Research . (SAGE Publications, 2015).

Then, K. L., Rankin, J. A. & Ali, E. Focus group research: what is it and how can it be used? Can J Cardiovasc Nurs 24 , 16–22 (2014).

PubMed   Google Scholar  

Wallace, R., Goodyear-Grant, E. & Bittner, A. Harnessing technologies in focus group research. Can J Pol Sci 54 , 335–355 (2021).

Kim, Y. & Stanton, J. Institutional and individual influences on scientists’ data sharing practices. JOCSE 3 , 47–56 (2012).

Download references

Acknowledgements

This material is based on work supported by the Institute of Museum and Library Services under grant number RE-37-19-0082-19.

Author information

Authors and affiliations.

Department of Information and Library Science, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, US

Devan Ray Donaldson & Joshua Wolfgang Koepke

You can also search for this author in PubMed   Google Scholar

Contributions

The authors confirm contribution to the paper as follows: study conception and design - D.R.D. data collection - D.R.D. and J.K., analysis and interpretation of results - D.R.D. and J.K. and draft manuscript preparation - D.R.D. and J.K.. Both authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Devan Ray Donaldson .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary table 1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Donaldson, D.R., Koepke, J.W. A focus groups study on data sharing and research data management. Sci Data 9 , 345 (2022). https://doi.org/10.1038/s41597-022-01428-w

Download citation

Received : 16 December 2021

Accepted : 23 May 2022

Published : 17 June 2022

DOI : https://doi.org/10.1038/s41597-022-01428-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research topics in data management

Data Management

Google is deeply engaged in Data Management research across a variety of topics with deep connections to Google products. We are building intelligent systems to discover, annotate, and explore structured data from the Web, and to surface them creatively through Google products, such as Search (e.g., structured snippets , Docs, and many others). The overarching goal is to create a plethora of structured data on the Web that maximally help Google users consume, interact and explore information. Through those projects, we study various cutting-edge data management research issues including information extraction and integration, large scale data analysis, effective data exploration, etc., using a variety of techniques, such as information retrieval, data mining and machine learning.

A major research effort involves the management of structured data within the enterprise. The goal is to discover, index, monitor, and organize this type of data in order to make it easier to access high-quality datasets. This type of data carries different, and often richer, semantics than structured data on the Web, which in turn raises new opportunities and technical challenges in their management.

Furthermore, Data Management research across Google allows us to build technologies that power Google's largest businesses through scalable, reliable, fast, and general-purpose infrastructure for large-scale data processing as a service. Some examples of such technologies include F1 , the database serving our ads infrastructure; Mesa , a petabyte-scale analytic data warehousing system; and Dremel , for petabyte-scale data processing with interactive response times. Dremel is available for external customers to use as part of Google Cloud’s BigQuery .

Recent Publications

Some of our teams.

System performance

We're always looking for more talented, passionate people.

Careers

University Libraries

  • University Libraries
  • Research Guides
  • Subject Guides

Digital Data Management, Curation, and Archiving

  • Data Management Topics
  • Research Data Support & Services
  • Public Access Plans
  • Data Sharing Policies
  • NSF DMP Guidelines
  • Other Funder Guidelines
  • UNM's Institutional Repository
  • Domain and General Purpose Repositories

Types of Data

There are a wide array of data types. Data can be described as experimental, observational. Data can also be in the tabular, or in the form of images, video or sound. Data can also be derived from other data, or the result of simulation. Its important to think broadly about your definition of data as well. Some projects may not create data in a traditional sense, but rather result in design schematics for instance, 3D Models or just a series of images.

When describing your data, try to be as descriptive as possible to help others in interpreting it. You might also briefly describe the source or origins of your data, the amount of data in bytes or even the number of files.

Directory and File Naming Conventions

How  you name your directories, files and variables them can have a huge impact on making your data useable and understandable to others. Also the structure of your directory system is important, too. When planning your file structure and naming conventions there are a number of things to consider. Imagine someone reading your directory and files as a sentence. In the examples below, file paths are in bold using a slash (/) to separate directory and file names.

  • Use human readable names: a23/seq is not good, Guppy/Sequences is better.
  • Use the project name for the top directory of each project
  • By experimental run
  • By types of data (ie. images, analysis, sequences
  • variableOne
  • September_2012
  • Plan your directory structure so that it reflects the structure of your data.

Prior to embarking n a research project, our team of Research Data Librarians will be happy to work with you to identify the best data format solutions for your work. We can work with you to find the current standards for data in your field, and create data management plans that both conform to these standards and ensure long-term access to your data.

If you already have data, we can work with you to determine how you want your data to be used, who you wish to have access to your data, and prepare your data for long-term storage and access.

Some general information is provided below, but please contact us for current information about established best practices regarding file types and recommended preservation formats. For the especially curious, the Library of Congress provides an excellent guide to the Sustainability of Digital Formats to which your librarian may refer in the course of a consultaiton.

Text and Textual Data

While its fine to use MS Word MS Excel, or other proprietary software for your day to day creation and handling of your data, sustainable, long term sharing and archiving are often better supported through tranformation to established, open formats including text, comma separated value, and PDF. As human readable, structured text formats, text (.txt) and comma separated value tabular files (.csv) are almost always readable by other software and are most easy to archive and curate. Almost all word processing and spread sheet software can import text files. If you have any questions about your particular case, please contact us, we'll be happy to help.

Image, Video and Audio Data

Curation of image, video and audio data poses unique challenges. While open formats exist, standard practices in many fields favor the use of proprietary standards or  use image standards with imbedded data that would be lost if converted to other formats. While one recommended practice is to save data in both in the source application format as well as an open format if possible, it is particulaly important to use lossless rather than lossy compression. It is also very important to document how the images or audio files were created, although the level of detail will vary greatly between image capturing devices. For instance, images of paintings might want to include lighting conditions, distance from the plane, exposure time, et cetera. For other devices these parameters might not make any sense at all. There are often existing discipline specific standards. If you are unsure of existing standards, please ask us. We'll research it for you.

Other Data Formats

It is impossible to predict that variety of data formats that are used or will be used. If you are unsure how best to manage or archive your data, please don't hesitate to contact us. We will research it for you and develop a management and curation plan custom made for your data.

Private and Sensitive Data

Sometimes data will contain private information about people, information deemed to be secret by the government or references to ecologically, culturally or otherwise senstive places. You probably already know if you data does. If you have an IRB approval, include that here.

If you do have any privacy or sensitive data issues, describe how you secure your data. For instance, using password protected and encrypted hard drives. Also describe how you will anonomyze, obscure or remove this information when you make your data public.

Documentation

Computer programmers know the importance of well documented code. Experienced programmers know that if they don't document as they write, their documentation will suffer. Similarly, document your data as it is created and document it so that someone in your field, but unfamiliar with  your data would be able to understand your data. Programmers who have experience with large projects know how important it is to keep all the components of a program organized in a clear, well defined file structure.

Tabular data can often be documented internally in the comments field. It would also be good to place a text (.txt) or read me file in the same directory as your data that is a "data dictionary" for each of the fields in your tabular data. The data dictionary would describe the data in each field, and also describe any transformations or external dependencies associated with the data.

if your data is complex, with many different files and file types, organize them in a well structured hierarchical directory system. Each directory should have a README.txt file that describes the files and data located inside. If your field has established metadata schemas for data, use that.

If you are unsure about the existence of established metadata in your field, just contact us, well find out for you. We can also assist you in creating your metadata document(s).

Data Acquisition, Integrity, and Quality

Acquiring your data and maintaining its quality and integrity is probably one of your main concerns. How you do that will depend on the hardware and technology at your disposal.  Most of us have learned these lessons the hard way, but for those lucky few who haven't, here is what can happen. When considering how to store and maintain your data you should consider the follow issues.

Potential Problems

Your data can be corrupted any number of ways. Probably the most common reason is user error. For instance, sorting an excel file wrong then saving it can result in useless data. Hardware failures can result in unreadable disks. If your data is accessible over a network, your data can be maliciously corrupted. Sometimes, the corruption can be as simple as wishing  you hadn't made some changes to a document, but realizing it too late and having saved over the version you preferred.

Many errors are introduced during the acquisition of data as well. Hand entered data can be highly error prone, as well as subject to "creative input" when there is an interest in the outcome. Where ever possible, use automated data entry methods. If you must hand enter data, consider having a second or third person inspect the data. Also, creating web based data entry forms can allow you to validate the data as its entered. If you need help creating these forms, feel free to contact us.

Incompatibility

If you are collaborating on a project, accidental corruption of data and other documents is very easy. It is all too easy when passing documents over email, or storing multiple versions of documents on different machines to edit an obsolete document or to end up with documents that are difficult if not impossible to merge.

Sometimes when working with others, you introduce incompatibility by your choice of file types. For instance, using the most recent version of Xcel when others only have an older version, making it unreadable to them. Also, using rarely used and obscure file types can limit accessibility. Try only use widely supported and ideally open formats when sharing data.

Data loss, like corruption can happen any number of ways and like corruption, is probably most often user error. Files are accidentally deleted, or data from a database is accidentally removed. Data can also be lost due to faulty hardware or maliciousness as well.

Many studies gather or use data that contains private information about people, references to ecologically or culturally sensitive places, has economic interests such as patents etc. Just as you wouldn't want someone accessing your personal computer to get information such as credit card numbers or social security numbers, you may not want your data to be exposed to others.

To avoid the problems described above, you should consider the strategies below, redundancy, backup, access and versioning. In fact these strategies are so basic, you should use them for all your digital assets, not just those associated with a particular project.

Redundancy is the first layer of protection. For instance, using RAID (Redundant Array of Identical Disks) on your computer can help protect you from hard drive failures. RAID mirrors your data on one or more additional disks, so that if one disk fails, you still have the data on at least one other. If a disk fails you just replace it and the software rebuilds your array. However, this does not prevent corruption or loss of data by user error as such changes are immediately mirrored on all the disks. RAID is one example of redundancy, depending on your project and infrastructure there may be other places to implement redundant hardware.

Most people are familiar with backing up your data. This can take many forms, often writing changes to a second hard drive, USB drive or optical disk. The gold standard for backup, however, is backing up your data automatically, at regular intervals, to a different location than your primary storage with incremental backup.

By using a different location you minimize the probability of both storage locations being destroyed by a single disaster such as a flood. Automation ensures the backup happens even should you forget to do so. Incremental backup, keeps snapshots of your drive at different times, so that you can roll back in time to the last known valid time. Otherwise its possible to discover problems only after your drive was backed up and the errors written to the backup storage.

Access Control

If your computer is not password protected then it should be. Ideally, all the files to your project should be stored on a central computer with controlled access over a network. Even if your data can be viewable to the public, the ability to add, modify or delete data should be limited.

Passwords should be complex and safe. If you have many passwords to remember, consider using a password keeping program such as KeePass to store and create passwords. Then you only need remember one password, the one for your password keeper. Remember, most security breaches are the result of people using easy passwords or giving away their password, not due to brute force hacking.

If keeping  your data private is truly a concern, also consider encrypting your data to add another layer of security (Redundancy).

Version Control

We have probably all wrestled with the issue of keeping track of the most recent version of a document, while still keeping a record of previous documents. If you are working on a collaborative team, this problem becomes even more of an issue. Often multiple copies of a document are passed around, edited and passed around and eventually which document is the "correct" one becomes impossible to know. There are number of strategies to alleviate this problem.

You could enforce naming conventions. For instance naming documents with the date or data and time if needed. This works ok with one or two people working on a document, but it requires you remember to do that. However, you can end up with a large number of files, and it doesn't prevent two people from working on a file at the same time.

Software developers are probably familiar with versioning software such as GIT, SVN and Subversion. These tools allow you to keep track of versions of files. They also allow  you to create branches of a project then merge them, documenting who made what changes and when. They also function as an additional backup. Although they were developed for computer code, they work with any kind of document. If you wish to learn more Github.com is an excellent service that offers great documentation and is either free or inexpensive depending on your level of needs.

Again, if you have any questions or concerns about this, please feel free to contact us.

Intellectual Property Rights Management

It is becoming increasingly recognized that data is most powerful, when it is made available for others to use.Funding agencies are starting to require that researchers make your data available to others.  That doesn't mean you don't have rights regarding your data. As long as you are within these requirements, you are free to decide how people use your data. Some important considerations to make are:

  •     Can people make commercial use of your data?
  •     Can people use your data with or without citing you?
  •     Would you like an embargo period on your data?

The legal aspects of copyrighting data are not clear cut. Most people consider numerical data, or tabular data to not be under copyright. However, charts, figures, images, videos, and other forms of data can be. In either case, you may attach a license to your data. Licensing your data not only may protect you legally, but perhaps more importantly, instruct others how you wish your data to be used. There are a number of online resources for licensing listed below. Additionally, many products of research are patentable. If you intend to seek a patent, you should describe how this will affect use of your data.

Creative Commons provides a licensing tool designed to help you select the correct license for your needs. It is used by Flickr for instance and well known. It is the most flexible.

The Open Data Commons provides tools for making data open, and may be more applicable to databases and tabular data.

There are a number of licensing schemes for software. This would also include scripts for statistical packages.

The  GNU General Public License  is perhaps the most well known. This license is a bit restrictive in that it is "Copy Left" requiring anyone using your software to adhere to the same GNU License for derivatives.

The Apache License Version 2.0 is sometimes considered a better, more modern option, giving others more latitude in how your software is used.

The BSD Open Source License is an additionaly popular alternative favored because of its simplicity.

The important issue here is not only to protect your legal rights with your data, but also to provide guidance to others who wish to use your data.

  • << Previous: Other Funder Guidelines
  • Next: Sharing & Publication >>
  • Last Updated: Aug 16, 2023 9:28 AM
  • URL: https://libguides.unm.edu/data

Research data management

It’s an increasingly common condition of funding that the data associated with your work should be made available, accessible, discoverable, and usable.

Our series of data management modules contain all the information you need to help you comply with these requirements. You will also discover how sharing research data can help with reproducibility issues and provide your peers with existing work to build on.

The modules explain the solutions already in place to support you in sharing, and explore how you can cite data and why that can benefit your career. You will also be introduced to data management plans and receive advice on building an effective one.

What you will learn

  • Why data sharing is so important
  • An introduction to data management plans
  • Information about the sharing solutions on offer

Modules in Research data management

How researchers store, share and use data

Data Repositories to store your data

Make your data findable- It's Not FAIR! Improving Data Publishing Practices in Research

Make your data findable- It's Not FAIR! Improving Data Publishing Practices in Research

Make your data accessible -It's Not FAIR! Improving Data Publishing Practices in Research

Make your data accessible -It's Not FAIR! Improving Data Publishing Practices in Research

Make your data interoperable - It's Not FAIR! Improving Data Publishing Practices in Research

Make your data interoperable - It's Not FAIR! Improving Data Publishing Practices in Research

Make your data reusable - It's Not FAIR! Improving Data Publishing Practices in Research

Make your data reusable - It's Not FAIR! Improving Data Publishing Practices in Research

Evidence-based research illustration

How to conduct evidence-based research

What is data?

How to manage and publish your research data

How researchers store, share and use data.

Creating a good research data management plan

Creating a good research data management plan

How researchers benefit from citing data

How researchers benefit from citing data

Teaching Research Data Management with DataLad: A Multi-year, Multi-domain Effort

  • Open access
  • Published: 07 May 2024

Cite this article

You have full access to this open access article

research topics in data management

  • Michał Szczepanik 1   na1 ,
  • Adina S. Wagner 1   na1 ,
  • Stephan Heunis 1 ,
  • Laura K. Waite 1 ,
  • Simon B. Eickhoff 1 , 2 &
  • Michael Hanke 1 , 2  

200 Accesses

Explore all metrics

Research data management has become an indispensable skill in modern neuroscience. Researchers can benefit from following good practices as well as from having proficiency in using particular software solutions. But as these domain-agnostic skills are commonly not included in domain-specific graduate education, community efforts increasingly provide early career scientists with opportunities for organised training and materials for self-study. Investing effort in user documentation and interacting with the user base can, in turn, help developers improve quality of their software. In this work, we detail and evaluate our multi-modal teaching approach to research data management in the DataLad ecosystem, both in general and with concrete software use. Spanning an online and printed handbook, a modular course suitable for in-person and virtual teaching, and a flexible collection of research data management tips in a knowledge base, our free and open source collection of training material has made research data management and software training available to various different stakeholders over the past five years.

Similar content being viewed by others

research topics in data management

Learnings from developing an applied data science curricula for undergraduate and graduate students

Nstudy: a system for researching information problem solving.

research topics in data management

The Design and Effects of Educational Data Science Workshops for Early Career Researchers

Avoid common mistakes on your manuscript.

Introduction

While experts in their respective domains and methodologies, scientists may not have domain-agnostic technical skills which are useful for efficient research data management (RDM). Managing the life cycle of digital objects which constitute research data requires a broad set of technical skills, however, research curricula seldom teach computing ecosystem literacy (Grisham et al., 2016 ). In fact, even computer science curricula often miss critical topics about the computing ecosystem. At the Massachusetts Institute of Technology (MIT), USA, this lack famously resulted in the internationally popular, self-organized class, “The missing semester of your CS education” Footnote 1 . In addition, the high usability of modern computers’ and applications’ front ends spares users the need to develop the same level of familiarity with their computers that previous generations of computer users had (Mehlenbacher, 2003 ). Yet, making research data and results findable, accessible, interoperable, and reusable (FAIR, Wilkinson et al.,  2016 ) can benefit, among others, from efficient use of various research software tools (Wiener et al., 2016 ). This makes general technical skill and RDM training a crucial element in preparing the next generation of neuroscientists.

In an ongoing multi-modal, multi-year effort, we combined various interconnected activities into a comprehensive RDM training centered around the software tool DataLad ( datalad.org ; Halchenko et al.,  2021 ). These activities spanned a community-led online RDM handbook with a printed paperback option and knowledge base, a matching online RDM course, and various workshops. In this reflective piece, we evaluate this teaching ecosystem, review its advantages and shortcomings, and share lessons learned over its 5-year long history.

DataLad is a Python-based, MIT-licensed software tool for the joint management of code, data, and their relationship. It builds up on git-annex, a versatile system for data logistics (Hess, 2010 ), and Git, the industry standard for distributed version control. To address the technical challenges of data management, data sharing, and digital provenance collection, it adapts principles of open-source software development and distribution to scientific workflows. Like Git and git-annex, DataLad’s primary interface is the command line. This makes familiarity with the computer terminal, common file system operations, and general knowledge about one’s operating system beneficial for software users.

Overarching Goals for Training Materials

Our training material aims to provide even technical novices with the opportunity to use the software quickly, productively, and to easily integrate with other tools and services in real-world research. In part, it was motivated by concrete user needs, such as early career researchers in a research consortium. Beyond this, we aimed for the training material to be fully open source, accessible (both regarding language and technical requirements), flexible, multi-modal (everyone should find something that fits their learning needs), directly applicable to various research contexts, and maintainable.

A DataLad Research Data Management Handbook

Since the first release (0.0.1, March 2015), DataLad had technical documentation with a design overview and reference documentation. Although any amount of documentation is better than no documentation at all, existing documentation can still be insufficient if it does not meet the needs of the target audience. Solely technical or reference documentation, for example, can be suboptimal for novices: it may be incomplete, narrowly focused on individual commands, or assume existing knowledge readers lack (Segal, 2007 ; Pawlik et al., 2015 ), and can thereby discourage potential users or inhibit the adoption of a tool. Even though technical documentation is useful for developers, a central target audience for documentation of the DataLad ecosystem are scientists. A considerable part of this target audience can thus be considered technical novices for whom technical documentation is not ideal. Research also suggests that scientists need documentation to go beyond reference manuals. In an analysis of user questions in support forums of scientific software packages, Swarts ( 2019 ) found that the focus in 80% of inquiries was on operations and tasks, such as a required sequence of operations to achieve a specific goal, instead of reference lists. In breaking down user questions by purpose, Swarts ( 2019 ) further found that users were most interested in a description of operations or tasks, followed by insights about the reasons behind the action. Separating documentation types into feature-based (closer related to the concept of reference documentation) or task-based , Swarts ( 2019 ) reports twice as many questions seeking explanations in software with feature-based compared to task-based documentation. This hints at a disconnect between knowing how something should be done and why it should be done this way. Overall, this highlights that users of scientific software show a clear need beyond the documentation of individual commands, but seek to understand general usage principles and master complex combinations of features to achieve specified goals. This type of empowerment is what the DataLad Handbook project aimed to achieve by complementing DataLad’s existing technical documentation.

Design Considerations

We identified three types of stakeholders with different needs: researchers, planners and trainers. Researchers need accessible educational content to understand and use the tool; planners , such as principal investigators or funders, need high-level, non-technical information in order to make informed yet efficient decisions on whether the tool fulfills their needs; and trainers need reliable, open access teaching material. Based on this assessment, the following goals for the Handbook’s contents were set:

Applicability for a broad audience : The Handbook should showcase domain-agnostic, real-world RDM applications.

Practical experience : The Handbook should enable a code-along style usage, with examples presented in code that users can copy, paste, and run on their own computer. To allow a read-only style usage, too, the Handbook should also reveal what a given code execution’s output would look like. For an optimal code-along or read-only experience, the code output should match the current software behavior.

Suitable for technical novices : The Handbook’s language should be accessible. Gradually, by explaining technical jargon and relevant tools or concepts in passing, it should provide readers with a broad set of relevant RDM skills rather than requiring prior knowledge.

Low barrier to entry: The Handbook’s contents should be organized in short, topical units to provide the possibility to re-read or mix and match.

Integrative workflows : The Handbook’s contents should build up on each other and link back to content already introduced to teach how different software features interact.

Empowering independent users : Instead of showcasing successful code only, it should also explicitly demonstrate common errors to enable users to troubleshoot problems in their own use cases independently.

The following structure arose from this specification analysis (Wagner et al., 2020 ):

The first part of the Handbook, covering high-level descriptions of the software and its features and detailed installation instructions for all operating systems.

The second part of the Handbook, written in the form of a continuous, code-along tutorial, set in a domain-agnostic fictional storyline about an RDM application, and covering all stable software features in chapters that build up on one another.

The third part of the Handbook covering features beyond the basics in stand-alone chapters, added prior to the second release.

The last part of the Handbook, containing short, standalone start-to-end descriptions of real-world use cases, with concise step-by-step instructions, and references to further reading in the Basics part.

Finally, the design and content requirements were accompanied by technical goals: from using expandable details to keep visible "core" text short and making the Handbook available in multiple formats, to developing the Handbook alongside the versioned software and using integration tests to ensure functioning of included code examples. The resulting implementation of the Handbook fulfilled these requirements as follows.

The Technical Backbone

The development environment of the Handbook was chosen with the intent to support declared goals, and to maximize configurability, autonomy, and reusability of the project. It builds up entirely on flexible and extendable open source infrastructure: on the highest level, it uses Sphinx as a documentation generator ( sphinx-doc.org ). Sphinx transforms documents written in reStructuredText, a lightweight markup language, to a variety of output formats, among them HTML, PDF, LaTeX, or EPUB. Initially a by-product of the Python documentation, it has been adopted by the Open Source community at large; GitHub’s dependency graph reports that it is used by more than 300.000 projects in January 2024 Footnote 2 .

Sphinx supports an extension mechanism with which additional functionality can be integrated. Leveraging this mechanism, the Handbook project extended standard Sphinx features with custom admonitions and designs, for example toggle-able boxes for optional details. This is implemented as a Python package alongside the Handbook source code, making the Handbook project a reusable and installable Sphinx extension. Figure 1 provides an overview of the custom-developed design features. A major functional enhancement is provided with a separate Python package, autorunrecord , an additional custom-made Sphinx extension that allows sequential execution of code in a specified environment, and embedding a record of the code and its output as code snippets into the documentation Footnote 3 . Instructors can further use it to automatically create scripts from selected code blocks which can then be demonstrated in a remote-controlled terminal in live-coding tutorials.

figure 1

Custom admonitions and code blocks used in the Handbook. In each pair of admonitions, the top image corresponds to the web version, and the bottom image corresponds to its PDF rendering. Windows-wits (green), toggle-able in the HTML version, contain information that is only relevant for the Windows operating system (DataLad supports GNU/Linux, MacOS, and Windows, but the latter is fundamentally different compared to the other two, sometimes leading to different behaviour or necessitating workarounds when using DataLad). Find-out-more admonitions (orange), also toggle-able in the HTML version, contain miscellaneous extra information for curious readers. Git user notes (blue) are colored boxes with references to the underlying tools of DataLad, intended for advanced Git users as a comparison or technical explanation. Code blocks show one or more commands and the resulting output, provided using the autorunrecord Sphinx extension. In the web version, a copy-button (top right corner) allows to copy relevant commands automatically to the clipboard. Internal annotations allow generating custom scripts from any sequence of code-blocks for live coding demonstrations

Hosting for the project is provided by Read the Docs ( readthedocs.org ), a full-featured software documentation deployment platform that integrates with Sphinx. Notably, it supports semantic versioning of documentation, which helps to ensure that users of a past software version can find the corresponding version of the conjointly developed Handbook. Illustrations in the Handbook are based on the undraw project by Katerina Limpitsouni ( undraw.co ).

The ability of the documentation to sequentially execute code and record its outcomes allows using the Handbook as an integration test for the DataLad software in addition to a user guide. If new software developments in the DataLad core packages break documented workflows, a continuous integration test suite will fail, alerting developers to the fact that their changes break user workflows.

To ensure reusability, such as the adaptation by Brooks et al. ( 2021 ), the project is released under a CC-BY-SA 4.0 license. Under its terms, all elements can be reused in original or derived form for all purposes under the condition that the original project is attributed and that derivative work is shared under an identical (“not more restrictive”) license Footnote 4 .

As of January 2024, the web and PDF versions of the Handbook were organized into four parts – “Introduction”, “Basics”, “Advanced”, and “Use cases” – which comprised a total of 21 chapters. The “Introduction” part has two different target audiences: first, it provides researchers with detailed installation instructions, a basic general command line tutorial, and an overview of the Handbook. Beyond this, it gives a high-level overview of the software and its capabilities to planners .

The “Basics” part is organized into nine chapters. Following a narrative about a fictional college course on RDM, it teaches different aspects of DataLad functionality and general RDM to researchers in each topical chapter. Broadly, those topics can be summarized as follows: 1) Local version control, 2) Capturing and re-executing process provenance, 3) Data integrity, 4) Collaboration and distributed version control, 5) Configuration, 6) Reproducible data analysis, 7) Computationally reproducible data analysis, 8) Data publication, and 9) Error management.

The “Advanced” part includes independent chapters on advanced DataLad features and workflows, big data projects, DataLad use on computational clusters, DataLad’s internals, and selected DataLad extensions. The latter two parts are accompanied with code demonstrations, slides, executable notebooks, and/or video tutorials that trainers can reuse freely to teach tool use and improve scientific practice. The last part, “Use cases”, targets planners and researchers with short step-by-step instructions which show planners what is possible, and help researchers to connect their knowledge into larger workflows.

Project and Community Management

Ensuring the longevity of software projects beyond the duration of individual researchers’ contracts requires community building (Koehler Leman et al., 2020 ). A user-driven alternative to documentation by software developers, “Documentation Crowdsourcing”, has been successfully employed by the NumPy project (Pawlik et al., 2015 ). The Handbook project extends this concept beyond reference documentation. To achieve this, it is set up to encourage and welcome improvements by external contributors. The project is openly hosted on GitHub. Mirroring processes in larger crowd-sourced documentation projects such as “The Turing Way handbook for reproducible, ethical and collaborative research” (The Turing Way Community, 2022 ), credit is given for both code-based and non-code-based contributions. Contributors are recognized in the source repository, on the DataLad Website, and as co-authors in both the printed version of the Handbook and its Zenodo releases. As of January 2024, a total of 60 contributors provided input in the form of content, bug fixes, or infrastructure improvements.

Paperback Version

A digest of the Handbook was published via the Kindle Direct Publishing (KDP) print-on-demand service to make the Handbook available in a printed paperback version. This fulfilled user demands for physical copies of the documentation, and was possible with minimal additional technical work, building up on the automatically generated LaTeX sources of the Handbook. The printed book’s contents were sub-selected for longevity, graphics or graphical in-text elements were optimized for black-and-white printing, and a dedicated hyperlink index was created.

RDM Online Course

While documentation is the primary way of disseminating information about software, workshops are another often practiced way of software education. As maintainers and contributors of DataLad, we receive invitations to teach such workshops for different audiences, most commonly involving early career researchers. Some such events arise from obligations related to consortium participation (such as the CRC 1451 Footnote 5 , Collaborative Research Center, investigating mechanisms of motor control, where RDM training was organised with course credit for involved doctoral students); others stem from more informal collaborations. To be better prepared for organizing training events, we decided to create a curriculum for a short RDM course centered around DataLad Footnote 6 . Our design approach aligns with the “Ten simple rules for collaborative lesson development” (Devenyi et al., 2018 ), and the course content and format were inspired by the Software Carpentry courses (Wilson, 2016 ).

While the Handbook is meant to be a comprehensive set of documentation covering multiple aspects, the course materials were intended as a more focused overview of the key features of DataLad software; self-contained, but linking to the Handbook for detail or context when needed. They introduce DataLad via interdependent examples which help present both usage and purpose of its basic commands. While the course focuses on DataLad, software-independent information about good practices in research data management is also included. The structure was tuned for presentation during a hands-on workshop (online or in-person) as well as self-study. The intended workshop duration was two half-days. Making it easy for tutors (also those who were not involved in course preparation) to reuse the materials on different occasions was an important goal; as time constraints and target audiences can differ, contents were divided into four core blocks and two optional additions. Finally, the aim was to create an open resource, not just by publishing the materials as a public website, but also by hosting the collaboratively edited sources in a public repository, licensing the content under Creative Commons Attribution License, and reusing other permissively licensed materials.

During the planning phase, we identified a set of data management tasks which should be covered, from dataset creation and local version control, through data publishing in external repositories and collaboration, to reusing datasets published by others and creating an analysis with modular datasets. The major theme for the software-agnostic part about good RDM practices, which came up in an informal poll among our colleagues, was file naming and organisation. Although it may sound trivial at first glance, this includes topics such as rationales for naming schemes, interoperability considerations related to file names (lengths, character sets), avoiding leakage of identifying information through file names, using sidecar files for metadata, clear semantics for separating inputs and outputs, and standard file organization structures, e.g. BIDS (Gorgolewski et al., 2016 ) or research compendium (Gentleman & Temple Lang, 2007 ). In addition to these, we also decided to discuss the distinction between text and binary files, and show examples of how the former can be used to store different kinds of data and metadata in an interoperable fashion (tabular files, serialization formats, lightweight markup).

The course website with the full course material is created based on The Carpentries Footnote 7 lesson template Footnote 8 . Website content is written in Markdown, and the website is built with the Ruby-based static site generator Jekyll (note, however, that The Carpentries recently redesigned their tooling to use R’s publishing ecosystem instead Footnote 9 ). Course material is split into sections, each starting with an overview (questions, objectives, time estimate), and ending with a summary of key points. The content is presented using a combination of text paragraphs and template-defined boxes with code samples, expected output, call-outs, challenges, and more.

During courses, we use Jupyter Hub to provide a unified, pre-configured software environment for participants, accessible through a web browser. While Jupyter Hub is mainly associated with notebooks, we mostly use its terminal feature, effectively providing participants with browser-based access to a terminal running on a remote machine. To simplify deployment, we use The Littlest JupyterHub Footnote 10 (TLJH) distribution to set up the hub for all users on a single machine. We have used Amazon Web Services to provision virtual machines, but other cloud computing providers or local infrastructure can be used to the same effect. Setup instructions, expanded from TLJH’s documentation, were included in the course website, in the “For instructors” section.

Following the design considerations, we organized the course in the following modules:

Content tracking with DataLad : learning the basics of version control, working locally to create a dataset, and practicing basic DataLad commands.

Structuring data : listing good practices in data organization, distinguishing between text and binary data, and exploring lightweight text files and how they can be useful.

Remote collaboration : exercising data publication and consumption, and demonstrating the dissociation between file content and its availability record.

Dataset management : demonstrating dataset nesting (subdatasets), investigating structure and content of a published dataset, and creating a simple model of a nested dataset.

(optional) The basics of branching : understanding Git’s concept of a branch, creating new branches in a local dataset and switching between them, and mastering the basics of a contribution workflow.

(optional) Removing datasets and files : learning how to remove dataset content, and removing unwanted datasets.

Additionally, the course website contains a short glossary, setup instructions for users (if using their own computers), slides, and instructor notes about the technical setup.

Knowledge Base and Online Office Hours

The educational resources were designed to be broadly applicable and domain-agnostic, but could not necessarily cover arbitrary use cases. Practical application of DataLad in RDM scenarios involves developing solutions to complex problems. No individual solution will always be one-size-fits-all, and no documentation can ever be comprehensive for everyone without becoming overwhelming for some. Likewise, while useful for discovering and learning usage patterns, most resources were of limited utility for trouble-shooting software issues. To this end, we offer a weekly online office hour to provide flexible assistance, and we invested resources into creating a knowledge base Footnote 11 . Office hours are a one-hour open video call during which (prospective) users can join flexibly and without prior notice, and ask questions or discuss use cases, often live-demoing relevant information via screen-sharing. The knowledge base, on the other hand, is a collection of documents, each document focusing on a particular topic (application, problem, solution) and considered standalone with respect to other documents. The nature of these documents resembles technical reports, the creation of which has a long-standing tradition in science and engineering (Pinelli et al., 1982 ; Brearley, 1973 ). When explorations in an office hour uncover technical limitations requiring workarounds or interesting use cases that are too peculiar to be prominently documented in the Handbook or RDM course, they typically inspire an entry in the knowledge base.

A knowledge base provides resources to anyone seeking particular solutions. It can also be used to accumulate the outcomes of investigations of technical issues as they occur when supporting users, thereby yielding persistent resources that streamline future support efforts, and increasing the efficacy of resources invested in support (turning incoming feedback into knowledge). An analysis of the content written on-demand, or its access frequency can also be used to inform prioritization of development efforts to improve technical implementations and/or documentation elsewhere.

Suitable topics for a knowledge base item (KBI) include: an answer to a frequently asked question (be that from office hours, issue trackers, or community forums); tips and strategies for a particular use case; a description of a technical limitation and possible workaround. Each KBI needs to have: a descriptive title; metadata, such as keywords, to aid discovery; and a persistent URL to share it.

The technical framework for the knowledge base is a simplified version of that used for the Handbook. In summary, KBIs are plain-text documents with reStructuredText markup. All KBI files are kept in a Git repository. A rendered knowledge base in HTML format is created with the Sphinx tool. A knowledge base Git repository is managed with the aid of a Git hosting solution, such as GitHub/GitLab. Respective continuous integration and website publishing tools are used to publish the knowledge base. Coordination for the office hour is done using a public matrix Footnote 12 chatroom, in which questions can also be asked asynchronously.

As of January 2024, the knowledge base contains 29 KBIs of varying length, describing various use cases. For example, the first KBI that we created describes a situation in which DataLad (or Git) users working on shared infrastructure can trigger Git’s safety mechanism, added to Git versions released after March 2022, which causes certain operations to end with an error message displayed to the user. The knowledge base format allowed us to explain in detail not only the configuration options that need to be set in order to perform the operation, but also the broader rationale for the safety mechanism being present in Git in the first place (quoting, e.g., the informative commit messages which accompanied the changes made in Git).

Impact and Scope

Online handbook.

Work on the Handbook began in June 2019, and the first release followed in January 2020. It has been under continuous development for more than four years, averaging two releases per year, and complements the DataLad ecosystem with a comprehensive user guide. Its PDF version spans more than 600 pages. Releases of the DataLad core package are coordinated with matching releases of the Handbook project, and past release versions remain accessible online.

Confirming observations from the literature (van Loggem & van der Veer, 2014 ), the conjunct development of user documentation has positive effects on software quality. As the writing process involved manual software testing, initial developments were accompanied by a higher discovery rate of software errors. This user-focused approach uncovers deficiencies of the technical documentation and API elements with suboptimal user experience. The workflow-based nature of demonstrations highlights API inconsistencies, and the integration test that the Handbook constitutes catches incompatibilities between the software and common usage practice. These documentation features facilitate software development, and had a major impact on the conjoint 0.12.0 release of DataLad (Jan 2020), the first with a matching Handbook release. The popularity data confirms a marked increase in downloads of the DataLad Debian package from this date onward Footnote 13 . In addition, differences in web traffic confirm that user documentation is in higher demand than the technical documentation. An analysis of visits to the web version of the Handbook from December 2022 to July 2023 revealed that handbook.datalad.org averaged 22 000 total page views per 30 days, compared to 6600 for the technical documentation at docs.datalad.org. In summary, the development of the DataLad Handbook had a measurable positive impact on the number of users, the popularity of the package, and the software quality.

We conducted a post-workshop survey among participants of the first two instances of the workshop conducted for the CRC 1451 early career researchers. Most participants were PhD students (who received course credit for workshop participation), and the workshops were conducted online. The workshop received high overall ratings, with participants stating that they are likely to recommend it to their colleagues; ratings of learning pace and applicability to participants’ work were mixed (Fig. 2 ). Given that the CRC 1451 project combines clinical, preclinical, and computational neuroscience, we see these responses as indicative of the diversity of backgrounds that PhD students in neuroscience have, as well as of a varying degree to which formal RDM is an already established practice across research fields. One recurring suggestion for improvement was to include more examples of real-world applications. This highlights that although a course dedicated to software basics is a good start, transferring the knowledge to specific applications is the real challenge, which can be made easier with existing written documentation.

figure 2

Responses of the participants of the first two installments of the workshop, conducted online for early career researchers, to the following questions. Recommend: How likely are you to recommend this workshop to a friend or colleague? Overall: What is your overall assessment of this event (1-insufficient, 5-excellent)? Pace: What do you think about the learning pace of the workshop (ie. material vs time)? Applicability: Will the knowledge and information you gained be applicable in your work?

Knowledge Base

In the span of ten months we accumulated 29 knowledge base items of various length. The knowledge base has been useful in answering recurring questions, or communicating recommended workflows. Beyond this, it was also a valuable method to keep the official handbook and course material lean, as it provided a home to more temporary or niche use cases.

Lessons Learned

In the previous sections we described the design considerations and their practical applications for the documentation and education aspect of the DataLad project. In our opinion, creating and maintaining this growing collection of materials was a worthwhile investment, which helped users to apply the software and developers to improve it. The open source, flexible approach to creating educational content was particularly valuable for its maintainability, adaptability, and applicability to various research contexts. In this closing section we want to share our comments – lessons learned in the process – on various related aspects.

Customization vs Complexity

Our technical choices for the Handbook had to be weighed against non-technical features. Compared to other handbook projects such as the Turing Way project (The Turing Way Community, 2022 ), sources based on reStructuredText and Sphinx, as well as the many custom admonitions, constitute a higher barrier to entry for contributors. The Turing Way community, for example, explicitly chose Markdown-based Jupyter Book tooling to ease contributing for technical novices. Indeed, Handbook maintainers regularly have to assist new contributors with technical details, and complex technical contributions almost always come from the core contributor team. Nevertheless, in our case - especially with the requirements for multiple formats, integration tests, and reuse in print editions - the customization opportunities of Sphinx made up for slightly higher complexity than alternative documentation frameworks.

Yes, there Can be too much Documentation

Although a large amount of documentation appears universally positive, there are concrete downsides that can increase with the amount of documentation. If more content leads to duplication, maintenance costs increase steeply, and so does the threat of showcasing outdated information. Thus, wherever possible, information is only detailed in a single location, and other places refer or link this source rather than duplicating its content.

Additionally, a large amount of documentation can appear intimidating. In our experience, information that educational resources exists is met positively, but the notion of a “600 page handbook” can diminish this enthusiasm. We find anecdotal evidence that a (surprisingly) large amount of available documentation can be perceived as a warning sign regarding software complexity, and is interpreted as a requirement to process all available documentation before a meaningful proficiency can be be reached. Designing all resources as best as possible in a modular, pick-what-you-need style proved to be important to allow for selective consumption and to lower the perceived cost of entry.

Keeping Online Workshops Interactive

Keeping participants engaged during an online workshop is a particular challenge, as it is much harder to “read the room”. We have had positive experiences with using interactive poll and Q&A platforms, such as, e.g. DirectPoll Footnote 14 or Slido Footnote 15 . Additionally, we believe that having co-presenters who can monitor text chat or take part of the questions is invaluable.

Avoiding Installfest

Software should be easy to install, and we believe this is the case for DataLad. However, the preferred method of installation will differ between users. DataLad can currently be installed through several methods: conda, pip, apt and several other package managers (GNU/Linux), homebrew (MacOS). Selecting one of these methods will depend on how it integrates (or clashes) with the methods used for managing the entire software environment(s), and, if chosen hastily, may lead to future issues. To this end, we provide an overview of installation methods in the Handbook, and a note on debugging issues related to using multiple Python versions in the knowledge base. For this reason, performing an installation as part of the workshop may turn out to be time consuming, and we tend to avoid it. If installing a given software on participants’ computers is a goal (because it is required for the workshop or for future work), one approach that we found useful (at least with certain audiences) is to provide a link to detailed installation instructions beforehand and ask participants to e-mail the instructor with the output of a diagnostic command (or describe encountered problems). This encourages engagement from the participants, and may also provide instructors (maintainers) with an insight into how well the installation process works in practice.

Software Environment Nuances

Exploring a command-line tool (particularly one for managing files) can hardly be separated from using basic command-line utilities (e.g. for changing the working directory or listing files). Although their usage can be weaved into the narrative of a workshop, this introduces additional complexity for command-line interface (CLI) novices. Moreover, while core utilities are similar across systems, there are differences, often subtle, in how they should be used to produce the same effect (e.g., compare tree vs tree /F , or which vs Get-Command , in Bash and PowerShell, respectively). The impact of these differences can be mitigated by providing toggle-able OS-specific instructions in the published materials, however, they still present a major challenge during live workshops when participants use different operating systems with their default sets of tools. For this reason, we prefer to use a common JupyterHub deployment for hands-on sessions.

Cloud Computing

Both virtual and in-person workshops benefit from prepared virtual computing environments in particular. Costs per workshop amount to a few Euros, and typically never exceed 15 Euro even for multi-day workshops. The setup of the respective Amazon EC2 Cloud instance takes a few hours at most.

Data Production and Data Consumption

The RDM course was created from the data producer perspective, and walks users through the process of building a dataset from scratch, covering data consumption only at a later stage. Aside from the fact that data analysis in computational neuroscience projects may just as often start with obtaining existing datasets, this narrative creates a situation where multiple steps are needed to reach a situation where benefits of version controlling the data can be seen. It could be an interesting change of perspective to start with obtaining a copy of an already created dataset, something which currently gets introduced in a later part of the workshop, and inspect its properties (such as content, history, and file availability information) in order to highlight the value added by RDM software. We tried this approach during shorter software demos, where having the target state communicated upfront was particularly useful for streamlining the presentation.

Technical Writing Takes Time

Preparing a description of a discovered solution in the shape of a knowledge base item is time consuming, and a task on its own. However, it generates a resource which becomes useful with time, as the solution is captured with its context. Working on such solutions is a valuable way to learn about the program – also for developers who need not be familiar with all parts of the code base, or all potential applications.

As the mere existence of software is insufficient to ensure its uptake and use according to best practices, maintaining user-oriented documentation became an important part of the DataLad project. Having users can not just validate the development effort, but it can also help enhance the software: users can diagnose problems, propose solutions, and suggest improvements (Raymond, 1999 ). A lack of documentation hinders knowledge transfer between users and developers, impedes maintenance, and creates a steep learning curve for new users and new developers alike (Theunissen et al., 2022 ). As described by Parnas ( 2011 ), “reduced [documentation] quality leads to reduced [software] usage, [r]educed usage leads to reductions in both resources and motivation, [r]educed resources and motivation degrade quality further”. Turning the argument around, improving documentation can improve software, which (for research software) can improve research.

Different kinds of documentation are needed for different audiences; in our case this led to creation of the Handbook, course materials, and knowledge base, in addition to the technical reference. In our interactions with users, we observe positive effects of having these resources available. We hope that our experiences in creating them, both in terms of design and practical aspects, can be helpful for other projects in research software development and for education in research data management.

Information Sharing Statement

The sources of all projects described in this manuscript are available from GitHub and licensed under CC-BY, CC-BY-SA, or MIT licenses:

github.com/datalad-handbook/book ,

github.com/datalad-handbook/book-datalad-intro ,

github.com/psychoinformatics-de/rdm-course ,

github.com/psychoinformatics-de/knowledge-base

Data Availability

No datasets were generated or analysed during the current study.

missing.csail.mit.edu

github.com/sphinx-doc/sphinx/network/dependents

github.com/mih/autorunrecord

creativecommons.org/licenses/by-sa/4.0

crc1451.uni-koeln.de

psychoinformatics-de.github.io/rdm-course

carpentries.org

github.com/carpentries/styles

carpentries.github.io/workbench/

tljh.jupyter.org

knowledge-base.psychoinformatics.de

qa.debian.org/popcon.php?package=datalad

directpoll.com

Brearley, N. (1973). The role of technical reports in scientific and technical communication. IEEE Transactions on Professional Communication, PC–16 , 117–119. https://doi.org/10.1109/tpc.1973.6592685

Article   Google Scholar  

Brooks, P. P., McDevitt, E. A., Mennen, A. C., Testerman, M., Kim, N. Y., Visconti di Oleggio Castello, M., & Nastase, S. A. (2021). Princeton handbook for reproducible neuroimaging (Version v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.4317623

Devenyi, G. A., Emonet, R., Harris, R. M., Hertweck, K. L., Irving, D., Milligan, I., & Wilson, G. (2018). Ten simple rules for collaborative lesson development (S. Markel, Ed.). PLOS Computational Biology, 14 , e1005963. https://doi.org/10.1371/journal.pcbi.1005963

Article   CAS   PubMed   PubMed Central   Google Scholar  

Gentleman, R., & Temple Lang, D. (2007). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16 , 1–23. https://doi.org/10.1198/106186007x178663

Gorgolewski, K. J., Auer, T., Calhoun, V. D., Craddock, C. R., Das, S., Duff, E. P., Flandin, G., Ghosh, S. S., Glatard, T., Halchenko, Y. O., et al. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3 , 1–9. https://doi.org/10.1038/sdata.2016.44

Grisham, W., Lom, B., Lanyon, L., & Ramos, R. (2016). Proposed training to meet challenges of large-scale data in neuroscience. Frontiers in Neuroinformatics, 10 , 28. https://doi.org/10.3389/fninf.2016.00028

Article   PubMed   PubMed Central   Google Scholar  

Halchenko, Y. O., Meyer, K., Poldrack, B., Solanky, D. S., Wagner, A. S., Gors, J., MacFarlane, D., Pustina, D., Sochat, V., Ghosh, S. S., Mönch, C., Markiewicz, C. J., Waite, L., Shlyakhter, I., de la Vega, A., Hayashi, S., Häusler, C. O., Poline, J.-B., Kadelka, T., ... Hanke, M. (2021). Datalad: Distributed system for joint management of code, data, and their relationship. Journal of Open Source Software,  6 , 3262. https://doi.org/10.21105/joss.03262

Hess, J. (2010). git-annex . https://git-annex.branchable.com/

Koehler Leman, J., Weitzner, B. D., Renfrew, P. D., Lewis, S. M., Moretti, R., Watkins, A. M., Mulligan, V. K., Lyskov, S., Adolf-Bryfogle, J., Labonte, J. W., et al. (2020). Better together: Elements of successful scientific software development in a distributed collaborative community. PLoS Computational Biology, 16 , e1007507. https://doi.org/10.1371/journal.pcbi.1007507

Mehlenbacher, B. (2003). Documentation: Not yet implemented, but coming soon. The HCI handbook: Fundamentals, evolving technologies, and emerging applications , (pp. 527–543).

Parnas, D. L. (2011). Precise documentation: The key to better software. In S. Nanz (Ed.), The Future of Software Engineering (pp. 125–148). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-15187-3_8

Chapter   Google Scholar  

Pawlik, A., Segal, J., Sharp, H., & Petre, M. (2015). Crowdsourcing scientific software documentation: A case study of the NumPy documentation project. Computing in Science & Engineering, 17 (1), 28–36. https://doi.org/10.1109/mcse.2014.93

Article   CAS   Google Scholar  

Pinelli, T. E., Glassman, M., & Cordle, V. M. (1982). Survey of reader preferences concerning the format of NASA technical reports . Technical Report NASA-TM-84502, National Aeronautics and Space Administration.

Raymond, E. (1999). The cathedral and the bazaar. Knowledge, Technology & Policy, 12 , 23–49. https://doi.org/10.1007/s12130-999-1026-0

Segal, J. (2007). Some problems of professional end user developers. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007) . https://doi.org/10.1109/vlhcc.2007.17

Swarts, J. (2019). Open-source software in the sciences: The challenge of user support. Journal of Business and Technical Communication, 33 , 60–90. https://doi.org/10.1177/1050651918780202

The Turing Way Community. (2022). The Turing Way: A handbook for reproducible, ethical and collaborative research (Version 1.0.2). Zenodo. https://doi.org/10.5281/zenodo.7625728

Theunissen, T., Heesch, U., & Avgeriou, P. (2022). A mapping study on documentation in continuous software development. Information and Software Technology, 142 , 106733. https://doi.org/10.1016/j.infsof.2021.106733

van Loggem, B., & van der Veer, G. C. (2014). A documentation-centred approach to software design, development and deployment. In A. Ebert, G. C. van der Veer, G. Domik, N. D. Gershon, & I. Scheler (Eds.), Building Bridges: HCI, Visualization, and Non-formal Modeling (pp. 188–200). Berlin, Heidelberg: Springer.

Wagner, A. S., Waite, L. K., Waite, A. Q., Reuter, N., Poldrack, B., Poline, J. -B., Kadelka, T., Markiewicz, C. J., Vavra, P., Paas, L. K., Herholz, P., Mochalski, L. N., Kraljevic, N., Heckner, M. K., Halchenko, Y. O., & Hanke, M. (2020). The DataLad Handbook: A user-focused and workflow- based addition to standard software documentation. 25th annual meeting of the Organization for Human Brain Mapping (OHBM) . https://doi.org/10.5281/zenodo.7906718

Wiener, M., Sommer, F., Ives, Z., Poldrack, R., & Litt, B. (2016). Enabling an open data ecosystem for the neurosciences. Neuron, 92 , 617–621. https://doi.org/10.1016/j.neuron.2016.10.037

Article   CAS   PubMed   Google Scholar  

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. -W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., ... Mons, B. (2016). The fair guiding principles for scientific data management and stewardship. Scientific Data , 3 (1). https://doi.org/10.1038/sdata.2016.18

Wilson, G. (2016). Software carpentry: Lessons learned. F1000Research, 3 , 62. https://doi.org/10.12688/f1000research.3-62.v2

Article   PubMed Central   Google Scholar  

Download references

Acknowledgements

The authors wish to thank Tosca Heunis for her feedback on earlier versions of this manuscript. The DataLad software and its documentation are the joint work of more than 100 individuals. We are deeply grateful for these contributions to free and open source software (FOSS) and documentation. Likewise we are grateful to the many more people that produce and maintain the FOSS ecosystem that DataLad is built on. We are particularly indebted to Joey Hess, the author of the git-annex software, without which DataLad would not be what it is today.

Open Access funding enabled and organized by Projekt DEAL. The RDM course was developed with funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451 (431549029, INF project). The DataLad project received support through the following grants: US-German collaboration in computational neuroscience (CRCNS) project “DataGit: converging catalogues, warehouses, and deployment logistics into a federated ‘data distribution”’, co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411); CRCNS US-German Data Sharing “DataLad - a decentralized system for integrated discovery, management, and publication of digital objects of science”, (NSF 1912266, BMBF 01GQ1905); Helmholtz Research Center Jülich, FDM challenge 2022; German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform; ReproNim project (NIH 1P41EB019936-01A1); Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451 (431549029, INF project); European Union’s Horizon 2020 research and innovation programme under grant agreements Human Brain Project SGA3 (H2020-EU.3.1.5.3, grant no. 945539), VirtualBrainCloud (H2020-EU.3.1.5.3, grant no. 826421); EBRAINS 2.0 (HORIZON.1.3, grant no. 101147319).

Author information

Michał Szczepanik and Adina S. Wagner both contributed equally to this work.

Authors and Affiliations

Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Center Jülich, Jülich, Germany

Michał Szczepanik, Adina S. Wagner, Stephan Heunis, Laura K. Waite, Simon B. Eickhoff & Michael Hanke

Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany

Simon B. Eickhoff & Michael Hanke

You can also search for this author in PubMed   Google Scholar

Contributions

ASW and MS contributed equally to the work. ASW and MS conceptualized and wrote the first draft of the manuscript. All authors reviewed and edited the manuscript. ASW, LKW, and MH conceptualized the RDM Handbook. ASW, LKW, MH, MS, SH maintained the RDM Handbook and curated the majority of its content. ASW and MH prepared the print version of the RDM Handbook. MS conceptualized the RDM course, created the technical backbone, and curated its content. ASW, MH, and SH contributed to the technical backbone and content of the RDM course. ASW, MH, MS, and SH presented multiple online and in-person workshops based on the content of both the Handbook and RDM course. MH and MS conceptualized the knowledge base. ASW, LKW, MH, MS, and SH contributed to the technical backbone and content of the knowledge base. MH and SBE acquired funding.

Corresponding author

Correspondence to Michał Szczepanik .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Szczepanik, M., Wagner, A.S., Heunis, S. et al. Teaching Research Data Management with DataLad: A Multi-year, Multi-domain Effort. Neuroinform (2024). https://doi.org/10.1007/s12021-024-09665-7

Download citation

Accepted : 22 April 2024

Published : 07 May 2024

DOI : https://doi.org/10.1007/s12021-024-09665-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Research data management
  • Version control
  • Online course
  • Software documentation

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

Case Western Reserve University

  • Research Data Lifecycle Guide

Developing a Data Management Plan

This section breaks down different topics required for the planning and preparation of data used in research at Case Western Reserve University. In this phase you should understand the research being conducted, the type and methods used for collecting data, the methods used to prepare and analyze the data, addressing budgets and resources required, and have a sound understanding of how you will manage data activities during your research project.

Many federal sponsors of Case Western Reserve funded research have required data sharing plans in research proposals since 2003. As of Jan. 25, 2023, the National Institutes of Health has revised its data management and sharing requirements. 

This website is designed to provide basic information and best practices to seasoned and new investigators as well as detailed guidance for adhering to the revised NIH policy.  

Basics of Research Data Management

What is research data management?

Research data management (RDM) comprises a set of best practices that include file organization, documentation, storage, backup, security, preservation, and sharing, which affords researchers the ability to more quickly, efficiently, and accurately find, access, and understand their own or others' research data.

Why should you care about research data management?

RDM practices, if applied consistently and as early in a project as possible, can save you considerable time and effort later, when specific data are needed, when others need to make sense of your data, or when you decide to share or otherwise upload your data to a digital repository. Adopting RDM practices will also help you more easily comply with the data management plan (DMP) required for obtaining grants from many funding agencies and institutions.

Does data need to be retained after a project is completed?

Research data must be retained in sufficient detail and for an adequate period of time to enable appropriate responses to questions about accuracy, authenticity, primacy and compliance with laws and regulations governing the conduct of the research. External funding agencies will each have different requirements regarding storage, retention, and availability of research data. Please carefully review your award or agreement for the disposition of data requirements and data retention policies.

A good data management plan begins by understanding the sponsor requirements funding your research. As a principal investigator (PI) it is your responsibility to be knowledgeable of sponsors requirements. The Data Management Plan Tool (DMPTool) has been designed to help PIs adhere to sponsor requirements efficiently and effectively. It is strongly recommended that you take advantage of the DMPTool.  

CWRU has an institutional account with DMPTool that enables users to access all of its resources via your Single Sign On credentials. CWRU's DMPTool account is supported by members of the Digital Scholarship team with the Freedman Center for Digital Scholarship. Please use the RDM Intake Request form to schedule a consultation if you would like support or guidance regarding developing a Data Management Plan.

Some basic steps to get started:

  • Sign into the  DMPTool site  to start creating a DMP for managing and sharing your data. 
  • On the DMPTool site, you can find the most up to date templates for creating a DMP for a long list of funders, including the NIH, NEH, NSF, and more. 
  • Explore sample DMPs to see examples of successful plans .

Be sure that your DMP is addressing any and all federal and/or funder requirements and associated DMP templates that may apply to your project. It is strongly recommended that investigators submitting proposals to the NIH utilize this tool. 

The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023.  Guidance for completing a NIH Data Management Plan has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals. 

A Data Management Plan can help create and maintain reliable data and promote project success. DMPs, when carefully constructed and reliably adhered to, help guide elements of your research and data organization.

A DMP can help you:

Document your process and data.

  • Maintain a file with information on researchers and collaborators and their roles, sponsors/funding sources, methods/techniques/protocols/standards used, instrumentation, software (w/versions), references used, any applicable restrictions on its distribution or use.
  • Establish how you will document file changes, name changes, dates of changes, etc. Where will you record of these changes? Try to keep this sort of information in a plain text file located in the same folder as the files to which it pertains.
  • How are derived data products created? A DMP encourages consistent description of data processing performed, software (including version number) used, and analyses applied to data.
  • Establish regular forms or templates for data collection. This helps reduce gaps in your data, promotes consistency throughout the project.

Explain your data

  • From the outset, consider why your data were collected, what the known and expected conditions may be for collection, and information such as time and place, resolution, and standards of data collected.
  • What attributes, fields, or parameters will be studied and included in your data files? Identify and describe these in each file that employs them.
  • For an overview of data dictionaries, see the USGS page here: https://www.usgs.gov/products/data-and-tools/data-management/data-dictionaries

DMP Requirements

Why are you being asked to include a data management plan (DMP) in your grant application? For grants awarded by US governmental agencies, two federal memos from the US Office of Science and Technology Policy (OSTP), issued in 2013 and 2015 , respectively, have prompted this requirement. These memos mandate public access to federally- (and, thus, taxpayer-) funded research results, reflecting a commitment by the government to greater accountability and transparency. While "results" generally refers to the publications and reports produced from a research project, it is increasingly used to refer to the resulting data as well.

Federal research-funding agencies  have responded to the OSTP memos by issuing their own guidelines and requirements for grant applicants (see below), specifying whether and how research data in particular are to be managed in order to be publicly and properly accessible.

  • NSF—National Science Foundation "Proposals submitted or due on or after January 18, 2011, must include a supplementary document of no more than two pages labeled 'Data Management Plan'. This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results." Note: Additional requirements may apply per Directorate, Office, Division, Program, or other NSF unit.
  • NIH—National Institutes of Health "To facilitate data sharing, investigators submitting a research application requesting $500,000 or more of direct costs in any single year to NIH on or after October 1, 2003 are expected to include a plan for sharing final research data for research purposes, or state why data sharing is not possible."
  • NASA—National Aeronautics and Space Administration "The purpose of a Data Management Plan (DMP) is to address the management of data from Earth science missions, from the time of their data collection/observation, to their entry into permanent archives."
  • DOD—Department of Defense "A Data Management Plan (DMP) describing the scientific data expected to be created or gathered in the course of a research project must be submitted to DTIC at the start of each research effort. It is important that DoD researchers document plans for preserving data at the outset, keeping in mind the potential utility of the data for future research or to support transition to operational or other environments. Otherwise, the data is lost as researchers move on to other efforts. The essential descriptive elements of the DMP are listed in section 3 of DoDI 3200.12, although the format of the plan may be adjusted to conform to standards established by the relevant scientific discipline or one that meets the requirements of the responsible Component"
  • Department of Education "The purpose of this document is to describe the implementation of this policy on public access to data and to provide guidance to applicants for preparing the Data Management Plan (DMP) that must outline data sharing and be submitted with the grant application. The DMP should describe a plan to provide discoverable and citable dataset(s) with sufficient documentation to support responsible use by other researchers, and should address four interrelated concerns—access, permissions, documentation, and resources—which must be considered in the earliest stages of planning for the grant."
  • " Office of Scientific and Technical Information (OSTI) Provides access to free, publicly-available research sponsored by the Department of Energy (DOE), including technical reports, bibliographic citations, journal articles, conference papers, books, multimedia, software, and data.

Data Management Best Practices

As you plan to collect data for research, keep in mind the following best practices. 

Keep Your Data Accessible to You

  • Store your temporary working files somewhere easily accessible, like on a local hard drive or shared server.
  • While cloud storage is a convenient solution for storage and sharing, there are often concerns about data privacy and preservation. Be sure to only put data in the cloud that you are comfortable with and that your funding and/or departmental requirements allow.
  • For long-term storage, data should be put into preservation systems that are well-managed. [U]Tech provides several long-term data storage options for cloud and campus. 
  • Don't keep your original data on a thumb drive or portable hard drive, as it can be easily lost or stolen.
  • Think about file formats that have a long life and that are readable by many programs. Formats like ascii, .txt, .csv, .pdf are great for long term  preservation.
  • A DMP is not a replacement for good data management practices, but it can set you on the right path if it is consistently followed. Consistently revisit your plan to ensure you are following it and adhering to funder requirements.

Preservation

  • Know the difference between storing and preserving your data. True preservation is the ongoing process of making sure your data are secure and accessible for future generations. Many sponsors have preferred or recommended data repositories. The DMP tool can help you identify these preferred repositories. 
  • Identify data with long-term value. Preserve the raw data and any intermediate/derived products that are expensive to reproduce or can be directly used for analysis. Preserve any scripted code that was used to clean and transform the data.
  • Whenever converting your data from one format to another, keep a copy of the original file and format to avoid loss or corruption of your important files.
  • Leverage online platforms like OSF can help your group organize, version, share, and preserve your data, if the sponsor hasn’t specified a specific platform.
  • Adhere to federal sponsor requirements on utilizing accepted data repositories (NIH dbGaP, NIH SRA, NIH CRDC, etc.) for preservation. 

Backup, Backup, Backup

  • The general rule is to keep 3 copies of your data: 2 copies onsite, 1 offsite.
  • Backup your data regularly and frequently - automate the process if possible. This may mean weekly duplication of your working files to a separate drive, syncing your folders to a cloud service like Box, or dedicating a block of time every week to ensure you've copied everything to another location.

Organization

  • Establish a consistent, descriptive filing system that is intelligible to future researchers and does not rely on your own inside knowledge of your research.
  • A descriptive directory and file-naming structure should guide users through the contents to help them find whatever they are looking for.

Naming Conventions

  • Use consistent, descriptive filenames that reliably indicate the contents of the file.
  • If your discipline requires or recommends particular naming conventions, use them!
  • Do not use spaces between words. Use either camelcase or underscores to separate words
  • Include LastnameFirstname descriptors where appropriate.
  • Avoid using MM-DD-YYYY formats
  • Do not append vague descriptors like "latest" or "final" to your file versions. Instead, append the version's date or a consistently iterated version number.

Clean Your Data

  • Mistakes happen, and often researchers don't notice at first. If you are manually entering data, be sure to double-check the entries for consistency and duplication. Often having a fresh set of eyes will help to catch errors before they become problems.
  • Tabular data can often be error checked by sorting the fields alphanumerically to catch simple typos, extra spaces, or otherwise extreme outliers. Be sure to save your data before sorting it to ensure you do not disrupt the records!
  • Programs like OpenRefine  are useful for checking for consistency in coding for records and variables, catching missing values, transforming data, and much more.

What should you do if you need assistance implementing RDM practices?

Whether it's because you need discipline-specific metadata standards for your data, help with securing sensitive data, or assistance writing a data management plan for a grant, help is available to you at CWRU. In addition to consulting the resources featured in this guide, you are encouraged to contact your department's liaison librarian.

If you are planning to submit a research proposal and need assistance with budgeting for data storage and or applications used to capture, manage, and or process data UTech provides information and assistance including resource boilerplates that list what centralized resources are available. 

More specific guidance for including a budget for Data Management and Sharing is included on this document: Budgeting for Data Management and Sharing . 

Custody of Research Data

The PI is the custodian of research data, unless agreed on in writing otherwise and the agreement is on file with the University, and is responsible for the collection, management, and retention of research data. The PI should adopt an orderly system of data organization and should communicate the chosen system to all members of a research group and to the appropriate administrative personnel, where applicable. Particularly for long-term research projects, the PI should establish and maintain procedures for the protection and management of essential records.

CWRU Custody of Research Data Policy  

Data Sharing

Many funding agencies require data to be shared for the purposes of reproducibility and other important scientific goals. It is important to plan for the timely release and sharing of final research data for use by other researchers.  The final release of data should be included as a key deliverable of the DMP. Knowledge of the discipline-specific database, data repository, data enclave, or archive store used to disseminate the data should also be documented as needed. 

The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023. Guidance for completing a NIH Data Management and Sharing Plan  has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals.

Association of College and Research Libraries site logo

Keeping Up With… Research Data Management

KUW Research Data Management

This edition of Keeping Up With... was written by Cathryn F. Miller, Rebekah S. Miller, and Gesina A. Phillips.

Cathryn F. Miller is a Visiting Social Sciences Librarian at Duquesne University, email: [email protected] ; Rebekah S. Miller is a STEM Librarian at Duquesne University, email: [email protected] ; and Gesina A. Phillips is a Digital Scholarship Librarian at Duquesne University, email: [email protected]

What is Research Data Management?

Research Data Management (RDM) is a broad concept that includes processes undertaken to create organized, documented, accessible, and reusable quality research data. [1] The role of the librarian is to support researchers through the research data lifecycle.

USGS

Figure 1: USGS Data Lifecycle Model [2]

The processes involved in RDM are more complex than simply backing up data on a thumb drive and ensuring that sensitive data is kept secure. Managing data includes using file naming conventions, organizing files, creating metadata, controlling access to data, backing up data, citing data, and more. There are checklists online which point to the considerations and processes in RDM (see UK Data Services Checklist [3] and DCC checklist [4]).

RDM is a relevant topic to keep up with at a time when researchers are increasingly required to create data management plans, provide methodological transparency, and share data. [5]

Making the Case for RDM

While many researchers are interested in data management, there are some who may not see a need for it at all. If that’s the case, there are both carrots and sticks that can be used to encourage them.

  • Save Time: Properly managing data is in a researcher’s best interest; being able locate past or current data and accompanying metadata saves time, frustration, and money.
  • Increase Citations: Well-managed data is easy to share, which can lead to the data itself being cited; making the data available may also lead to more citations for the original paper. [6]
  • Enhance Reproducibility: Data management enhances reproducibility by making the methodology more transparent.
  • Preserve Data: While data management encourages researchers to consider backup and security measures, it also ensures that data is preserved, not just stored. Preservation focuses on the long-term ability to access and use data, and considers interoperability and open file formats.
  • Required Sharing: Funders and journals have begun to require data sharing as a requirement of publication or award acceptance; Nature , PLoS One , and the American Journal of Political Science all have data sharing requirements.
  • Required Data Management Plans: A variety of government funders (e.g., NSF, NIH) and private funders (e.g., Bill & Melinda Gates Foundation [7]) require data management plans and/or data sharing.
  • Prevent Retraction : Accessible data protects from retraction; the New England Journal of Medicine retracted an article after the underlying data could not be located, [8] as has Cell Cycle. [9]

The Role of the Librarian

Librarians need to provide RDM services that take into account the “interests and needs” of the university community that includes graduate students, faculty, and research staff [10] Understanding the current practices, knowledge, and desired support at a university is therefore key in developing and maintaining relevant services.

  • Services and Education: How are librarians serving researchers? Common RDM services include helping researchers to deposit data in institutional and disciplinary repositories, assisting with data management plans, and consulting with research teams. [11]Librarians also serve the research community by creating workshops, webinars, and tutorials. Providing RDM services does not require librarians to become experts at statistics, programming or IRB proposals, but instead to develop a robust understanding of tools and support mechanisms available on campus. This may prompt collaboration with campus computing or statistical support services.
  • Information: Providing background information as well as advanced information about RDM through LibGuides, newsletters, or webpages is also important. When communicating information, it is important to limit jargon and use the terminology that researchers themselves are familiar with. More than 50% of the academic libraries that have an RDM presence online provide information related to creating data management plans, data documentation, metadata standards, storage and preservation. [12]

Understanding the “needs and interests” of researchers can guide the development of services, information objects, and instruction. Communicating the role of the librarian, marketing services, and evaluating services are key to staying relevant.

Learning about RDM

Because RDM can be a complex process with many different considerations, learning about RDM through a series of modules is recommended. For those familiar with basic RDM concepts, reading research journals and engaging in online communities is key.

Training modules (see https://nnlm.gov/data/courses-and-workshops for a complete list):

  • ESIP Federation : 35 training videos about very specific topics from “Tracking Data Usage” to “Handling Sensitive Data” ( http://commons.esipfed.org/datamanagementshortcourse )
  • DataOne Data Management Modules : 10 powerpoints accompanied by handouts and hands-on exercises ( https://www.dataone.org/education )
  • NYU RDM Training for Information Professionals : 8 tutorials about RDM in a biomedical context ( https://compass.iime.cloud//mix/G3X5E )
  • Journal of eScience Librarianship
  • International Journal of Digital Curation

Online communities/websites (see Barbrow, Brush, and Goldman [13] for a complete list):

  • NNLM RD3 ( https://nnlm.gov/data )
  • Digital Curation Centre ( http://www.dcc.ac.uk )

What specific processes should researchers engage in throughout the data lifecycle? The answer to this question varies by discipline, by research project, by size of the data collected, and by researcher; the RDM practices involved in an ethnographic study will be very different from those involved in clinical research. RDM is complex, ambiguous, and imperfect because of the complexity of research itself. Supporting research throughout the data lifecycle by consulting with researchers and promoting best practices can be challenging, but will improve data quality, reproducibility, and shareability.

Additional Resources/ Tools

Creating a Data Management Plan: DMPTool ( dmptool.org ), DMPonline ( dmponline.dcc.ac.uk )

Workflow and organization: REDCap ( project-redcap.org ), Open Science Framework ( https://osf.io )

Sharing/publishing: Registry of Research Data Repositories ( re3data.org ), figshare ( figshare.com )

Examples of library RDM presence:

  • University of Minnesota ( https://www.lib.umn.edu/datamanagement )
  • Duquesne University ( http://guides.library.duq.edu/datamanagement )
  • Columbia University ( https://scholcomm.columbia.edu/data-management )

ACRL Workshop: RDM ( https://acrl.libguides.com/scholcomm/toolkit/RDMWorkshop )

Evaluation tool for RDM tutorials or workshops: DataOne EEVA ( https://www.dataone.org/education-evaluation )

[1] Louise Corti, Veerle Van den Eynden, Libby Bishop, and Matthew Woollard, Managing and Sharing Research Data: A Guide to Good Practice (Los Angeles: Sage, 2014), 2.

[2] “The USGS Data Lifecycle,” US Geological Survey, accessed April 9, 2018, https://www2.usgs.gov/datamanagement/why.php .

[3] “Data Management Checklist,” UK Data Services, accessed on April 8, 2018, https://www.ukdataservice.ac.uk/manage-data/plan/checklist .

[4] “Checklist for a Data Management Plan v.4.0,” Digital Curation Centre, last modified 2014, http://www.dcc.ac.uk/resources/data-management-plans/checklist

[5] Lisa Federer, “Research Data Management in the Age of Big Data: Roles and Opportunities for Librarians,” Information Services & Use 36 (2016): 35.

[6] Heather A. Piwowar and Todd J. Vision, “Data Reuse and the Open Data Citation Advantage,” PeerJ 1 (October 2013): e175, https://dx.doi.org/10.7717/peerj.175 .

[7] “Bill & Melinda Gates Foundation Open Access Policy,” Bill & Melinda Gates Foundation, accessed April 9, 2018, https://www.gatesfoundation.org/how-we-work/general-information/open-access-policy .

[8] “Retraction: CPAP for the Metabolic Syndrome in Patients with Obstructive Sleep Apnea. N Engl J Med 2011;365:2277-86,” New England Journal of Medicine 369 (2013): 1770, http://www.nejm.org/doi/10.1056/NEJMc1313105 .

[9] “Editorial retraction,” Cell Cycle 16, no. 3 (2017): 296, https://doi.org/10.1080/15384101.2016.1205369 .

[10] Travis Weller and Amalia Monroe-Gulick, "Differences in the Data Practices, Challenges, and Future Needs of Graduate Students and Faculty Members," Journal of eScience Librarianship 4 (2015): e1070, http://dx.doi.org/10.7191/jeslib.2015.1070 .

[11] Ayoung Yoon and Teresa Schultz, "Research Data Management Services in Academic Libraries in the Us: A Content Analysis of Libraries’ Websites," College & Research Libraries 78, no. 7 (2017): 925, https://doi.org/10.5860/crl.78.7.920 .

[12] Ibid., 926-927.

[13] Sarah Barbrow, Denise Brush, and Julie Goldman, “Research Data Management and Services: Resources for Novice Data Librarians,” C&RL News 78, no. 5 (May 2017), https://doi.org/10.5860/crln.78.5.274 .

Share This Page

eml header

37 Research Topics In Data Science To Stay On Top Of

Stewart Kaplan

  • February 22, 2024

As a data scientist, staying on top of the latest research in your field is essential.

The data science landscape changes rapidly, and new techniques and tools are constantly being developed.

To keep up with the competition, you need to be aware of the latest trends and topics in data science research.

In this article, we will provide an overview of 37 hot research topics in data science.

We will discuss each topic in detail, including its significance and potential applications.

These topics could be an idea for a thesis or simply topics you can research independently.

Stay tuned – this is one blog post you don’t want to miss!

37 Research Topics in Data Science

1.) predictive modeling.

Predictive modeling is a significant portion of data science and a topic you must be aware of.

Simply put, it is the process of using historical data to build models that can predict future outcomes.

Predictive modeling has many applications, from marketing and sales to financial forecasting and risk management.

As businesses increasingly rely on data to make decisions, predictive modeling is becoming more and more important.

While it can be complex, predictive modeling is a powerful tool that gives businesses a competitive advantage.

predictive modeling

2.) Big Data Analytics

These days, it seems like everyone is talking about big data.

And with good reason – organizations of all sizes are sitting on mountains of data, and they’re increasingly turning to data scientists to help them make sense of it all.

But what exactly is big data? And what does it mean for data science?

Simply put, big data is a term used to describe datasets that are too large and complex for traditional data processing techniques.

Big data typically refers to datasets of a few terabytes or more.

But size isn’t the only defining characteristic – big data is also characterized by its high Velocity (the speed at which data is generated), Variety (the different types of data), and Volume (the amount of the information).

Given the enormity of big data, it’s not surprising that organizations are struggling to make sense of it all.

That’s where data science comes in.

Data scientists use various methods to wrangle big data, including distributed computing and other decentralized technologies.

With the help of data science, organizations are beginning to unlock the hidden value in their big data.

By harnessing the power of big data analytics, they can improve their decision-making, better understand their customers, and develop new products and services.

3.) Auto Machine Learning

Auto machine learning is a research topic in data science concerned with developing algorithms that can automatically learn from data without intervention.

This area of research is vital because it allows data scientists to automate the process of writing code for every dataset.

This allows us to focus on other tasks, such as model selection and validation.

Auto machine learning algorithms can learn from data in a hands-off way for the data scientist – while still providing incredible insights.

This makes them a valuable tool for data scientists who either don’t have the skills to do their own analysis or are struggling.

Auto Machine Learning

4.) Text Mining

Text mining is a research topic in data science that deals with text data extraction.

This area of research is important because it allows us to get as much information as possible from the vast amount of text data available today.

Text mining techniques can extract information from text data, such as keywords, sentiments, and relationships.

This information can be used for various purposes, such as model building and predictive analytics.

5.) Natural Language Processing

Natural language processing is a data science research topic that analyzes human language data.

This area of research is important because it allows us to understand and make sense of the vast amount of text data available today.

Natural language processing techniques can build predictive and interactive models from any language data.

Natural Language processing is pretty broad, and recent advances like GPT-3 have pushed this topic to the forefront.

natural language processing

6.) Recommender Systems

Recommender systems are an exciting topic in data science because they allow us to make better products, services, and content recommendations.

Businesses can better understand their customers and their needs by using recommender systems.

This, in turn, allows them to develop better products and services that meet the needs of their customers.

Recommender systems are also used to recommend content to users.

This can be done on an individual level or at a group level.

Think about Netflix, for example, always knowing what you want to watch!

Recommender systems are a valuable tool for businesses and users alike.

7.) Deep Learning

Deep learning is a research topic in data science that deals with artificial neural networks.

These networks are composed of multiple layers, and each layer is formed from various nodes.

Deep learning networks can learn from data similarly to how humans learn, irrespective of the data distribution.

This makes them a valuable tool for data scientists looking to build models that can learn from data independently.

The deep learning network has become very popular in recent years because of its ability to achieve state-of-the-art results on various tasks.

There seems to be a new SOTA deep learning algorithm research paper on  https://arxiv.org/  every single day!

deep learning

8.) Reinforcement Learning

Reinforcement learning is a research topic in data science that deals with algorithms that can learn on multiple levels from interactions with their environment.

This area of research is essential because it allows us to develop algorithms that can learn non-greedy approaches to decision-making, allowing businesses and companies to win in the long term compared to the short.

9.) Data Visualization

Data visualization is an excellent research topic in data science because it allows us to see our data in a way that is easy to understand.

Data visualization techniques can be used to create charts, graphs, and other visual representations of data.

This allows us to see the patterns and trends hidden in our data.

Data visualization is also used to communicate results to others.

This allows us to share our findings with others in a way that is easy to understand.

There are many ways to contribute to and learn about data visualization.

Some ways include attending conferences, reading papers, and contributing to open-source projects.

data visualization

10.) Predictive Maintenance

Predictive maintenance is a hot topic in data science because it allows us to prevent failures before they happen.

This is done using data analytics to predict when a failure will occur.

This allows us to take corrective action before the failure actually happens.

While this sounds simple, avoiding false positives while keeping recall is challenging and an area wide open for advancement.

11.) Financial Analysis

Financial analysis is an older topic that has been around for a while but is still a great field where contributions can be felt.

Current researchers are focused on analyzing macroeconomic data to make better financial decisions.

This is done by analyzing the data to identify trends and patterns.

Financial analysts can use this information to make informed decisions about where to invest their money.

Financial analysis is also used to predict future economic trends.

This allows businesses and individuals to prepare for potential financial hardships and enable companies to be cash-heavy during good economic conditions.

Overall, financial analysis is a valuable tool for anyone looking to make better financial decisions.

Financial Analysis

12.) Image Recognition

Image recognition is one of the hottest topics in data science because it allows us to identify objects in images.

This is done using artificial intelligence algorithms that can learn from data and understand what objects you’re looking for.

This allows us to build models that can accurately recognize objects in images and video.

This is a valuable tool for businesses and individuals who want to be able to identify objects in images.

Think about security, identification, routing, traffic, etc.

Image Recognition has gained a ton of momentum recently – for a good reason.

13.) Fraud Detection

Fraud detection is a great topic in data science because it allows us to identify fraudulent activity before it happens.

This is done by analyzing data to look for patterns and trends that may be associated with the fraud.

Once our machine learning model recognizes some of these patterns in real time, it immediately detects fraud.

This allows us to take corrective action before the fraud actually happens.

Fraud detection is a valuable tool for anyone who wants to protect themselves from potential fraudulent activity.

fraud detection

14.) Web Scraping

Web scraping is a controversial topic in data science because it allows us to collect data from the web, which is usually data you do not own.

This is done by extracting data from websites using scraping tools that are usually custom-programmed.

This allows us to collect data that would otherwise be inaccessible.

For obvious reasons, web scraping is a unique tool – giving you data your competitors would have no chance of getting.

I think there is an excellent opportunity to create new and innovative ways to make scraping accessible for everyone, not just those who understand Selenium and Beautiful Soup.

15.) Social Media Analysis

Social media analysis is not new; many people have already created exciting and innovative algorithms to study this.

However, it is still a great data science research topic because it allows us to understand how people interact on social media.

This is done by analyzing data from social media platforms to look for insights, bots, and recent societal trends.

Once we understand these practices, we can use this information to improve our marketing efforts.

For example, if we know that a particular demographic prefers a specific type of content, we can create more content that appeals to them.

Social media analysis is also used to understand how people interact with brands on social media.

This allows businesses to understand better what their customers want and need.

Overall, social media analysis is valuable for anyone who wants to improve their marketing efforts or understand how customers interact with brands.

social media

16.) GPU Computing

GPU computing is a fun new research topic in data science because it allows us to process data much faster than traditional CPUs .

Due to how GPUs are made, they’re incredibly proficient at intense matrix operations, outperforming traditional CPUs by very high margins.

While the computation is fast, the coding is still tricky.

There is an excellent research opportunity to bring these innovations to non-traditional modules, allowing data science to take advantage of GPU computing outside of deep learning.

17.) Quantum Computing

Quantum computing is a new research topic in data science and physics because it allows us to process data much faster than traditional computers.

It also opens the door to new types of data.

There are just some problems that can’t be solved utilizing outside of the classical computer.

For example, if you wanted to understand how a single atom moved around, a classical computer couldn’t handle this problem.

You’ll need to utilize a quantum computer to handle quantum mechanics problems.

This may be the “hottest” research topic on the planet right now, with some of the top researchers in computer science and physics worldwide working on it.

You could be too.

quantum computing

18.) Genomics

Genomics may be the only research topic that can compete with quantum computing regarding the “number of top researchers working on it.”

Genomics is a fantastic intersection of data science because it allows us to understand how genes work.

This is done by sequencing the DNA of different organisms to look for insights into our and other species.

Once we understand these patterns, we can use this information to improve our understanding of diseases and create new and innovative treatments for them.

Genomics is also used to study the evolution of different species.

Genomics is the future and a field begging for new and exciting research professionals to take it to the next step.

19.) Location-based services

Location-based services are an old and time-tested research topic in data science.

Since GPS and 4g cell phone reception became a thing, we’ve been trying to stay informed about how humans interact with their environment.

This is done by analyzing data from GPS tracking devices, cell phone towers, and Wi-Fi routers to look for insights into how humans interact.

Once we understand these practices, we can use this information to improve our geotargeting efforts, improve maps, find faster routes, and improve cohesion throughout a community.

Location-based services are used to understand the user, something every business could always use a little bit more of.

While a seemingly “stale” field, location-based services have seen a revival period with self-driving cars.

GPS

20.) Smart City Applications

Smart city applications are all the rage in data science research right now.

By harnessing the power of data, cities can become more efficient and sustainable.

But what exactly are smart city applications?

In short, they are systems that use data to improve city infrastructure and services.

This can include anything from traffic management and energy use to waste management and public safety.

Data is collected from various sources, including sensors, cameras, and social media.

It is then analyzed to identify tendencies and habits.

This information can make predictions about future needs and optimize city resources.

As more and more cities strive to become “smart,” the demand for data scientists with expertise in smart city applications is only growing.

21.) Internet Of Things (IoT)

The Internet of Things, or IoT, is exciting and new data science and sustainability research topic.

IoT is a network of physical objects embedded with sensors and connected to the internet.

These objects can include everything from alarm clocks to refrigerators; they’re all connected to the internet.

That means that they can share data with computers.

And that’s where data science comes in.

Data scientists are using IoT data to learn everything from how people use energy to how traffic flows through a city.

They’re also using IoT data to predict when an appliance will break down or when a road will be congested.

Really, the possibilities are endless.

With such a wide-open field, it’s easy to see why IoT is being researched by some of the top professionals in the world.

internet of things

22.) Cybersecurity

Cybersecurity is a relatively new research topic in data science and in general, but it’s already garnering a lot of attention from businesses and organizations.

After all, with the increasing number of cyber attacks in recent years, it’s clear that we need to find better ways to protect our data.

While most of cybersecurity focuses on infrastructure, data scientists can leverage historical events to find potential exploits to protect their companies.

Sometimes, looking at a problem from a different angle helps, and that’s what data science brings to cybersecurity.

Also, data science can help to develop new security technologies and protocols.

As a result, cybersecurity is a crucial data science research area and one that will only become more important in the years to come.

23.) Blockchain

Blockchain is an incredible new research topic in data science for several reasons.

First, it is a distributed database technology that enables secure, transparent, and tamper-proof transactions.

Did someone say transmitting data?

This makes it an ideal platform for tracking data and transactions in various industries.

Second, blockchain is powered by cryptography, which not only makes it highly secure – but is a familiar foe for data scientists.

Finally, blockchain is still in its early stages of development, so there is much room for research and innovation.

As a result, blockchain is a great new research topic in data science that vows to revolutionize how we store, transmit and manage data.

blockchain

24.) Sustainability

Sustainability is a relatively new research topic in data science, but it is gaining traction quickly.

To keep up with this demand, The Wharton School of the University of Pennsylvania has  started to offer an MBA in Sustainability .

This demand isn’t shocking, and some of the reasons include the following:

Sustainability is an important issue that is relevant to everyone.

Datasets on sustainability are constantly growing and changing, making it an exciting challenge for data scientists.

There hasn’t been a “set way” to approach sustainability from a data perspective, making it an excellent opportunity for interdisciplinary research.

As data science grows, sustainability will likely become an increasingly important research topic.

25.) Educational Data

Education has always been a great topic for research, and with the advent of big data, educational data has become an even richer source of information.

By studying educational data, researchers can gain insights into how students learn, what motivates them, and what barriers these students may face.

Besides, data science can be used to develop educational interventions tailored to individual students’ needs.

Imagine being the researcher that helps that high schooler pass mathematics; what an incredible feeling.

With the increasing availability of educational data, data science has enormous potential to improve the quality of education.

online education

26.) Politics

As data science continues to evolve, so does the scope of its applications.

Originally used primarily for business intelligence and marketing, data science is now applied to various fields, including politics.

By analyzing large data sets, political scientists (data scientists with a cooler name) can gain valuable insights into voting patterns, campaign strategies, and more.

Further, data science can be used to forecast election results and understand the effects of political events on public opinion.

With the wealth of data available, there is no shortage of research opportunities in this field.

As data science evolves, so does our understanding of politics and its role in our world.

27.) Cloud Technologies

Cloud technologies are a great research topic.

It allows for the outsourcing and sharing of computer resources and applications all over the internet.

This lets organizations save money on hardware and maintenance costs while providing employees access to the latest and greatest software and applications.

I believe there is an argument that AWS could be the greatest and most technologically advanced business ever built (Yes, I know it’s only part of the company).

Besides, cloud technologies can help improve team members’ collaboration by allowing them to share files and work on projects together in real-time.

As more businesses adopt cloud technologies, data scientists must stay up-to-date on the latest trends in this area.

By researching cloud technologies, data scientists can help organizations to make the most of this new and exciting technology.

cloud technologies

28.) Robotics

Robotics has recently become a household name, and it’s for a good reason.

First, robotics deals with controlling and planning physical systems, an inherently complex problem.

Second, robotics requires various sensors and actuators to interact with the world, making it an ideal application for machine learning techniques.

Finally, robotics is an interdisciplinary field that draws on various disciplines, such as computer science, mechanical engineering, and electrical engineering.

As a result, robotics is a rich source of research problems for data scientists.

29.) HealthCare

Healthcare is an industry that is ripe for data-driven innovation.

Hospitals, clinics, and health insurance companies generate a tremendous amount of data daily.

This data can be used to improve the quality of care and outcomes for patients.

This is perfect timing, as the healthcare industry is undergoing a significant shift towards value-based care, which means there is a greater need than ever for data-driven decision-making.

As a result, healthcare is an exciting new research topic for data scientists.

There are many different ways in which data can be used to improve healthcare, and there is a ton of room for newcomers to make discoveries.

healthcare

30.) Remote Work

There’s no doubt that remote work is on the rise.

In today’s global economy, more and more businesses are allowing their employees to work from home or anywhere else they can get a stable internet connection.

But what does this mean for data science? Well, for one thing, it opens up a whole new field of research.

For example, how does remote work impact employee productivity?

What are the best ways to manage and collaborate on data science projects when team members are spread across the globe?

And what are the cybersecurity risks associated with working remotely?

These are just a few of the questions that data scientists will be able to answer with further research.

So if you’re looking for a new topic to sink your teeth into, remote work in data science is a great option.

31.) Data-Driven Journalism

Data-driven journalism is an exciting new field of research that combines the best of both worlds: the rigor of data science with the creativity of journalism.

By applying data analytics to large datasets, journalists can uncover stories that would otherwise be hidden.

And telling these stories compellingly can help people better understand the world around them.

Data-driven journalism is still in its infancy, but it has already had a major impact on how news is reported.

In the future, it will only become more important as data becomes increasingly fluid among journalists.

It is an exciting new topic and research field for data scientists to explore.

journalism

32.) Data Engineering

Data engineering is a staple in data science, focusing on efficiently managing data.

Data engineers are responsible for developing and maintaining the systems that collect, process, and store data.

In recent years, there has been an increasing demand for data engineers as the volume of data generated by businesses and organizations has grown exponentially.

Data engineers must be able to design and implement efficient data-processing pipelines and have the skills to optimize and troubleshoot existing systems.

If you are looking for a challenging research topic that would immediately impact you worldwide, then improving or innovating a new approach in data engineering would be a good start.

33.) Data Curation

Data curation has been a hot topic in the data science community for some time now.

Curating data involves organizing, managing, and preserving data so researchers can use it.

Data curation can help to ensure that data is accurate, reliable, and accessible.

It can also help to prevent research duplication and to facilitate the sharing of data between researchers.

Data curation is a vital part of data science. In recent years, there has been an increasing focus on data curation, as it has become clear that it is essential for ensuring data quality.

As a result, data curation is now a major research topic in data science.

There are numerous books and articles on the subject, and many universities offer courses on data curation.

Data curation is an integral part of data science and will only become more important in the future.

businessman

34.) Meta-Learning

Meta-learning is gaining a ton of steam in data science. It’s learning how to learn.

So, if you can learn how to learn, you can learn anything much faster.

Meta-learning is mainly used in deep learning, as applications outside of this are generally pretty hard.

In deep learning, many parameters need to be tuned for a good model, and there’s usually a lot of data.

You can save time and effort if you can automatically and quickly do this tuning.

In machine learning, meta-learning can improve models’ performance by sharing knowledge between different models.

For example, if you have a bunch of different models that all solve the same problem, then you can use meta-learning to share the knowledge between them to improve the cluster (groups) overall performance.

I don’t know how anyone looking for a research topic could stay away from this field; it’s what the  Terminator  warned us about!

35.) Data Warehousing

A data warehouse is a system used for data analysis and reporting.

It is a central data repository created by combining data from multiple sources.

Data warehouses are often used to store historical data, such as sales data, financial data, and customer data.

This data type can be used to create reports and perform statistical analysis.

Data warehouses also store data that the organization is not currently using.

This type of data can be used for future research projects.

Data warehousing is an incredible research topic in data science because it offers a variety of benefits.

Data warehouses help organizations to save time and money by reducing the need for manual data entry.

They also help to improve the accuracy of reports and provide a complete picture of the organization’s performance.

Data warehousing feels like one of the weakest parts of the Data Science Technology Stack; if you want a research topic that could have a monumental impact – data warehousing is an excellent place to look.

data warehousing

36.) Business Intelligence

Business intelligence aims to collect, process, and analyze data to help businesses make better decisions.

Business intelligence can improve marketing, sales, customer service, and operations.

It can also be used to identify new business opportunities and track competition.

BI is business and another tool in your company’s toolbox to continue dominating your area.

Data science is the perfect tool for business intelligence because it combines statistics, computer science, and machine learning.

Data scientists can use business intelligence to answer questions like, “What are our customers buying?” or “What are our competitors doing?” or “How can we increase sales?”

Business intelligence is a great way to improve your business’s bottom line and an excellent opportunity to dive deep into a well-respected research topic.

37.) Crowdsourcing

One of the newest areas of research in data science is crowdsourcing.

Crowdsourcing is a process of sourcing tasks or projects to a large group of people, typically via the internet.

This can be done for various purposes, such as gathering data, developing new algorithms, or even just for fun (think: online quizzes and surveys).

But what makes crowdsourcing so powerful is that it allows businesses and organizations to tap into a vast pool of talent and resources they wouldn’t otherwise have access to.

And with the rise of social media, it’s easier than ever to connect with potential crowdsource workers worldwide.

Imagine if you could effect that, finding innovative ways to improve how people work together.

That would have a huge effect.

crowd sourcing

Final Thoughts, Are These Research Topics In Data Science For You?

Thirty-seven different research topics in data science are a lot to take in, but we hope you found a research topic that interests you.

If not, don’t worry – there are plenty of other great topics to explore.

The important thing is to get started with your research and find ways to apply what you learn to real-world problems.

We wish you the best of luck as you begin your data science journey!

Other Data Science Articles

We love talking about data science; here are a couple of our favorite articles:

  • Why Are You Interested In Data Science?
  • Recent Posts

Stewart Kaplan

  • Mastering Customer Cluster Analysis: Strategies for Effective Clustering [Enhance Your Customer Insights] - May 17, 2024
  • Mastering Chroma Luxe Software: A Comprehensive Guide [Boost Your Editing Skills] - May 17, 2024
  • Maximizing Performance with Razer Gaming Software [Boost Your Gaming Setup] - May 17, 2024
  • How It Works
  • PhD thesis writing
  • Master thesis writing
  • Bachelor thesis writing
  • Dissertation writing service
  • Dissertation abstract writing
  • Thesis proposal writing
  • Thesis editing service
  • Thesis proofreading service
  • Thesis formatting service
  • Coursework writing service
  • Research paper writing service
  • Architecture thesis writing
  • Computer science thesis writing
  • Engineering thesis writing
  • History thesis writing
  • MBA thesis writing
  • Nursing dissertation writing
  • Psychology dissertation writing
  • Sociology thesis writing
  • Statistics dissertation writing
  • Buy dissertation online
  • Write my dissertation
  • Cheap thesis
  • Cheap dissertation
  • Custom dissertation
  • Dissertation help
  • Pay for thesis
  • Pay for dissertation
  • Senior thesis
  • Write my thesis

214 Best Big Data Research Topics for Your Thesis Paper

big data research topics

Finding an ideal big data research topic can take you a long time. Big data, IoT, and robotics have evolved. The future generations will be immersed in major technologies that will make work easier. Work that was done by 10 people will now be done by one person or a machine. This is amazing because, in as much as there will be job loss, more jobs will be created. It is a win-win for everyone.

Big data is a major topic that is being embraced globally. Data science and analytics are helping institutions, governments, and the private sector. We will share with you the best big data research topics.

On top of that, we can offer you the best writing tips to ensure you prosper well in your academics. As students in the university, you need to do proper research to get top grades. Hence, you can consult us if in need of research paper writing services.

Big Data Analytics Research Topics for your Research Project

Are you looking for an ideal big data analytics research topic? Once you choose a topic, consult your professor to evaluate whether it is a great topic. This will help you to get good grades.

  • Which are the best tools and software for big data processing?
  • Evaluate the security issues that face big data.
  • An analysis of large-scale data for social networks globally.
  • The influence of big data storage systems.
  • The best platforms for big data computing.
  • The relation between business intelligence and big data analytics.
  • The importance of semantics and visualization of big data.
  • Analysis of big data technologies for businesses.
  • The common methods used for machine learning in big data.
  • The difference between self-turning and symmetrical spectral clustering.
  • The importance of information-based clustering.
  • Evaluate the hierarchical clustering and density-based clustering application.
  • How is data mining used to analyze transaction data?
  • The major importance of dependency modeling.
  • The influence of probabilistic classification in data mining.

Interesting Big Data Analytics Topics

Who said big data had to be boring? Here are some interesting big data analytics topics that you can try. They are based on how some phenomena are done to make the world a better place.

  • Discuss the privacy issues in big data.
  • Evaluate the storage systems of scalable in big data.
  • The best big data processing software and tools.
  • Data mining tools and techniques are popularly used.
  • Evaluate the scalable architectures for parallel data processing.
  • The major natural language processing methods.
  • Which are the best big data tools and deployment platforms?
  • The best algorithms for data visualization.
  • Analyze the anomaly detection in cloud servers
  • The scrutiny normally done for the recruitment of big data job profiles.
  • The malicious user detection in big data collection.
  • Learning long-term dependencies via the Fourier recurrent units.
  • Nomadic computing for big data analytics.
  • The elementary estimators for graphical models.
  • The memory-efficient kernel approximation.

Big Data Latest Research Topics

Do you know the latest research topics at the moment? These 15 topics will help you to dive into interesting research. You may even build on research done by other scholars.

  • Evaluate the data mining process.
  • The influence of the various dimension reduction methods and techniques.
  • The best data classification methods.
  • The simple linear regression modeling methods.
  • Evaluate the logistic regression modeling.
  • What are the commonly used theorems?
  • The influence of cluster analysis methods in big data.
  • The importance of smoothing methods analysis in big data.
  • How is fraud detection done through AI?
  • Analyze the use of GIS and spatial data.
  • How important is artificial intelligence in the modern world?
  • What is agile data science?
  • Analyze the behavioral analytics process.
  • Semantic analytics distribution.
  • How is domain knowledge important in data analysis?

Big Data Debate Topics

If you want to prosper in the field of big data, you need to try even hard topics. These big data debate topics are interesting and will help you to get a better understanding.

  • The difference between big data analytics and traditional data analytics methods.
  • Why do you think the organization should think beyond the Hadoop hype?
  • Does the size of the data matter more than how recent the data is?
  • Is it true that bigger data are not always better?
  • The debate of privacy and personalization in maintaining ethics in big data.
  • The relation between data science and privacy.
  • Do you think data science is a rebranding of statistics?
  • Who delivers better results between data scientists and domain experts?
  • According to your view, is data science dead?
  • Do you think analytics teams need to be centralized or decentralized?
  • The best methods to resource an analytics team.
  • The best business case for investing in analytics.
  • The societal implications of the use of predictive analytics within Education.
  • Is there a need for greater control to prevent experimentation on social media users without their consent?
  • How is the government using big data; for the improvement of public statistics or to control the population?

University Dissertation Topics on Big Data

Are you doing your Masters or Ph.D. and wondering the best dissertation topic or thesis to do? Why not try any of these? They are interesting and based on various phenomena. While doing the research ensure you relate the phenomenon with the current modern society.

  • The machine learning algorithms are used for fall recognition.
  • The divergence and convergence of the internet of things.
  • The reliable data movements using bandwidth provision strategies.
  • How is big data analytics using artificial neural networks in cloud gaming?
  • How is Twitter accounts classification done using network-based features?
  • How is online anomaly detection done in the cloud collaborative environment?
  • Evaluate the public transportation insights provided by big data.
  • Evaluate the paradigm for cancer patients using the nursing EHR to predict the outcome.
  • Discuss the current data lossless compression in the smart grid.
  • How does online advertising traffic prediction helps in boosting businesses?
  • How is the hyperspectral classification done using the multiple kernel learning paradigm?
  • The analysis of large data sets downloaded from websites.
  • How does social media data help advertising companies globally?
  • Which are the systems recognizing and enforcing ownership of data records?
  • The alternate possibilities emerging for edge computing.

The Best Big Data Analysis Research Topics and Essays

There are a lot of issues that are associated with big data. Here are some of the research topics that you can use in your essays. These topics are ideal whether in high school or college.

  • The various errors and uncertainty in making data decisions.
  • The application of big data on tourism.
  • The automation innovation with big data or related technology
  • The business models of big data ecosystems.
  • Privacy awareness in the era of big data and machine learning.
  • The data privacy for big automotive data.
  • How is traffic managed in defined data center networks?
  • Big data analytics for fault detection.
  • The need for machine learning with big data.
  • The innovative big data processing used in health care institutions.
  • The money normalization and extraction from texts.
  • How is text categorization done in AI?
  • The opportunistic development of data-driven interactive applications.
  • The use of data science and big data towards personalized medicine.
  • The programming and optimization of big data applications.

The Latest Big Data Research Topics for your Research Proposal

Doing a research proposal can be hard at first unless you choose an ideal topic. If you are just diving into the big data field, you can use any of these topics to get a deeper understanding.

  • The data-centric network of things.
  • Big data management using artificial intelligence supply chain.
  • The big data analytics for maintenance.
  • The high confidence network predictions for big biological data.
  • The performance optimization techniques and tools for data-intensive computation platforms.
  • The predictive modeling in the legal context.
  • Analysis of large data sets in life sciences.
  • How to understand the mobility and transport modal disparities sing emerging data sources?
  • How do you think data analytics can support asset management decisions?
  • An analysis of travel patterns for cellular network data.
  • The data-driven strategic planning for citywide building retrofitting.
  • How is money normalization done in data analytics?
  • Major techniques used in data mining.
  • The big data adaptation and analytics of cloud computing.
  • The predictive data maintenance for fault diagnosis.

Interesting Research Topics on A/B Testing In Big Data

A/B testing topics are different from the normal big data topics. However, you use an almost similar methodology to find the reasons behind the issues. These topics are interesting and will help you to get a deeper understanding.

  • How is ultra-targeted marketing done?
  • The transition of A/B testing from digital to offline.
  • How can big data and A/B testing be done to win an election?
  • Evaluate the use of A/B testing on big data
  • Evaluate A/B testing as a randomized control experiment.
  • How does A/B testing work?
  • The mistakes to avoid while conducting the A/B testing.
  • The most ideal time to use A/B testing.
  • The best way to interpret results for an A/B test.
  • The major principles of A/B tests.
  • Evaluate the cluster randomization in big data
  • The best way to analyze A/B test results and the statistical significance.
  • How is A/B testing used in boosting businesses?
  • The importance of data analysis in conversion research
  • The importance of A/B testing in data science.

Amazing Research Topics on Big Data and Local Governments

Governments are now using big data to make the lives of the citizens better. This is in the government and the various institutions. They are based on real-life experiences and making the world better.

  • Assess the benefits and barriers of big data in the public sector.
  • The best approach to smart city data ecosystems.
  • The big analytics used for policymaking.
  • Evaluate the smart technology and emergence algorithm bureaucracy.
  • Evaluate the use of citizen scoring in public services.
  • An analysis of the government administrative data globally.
  • The public values are found in the era of big data.
  • Public engagement on local government data use.
  • Data analytics use in policymaking.
  • How are algorithms used in public sector decision-making?
  • The democratic governance in the big data era.
  • The best business model innovation to be used in sustainable organizations.
  • How does the government use the collected data from various sources?
  • The role of big data for smart cities.
  • How does big data play a role in policymaking?

Easy Research Topics on Big Data

Who said big data topics had to be hard? Here are some of the easiest research topics. They are based on data management, research, and data retention. Pick one and try it!

  • Who uses big data analytics?
  • Evaluate structure machine learning.
  • Explain the whole deep learning process.
  • Which are the best ways to manage platforms for enterprise analytics?
  • Which are the new technologies used in data management?
  • What is the importance of data retention?
  • The best way to work with images is when doing research.
  • The best way to promote research outreach is through data management.
  • The best way to source and manage external data.
  • Does machine learning improve the quality of data?
  • Describe the security technologies that can be used in data protection.
  • Evaluate token-based authentication and its importance.
  • How can poor data security lead to the loss of information?
  • How to determine secure data.
  • What is the importance of centralized key management?

Unique IoT and Big Data Research Topics

Internet of Things has evolved and many devices are now using it. There are smart devices, smart cities, smart locks, and much more. Things can now be controlled by the touch of a button.

  • Evaluate the 5G networks and IoT.
  • Analyze the use of Artificial intelligence in the modern world.
  • How do ultra-power IoT technologies work?
  • Evaluate the adaptive systems and models at runtime.
  • How have smart cities and smart environments improved the living space?
  • The importance of the IoT-based supply chains.
  • How does smart agriculture influence water management?
  • The internet applications naming and identifiers.
  • How does the smart grid influence energy management?
  • Which are the best design principles for IoT application development?
  • The best human-device interactions for the Internet of Things.
  • The relation between urban dynamics and crowdsourcing services.
  • The best wireless sensor network for IoT security.
  • The best intrusion detection in IoT.
  • The importance of big data on the Internet of Things.

Big Data Database Research Topics You Should Try

Big data is broad and interesting. These big data database research topics will put you in a better place in your research. You also get to evaluate the roles of various phenomena.

  • The best cloud computing platforms for big data analytics.
  • The parallel programming techniques for big data processing.
  • The importance of big data models and algorithms in research.
  • Evaluate the role of big data analytics for smart healthcare.
  • How is big data analytics used in business intelligence?
  • The best machine learning methods for big data.
  • Evaluate the Hadoop programming in big data analytics.
  • What is privacy-preserving to big data analytics?
  • The best tools for massive big data processing
  • IoT deployment in Governments and Internet service providers.
  • How will IoT be used for future internet architectures?
  • How does big data close the gap between research and implementation?
  • What are the cross-layer attacks in IoT?
  • The influence of big data and smart city planning in society.
  • Why do you think user access control is important?

Big Data Scala Research Topics

Scala is a programming language that is used in data management. It is closely related to other data programming languages. Here are some of the best scala questions that you can research.

  • Which are the most used languages in big data?
  • How is scala used in big data research?
  • Is scala better than Java in big data?
  • How is scala a concise programming language?
  • How does the scala language stream process in real-time?
  • Which are the various libraries for data science and data analysis?
  • How does scala allow imperative programming in data collection?
  • Evaluate how scala includes a useful REPL for interaction.
  • Evaluate scala’s IDE support.
  • The data catalog reference model.
  • Evaluate the basics of data management and its influence on research.
  • Discuss the behavioral analytics process.
  • What can you term as the experience economy?
  • The difference between agile data science and scala language.
  • Explain the graph analytics process.

Independent Research Topics for Big Data

These independent research topics for big data are based on the various technologies and how they are related. Big data will greatly be important for modern society.

  • The biggest investment is in big data analysis.
  • How are multi-cloud and hybrid settings deep roots?
  • Why do you think machine learning will be in focus for a long while?
  • Discuss in-memory computing.
  • What is the difference between edge computing and in-memory computing?
  • The relation between the Internet of things and big data.
  • How will digital transformation make the world a better place?
  • How does data analysis help in social network optimization?
  • How will complex big data be essential for future enterprises?
  • Compare the various big data frameworks.
  • The best way to gather and monitor traffic information using the CCTV images
  • Evaluate the hierarchical structure of groups and clusters in the decision tree.
  • Which are the 3D mapping techniques for live streaming data.
  • How does machine learning help to improve data analysis?
  • Evaluate DataStream management in task allocation.
  • How is big data provisioned through edge computing?
  • The model-based clustering of texts.
  • The best ways to manage big data.
  • The use of machine learning in big data.

Is Your Big Data Thesis Giving You Problems?

These are some of the best topics that you can use to prosper in your studies. Not only are they easy to research but also reflect on real-time issues. Whether in University or college, you need to put enough effort into your studies to prosper. However, if you have time constraints, we can provide professional writing help. Are you looking for online expert writers? Look no further, we will provide quality work at a cheap price.

249 Personal Narrative Ideas

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment * Error message

Name * Error message

Email * Error message

Save my name, email, and website in this browser for the next time I comment.

As Putin continues killing civilians, bombing kindergartens, and threatening WWIII, Ukraine fights for the world's peaceful future.

Ukraine Live Updates

For enquiries call:

+1-469-442-0620

banner-in1

10 Current Database Research Topic Ideas in 2024

Home Blog Database 10 Current Database Research Topic Ideas in 2024

Play icon

As we head towards the second half of 2024, the world of technology evolves at a rapid pace. With the rise of AI and blockchain, the demand for data, its management and the need for security increases rapidly. A logical consequence of these changes is the way fields like database security research topics and DBMS research have come up as the need of the hour.

With new technologies and techniques emerging day-by-day, staying up-to-date with the latest trends in database research topics is crucial. Whether you are a student, researcher, or industry professional, we recommend taking our Database Certification courses to stay current with the latest research topics in DBMS.

In this blog post, we will introduce you to 10 current database research topic ideas that are likely to be at the forefront of the field in 2024. From blockchain-based database systems to real-time data processing with in-memory databases, these topics offer a glimpse into the exciting future of database research.

So, get ready to dive into the exciting world of databases and discover the latest developments in database research topics of 2024!

Blurring the Lines between Blockchains and Database Systems 

The intersection of blockchain technology and database systems offers fertile new grounds to anyone interested in database research.

As blockchain gains popularity, many thesis topics in DBMS[1] are exploring ways to integrate both fields. This research will yield innovative solutions for data management. Here are 3 ways in which these two technologies are being combined to create powerful new solutions:

Immutable Databases: By leveraging blockchain technology, it’s possible to create databases to be immutable. Once data has been added to such a database, it cannot be modified or deleted. This is particularly useful in situations where data integrity is critical, such as in financial transactions or supply chain management.

Decentralized Databases: Blockchain technology enables the creation of decentralized databases. Here data is stored on a distributed network of computers rather than in a central location. This can help to improve data security and reduce the risk of data loss or corruption.

Smart Contracts: Smart contracts are self-executing contracts with the terms of the agreement between buyer and seller being directly written into lines of code. By leveraging blockchain technology, it is possible to create smart contracts that are stored and executed on a decentralized database, making it possible to automate a wide range of business processes.

Childhood Obesity: Data Management 

Childhood obesity is a growing public health concern, with rates of obesity among children and adolescents rising around the world. To address this issue, it’s crucial to have access to comprehensive data on childhood obesity. Analyzing information on prevalence, risk factors, and interventions is a popular research topic in DBMS these days.

Effective data management is essential for ensuring that this information is collected, stored, and analyzed in a way that is useful and actionable. This is one of the hottest DBMS research paper topics. In this section, we will explore the topic of childhood obesity data management.

A key challenge to childhood obesity data management is ensuring data consistency. This is difficult as various organizations have varied methods for measuring and defining obesity. For example:

Some may use body mass index (BMI) as a measure of obesity.

Others may use waist circumference or skinfold thickness.   Another challenge is ensuring data security and preventing unauthorized access. To protect the privacy and confidentiality of individuals, it is important to ensure appropriate safeguards are in place. This calls for database security research and appropriate application.

Application of Computer Database Technology in Marketing

Leveraging data and analytics allows businesses to gain a competitive advantage in this digitized world today. With the rising demand for data, the use of computer databases in marketing has gained prominence.

The application of database capabilities in marketing has really come into its own as one of the most popular and latest research topics in DBMS[2]. In this section, we will explore how computer database technology is being applied in marketing, and the benefits this research can offer.

Customer Segmentation: Storage and analysis of customer data makes it possible to gain valuable insights. It allows businesses to identify trends in customer behavior, preferences and demographics. This information can be utilized to create highly targeted customer segments. This is how businesses can tailor their marketing efforts to specific groups of customers.

Personalization: Computer databases can be used to store and analyze customer data in real-time. In this way, businesses can personalize their marketing and offers based on individual customer preferences. This can help increase engagement and loyalty among customers, thereby driving greater revenue for businesses.

Predictive Analytics: Advanced analytics techniques such as machine learning and predictive modeling can throw light on patterns in customer behavior. This can even be used to predict their future actions. This information can be used to create more targeted marketing campaigns, and to identify opportunities for cross-selling and upselling.

Database Technology in Sports Competition Information Management

Database technology has revolutionized the way in which sports competition information is managed and analyzed. With the increasing popularity of sports around the world, there is a growing need for effective data management systems that can collect, store, and analyze large volumes of relevant data. Thus, researching database technologies[3] is vital to streamlining operations, improving decision-making, and enhancing the overall quality of events.

Sports organizations can use database technology to collect and manage a wide range of competition-related data such as: 

Athlete and team information,

competition schedules and results,

performance metrics, and

spectator feedback.

Collating this data in a distributed database lets sports organizations easily analyze and derive valuable insights. This is emerging as a key DBMS research paper topic.

Database Technology for the Analysis of Spatio-temporal Data

Spatio-temporal data refers to data which has a geographic as well as a temporal component. Meteorological readings, GPS data, and social media content are prime examples of this diverse field. This data can provide valuable insights into patterns and trends across space and time. However, its multidimensional nature makes analysis be super challenging. It’s no surprise that this has become a hot topic for distributed database research[4].

In this section, we will explore how database technology is being used to analyze spatio-temporal data, and the benefits this research offers.

Data Storage and Retrieval: Spatio-temporal data tends to be very high-volume. Advances in database technology are needed to make storage, retrieval and consumption of such information more efficient. A solution to this problem will make such data more available. It will then be easily retrievable and usable by a variety of data analytics tools.

Spatial Indexing: Database technology can create spatial indexes to enable faster queries on spatio-temporal data. This allows analysts to quickly retrieve data for specific geographic locations or areas of interest, and to analyze trends across these areas.

Temporal Querying: Distributed database research can also enable analysts to analyze data over specific time periods. This facilitates the identification of patterns over time. Ultimately, this enhances our understanding of how these patterns evolve over various seasons.

Artificial Intelligence and Database Technology

Artificial intelligence (AI) is another sphere of technology that’s just waiting to be explored. It hints at a wealth of breakthroughs which can change the entire world. It’s unsurprising that the combination of AI with database technology is such a hot topic for database research papers[5] in modern times. 

By using AI to analyze data, organizations can identify patterns and relationships that might not be apparent through traditional data analysis methods. In this section, we will explore some of the ways in which AI and database technology are being used together. We’ll also discuss the benefits that this amalgamation can offer.

Predictive Analytics: By analyzing large volumes of organizational and business data, AI can generate predictive models to forecast outcomes. For example, AI can go through customer data stored in a database and predict who is most likely to make a purchase in the near future.

Natural Language Processing: All businesses have huge, untapped wells of valuable information in the form of customer feedback and social media posts. These types of data sources are unstructured, meaning they don’t follow rigid parameters. By using natural language processing (NLP) techniques, AI can extract insights from this data. This helps organizations understand customer sentiment, preferences and needs.

Anomaly Detection: AI can be used to analyze large volumes of data to identify anomalies and outliers. Then, a second round of analysis can be done to pinpoint potential problems or opportunities. For example, AI can analyze sensor data from manufacturing equipment and detect when equipment is operating outside of normal parameters.

Data Collection and Management Techniques of a Qualitative Research Plan

Any qualitative research calls for the collection and management of empirical data. A crucial part of the research process, this step benefits from good database management techniques. Let’s explore some thesis topics in database management systems[6] to ensure the success of a qualitative research plan.

Interviews: This is one of the most common methods of data collection in qualitative research. Interviews can be conducted in person, over the phone, or through video conferencing. A standardized interview guide ensures the data collected is reliable and accurate. Relational databases, with their inherent structure, aid in this process. They are a way to enforce structure onto the interviews’ answers.

Focus Groups: Focus groups involve gathering a small group of people to discuss a particular topic. These generate rich data by allowing participants to share their views in a group setting. It is important to select participants who have knowledge or experience related to the research topic.

Observations: Observations involve observing and recording events in a given setting. These can be conducted openly or covertly, depending on the research objective and setting. To ensure that the data collected is accurate, it is important to develop a detailed observation protocol that outlines what behaviors or events to observe, how to record data, and how to handle ethical issues.

Database Technology in Video Surveillance System 

Video surveillance systems are used to monitor and secure public spaces, workplaces, even homes. With the increasing demand for such systems, it’s important to have an efficient and reliable way to store, manage and analyze the data generated. This is where database topics for research paper [7] come in.

By using database technology in video surveillance systems, it is possible to store and manage large amounts of video data efficiently. Database management systems (DBMS) can be used to organize video data in a way that is easily searchable and retrievable. This is particularly important in cases where video footage is needed as evidence in criminal investigations or court cases.

In addition to storage and management, database technology can also be used to analyze video data. For example, machine learning algorithms can be applied to video data to identify patterns and anomalies that may indicate suspicious activity. This can help law enforcement agencies and security personnel to identify and respond to potential threats more quickly and effectively.

Application of Java Technology in Dynamic Web Database Technology 

Java technology has proven its flexibility, scalability, and ease of use over the decades. This makes it widely used in the development of dynamic web database applications. In this section, we will explore research topics in DBMS[8] which seek to apply Java technology in databases.

Java Server Pages (JSP): JSP is a Java technology that is used to create dynamic web pages that can interact with databases. It allows developers to embed Java code within HTML scripts, thereby enabling dynamic web pages. These can interact with databases in real-time, and aid in data collection and maintenance.

Java Servlets: Java Servlets are Java classes used to extend the functionality of web servers. They provide a way to handle incoming requests from web browsers and generate dynamic content that can interact with databases.

Java Database Connectivity (JDBC): JDBC is a Java API that provides a standard interface for accessing databases. It allows Java applications to connect to databases. It can SQL queries to enhance, modify or control the backend database. This enables developers to create dynamic web applications.

Online Multi Module Educational Administration System Based on Time Difference Database Technology 

With the widespread adoption of remote learning post-COVID, online educational systems are gaining popularity at a rapid pace. A ubiquitous challenge these systems face is managing multiple modules across different time zones. This is one of the latest research topics in database management systems[9].

Time difference database technology is designed to handle time zone differences in online systems. By leveraging this, it’s possible to create a multi-module educational administration system that can handle users from different parts of the world, with different time zones.

This type of system can be especially useful for online universities or other educational institutions that have a global reach:

It makes it possible to schedule classes, assignments and other activities based on the user's time zone, ensuring that everyone can participate in real-time.

In addition to managing time zones, a time difference database system can also help manage student data, course materials, grades, and other important information.

Why is it Important to Study Databases?

Databases are the backbone of many modern technologies and applications, making it essential for professionals in various fields to understand how they work. Whether you're a software developer, data analyst or a business owner, understanding databases is critical to success in today's world. Here are a few reasons why it is important to study databases and more database topics for research paper should be published:

Efficient Data Management

Databases enable the efficient storage, organization, and retrieval of data. By studying databases, you can learn how to design and implement effective data management systems that can help organizations store, analyze, and use data efficiently.

Improved Decision-Making

Data is essential for making informed decisions, and databases provide a reliable source of data for analysis. By understanding databases, you can learn how to retrieve and analyze data to inform business decisions, identify trends, and gain insights.

Career Opportunities

In today's digital age, many career paths require knowledge of databases. By studying databases, you can open up new career opportunities in software development, data analysis, database administration and related fields.

Needless to say, studying databases is essential for anyone who deals with data. Whether you're looking to start a new career or enhance your existing skills, studying databases is a critical step towards success in today's data-driven world.

Final Takeaways

In conclusion, as you are interested in database technology, we hope this blog has given you some insights into the latest research topics in the field. From blockchain to AI, from sports to marketing, there are a plethora of exciting database topics for research papers that will shape the future of database technology.

As technology continues to evolve, it is essential to stay up-to-date with the latest trends in the field of databases. Our curated KnowledgeHut Database Certification Courses will help you stay ahead of the curve and develop new skills.

We hope this blog has inspired you to explore the exciting world of database research in 2024. Stay curious and keep learning!

Frequently Asked Questions (FAQs)

There are several examples of databases, with the five most common ones being:

MySQL : An open-source RDBMS used commonly in web applications.

Microsoft SQL Server : A popular RDBMS used in enterprise environments.

Oracle : A trusted commercial RDBMS famous for its high-scalability and security.

MongoDB : A NoSQL document-oriented database optimized for storing large amounts of unstructured data.

PostgreSQL : An open-source RDBMS offering advanced features like high concurrency and support for multiple data types.

Structured Query Language (SQL) is a high-level language designed to communicate with relational databases. It’s not a database in and of itself. Rather, it’s a language used to create, modify, and retrieve data from relational databases such as MySQL and Oracle.

A primary key is a column (or a set of columns) that uniquely identifies each row in a table. In technical terms, the primary key is a unique identifier of records. It’s used as a reference to establish relationships between various tables.

Profile

Monica Gupta

I am Monica Gupta with 19+ years of experience in the field of Training and Development. I have done over 500 Corporate Trainings. I am currently working as a freelancer for several years. My core area of work is Java, C++, Angular, PHP, Python, VBA.

Avail your free 1:1 mentorship session.

Something went wrong

Upcoming Database Batches &amp; Dates

Chat icon for mobile

Data Management

  • Animal Subjects
  • Biosecurity
  • Collaboration
  • Conflicts of Interest
  • Human Subjects
  • Peer Review
  • Publication
  • Research Misconduct
  • Social Responsibility
  • Stem Cell Research
  • Whistleblowing
  • Regulations and Guidelines

What are data?

Nominal ?best practices?, responsibilities, data collection, recordkeeping, ownership of data.

In practice, even though the University or institution has legal standing to make decisions about what can or will be done with research data, it does not typically do so. Absent an explicit agreement or ruling to the contrary, the principal investigator (PI) has primary responsibility for decisions about the collection, use, and sharing of data .

Retention of Data

This depends in part on the nature of the products of research. Some materials, such as thin sections for electron microscopy, cannot be kept indefinitely because of degradation. It is also impractical to store extraordinarily large volumes of primary data. At minimum, enough data should be retained to reconstruct what was done .

Original data are the responsibility of the principal investigator (PI) and should be kept in her or his lab or office. Although most researchers have the expectation that graduating students may take copies of their research records, student or postdoctoral researchers should assume unless told otherwise that their original data will stay with the PI . If regulations or other considerations preclude researchers taking copies, then the PI has a responsibility to make this clear to the research group before work begins.

Any stored data will be rendered useless if there are insufficient records to locate and identify the material in question. Ease of access must be balanced against security , for instance if the study involved human subjects with a reasonable expectation of confidentiality. Although the institution is the legal owner of the data, it is usually the responsibility of the principal investigator to ensure that records are stored in a secure, accessible fashion.

Sharing of Data

Discussion questions, case studies.

  • Terms of Use
  • Site Feedback

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

Research data management in academic institutions: A scoping review

Laure perrier.

1 Gerstein Science Information Centre, University of Toronto, Toronto, Ontario, Canada

Erik Blondal

2 Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada

A. Patricia Ayala

Dylanne dearborn.

3 Gibson D. Lewis Health Science Library, UNT Health Science Center, Fort Worth, Texas, United States of America

David Lightfoot

4 St. Michael’s Hospital Library, St. Michael’s Hospital, Toronto, Ontario, Canada

5 Faculty of Information, University of Toronto, Toronto, Ontario, Canada

Mindy Thuna

6 Engineering & Computer Science Library, University of Toronto, Toronto, Ontario, Canada

Leanne Trimble

7 Map and Data Library, University of Toronto, Toronto, Ontario, Canada

Heather MacDonald

8 MacOdrum Library, Carleton University, Ottawa, Ontario, Canada

  • Conceptualization: LP.
  • Data curation: LP EB HM.
  • Formal analysis: LP EB.
  • Investigation: LP EB HM APA DD TK DL MT LT RR.
  • Methodology: LP.
  • Project administration: LP.
  • Supervision: LP.
  • Validation: LP HM.
  • Writing – original draft: LP.
  • Writing – review & editing: LP EB HM APA DD DL TK MT LT RR.

Associated Data

Dataset is available from the Zenodo Repository, DOI: 10.5281/zenodo.557043 .

The purpose of this study is to describe the volume, topics, and methodological nature of the existing research literature on research data management in academic institutions.

Materials and methods

We conducted a scoping review by searching forty literature databases encompassing a broad range of disciplines from inception to April 2016. We included all study types and data extracted on study design, discipline, data collection tools, and phase of the research data lifecycle.

We included 301 articles plus 10 companion reports after screening 13,002 titles and abstracts and 654 full-text articles. Most articles (85%) were published from 2010 onwards and conducted within the sciences (86%). More than three-quarters of the articles (78%) reported methods that included interviews, cross-sectional, or case studies. Most articles (68%) included the Giving Access to Data phase of the UK Data Archive Research Data Lifecycle that examines activities such as sharing data. When studies were grouped into five dominant groupings (Stakeholder, Data, Library, Tool/Device, and Publication), data quality emerged as an integral element.

Most studies relied on self-reports (interviews, surveys) or accounts from an observer (case studies) and we found few studies that collected empirical evidence on activities amongst data producers, particularly those examining the impact of research data management interventions. As well, fewer studies examined research data management at the early phases of research projects. The quality of all research outputs needs attention, from the application of best practices in research data management studies, to data producers depositing data in repositories for long-term use.

Introduction

Increased connectivity has accelerated progress in global research and estimates indicate scientific output is doubling approximately every ten years [ 1 ]. A rise in research activity results in an increase in research data output. However, data generated from research that is not prepared and stored for long-term access is at risk of being lost forever. Vines and colleagues report that the availability of data related to studies declines rapidly with the age of a study and determined that the odds of a data set being reported as available decreased 17% per year after publication)[ 2 ]. At the same time, research funding agencies and scholarly journals are progressively moving towards directives that require data management plans and demand data sharing [ 3 – 6 ]. The current research ecosystem is complex and highlights the need for focused attention on the stewardship of research data [ 1 , 7 ].

Academic institutions are multifaceted organizations that exist within the research ecosystem. Researchers practicing within universities and higher education institutions must comply with funding agency requirements when they are the recipients of research grants. For some disciplines, such as genomics and astronomy, persevering and sharing data is the norm [ 8 – 9 ] yet best practices stipulate that research be reproducible and transparent which indicates effective data management is pertinent to all disciplines.

Interest in research data management in the global community is on the rise. Recent activity has included the Bill & Melinda Gates Foundation moving their open access/open data policy, considered to be exceptionally strong, into force at the beginning of 2017 [ 10 ]. Researchers working towards a solution to the Zika virus organized themselves to publish all epidemiological and clinical data as soon as it was gathered and analyzed [ 11 ]. Fecher and colleagues [ 12 ] conducted a systematic review focusing on data sharing to support the development of a conceptual framework, however it lacked rigorous methods, such as the use of a comprehensive search strategy [ 13 ]. Another review on data sharing was conducted by Bull and colleagues [ 14 ] that examined stakeholders’ perspectives on ethical best practices but focused specifically on low- and middle-income settings. In this scoping review, we aim to assess the research literature that examines research data management as it relates to academic institutions. It is a time of increasing activity in the area of research data management [ 15 ] and higher learning institutions need to be ready to address this change, as well as provide support for their faculty and researchers. Identifying the current state of the literature so there is a clear understanding of the evidence in the area will provide guidance in planning strategies for services and support, as well as outlining essential areas for future research endeavors in research data management. The purpose of this study is to describe the volume, topics, and methodological nature of the existing research literature on research data management in academic institutions.

We conducted a scoping review using guidance from Arksey and O’Malley [ 16 ] and the Joanna Briggs Manual for Scoping Reviews [ 17 ]. A scoping review protocol was prepared and revised based on input from the research team, which included methodologists and librarians specializing in data management. It is available upon request from the corresponding author. Although traditionally applied to systematic reviews, the PRISMA Statement was used for reporting [ 18 ].

Data sources and literature search

We searched 40 electronic literature databases from inception until April 3–4, 2016. Since research data management is relevant to all disciplines, we did not restrict our search to literature databases in the sciences. This was done in order to gain an understanding of the breadth of research available and provide context for the science research literature on the topic of research data management. The search was peer-reviewed by an experienced librarian (HM) using the Peer Review of Electronic Search Strategies checklist and modified as necessary [ 19 ]. The full literature search for MEDLINE is available in the S1 File . Additional database literature searches are available from the corresponding author. Searches were performed with no year or language restrictions. We also searched conference proceedings and gray literature. The gray literature discovery process involved identifying and searching the websites of relevant organizations (such as the Association of Research Libraries, the Joint Information Systems Committee, and the Data Curation Centre). Finally, we scanned the references of included studies to identify other potentially relevant articles. The results were imported into Covidence (covidence.org) for the review team to screen the records.

Study selection

All study designs were considered, including qualitative and quantitative methods such as focus groups, interviews, cross-sectional studies, and randomized controlled trials. Eligible studies included academic institutions and reported on research data management involving areas such as infrastructure, services, and policy. We included studies from all disciplines within academic institutions with no restrictions on geographical location. Studies reporting results that accepted participants outside of academic institutions were included if 50% or more of the total sample represented respondents from academic institutions. For studies that examined entities other than human subjects, the study was included if the outcomes were pertinent to the broader research community, including academia. For example, if a sample of journal articles were retrieved to examine the data sharing statements but each study was not explicitly linked to a research sector, it was accepted into our review since the outcomes are significant to the entire research community and academia was not explicitly excluded. We excluded commentaries, editorials, or papers providing descriptions of processes that lacked a research component.

We define an academic institution as a higher education degree-granting organization dedicated to education and research. Research data management is defined as the storage, access, and preservation of data produced from a given investigation [ 20 ]. This includes issues such as creating data management plans, matters related to sharing data, delivery of services and tools, infrastructure considerations typically related to researchers, planners, librarians, and administrators.

A two-stage process was used to assess articles. Two investigators independently reviewed the retrieved titles and abstracts to identify those that met the inclusion criteria. The study selection process was pilot tested on a sample of records from the literature search. In the second stage, full-text articles of all records identified as relevant were retrieved and independently assessed by two investigators to determine if they met the inclusion criteria. Discrepancies were addressed by having a third reviewer resolve disagreements.

Data abstraction and analysis

After a training exercise, two investigators independently read each article and extracted relevant data in duplicate. Extracted data included study design, study details (such as purpose, methodology), participant characteristics, discipline, and data collection tools used to gather information for the study. In addition, articles were aligned with the research data lifecycle proposed by the United Kingdom Data Archive [ 21 ]. Although represented in a simple diagram, this framework incorporates a comprehensive set of activities (creating data, processing data, analyzing data, preserving data, giving access to data, re-using data) and actions associated with research data management clarifying the longer lifespan that data has outside of the research project that is was created within (see S2 File ). Differences in abstraction were resolved by a third reviewer. Companion reports were identified by matching the authors, timeframe for the study, and intervention. Those that were identified were used for supplementary material only. Risk of bias of individual studies was not assessed because our aim was to examine the extent, range, and nature of research activity, as is consistent with the proposed scoping review methodology [ 16 – 17 ].

We summarized the results descriptively with the use of validated guidelines for narrative synthesis [ 22 – 25 ]. Following guidance from Rodgers and colleagues, [ 22 ] data extraction tables were examined to determine the presence of dominant groups or clusters of characteristics by which the subsequent analysis could be organized. Two team members independently evaluated the abstracted data from the included articles in order to identify key characteristics and themes. Disagreement was resolved through discussion. Due to the heterogeneity of the data, articles and themes were summarized as frequencies and proportions.

Literature search

The literature search identified a total of 15,228 articles. After reviewing titles and abstracts, we retrieved 654 potentially relevant full-text articles. 301 articles were identified for inclusion in the study along with 10 companion documents ( Fig 1 ). The full list of citations for the included studies can be found in the S3 File . The five literature databases that identified the most included studies were MEDLINE (81 articles or 21.60%), Compendex (60 articles or 16%), INSPEC (55 articles or 14.67%), Library and Information Science Abstracts (52 articles or 13.87%), and BIOSIS Previews (47 articles or 12.53%). The full list of electronic databases is available in the S4 File which also includes the number of included studies traced back to their original literature database.

An external file that holds a picture, illustration, etc.
Object name is pone.0178261.g001.jpg

Characteristics of included articles

Most of the 301 articles were published from 2010 onwards (256 or 85.04%) with 15% published prior to that time ( Table 1 ). Almost half (45.85%) identified North America (Canada, United States, or Mexico) as the region where studies were conducted; however, close to one fifth of articles (18.60%) did not report where the study was conducted. Most of the articles (78.51%) reported methods that included cross-sectional (129 or 35.54%), interviews (86 or 23.69%), or case studies (70 or 19.28%), with 42 articles (out of 301) describing two or more methods. Articles were almost even for reporting qualitative evidence (44.85%) and quantitative evidence (43.85%), with mixed methods representing a smaller proportion (11.29%). Reliance was put on authors in reporting characteristics of studies and no interpretations were made with regards to how attributes of the studies were reported. As a result, some information may appear to have overlap in the reporting of disciplines. For example, health science, medicine, and biomedicine are reported separately as disciplines/subject areas. Authors identified 35 distinct disciplines in the articles with just under ten percent (8.64%) not reporting a discipline and the largest group (105 or 34.88%) being a multidisciplinary. The two disciplines reported most often were medicine and information science/library science (31 or 10.30% each). Studies were reported in 116 journals, 43 conference papers, 26 gray literature documents (e.g., reports), two book chapters, and one PhD dissertation. Almost one-third of the articles (99 or 32.89%) did not use a data collection tool (e.g., when a case study was reported) and a small number (22 or 7.31%) based their data collection tools on instruments previously reported in the literature. Most data collection tools were either developed by authors (97 or 32.23%) or no description was provided about their development (83 or 27.57%). No validated data collection tools were reported. We identified articles that offered no information on the sample size or participant characteristics, [ 26 – 29 ] as well as those that reported on the number of participants that completed the study but failed to describe how many were recruited [ 30 – 31 ].

a Percentages may not total 100 because of rounding

b Geographic region refers to where data originated, e.g., if telephone interviews were conducted with participants in France, Mexico, and Chile, the region would be listed as Multi-continent

c Categories are not mutually exclusive, i.e., multiple study designs of two or more are reported in 42 articles

d No attempt was made to create groupings, e.g., to collapse Chemistry and Science into one group

Research data lifecycle framework

Two hundred and seven (31.13%) articles aligned with the Giving Access to Data phase of the Research Data Lifecycle [ 20 ] ( Table 2 ) which include the components of distributing data, sharing data, controlling access, establishing copyright, and promoting data. The Preserving Data phase contained the next largest set of articles with 178 (26.77%). In contrast, Analysing Data and Processing Data were the two phases with the least amount of articles containing 28 (4.21%) and 49 (7.37%) respectively. Most articles (87 or 28.9%) were aligned with two phases of the Research Data Lifecycle and were followed by an almost even match of 73 (24.25%) aligning with three phases and 70 (23.26%) with one phase. Twenty-nine (9.63%) were not aligned with any phase of the Research Data Lifecycle and these included articles such as those that described education and training for librarians, or identified skill sets needed to work in research data management.

Source: UK Data Archive, Research data lifecycle. Available at: http://data-archive.ac.uk/create-manage/life-cycle

b Articles can be listed in more than one phase of the Research Data Lifecycle

Key characteristics of articles

Five dominant groupings were identified for the 301 articles ( Table 3 ). Each of these dominant groups were further categorized into subgroupings of articles to provide more granularity. The top three study types and the top three discipline/subject area is reported for each of the dominant groups. Half of the articles (151 or 50.17%) concentrated on stakeholders (Stakeholder Group), e.g., activities of researchers, publishers, participants / patients, funding agencies, 57 (18.94%) were data-focused (Data Group), e.g., investigating quality or integrity of data in repositories, development or refinement of metadata, 42 (13.95%) centered on library-related activities (Library Group), e.g., identifying skills or training for librarians working in data management, 27 (8.97%) described specific tools/applications/repositories (Tool/Device Group), e.g., introducing an electronic notebook into a laboratory, and 24 (7.97%) articles focused on the activities of publishing (Publication Group), e.g., examining data policies. The Stakeholder Group contained the largest subgroup of articles which was labelled ‘Researcher’ (119 or 39.53%).

b Articles can be listed in more than one grouping

We identified 301 articles and 10 companion documents that focus on research data management in academic institutions published between 1995 and 2016. Tracing articles back to their original literature database indicates that 86% of the studies accepted into our review were from the applied science or basic science literature indicating high activity for research in this area among the sciences. The number of published articles has risen dramatically since 2010 with 85% of articles published post-2009, signaling the increased importance and interest in this area of research. However, the limited use of study designs, deficiency in standardized or validated data collection tools, and lack of transparency in reporting demonstrate the need for attention to rigor. As well, there are limited studies that examine the impact of research data management activities (e.g., the implementation of services, training, or tools).

Few of the study designs employed in the 301 articles collected empirical evidence on activities amongst data producers such as examining changes in behavior (e.g., movement from data withholding to data sharing) or identifying changes in endeavors (e.g., strategies to increase data quality in repositories). Close to 80% of the articles rely on self-reports (e.g., participating in interviews, filling out surveys) or accounts from an observer (e.g., describing events in a case study). Case studies made up almost one-fifth of the articles examined. This group of articles ranged from question-and-answer journalistic style reports, [ 32 ] to articles that offered structured descriptions of activities and processes [ 33 ]. Although study quality was not formally assessed, this range of offerings provided challenges with data abstraction, in particular with the journalistic style accounts. If papers provided clear reporting that included declaring a purpose and describing well-defined outcomes, these articles could supply valuable input to knowledge syntheses such as a realist review [ 34 – 35 ] despite being ranked lower in the hierarchy of evidence [ 36 ]. One exception was Hruby and colleagues [ 37 ] that included a retrospective analysis in their case report that examined the impact of introducing a centralized research data repository for datasets within a urology department at Columbia University. This study offered readers a fuller understanding of the impact of a research data management intervention by providing evidence that detailed a change. Results described a reduction in the time required to complete studies, and an increase in publication quantity and quality (i.e., increase in average journal impact factor of papers published). There is opportunity for those wishing to conduct studies that provide empirical evidence for data producers and those interested in data reuse, however, for those wishing to conduct case studies, the development of reporting guidelines may be of benefit.

Using the Research Data Lifecycle framework provides the opportunity to understand where researchers are focusing their efforts in studying research data management. Most studies fell within the Giving Access to Data phase of the framework which includes activities such as sharing data and controlling access to data, and the Preserving Data phase which focuses on activities such as documenting and archiving data. This aligns with the global trend of funding agencies moving towards requirements for open access and open data [ 15 ] which includes activities such as creating metadata/documentation and sharing data in public repositories when possible. Fewer studies fell within phases that occurred at the beginning of the Research Data Lifecycle which includes activities such as writing data management plans and the preparation of data for preservation. Research in these early phases that include planning and setting up processes for handling data as it is being created may provide insight into how these activities impact later phases of the Research Data Lifecycle, in particular with regards to data quality.

Data quality was examined in several of the Groups described in Table 3 . Within the Data Group, ‘data quality and integrity’ comprised the biggest subgroup of articles. Two other subgroups in the Data Group, ‘classification systems’ and ‘repositories’, provided articles that touched on issues related to data quality as well. These issues included refining metadata and improving functionalities in repositories that enabled scholarly use and reuse of materials. Willoughby and colleagues illustrated some of the challenges related to data quality when reporting on researchers in chemistry, biology, and physics [ 38 ]. They found that when filling out metadata for a repository, researchers used a ‘minimum required’ approach. The biggest inhibitor to adding useful metadata was the ‘blank canvas’ effect, where the users may have been willing to add metadata but did not know how. The authors concluded that simply providing a mechanism to add metadata was not sufficient. Data quality, or the lack thereof, was also identified in the Publication Group, with ‘data availability, accessibility, and reuse’ and ‘data policies’ subgroups listing articles that tracked the completeness of deposited data sets, and offered assessments on the guidance offered by journals on their data sharing policies. Piwowar and Chapman analyzed whether data sharing frequency was associated with funder and publisher requirements [ 39 ]. They found that NIH (National Institute of Health) funding had little impact on data sharing despite policies that required this. Data sharing was significantly association with the impact factor of a journal (not a journal’s data sharing policy) and the experience of the first/last authors. Studies that investigate processes to improve the quality of data deposited in repositories, or strategies to increase compliance with journal or funder data sharing policies that support depositing high-quality and useable data, could potentially provide tangible guidance to investigators interested in effective data reuse.

We found a number of articles with important information not reported. This included the geographic region in which the study was conducted (56 or 18.6%) and the discipline or subject area being examined (26 or 8.64%). Data abstraction identified studies that provided no information on participant populations (such as sample size or characteristics of the participants) as well as studies that reported the number of participants who completed the study, but failed to report the number recruited. Lack of transparency and poor documentation of research is highlighted in the recent Lancet series on ‘research waste’ that calls attention to avoiding the misuse of valuable resources and the inadequate emphasis on the reproducibility of research [ 40 ]. Those conducting research in data management must recognize the importance of research integrity being reflected in all research outputs that includes both publications and data.

We identified a sizable body of literature that describes research data management related to academic institutions, with the majority of studies conducted in the applied or basic sciences. Our results should promote further research in several areas. One area includes shifting the focus of studies towards collecting empirical evidence that demonstrates the impact of interventions related to research data management. Another area that requires further attention is researching activities that demonstrate concrete improvements to the quality and usefulness of data in repositories for reuse, as well as the examining facilitators and barriers for researchers to participate in this activity. In particular, there is a gap in research that examines activities in the early phases of research projects to determine the impact of interventions at this stage. Finally, researchers investigating research data management must follow best practices in research reporting and ensure the high quality of their own research outputs that includes both publications and datasets.

Supporting information

Acknowledgments.

We thank Mikaela Gray for retrieving articles, tracking papers back to their original literature databases, and assisting with references. We also thank Lily Yuxi Ren for retrieving conference proceedings and searching the gray literature. We acknowledge Matt Gertler for screening abstracts.

Funding Statement

The authors received no specific funding for this work.

Data Availability

Economics for Disaster Prevention and Preparedness in Europe

Europe is facing overwhelming losses and destruction from climate-related disasters. From 1980 to 2022, weather and climate-related events across the EU caused total losses of about €650 billion , or around €15.5 billion per year. Recent disasters, such as floods in 2022 and wildfires in 2023, have highlighted the vulnerabilities of critical infrastructure, including emergency response buildings such as fire stations, but also roads and power lines.

To guide priority investments in disaster and climate resilience and strengthen financial resilience, the report series  Economics for Disaster Prevention and Preparedness —developed by the World Bank and the European Commission—offers evidence and tools to help countries take a more strategic approach to boost their climate resilience. These approaches are also being promoted and operationalized through the ongoing Technical Assistance Financing Facility for Disaster Prevention and Preparedness (TAFF) ,  funded by the European Commission, and implemented by the World Bank and the Global Facility for Disaster Reduction and Recovery ( GFDRR ).

From Data to Decisions: Tools for making smart investments in prevention and preparedness in Europe

Half of EU Member States have fire stations located in areas with high levels of multiple hazards including wildfires, landslides, floods, or earthquakes. Investing in disaster resilience makes economic sense , and there is an urgency to scale up investments in disaster and climate resilience in a cost-effective and smart manner. This report provides guidance and examples on how to make focused and smart investments to increase the disaster and climate resilience of critical sectors, including those that provide emergency-response services. Risk data, analytical tools, and examples can guide decision-making toward high-priority areas and enable a strategic approach that maximizes benefits of investing in resilience.

Investing in Resilience: Climate adaptation costing in a changing world

The report provides new insights into the costs for a country to adapt to the impacts of climate change, new costing approaches, and best practices with estimated ranges for various sectors and multiple risks. While the estimated cost of climate adaptation varies significantly, in the EU, climate change adaptation costs up to the 2030s are estimated(based on extrapolation from national studies) to be between €15 billion to €64 billion. As Europe grapples with the escalating risks of climate change , the urgency to develop 'adaptation pathways' is paramount. These decision-making approaches enable countries to prepare and act amidst uncertainty, informed by current and future climate risks.

Financially Prepared: The case for pre-positioned finance

Floods, earthquakes, landslides and storms, wildfires and droughts, extreme heat risks create additional pressure on already constrained response and recovery budgets. The size of a potential funding gap due to major earthquakes and floods varies between €13 billion to €50 billion . Should a drought or a wildfire happen in a year where a major earthquake or flood has already occurred, there would be no funding available at the EU level to respond to a wildfire or drought event. Countries in Europe need to enhance their financial resilience through better data utilization and innovative financial instruments, including risk transfer to the private sector.

Related reports

Economics for Disaster Prevention and Preparedness EDPP2

DOWNLOAD PDF

Economics for Disaster Prevention and Preparedness EDPP2

SUMMARY  | BACKGROUND REPORT

The World Bank

This site uses cookies to optimize functionality and give you the best possible experience. If you continue to navigate this website beyond this page, cookies will be placed on your browser. To learn more about cookies, click here .

U.S. flag

A .gov website belongs to an official government organization in the United States.

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • About Mild TBI and Concussion
  • After a Mild TBI or Concussion
  • Health Disparities in TBI
  • Comparing Head Impacts
  • Publications
  • National Concussion Surveillance System
  • Clinical Guidance
  • Mild Traumatic Brain Injury Management Guideline
  • Resources for Health Care Providers

At a glance

View the updated Mild Traumatic Brain Injury Management Guideline for Adults and other educational tools including patient discharge instructions and a checklist on diagnosis and management of mTBI.

Healthcare provider shows a screen to a patient

Mild traumatic brain injury (mTBI), commonly called concussion, affects millions of Americans each year. This injury can lead to short- or long-term problems affecting how a person thinks, acts, and feels. The American College of Emergency Physicians updated their clinical policy in 2023 to provide recommendations on the care of adult patients with mTBI seen in an emergency department .

CDC developed educational tools to help healthcare professionals use the updated policy in their practice. These educational tools include patient discharge instructions and a checklist on diagnosis and management of mTBI.

  • Updated Mild Traumatic Brain Injury Management Guideline for Adults
  • Key Recommendations for the Care of Adult Patients with Mild Traumatic Brain Injury
  • Checklist on Diagnosis and Management of mTBI

Discharge Instructions

  • Recovering from a Mild Traumatic Brain Injury or Concussion
  • Recuperación de una lesión cerebral traumática leve o conmoción cerebral

Recovery Tips

  • Tips to Feel Better After a Mild Traumatic Brain Injury or Concussion
  • Consejos para sentirse mejor después de una lesión cerebral traumática o conmoción cerebral leve

Return to Work Instructions

  • Instructions on Returning to Work
  • Instrucciones para regresar al trabajo

Traumatic Brain Injury & Concussion

A traumatic brain injury, or TBI, is an injury that affects how the brain works. TBI is a major cause of death and disability in the United States.

For Everyone

Health care providers.

IMAGES

  1. 140 Excellent Big Data Research Topics to Consider

    research topics in data management

  2. A Guide to Research Data Management

    research topics in data management

  3. WELCOME

    research topics in data management

  4. 🎉 Data management research paper topics. 140 Outstanding Big Data

    research topics in data management

  5. Data Management Plan

    research topics in data management

  6. 🎉 Data management research paper topics. 140 Outstanding Big Data

    research topics in data management

VIDEO

  1. Data analysis

  2. UniCredit's ESG Integration Journey: Streamlining Data for Sustainable Success

  3. Data Management Overview, Part 3 of 4

  4. Data Management Overview, Part 4 of 4

  5. Data Management Overview, Part 2 of 4

  6. Top 10 Human Resource Thesis research topics research paper

COMMENTS

  1. Research Topics & Ideas: Data Science

    Data Science-Related Research Topics. Developing machine learning models for real-time fraud detection in online transactions. The use of big data analytics in predicting and managing urban traffic flow. Investigating the effectiveness of data mining techniques in identifying early signs of mental health issues from social media usage.

  2. Data management

    Strategy & Execution Magazine Article. Leandro DalleMule. Thomas H. Davenport. Although the ability to manage torrents of data has become crucial to companies' success, most organizations remain ...

  3. Research data management in academic institutions: A scoping review

    Objective The purpose of this study is to describe the volume, topics, and methodological nature of the existing research literature on research data management in academic institutions. Materials and methods We conducted a scoping review by searching forty literature databases encompassing a broad range of disciplines from inception to April 2016. We included all study types and data ...

  4. PDF Essentials of data management: an overview

    Outlining a data management strategy prior to initiation of a research study plays an essential role in ensuring that both scienti c integrity (i.e., data generated can accurately test the fi ...

  5. Global research trends in research data management: A bibliometrics

    The bibliometrics analysis on the topic trends show that there were five major research topics which are identified as the fundamentals and practices of library services, and the research data sharing topics. There was a clear distinction of the interests in RDM research until 2018 and after 2019.

  6. Data management made simple

    A data-management plan explains how researchers will handle their data during and after a project, and encompasses creating, sharing and preserving research data of any type, including text ...

  7. A focus groups study on data sharing and research data management

    The need for training on data management topics is consistent with prior research 25,26,27. Within metadata standards development, this study's results point to the discipline-specific call for ...

  8. Data Management

    Google is deeply engaged in Data Management research across a variety of topics with deep connections to Google products. We are building intelligent systems to discover, annotate, and explore structured data from the Web, and to surface them creatively through Google products, such as Search (e.g., structured snippets, Docs, and many others).

  9. Understanding and Implementing Research Data Management

    The book Managing and Sharing Research Data by Corti et al. provides a detailed but simple introduction to research data management in the social sciences.It includes a lot of illustrative examples as well as further materials to get started with the different topics and aspects of managing data in a research project.

  10. Research Data Management and Sharing

    There are 5 modules in this course. This course will provide learners with an introduction to research data management and sharing. After completing this course, learners will understand the diversity of data and their management needs across the research data lifecycle, be able to identify the components of good data management plans, and be ...

  11. PDF Support Your Data: A Research Data Management Guide for Researchers

    Research data management (RDM), a term that encompassess activities related to the storage, organization, documentation, and dissemination of data*1, is central to efforts ... This prompted us to consider how to present research data management, a topic sufficiently complex as to be labelled a "wicked problem" (Awre et al. 2015), in a ...

  12. Digital Data Management, Curation, and Archiving

    Prior to embarking n a research project, our team of Research Data Librarians will be happy to work with you to identify the best data format solutions for your work. We can work with you to find the current standards for data in your field, and create data management plans that both conform to these standards and ensure long-term access to ...

  13. Full article: Challenges in research data management practices: a

    Introduction. Research Data Management (RDM) is a burgeoning field of research (Tenopir et al., Citation 2011; Zhang and Eichmann-Kalwara, Citation 2019) and RDM skills are increasingly required across all disciplines (Borghi et al., Citation 2021) as researchers take on more responsibilities to meet the demand for open and reusable data.Higman et al. (Citation 2019, p.

  14. Data Management

    Data Management. The future of computing lies in the hybrid cloud. We're creating a hybrid data fabric that provides secure, governed data access from anywhere, enables self-service discovery of the right data at the right time, and takes a holistic view at minimizing total cost of ownership for AI and analytics.

  15. 1

    Aims. The aims of this chapter are to: • introduce the topic of research data management (RDM) and what it means in practice. • explain the thinking behind the book, so you can use it effectively. A thought experiment. Imagine going to a busy researcher's office: • What would you expect to see? And if you asked them about their research ...

  16. Elsevier Researcher Academy

    Research data management. It's an increasingly common condition of funding that the data associated with your work should be made available, accessible, discoverable, and usable. Our series of data management modules contain all the information you need to help you comply with these requirements. You will also discover how sharing research ...

  17. Teaching Research Data Management with DataLad: A Multi-year ...

    Research data management has become an indispensable skill in modern neuroscience. Researchers can benefit from following good practices as well as from having proficiency in using particular software solutions. But as these domain-agnostic skills are commonly not included in domain-specific graduate education, community efforts increasingly provide early career scientists with opportunities ...

  18. Developing a Data Management Plan

    Developing a Data Management Plan. This section breaks down different topics required for the planning and preparation of data used in research at Case Western Reserve University. In this phase you should understand the research being conducted, the type and methods used for collecting data, the methods used to prepare and analyze the data ...

  19. Keeping Up With… Research Data Management

    Research Data Management (RDM) is a broad concept that includes processes undertaken to create organized, documented, accessible, and reusable quality research data. The role of the librarian is to support researchers through the research data lifecycle. ... ESIP Federation: 35 training videos about very specific topics from "Tracking Data ...

  20. 37 Research Topics In Data Science To Stay On Top Of » EML

    9.) Data Visualization. Data visualization is an excellent research topic in data science because it allows us to see our data in a way that is easy to understand. Data visualization techniques can be used to create charts, graphs, and other visual representations of data.

  21. (PDF) Research Data Management Practices and Challenges in Academic

    Comprehensive Review. Subaveerapandiyan A. Former Chief Librarian. Department of Library and Information Science. DMI-St. Eugene University, Lusaka, Zambia. Email: [email protected] ...

  22. 214 Big Data Research Topics: Interesting Ideas To Try

    These 15 topics will help you to dive into interesting research. You may even build on research done by other scholars. Evaluate the data mining process. The influence of the various dimension reduction methods and techniques. The best data classification methods. The simple linear regression modeling methods.

  23. 10 Current Database Research Topic Ideas in 2024

    10 Current Database Research Topic Ideas in 2024. As we head towards the second half of 2024, the world of technology evolves at a rapid pace. With the rise of AI and blockchain, the demand for data, its management and the need for security increases rapidly. A logical consequence of these changes is the way fields like database security ...

  24. Data Management

    Data are the empirical basis for scientific findings. The integrity of research depends on integrity in all aspects of data management, including the collection, use, storage, and sharing of data. Data are not just numbers in a lab notebook. Depending on the research, data might include: images, audio or video recordings, genetically modified ...

  25. Research data management in academic institutions: A scoping review

    The purpose of this study is to describe the volume, topics, and methodological nature of the existing research literature on research data management in academic institutions. Materials and methods We conducted a scoping review using guidance from Arksey and O'Malley [ 16 ] and the Joanna Briggs Manual for Scoping Reviews [ 17 ].

  26. Data Management/Data Warehousing Topics

    Data Warehousing. Data warehousing captures data from a variety of sources so it can be accessed and analyzed by business analysts, data scientists and other end users. One goal is to enhance data quality and consistency for analytics uses while improving business intelligence. Read how data warehousing provides these and other unique benefits ...

  27. Director Data Management

    Data is one of the key differentiation factors for the success of a research organization and effective data management is a crucial component to thrive in the digital age. As the Director of Data Management, you are a leader in this key discipline with responsibility for propagating and integrating modern and state of the art data management ...

  28. Financial business trends elucidated with data

    Nomura Research Institute (NRI) ... #Management information visualization and utilization. #Operation and system integration. #IT optimization. #IT resource reinforcement. #Digital Workplace. #RPA. ... Financial business trends elucidated with data. #Hisashi Kaneko. May. 15, 2024.

  29. Economics for Disaster Prevention and Preparedness in Europe

    Europe is facing overwhelming losses and destruction from climate-related disasters. From 1980 to 2022, weather and climate-related events across the EU caused total losses of about €650 billion, or around €15.5 billion per year. Recent disasters, such as floods in 2022 and wildfires in 2023, have highlighted the vulnerabilities of critical infrastructure, including emergency response ...

  30. Mild Traumatic Brain Injury Management Guideline

    Overview. Mild traumatic brain injury (mTBI), commonly called concussion, affects millions of Americans each year. This injury can lead to short- or long-term problems affecting how a person thinks, acts, and feels. The American College of Emergency Physicians updated their clinical policy in 2023 to provide recommendations on the care of adult ...