CodeAvail

21 Interesting Data Science Capstone Project Ideas [2024]

data science capstone project ideas

Data science, encompassing the analysis and interpretation of data, stands as a cornerstone of modern innovation. 

Capstone projects in data science education play a pivotal role, offering students hands-on experience to apply theoretical concepts in practical settings. 

These projects serve as a culmination of their learning journey, providing invaluable opportunities for skill development and problem-solving. 

Our blog is dedicated to guiding prospective students through the selection process of data science capstone project ideas. It offers curated ideas and insights to help them embark on a fulfilling educational experience. 

Join us as we navigate the dynamic world of data science, empowering students to thrive in this exciting field.

Data Science Capstone Project: A Comprehensive Overview

Table of Contents

Data science capstone projects are an essential component of data science education, providing students with the opportunity to apply their knowledge and skills to real-world problems. 

Capstone projects challenge students to acquire and analyze data to solve real-world problems. These projects are designed to test students’ skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning. 

In addition, capstone projects are conducted with industry, government, and academic partners, and most projects are sponsored by an organization. 

The projects are drawn from real-world problems, and students work in teams consisting of two to four students and a faculty advisor. 

However, the goal of the capstone project is to create a usable/public data product that can be used to show students’ skills to potential employers. 

Best Data Science Capstone Project Ideas – According to Skill Level

Data science capstone projects are a great way to showcase your skills and apply what you’ve learned in a real-world context. Here are some project ideas categorized by skill level:

best data science capstone project ideas - according to skill level

Beginner-Level Data Science Capstone Project Ideas

beginner-level data science capstone project ideas

1. Exploratory Data Analysis (EDA) on a Dataset

Start by analyzing a dataset of your choice and exploring its characteristics, trends, and relationships. Practice using basic statistical techniques and visualization tools to gain insights and present your findings clearly and understandably.

2. Predictive Modeling with Linear Regression

Build a simple linear regression model to predict a target variable based on one or more input features. Learn about model evaluation techniques such as mean squared error and R-squared, and interpret the results to make meaningful predictions.

3. Classification with Decision Trees

Use decision tree algorithms to classify data into distinct categories. Learn how to preprocess data, train a decision tree model, and evaluate its performance using metrics like accuracy, precision, and recall. Apply your model to practical scenarios like predicting customer churn or classifying spam emails.

4. Clustering with K-Means

Explore unsupervised learning by applying the K-Means algorithm to group similar data points together. Practice feature scaling and model evaluation to identify meaningful clusters within your dataset. Apply your clustering model to segment customers or analyze patterns in market data.

5. Sentiment Analysis on Text Data

Dive into natural language processing (NLP) by analyzing text data to determine sentiment polarity (positive, negative, or neutral). 

Learn about tokenization, text preprocessing, and sentiment analysis techniques using libraries like NLTK or spaCy. Apply your skills to analyze product reviews or social media comments.

6. Time Series Forecasting

Predict future trends or values based on historical time series data. Learn about time series decomposition, trend analysis, and seasonal patterns using methods like ARIMA or exponential smoothing. Apply your forecasting skills to predict stock prices, weather patterns, or sales trends.

7. Image Classification with Convolutional Neural Networks (CNNs)

Explore deep learning concepts by building a basic CNN model to classify images into different categories. 

Learn about convolutional layers, pooling, and fully connected layers, and experiment with different architectures to improve model performance. Apply your CNN model to tasks like recognizing handwritten digits or classifying images of animals.

Intermediate-Level Data Science Capstone Project Ideas

intermediate-level data science capstone project ideas

8. Customer Segmentation and Market Basket Analysis

Utilize advanced clustering techniques to segment customers based on their purchasing behavior. Conduct market basket analysis to identify frequent item associations and recommend personalized product suggestions. 

Implement techniques like the Apriori algorithm or association rules mining to uncover valuable insights for targeted marketing strategies.

9. Time Series Anomaly Detection

Apply anomaly detection algorithms to identify unusual patterns or outliers in time series data. Utilize techniques such as moving average, Z-score, or autoencoders to detect anomalies in various domains, including finance, IoT sensors, or network traffic. 

Develop robust anomaly detection models to enhance data security and predictive maintenance.

10. Recommendation System Development

Build a recommendation engine to suggest personalized items or content to users based on their preferences and behavior. Implement collaborative filtering, content-based filtering, or hybrid recommendation approaches to improve user engagement and satisfaction. 

Evaluate the performance of your recommendation system using metrics like precision, recall, and mean average precision.

11. Natural Language Processing for Topic Modeling

Dive deeper into NLP by exploring topic modeling techniques to extract meaningful topics from text data. 

Implement algorithms like Latent Dirichlet Allocation (LDA) or Non-Negative Matrix Factorization (NMF) to identify hidden themes or subjects within large text corpora. Apply topic modeling to analyze customer feedback, news articles, or academic papers.

12. Fraud Detection in Financial Transactions

Develop a fraud detection system using machine learning algorithms to identify suspicious activities in financial transactions. Utilize supervised learning techniques such as logistic regression, random forests, or gradient boosting to classify transactions as fraudulent or legitimate. 

Employ feature engineering and model evaluation to improve fraud detection accuracy and minimize false positives.

13. Predictive Maintenance for Industrial Equipment

Implement predictive maintenance techniques to anticipate equipment failures and prevent costly downtime. 

Analyze sensor data from machinery using machine learning algorithms like support vector machines or recurrent neural networks to predict when maintenance is required. Optimize maintenance schedules to minimize downtime and maximize operational efficiency.

14. Healthcare Data Analysis and Disease Prediction

Utilize healthcare datasets to analyze patient demographics, medical history, and diagnostic tests to predict the likelihood of disease occurrence or progression. 

Apply machine learning algorithms such as logistic regression, decision trees, or support vector machines to develop predictive models for diseases like diabetes, cancer, or heart disease. Evaluate model performance using metrics like sensitivity, specificity, and area under the ROC curve.

Advanced Level Data Science Capstone Project Ideas

advanced level data science capstone project ideas

15. Deep Learning for Image Generation

Explore generative adversarial networks (GANs) or variational autoencoders (VAEs) to generate realistic images from scratch. Experiment with architectures like DCGAN or StyleGAN to create high-resolution images of faces, landscapes, or artwork. 

Evaluate image quality and diversity using perceptual metrics and human judgment.

16. Reinforcement Learning for Game Playing

Implement reinforcement learning algorithms like deep Q-learning or policy gradients to train agents to play complex games like Atari or board games. 

Experiment with exploration-exploitation strategies and reward-shaping techniques to improve agent performance and achieve superhuman levels of gameplay.

17. Anomaly Detection in Streaming Data

Develop real-time anomaly detection systems to identify abnormal behavior in streaming data streams such as network traffic, sensor readings, or financial transactions. 

Utilize online learning algorithms like streaming k-means or Isolation Forest to detect anomalies and trigger timely alerts for intervention.

18. Multi-Modal Sentiment Analysis

Extend sentiment analysis to incorporate multiple modalities such as text, images, and audio to capture rich emotional expressions. 

However, utilize deep learning architectures like multimodal transformers or fusion models to analyze sentiment across different modalities and improve understanding of complex human emotions.

19. Graph Neural Networks for Social Network Analysis

Apply graph neural networks (GNNs) to model and analyze complex relational data in social networks. Use techniques like graph convolutional networks (GCNs) or graph attention networks (GATs) to learn node embeddings and predict node properties such as community detection or influential users.

20. Time Series Forecasting with Deep Learning

Explore advanced deep learning architectures like long short-term memory (LSTM) networks or transformer-based models for time series forecasting. 

Utilize attention mechanisms and multi-horizon forecasting to capture long-term dependencies and improve prediction accuracy in dynamic and volatile environments.

21. Adversarial Robustness in Machine Learning

Investigate techniques to improve the robustness of machine learning models against adversarial attacks. 

Explore methods like adversarial training, defensive distillation, or certified robustness to mitigate vulnerabilities and ensure model reliability in adversarial perturbations, particularly in critical applications like autonomous vehicles or healthcare.

These project ideas cater to various skill levels in data science, ranging from beginners to experts. Choose a project that aligns with your interests and skill level, and don’t hesitate to experiment and learn along the way!

Factors to Consider When Choosing a Data Science Capstone Project

Choosing the right data science capstone project is crucial for your learning experience and effectively showcasing your skills. Here are some factors to consider when selecting a data science capstone project:

Personal Interest

Select a project that aligns with your passions and career goals to stay motivated and engaged throughout the process.

Data Availability

Ensure access to relevant and sufficient data to complete the project and draw meaningful insights effectively.

Complexity Level

Consider your current skill level and choose a project that challenges you without overwhelming you, allowing for growth and learning.

Real-World Impact

Aim for projects with practical applications or societal relevance to showcase your ability to solve tangible problems.

Resource Requirements

Evaluate the availability of resources such as time, computing power, and software tools needed to execute the project successfully.

Mentorship and Support

Seek projects with opportunities for guidance and feedback from mentors or peers to enhance your learning experience.

Novelty and Innovation

Explore projects that push boundaries and explore new techniques or approaches to demonstrate creativity and originality in your work.

Tips for Successfully Completing a Data Science Capstone Project

Successfully completing a data science capstone project requires careful planning, effective execution, and strong communication skills. Here are some tips to help you navigate through the process:

  • Plan and Prioritize: Break down the project into manageable tasks and create a timeline to stay organized and focused.
  • Understand the Problem: Clearly define the project objectives, requirements, and expected outcomes before analyzing.
  • Explore and Experiment: Experiment with different methodologies, algorithms, and techniques to find the most suitable approach.
  • Document and Iterate: Document your process, results, and insights thoroughly, and iterate on your analyses based on feedback and new findings.
  • Collaborate and Seek Feedback: Collaborate with peers, mentors, and stakeholders, actively seeking feedback to improve your work and decision-making.
  • Practice Communication: Communicate your findings effectively through clear visualizations, reports, and presentations tailored to your audience’s understanding.
  • Reflect and Learn: Reflect on your challenges, successes, and lessons learned throughout the project to inform your future endeavors and continuous improvement.

By following these tips, you can successfully navigate the data science capstone project and demonstrate your skills and expertise in the field.

Wrapping Up

In wrapping up, data science capstone project ideas are invaluable in bridging the gap between theory and practice, offering students a chance to apply their knowledge in real-world scenarios.

They are a cornerstone of data science education, fostering critical thinking, problem-solving, and practical skills development. 

As you embark on your journey, don’t hesitate to explore diverse and challenging project ideas. Embrace the opportunity to push boundaries, innovate, and make meaningful contributions to the field. 

Share your insights, challenges, and successes with others, and invite fellow enthusiasts to exchange ideas and experiences. 

1. What is the purpose of a data science capstone project?

A data science capstone project serves as a culmination of a student’s learning experience, allowing them to apply their knowledge and skills to solve real-world problems in the field of data science. It provides hands-on experience and showcases their ability to analyze data, derive insights, and communicate findings effectively.

2. What are some examples of data science capstone projects?

Data science capstone projects can cover a wide range of topics and domains, including predictive modeling, natural language processing, image classification, recommendation systems, and more. Examples may include analyzing customer behavior, predicting stock prices, sentiment analysis on social media data, or detecting anomalies in financial transactions.

3. How long does it typically take to complete a data science capstone project?

The duration of a data science capstone project can vary depending on factors such as project complexity, available resources, and individual pace. Generally, it may take several weeks to several months to complete a project, including tasks such as data collection, preprocessing, analysis, modeling, and presentation of findings.

Related Posts

Science Fair Project Ideas For 6th Graders

Science Fair Project Ideas For 6th Graders

When it comes to Science Fair Project Ideas For 6th Graders, the possibilities are endless! These projects not only help students develop essential skills, such…

Java Project Ideas For Beginners

Java Project Ideas for Beginners

Java is one of the most popular programming languages. It is used for many applications, from laptops to data centers, gaming consoles, scientific supercomputers, and…

jamiefosterscience logo

10 Unique Data Science Capstone Project Ideas

A capstone project is a culminating assignment that allows students to demonstrate the skills and knowledge they’ve acquired throughout their degree program. For data science students, it’s a chance to tackle a substantial real-world data problem.

If you’re short on time, here’s a quick answer to your question: Some great data science capstone ideas include analyzing health trends, building a predictive movie recommendation system, optimizing traffic patterns, forecasting cryptocurrency prices, and more .

In this comprehensive guide, we will explore 10 unique capstone project ideas for data science students. We’ll overview potential data sources, analysis methods, and practical applications for each idea.

Whether you want to work with social media datasets, geospatial data, or anything in between, you’re sure to find an interesting capstone topic.

Project Idea #1: Analyzing Health Trends

When it comes to data science capstone projects, analyzing health trends is an intriguing idea that can have a significant impact on public health. By leveraging data from various sources, data scientists can uncover valuable insights that can help improve healthcare outcomes and inform policy decisions.

Data Sources

There are several data sources that can be used to analyze health trends. One of the most common sources is electronic health records (EHRs), which contain a wealth of information about patient demographics, medical history, and treatment outcomes.

Other sources include health surveys, wearable devices, social media, and even environmental data.

Analysis Approaches

When analyzing health trends, data scientists can employ a variety of analysis approaches. Descriptive analysis can provide a snapshot of current health trends, such as the prevalence of certain diseases or the distribution of risk factors.

Predictive analysis can be used to forecast future health outcomes, such as predicting disease outbreaks or identifying individuals at high risk for certain conditions. Machine learning algorithms can be trained to identify patterns and make accurate predictions based on large datasets.

Applications

The applications of analyzing health trends are vast and far-reaching. By understanding patterns and trends in health data, policymakers can make informed decisions about resource allocation and public health initiatives.

Healthcare providers can use these insights to develop personalized treatment plans and interventions. Researchers can uncover new insights into disease progression and identify potential targets for intervention.

Ultimately, analyzing health trends has the potential to improve overall population health and reduce healthcare costs.

Project Idea #2: Movie Recommendation System

When developing a movie recommendation system, there are several data sources that can be used to gather information about movies and user preferences. One popular data source is the MovieLens dataset, which contains a large collection of movie ratings provided by users.

Another source is IMDb, a trusted website that provides comprehensive information about movies, including user ratings and reviews. Additionally, streaming platforms like Netflix and Amazon Prime also provide access to user ratings and viewing history, which can be valuable for building an accurate recommendation system.

There are several analysis approaches that can be employed to build a movie recommendation system. One common approach is collaborative filtering, which uses user ratings and preferences to identify patterns and make recommendations based on similar users’ preferences.

Another approach is content-based filtering, which analyzes the characteristics of movies (such as genre, director, and actors) to recommend similar movies to users. Hybrid approaches that combine both collaborative and content-based filtering techniques are also popular, as they can provide more accurate and diverse recommendations.

A movie recommendation system has numerous applications in the entertainment industry. One application is to enhance the user experience on streaming platforms by providing personalized movie recommendations based on individual preferences.

This can help users discover new movies they might enjoy and improve overall satisfaction with the platform. Additionally, movie recommendation systems can be used by movie production companies to analyze user preferences and trends, aiding in the decision-making process for creating new movies.

Finally, movie recommendation systems can also be utilized by movie critics and reviewers to identify movies that are likely to be well-received by audiences.

For more information on movie recommendation systems, you can visit https://www.kaggle.com/rounakbanik/movie-recommender-systems or https://www.researchgate.net/publication/221364567_A_new_movie_recommendation_system_for_large-scale_data .

Project Idea #3: Optimizing Traffic Patterns

When it comes to optimizing traffic patterns, there are several data sources that can be utilized. One of the most prominent sources is real-time traffic data collected from various sources such as GPS devices, traffic cameras, and mobile applications.

This data provides valuable insights into the current traffic conditions, including congestion, accidents, and road closures. Additionally, historical traffic data can also be used to identify recurring patterns and trends in traffic flow.

Other data sources that can be used include weather data, which can help in understanding how weather conditions impact traffic patterns, and social media data, which can provide information about events or incidents that may affect traffic.

Optimizing traffic patterns requires the use of advanced data analysis techniques. One approach is to use machine learning algorithms to predict traffic patterns based on historical and real-time data.

These algorithms can analyze various factors such as time of day, day of the week, weather conditions, and events to predict traffic congestion and suggest alternative routes.

Another approach is to use network analysis to identify bottlenecks and areas of congestion in the road network. By analyzing the flow of traffic and identifying areas where traffic slows down or comes to a halt, transportation authorities can make informed decisions on how to optimize traffic flow.

The optimization of traffic patterns has numerous applications and benefits. One of the main benefits is the reduction of traffic congestion, which can lead to significant time and fuel savings for commuters.

By optimizing traffic patterns, transportation authorities can also improve road safety by reducing the likelihood of accidents caused by congestion.

Additionally, optimizing traffic patterns can have positive environmental impacts by reducing greenhouse gas emissions. By minimizing the time spent idling in traffic, vehicles can operate more efficiently and emit fewer pollutants.

Furthermore, optimizing traffic patterns can have economic benefits by improving the flow of goods and services. Efficient traffic patterns can reduce delivery times and increase productivity for businesses.

Project Idea #4: Forecasting Cryptocurrency Prices

With the growing popularity of cryptocurrencies like Bitcoin and Ethereum, forecasting their prices has become an exciting and challenging task for data scientists. This project idea involves using historical data to predict future price movements and trends in the cryptocurrency market.

When working on this project, data scientists can gather cryptocurrency price data from various sources such as cryptocurrency exchanges, financial websites, or APIs. Websites like CoinMarketCap (https://coinmarketcap.com/) provide comprehensive data on various cryptocurrencies, including historical price data.

Additionally, platforms like CryptoCompare (https://www.cryptocompare.com/) offer real-time and historical data for different cryptocurrencies.

To forecast cryptocurrency prices, data scientists can employ various analysis approaches. Some common techniques include:

  • Time Series Analysis: This approach involves analyzing historical price data to identify patterns, trends, and seasonality in cryptocurrency prices. Techniques like moving averages, autoregressive integrated moving average (ARIMA), or exponential smoothing can be used to make predictions.
  • Machine Learning: Machine learning algorithms, such as random forests, support vector machines, or neural networks, can be trained on historical cryptocurrency data to predict future price movements. These algorithms can consider multiple variables, such as trading volume, market sentiment, or external factors, to make accurate predictions.
  • Sentiment Analysis: This approach involves analyzing social media sentiment and news articles related to cryptocurrencies to gauge market sentiment. By considering the collective sentiment, data scientists can predict how positive or negative sentiment can impact cryptocurrency prices.

Forecasting cryptocurrency prices can have several practical applications:

  • Investment Decision Making: Accurate price forecasts can help investors make informed decisions when buying or selling cryptocurrencies. By considering the predicted price movements, investors can optimize their investment strategies and potentially maximize their returns.
  • Trading Strategies: Traders can use price forecasts to develop trading strategies, such as trend following or mean reversion. By leveraging predicted price movements, traders can make profitable trades in the volatile cryptocurrency market.
  • Risk Management: Cryptocurrency price forecasts can help individuals and organizations manage their risk exposure. By understanding potential price fluctuations, risk management strategies can be implemented to mitigate losses.

Project Idea #5: Predicting Flight Delays

One interesting and practical data science capstone project idea is to create a model that can predict flight delays. Flight delays can cause a lot of inconvenience for passengers and can have a significant impact on travel plans.

By developing a predictive model, airlines and travelers can be better prepared for potential delays and take appropriate actions.

To create a flight delay prediction model, you would need to gather relevant data from various sources. Some potential data sources include:

  • Flight data from airlines or aviation organizations
  • Weather data from meteorological agencies
  • Historical flight delay data from airports

By combining these different data sources, you can build a comprehensive dataset that captures the factors contributing to flight delays.

Once you have collected the necessary data, you can employ different analysis approaches to predict flight delays. Some common approaches include:

  • Machine learning algorithms such as decision trees, random forests, or neural networks
  • Time series analysis to identify patterns and trends in flight delay data
  • Feature engineering to extract relevant features from the dataset

By applying these analysis techniques, you can develop a model that can accurately predict flight delays based on the available data.

The applications of a flight delay prediction model are numerous. Airlines can use the model to optimize their operations, improve scheduling, and minimize disruptions caused by delays. Travelers can benefit from the model by being alerted in advance about potential delays and making necessary adjustments to their travel plans.

Additionally, airports can use the model to improve resource allocation and manage passenger flow during periods of high delay probability. Overall, a flight delay prediction model can significantly enhance the efficiency and customer satisfaction in the aviation industry.

Project Idea #6: Fighting Fake News

With the rise of social media and the easy access to information, the spread of fake news has become a significant concern. Data science can play a crucial role in combating this issue by developing innovative solutions.

Here are some aspects to consider when working on a project that aims to fight fake news.

When it comes to fighting fake news, having reliable data sources is essential. There are several trustworthy platforms that provide access to credible news articles and fact-checking databases. Websites like Snopes and FactCheck.org are good starting points for obtaining accurate information.

Additionally, social media platforms such as Twitter and Facebook can be valuable sources for analyzing the spread of misinformation.

One approach to analyzing fake news is by utilizing natural language processing (NLP) techniques. NLP can help identify patterns and linguistic cues that indicate the presence of misleading information.

Sentiment analysis can also be employed to determine the emotional tone of news articles or social media posts, which can be an indicator of potential bias or misinformation.

Another approach is network analysis, which focuses on understanding how information spreads through social networks. By analyzing the connections between users and the content they share, it becomes possible to identify patterns of misinformation dissemination.

Network analysis can also help in identifying influential sources and detecting coordinated efforts to spread fake news.

The applications of a project aiming to fight fake news are numerous. One possible application is the development of a browser extension or a mobile application that provides users with real-time fact-checking information.

This tool could flag potentially misleading articles or social media posts and provide users with accurate information to help them make informed decisions.

Another application could be the creation of an algorithm that automatically identifies fake news articles and separates them from reliable sources. This algorithm could be integrated into news aggregation platforms to help users distinguish between credible and non-credible information.

Project Idea #7: Analyzing Social Media Sentiment

Social media platforms have become a treasure trove of valuable data for businesses and researchers alike. When analyzing social media sentiment, there are several data sources that can be tapped into. The most popular ones include:

  • Twitter: With its vast user base and real-time nature, Twitter is often the go-to platform for sentiment analysis. Researchers can gather tweets containing specific keywords or hashtags to analyze the sentiment of a particular topic.
  • Facebook: Facebook offers rich data for sentiment analysis, including posts, comments, and reactions. Analyzing the sentiment of Facebook posts can provide valuable insights into user opinions and preferences.
  • Instagram: Instagram’s visual nature makes it an interesting platform for sentiment analysis. By analyzing the comments and captions on Instagram posts, researchers can gain insights into the sentiment associated with different images or topics.
  • Reddit: Reddit is a popular platform for discussions on various topics. By analyzing the sentiment of comments and posts on specific subreddits, researchers can gain insights into the sentiment of different communities.

These are just a few examples of the data sources that can be used for analyzing social media sentiment. Depending on the research goals, other platforms such as LinkedIn, YouTube, and TikTok can also be explored.

When it comes to analyzing social media sentiment, there are various approaches that can be employed. Some commonly used analysis techniques include:

  • Lexicon-based analysis: This approach involves using predefined sentiment lexicons to assign sentiment scores to words or phrases in social media posts. By aggregating these scores, researchers can determine the overall sentiment of a post or a collection of posts.
  • Machine learning: Machine learning algorithms can be trained to classify social media posts into positive, negative, or neutral sentiment categories. These algorithms learn from labeled data and can make predictions on new, unlabeled data.
  • Deep learning: Deep learning techniques, such as recurrent neural networks (RNNs) or convolutional neural networks (CNNs), can be used to capture the complex patterns and dependencies in social media data. These models can learn to extract sentiment information from textual or visual content.

It is important to note that the choice of analysis approach depends on the specific research objectives, available resources, and the nature of the social media data being analyzed.

Analyzing social media sentiment has a wide range of applications across different industries. Here are a few examples:

  • Brand reputation management: By analyzing social media sentiment, businesses can monitor and manage their brand reputation. They can identify potential issues, respond to customer feedback, and take proactive measures to maintain a positive image.
  • Market research: Social media sentiment analysis can provide valuable insights into consumer opinions and preferences. Businesses can use this information to understand market trends, identify customer needs, and develop targeted marketing strategies.
  • Customer feedback analysis: Social media sentiment analysis can help businesses understand customer satisfaction levels and identify areas for improvement. By analyzing sentiment in customer feedback, companies can make data-driven decisions to enhance their products or services.
  • Public opinion analysis: Researchers can analyze social media sentiment to study public opinion on various topics, such as political events, social issues, or product launches. This information can be used to understand public sentiment, predict trends, and inform decision-making.

These are just a few examples of how analyzing social media sentiment can be applied in real-world scenarios. The insights gained from sentiment analysis can help businesses and researchers make informed decisions, improve customer experience, and drive innovation.

Project Idea #8: Improving Online Ad Targeting

Improving online ad targeting involves analyzing various data sources to gain insights into users’ preferences and behaviors. These data sources may include:

  • Website analytics: Gathering data from websites to understand user engagement, page views, and click-through rates.
  • Demographic data: Utilizing information such as age, gender, location, and income to create targeted ad campaigns.
  • Social media data: Extracting data from platforms like Facebook, Twitter, and Instagram to understand users’ interests and online behavior.
  • Search engine data: Analyzing search queries and user behavior on search engines to identify intent and preferences.

By combining and analyzing these diverse data sources, data scientists can gain a comprehensive understanding of users and their ad preferences.

To improve online ad targeting, data scientists can employ various analysis approaches:

  • Segmentation analysis: Dividing users into distinct groups based on shared characteristics and preferences.
  • Collaborative filtering: Recommending ads based on users with similar preferences and behaviors.
  • Predictive modeling: Developing algorithms to predict users’ likelihood of engaging with specific ads.
  • Machine learning: Utilizing algorithms that can continuously learn from user interactions to optimize ad targeting.

These analysis approaches help data scientists uncover patterns and insights that can enhance the effectiveness of online ad campaigns.

Improved online ad targeting has numerous applications:

  • Increased ad revenue: By delivering more relevant ads to users, advertisers can expect higher click-through rates and conversions.
  • Better user experience: Users are more likely to engage with ads that align with their interests, leading to a more positive browsing experience.
  • Reduced ad fatigue: By targeting ads more effectively, users are less likely to feel overwhelmed by irrelevant or repetitive advertisements.
  • Maximized ad budget: Advertisers can optimize their budget by focusing on the most promising target audiences.

Project Idea #9: Enhancing Customer Segmentation

Enhancing customer segmentation involves gathering relevant data from various sources to gain insights into customer behavior, preferences, and demographics. Some common data sources include:

  • Customer transaction data
  • Customer surveys and feedback
  • Social media data
  • Website analytics
  • Customer support interactions

By combining data from these sources, businesses can create a comprehensive profile of their customers and identify patterns and trends that will help in improving their segmentation strategies.

There are several analysis approaches that can be used to enhance customer segmentation:

  • Clustering: Using clustering algorithms to group customers based on similar characteristics or behaviors.
  • Classification: Building predictive models to assign customers to different segments based on their attributes.
  • Association Rule Mining: Identifying relationships and patterns in customer data to uncover hidden insights.
  • Sentiment Analysis: Analyzing customer feedback and social media data to understand customer sentiment and preferences.

These analysis approaches can be used individually or in combination to enhance customer segmentation and create more targeted marketing strategies.

Enhancing customer segmentation can have numerous applications across industries:

  • Personalized marketing campaigns: By understanding customer preferences and behaviors, businesses can tailor their marketing messages to individual customers, increasing the likelihood of engagement and conversion.
  • Product recommendations: By segmenting customers based on their purchase history and preferences, businesses can provide personalized product recommendations, leading to higher customer satisfaction and sales.
  • Customer retention: By identifying at-risk customers and understanding their needs, businesses can implement targeted retention strategies to reduce churn and improve customer loyalty.
  • Market segmentation: By identifying distinct customer segments, businesses can develop tailored product offerings and marketing strategies for each segment, maximizing the effectiveness of their marketing efforts.

Project Idea #10: Building a Chatbot

A chatbot is a computer program that uses artificial intelligence to simulate human conversation. It can interact with users in a natural language through text or voice. Building a chatbot can be an exciting and challenging data science capstone project.

It requires a combination of natural language processing, machine learning, and programming skills.

When building a chatbot, data sources play a crucial role in training and improving its performance. There are various data sources that can be used:

  • Chat logs: Analyzing existing chat logs can help in understanding common user queries, responses, and patterns. This data can be used to train the chatbot on how to respond to different types of questions and scenarios.
  • Knowledge bases: Integrating a knowledge base can provide the chatbot with a wide range of information and facts. This can be useful in answering specific questions or providing detailed explanations on certain topics.
  • APIs: Utilizing APIs from different platforms can enhance the chatbot’s capabilities. For example, integrating a weather API can allow the chatbot to provide real-time weather information based on user queries.

There are several analysis approaches that can be used to build an efficient and effective chatbot:

  • Natural Language Processing (NLP): NLP techniques enable the chatbot to understand and interpret user queries. This involves tasks such as tokenization, part-of-speech tagging, named entity recognition, and sentiment analysis.
  • Intent recognition: Identifying the intent behind user queries is crucial for providing accurate responses. Machine learning algorithms can be trained to classify user intents based on the input text.
  • Contextual understanding: Chatbots need to understand the context of the conversation to provide relevant and meaningful responses. Techniques such as sequence-to-sequence models or attention mechanisms can be used to capture contextual information.

Chatbots have a wide range of applications in various industries:

  • Customer support: Chatbots can be used to handle customer queries and provide instant support. They can assist with common troubleshooting issues, answer frequently asked questions, and escalate complex queries to human agents when necessary.
  • E-commerce: Chatbots can enhance the shopping experience by assisting users in finding products, providing recommendations, and answering product-related queries.
  • Healthcare: Chatbots can be deployed in healthcare settings to provide preliminary medical advice, answer general health-related questions, and assist with appointment scheduling.

Building a chatbot as a data science capstone project not only showcases your technical skills but also allows you to explore the exciting field of artificial intelligence and natural language processing.

It can be a great opportunity to create a practical and useful tool that can benefit users in various domains.

Completing an in-depth capstone project is the perfect way for data science students to demonstrate their technical skills and business acumen. This guide outlined 10 unique project ideas spanning industries like healthcare, transportation, finance, and more.

By identifying the ideal data sources, analysis techniques, and practical applications for their chosen project, students can produce an impressive capstone that solves real-world problems and showcases their abilities.

Similar Posts

The Top 10 Universities In Europe For Computer Science

The Top 10 Universities In Europe For Computer Science

Europe is home to some of the most prestigious computer science programs in the world. With a long history of technology innovation and research, European universities provide first-class education in computing fields. If you’re short on time, here’s a quick answer: Some of the best universities in Europe for computer science include University of Oxford,…

Best Science Fair Ideas For 6Th Graders

Best Science Fair Ideas For 6Th Graders

As a 6th grader, participating in a science fair is an exciting opportunity to showcase your knowledge and skills in science, technology, engineering, and math (STEM). But coming up with an interesting, doable science fair project idea can be challenging. If you’re short on time, here’s a quick answer to finding great science fair ideas…

Does Computer Science Require Math? Examining The Math In Cs

Does Computer Science Require Math? Examining The Math In Cs

If you’re exploring computer science careers, you may be wondering – does computer science require math? With complex programming languages and algorithms involved, it’s a fair question to ask. In this comprehensive guide, we’ll analyze the role math plays in computer science education and careers. We’ll look at specific math topics covered, examine math-heavy CS…

Ap Computer Science A Vs Ap Computer Science Principles: How Do They Compare?

Ap Computer Science A Vs Ap Computer Science Principles: How Do They Compare?

AP Computer Science A and AP Computer Science Principles are two popular AP courses for high school students interested in computer science. But what are the key differences between these classes? If you’re short on time, here’s a quick answer: AP CS A focuses on programming in Java and covers fundamental data structures and algorithms….

Is Geography An Art Or A Science?

Is Geography An Art Or A Science?

Geography occupies a unique space among scholarly disciplines with its blend of scientific analysis and creative expression. But is it fundamentally an art or a science? The quick answer is that geography is considered a science due to its quantitative analysis methods and study of physical processes. However, its qualitative aspects and ability to interpret…

Notre Dame Computer Science Ranking

Notre Dame Computer Science Ranking

The University of Notre Dame is recognized nationally for its reputable computer science program. If you’re short on time, here’s the quick answer: Notre Dame CS ranks among the top 25 computer science programs in the United States. This detailed guide will explore Notre Dame’s CS program reputation, academics, admissions competitiveness, career outcomes, and more….

Data Science: Capstone

To become an expert you need practice and experience..

Show what you’ve learned from the Professional Certificate Program in Data Science.

Harvard School of Public Health Logo

What You'll Learn

To become an expert data scientist you need practice and experience. By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning.

Unlike the rest of our Professional Certificate Program in Data Science , in this course, you will receive much less guidance from the instructors. When you complete the project you will have a data product to show off to potential employers or educational programs, a strong indicator of your expertise in the field of data science.

The course will be delivered via edX and connect learners around the world. By the end of the course, participants will understand the following concepts:

  • How to apply the knowledge base and skills learned throughout the series to a real-world problem
  • How to independently work on a data analysis project

Your Instructors

Rafael Irizarry

Rafael Irizarry

Professor of Biostatistics at Harvard University Read full bio.

Ways to take this course

When you enroll in this course, you will have the option of pursuing a Verified Certificate or Auditing the Course.

A Verified Certificate costs $149 and provides unlimited access to full course materials, activities, tests, and forums. At the end of the course, learners who earn a passing grade can receive a certificate. 

Alternatively, learners can Audit the course for free and have access to select course material, activities, tests, and forums.  Please note that this track does not offer a certificate for learners who earn a passing grade.

Introduction to Linear Models and Matrix Algebra

Learn to use R programming to apply linear models to analyze data in life sciences.

High-Dimensional Data Analysis

A focus on several techniques that are widely used in the analysis of high-dimensional data.

Introduction to Bioconductor

Join Harvard faculty in this online course to learn the structure, annotation, normalization, and interpretation of genome scale assays.

Capstone Projects

Education is one of the pillars of the data science institute..

Through educational activities, we strive to create a community in Data Science at Columbia. The capstone project is one of the most lauded elements of our MS in Data Science program. As a final step during their study at Columbia, our MS students work on a project sponsored by a DSI industry affiliate or a faculty member over the course of a semester.

Faculty-Sponsored Capstone Projects

A DSI faculty member proposes a research project and advises a team of students working on this project. This is a great way to run a research project with enthusiastic students, eager to try out their newly acquired data science skills in a research setting. This is especially a good opportunity for developing and accelerating interdisciplinary collaboration.

2023-2024 Academic Year: July 15, 2023 via this form

Project Archive

  • Spring 2022
  • Spring 2020
  • Spring 2019
  • Spring 2018
  • Spring 2016

Data Science: Capstone

Show what you’ve learned from the Professional Certificate Program in Data Science.

Stained glass windows arranged in a spiraling shape

Associated Schools

Harvard T.H. Chan School of Public Health

Harvard T.H. Chan School of Public Health

What you'll learn.

How to apply the knowledge base and skills learned throughout the series to a real-world problem

Independently work on a data analysis project

Course description

To become an expert data scientist you need practice and experience. By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning.

Unlike the rest of our Professional Certificate Program in Data Science, in this course, you will receive much less guidance from the instructors. When you complete the project you will have a data product to show off to potential employers or educational programs, a strong indicator of your expertise in the field of data science.

Instructors

Rafael Irizarry

Rafael Irizarry

You may also like.

Purple and teal geometric shapes

Data Science: Inference and Modeling

Learn inference and modeling: two of the most widely used statistical tools in data analysis.

Colorful confetti against a blue background

Data Science: Probability

Learn probability theory — essential for a data scientist — using a case study on the financial crisis of 2007–2008.

lines of genomic data (dna is made up of sequences of a, t, g, c)

High-Dimensional Data Analysis

A focus on several techniques that are widely used in the analysis of high-dimensional data.

Join our list to learn more

_IACS Shield width130height130_orig (1)_

 Wednesdays @ 12:45pm - 3:00pm SEC LL2.223 (Allston Campus)

Capstone research project course, ac297r, fall 2022 weiwei pan, founded by the institute for applied computational science (iacs)'s  scientific program director,  pavlos protopapas , the capstone research course is a group-based research experience where students work directly with a partner from industry, government, academia, or an ngo to solve a real-world data science/ computation problem. students will create a solution in the form of a software package, which will require varying levels of research. upon completion of this challenging project, students will be better equipped to conduct research and enter the professional world. every class session includes a guest lecture concerning various essential skills for one's career -- from public speaking, reading and writing research papers, how to work remotely on a team, everything about start-ups, and more..

  • Utility Menu

University Logo

Guide to the ALM Capstone Project

Customstyles.

  • Course Catalog

Data Science Capstone

This capstone course is the culmination of the Master of Liberal Arts, data science, where students execute their research proposal from  CSCI S-597 . It  gives students the opportunity to collaborate on a complex research topic using their data science skills.  At the completion of the capstone, students are able to demonstrate their ability to think critically about data, communicate with diverse audiences, and advance innovation in ways that benefit society.

Capstone Proposal Tutorial and Capstone Sequencing

The semester prior to capstone enrollment (no earlier), you register for the on-campus precapstone: CSCI E-597 Data Science Precapstone . Ordinarily the on-campus precapstone tutorial is offered during the three-week January session and one, three-week summer session.

The Precapstone prepares students to explore interdisciplinary research topics from a variety of industries and areas. Through workshops and collaborating with experts from different disciplines, students identify research topics, apply the appropriate data science methods, and use data to advance innovative solutions. Students receive guidance and advising to work effectively in teams, refine project proposals, and build the domain knowledge necessary in their selected area. By the end of the course, each team submits a detailed research proposal, including project rationale, methods, and expected outcomes, which they intend to execute during CSCI E-599a.

The semester right after the precapstone, you enroll in the online capstone, CSCI E-599a Data Science Capstone , as your final one-and-only course, either in the fall or the spring. Due to the heavy demands of the capstone, it is considered a full-time course.  All other degree requirements must be fulfilled so you can draw upon your entire ALM training to produce a final project worthy of a Harvard degree.

Sample Pathway

You need to complete 12 courses (48 credits) to earn the degree. 

  • You'll register for the precapstone in the summer as your 11th course. Then in the fall, you'll register for the capstone as your 12th and final course.
  • You'll register for the precapstone in the January term as your 11th course. Then in the spring, you'll register for the capstone as your 12th and final course.

Bruce Huang, EdD, PhD, Director of Master's Degree Program in Information Technology, Harvard Extension School

Capstone Projects

The culminating experience in the Master’s in Applied Data Science program is a Capstone Project where you’ll put your knowledge and skills into practice . You will immerse yourself in a real business problem and will gain valuable, data driven insights using authentic data. Together with project sponsors, you will develop a data science solution to address organization problems, enhance analytics capabilities, and expand talent pools and employment opportunities. Leveraging the university’s rich research portfolio, you also have the option to join a research-focused team .

Selected Capstone Projects

Copd readmission and cost reduction assessment, an nfl ticket pricing study: optimizing revenue using variable and dynamic pricing methods, using image recognition to identify yoga poses, using image recognition to measure the speed of a pitch, real-time credit card fraud detection, interested in becoming a capstone sponsor.

The Master’s in Applied Data Science program accepts projects year-round for placement at the beginning of every quarter, with the Spring quarter being the largest cohort. All projects must be submitted no later than one month prior to the beginning of the preferred starting quarter based on the UChicago academic calendar .

Capstone Sponsor Incentives

Sponsors derive measurable benefits from this unique opportunity to support higher education. Partner organizations propose real-world problems, untested ideas or research queries. Students review them from the perspective of data scientists trained to generate actionable insights that provide long-term value. Through the project, Capstone partners gain access to a symbiotic pool of world-class students, highly accomplished instructors, and cited researchers, resulting in optimized utilization of modern data science-based methods, using your data. Further, for many sponsors, the project becomes a meaningful source of recruitment through the excellent pool of students who work on your project.

Capstone Sponsor Obligations

While there is no monetary cost or contract necessary to sponsor a project, we do consider this a partnership. Teams comprised of four students and guided by an instructor and subject matter expert are provided with expectations from the capstone sponsor and learning objectives, assignments, and evaluation requirements from instructors. In turn, Capstone partners should be prepared to provide the following:

  • A detailed problem statement with a description of the data and expected results
  • Two or more points of contact
  • Access to data relevant to the project by the first week of the applicable quarter
  • Engagement through regular meetings (typically bi-weekly) while classes are in session
  • If requested, a non-disclosure agreement that may be completed by the student team

Interested in Becoming a Capstone or Industry Research Partner?

Get in touch with us to submit your idea for a collaboration or ask us questions about how the partnership process works.

Apply Today

The application portal for entrance in Autumn 2024 is now open ! Explore our In-Person and Online programs.

cds official logo

NYU Center for Data Science

Harnessing Data’s Potential for the World

Master’s in Data Science

  • Industry Concentration
  • Admission Requirements
  • Capstone Project
  • Summer Research Initiative
  • Financial Aid
  • MS Admissions Ambassadors
  • Summer Initiative

CDS master’s students have a unique opportunity to solve real-world problems through the capstone course in the final year of their program. The capstone course is designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills.

Students are matched with research labs within the NYU community and with industry partners to investigate pressing issues, applying data science to the following areas:

  • Probability and statistical analyses
  • Natural language processing
  • Big Data analysis and modeling
  • Machine learning and computational statistics
  • Coding and software engineering
  • Visualization modeling
  • Neural networks
  • Signal processing
  • High dimensional statistics

Capstone projects present students with the opportunity to work in their field of interest and gain exposure to applicable solutions. Project sponsors, NYU labs, and external partners, in turn receive the benefit of having a new perspective applied to their projects.

“Capstone is a unique opportunity for students to solve real world problems through projects carried out in collaboration with industry partners or research labs within the NYU community,” says capstone advisor and CDS Research Fellow Anastasios Noulas. “It is a vital experience for students ahead of their graduation and prior to entering the market, as it helps them improve their skills, especially in problem solving contexts that are atypical compared to standard courses offered in the curriculum. Cooperation within teams is another crucial skill built through the Capstone experience as projects are typically run across groups of 2 to 4 people.”

The Capstone Project offers the opportunity for organizations to propose a project that our graduate students will work on as part of their curriculum for one semester. Information on the course along with a questionnaire to propose a project, can be found on the Capstone Fall 2024 Project Submission Form . If you have any questions, please reach out to [email protected] .

Best Fall 2023 Capstone Posters

capstone project for data science

Multimodal NLP for M&A Agreements

Student Authors: Harsh Asrani, Chaitali Joshi, Tayyibah Khanam, Ansh Riyal | Project Mentors: Vlad Kobzar, Kyunghyun Cho

capstone project for data science

  • Partisan Bias and the US Federal Court System

Student Authors: Annabelle Huether, Mary Nwangwu, Allison Redfern | Project Mentors: Aaron Kaufman, Jon Rogowski

Best Fall 2023 Student Voted Posters

capstone project for data science

User-Centric AI Models for Assisting the Blind

Student Authors: Gail Batutis, Aradhita Bhandari, Aryan Jain, Mallory Sico | Project Mentors: Giles Hamilton-Fletcher, Chen Feng, Kevin C. Chan

capstone project for data science

  • Multi-Modal Foundation Models for Medicine

Student Authors: Yunming Chen, Harry Huang, Jordan Tian, Ning Yang | Project Mentors: Narges Razavian

Best Fall 2023 Student Voted Runner-Up Posters

capstone project for data science

  • Representational geometry of learning rules in neural networks

Student Authors: Ghana Bandi, Shiyu Ling, Shreemayi Sonti, Zoe Xiao | Project Mentors: SueYeon Chung, Chi-Ning Chou

capstone project for data science

  • Medical Data Leakage with Multi-site Collaborative Training

Student Authors: Christine Gao, Ciel Wang, Yuqi Zhang | Project Mentors: Qi Lei

Fall 2023 Capstone Project List

  • Segmentation of Metastatic Brain Tumors Using Deep Learning
  • Discovering misinformation narratives from suspended tweets using embedding-based clustering algorithms
  • Network Intrusion Detection Systems using Machine Learning
  • Knowledge Extraction from Pathology Reports Using LLMs
  • Building an Interactive Browser for Epigenomic & Functional Maps from the Viewpoint of Disease Association
  • Prediction of Acute Pancreatitis Severity Using CT Imaging and Deep Learning
  • User-centric AI models for assisting the blind
  • A machine learning model to predict future kidney function in patients undergoing treatment for kidney masses
  • Fine-Tuning of MedSAM for the Automated Segmentation of Musculoskeletal MRI for Bone Topology Evaluation and Radiomic Analysis
  • Online News Content Neural Network Recommendation Engine
  • Explanatory Modeling for Website Traffic Movements
  • Egocentric video zero-shot object detection
  • Leverage OncoKB’s Curated Literature Database to Build an NLP Biomarker Identifier
  • Improving Out-of-Distribution Generalization in Neural Models for Astrophics and Cosmology?
  • Preparing a Flood Risk Index for the State of Assam, India
  • Causal GANs
  • Bringing Structure to Emergent Taxonomies from Open-Ended CMS Tags
  • Social Network Analysis of Hospital Communication Networks
  • Multimodal Question Answering
  • Does resolution matter for transfer learning with satelitte imagery?
  • Measuring Optimizer-Agnostic Hyperparameter Tuning Difficulty
  • Extracting causal political narratives from text.
  • Designing Principled Training Methods for Deep Neural Networks
  • Multimodal NLP for M&A Agreements
  • Using Deep Learning to Solve Forward-Backward Stochastic Differential Equations
  • OptiComm: Maximizing Medical Communication Success with Advanced Analytics
  • Automated assessment of epilepsy subtypes using patient-generated language data
  • Predicting cancer drug response of patients from their alteration and clinical data
  • Identify & Summarize top key events for a given company from News Data using ML and NLP Models
  • Developing predictive shooting accuracy metric(s) for First-Person-Shooter esports
  • Supporting Student Success through Pipeline Curricular Analysis
  • Transformers for Electronic Health Records
  • Build Models for Multilingual Medical Coding
  • Metadata Extraction from Spoken Interactions Between Mothers and Young Children
  • Uncertainty Radius Selection in Distributionally Robust Portfolio Optimization
  • Unveiling Insights into Employee Benefit Plans and Insurance Dynamics
  • Advanced Name Screening and Entity Linking Using large language models
  • What Keeps the Public Safe While Avoiding Excessive Use of Incarceration? Supporting Data-Centered Decisionmaking in a DA’s Office
  • Foundation Models for Brain Imaging
  • Housing Price Forecasting – Alternative Approaches
  • Evaluating the Capability of Large Language Models to Measure Psychiatric Functioning
  • Predicting year-end success using deep neural network (DNN) architecture

Best Fall 2022 Capstone Posters

Leveraging Computer Vision to Map Cell Tower Locations to Enhance School Connectivity poster

  • Leveraging Computer Vision to Map Cell Tower Locations to Enhance School Connectivity

Student Authors: Lorena Piedras, Priya Dhond, and Alejandro Sáez | Mentors: Iyke Derek Maduako (UNICEF)

Neural Re-Ranking for Personalized Home Search poster

  • Neural Re-Ranking for Personalized Home Search

Student Authors: Giacomo Bugli, Luigi Noto, Guilherme Albertini | Mentors: Shourabh Rawat, Niranjan Krishna, and Andreas Rubin-Schwarz

Sequence Modeling for Query Understanding & Conversational Search poster

Sequence Modeling for Query Understanding & Conversational Search

Student Authors: Lucas Tao, Evelyn Wang, Jun Wang, Cecilia Wu | Mentors: Amir Rahmani, Arun Balagopalan, Shourabh Rawat, and Najoung Kim

 Solving challenging video games in human-like ways poster

  • Solving challenging video games in human-like ways

Student Authors: Brian Pennisi, Jiawen Wu, Adeet Patel, and Sarvesh Patki | Mentors: Todd Gureckis (NYU)

Best Fall 2022 Student Voted Posters

Deep Learning Framework for Segmentation of Medical Images poster

  • Deep Learning Framework for Segmentation of Medical Images

Student Authors: Luoyao Chen, Mei Chen, Jinqian Pan | Mentors: Jacopo Cirrone (NYU)

Galaxy Dataset Distillation poster

  • Galaxy Dataset Distillation

Student Authors: Xu Han, Jason Wang, Chloe Zheng | Mentors: Julia Kempe (NYU)

Best Fall 2022 Runner-Up Posters

Dementia Detection from FLAIR MRI via Deep Learning poster

  • Dementia Detection from FLAIR MRI via Deep Learning

Student Authors: Jiawen Fan, Aiqing Li | Mentors: Narges Razavian (NYU Langone)

Ego4d NLQ: Egocentric Visual Learning of Representations and Episodic Memory poster

  • Ego4d NLQ: Egocentric Visual Learning of Representations and Episodic Memory

Student Authors: Dongdong Sun; Rui Chen; Ying Wang | Mentors: Mengye Ren (NYU)

Learning User Representations from Zillow Search Sessions using Transformer Architectures poster

  • Learning User Representations from Zillow Search Sessions using Transformer Architectures

Student Authors: Xu Han, Jason Wang, Chloe Zheng | Mentors: Shourabh Rawat (Zillow Group)

Methane Emission Quantification through Satellite Images poster

  • Methane Emission Quantification through Satellite Images

Student Authors: Alex Herron, Dhruv Saxena, Xiangyue Wang | Mentors: Robert Huppertz (orbio.earth)

Fall 2022 Capstone Project List

  • Data Science for Clinical Decision-making Support in Radiation Therapy
  • Using Voter File Data to Study Electoral Reform
  • Creating an Epigenomic Map of the Heart
  • Career Recommendation
  • Calibrating for Class Weights
  • Assigning Locations to Detected Stops using LSTM
  • Impact of YMCA Facilities on the Local Neighborhoods of Bronx
  • Powering SMS Product Recommendations with Deep Learning
  • Evaluation and Performance Comparison of Two Models in Classifying Cosmological Simulation Parameters
  • Crypto Anomaly Detection
  • Sequence Modeling for Query Understanding & Conversational Search
  • Multi-Modal Graph Inductive Learning with CLIP Embeddings
  • Multimodal Contract Segmentation
  • Extraction of Causal Narratives from News Articles
  • Detecting Erroneous Geospatial Data
  • Improving Speech Recognition Performance using Synthetic Data
  • Multi-document Summarization for News Events
  • Multi-task learning in orthogonal low dimensional parameter manifolds
  • Let’s Go Shopping: An Investigation Into a New Bimodal E-Commerce Dataset
  • Training AI to recognize objects of interest to the blind community
  • Classify Classroom Activities using Ambient Sound
  • Database and Dashboard for RII
  • Bitcoin Price Prediction Using Machine Learning Models
  • Context Driven Approach to Detecting Cross-Platform Coordinated Influence Campaigns
  • Invalid Traffic Detection Model Deployment
  • Recalled Experiences of Death: Using Transformers to Understand Experiences and Themes
  • Context-Based Content Extraction & Summarization from News Articles
  • Neural Learning to Rank for Personalized Home Search
  • Improve Speech Recognition Performance Using Unpaired Audio and Text
  • Data Normalization & Generalization to Population Metrics
  • Automated Judicial Case Briefing
  • Cyber Threat Detection for News Articles
  • MLS Fan Segmentation
  • Near Real-Time Estimation of Beef and Dairy Feedlot Greenhouse Gas Emissions
  • Do Better Batters Face Higher or Lower Quality Pitches?

Previous Capstone Projects

Best fall 2021 capstone posters.

capstone project for data science

  • Question Answering on Long Context

Student Authors: Xinli Gu, Di He, Congyun Jin | Project Mentor: Jocelyn Beauchesne (Hyperscience)

capstone project for data science

Multimodal Self-Supervised Deep Learning with Chest X-Rays and EHR Data

Student Authors: Adhham Zaatri, Emily Mui, Yechan Lew | Project Mentor: Sumit Chopra (NYU Langone)

capstone project for data science

Head and Neck CT Segmentation Using Deep Learning

Student Authors: Pengyun Ding, Tianyu Zhang | Project Mentor: Ye Yuan (NYU Langone)

capstone project for data science

  • 3D Astrophysical Simulation with Transformer

Student Authors: Elliot Dang, Tong Li, Zheyuan Hu | Project Mentor: Shirley Ho (Flatiron Institute)

capstone project for data science

Multimodal Representations for Document Understanding (Best Student Voted Poster)

Student Authors: Pavel Gladkevich, David Trakhtenberg, Ted Xie, Duey Xu | Project Mentor: Shourabh Rawat (Zillow Group)

2021 Capstone Project List

  • Accelerated Learning in the Context of Language Acquisition
  • Analysis of Cardiac Signals on Patients with Atrial Fibrillation
  • Applications of Neural Radiance Fields in Astronomy
  • Automatic Detection of Alzheimer’s Disease with Multi-Modal Fusion of Clinical MRI Scans
  • Automatic Transcription of Speech on SAYCam
  • Automatic Volumetric Segmentation of Brain Tumor Using Deep Learning for Radiation Oncology
  • Automatically Identify Applicants Who Require Physician’s Reports
  • Building a Question-Answer Generation Pipeline for The New York Times
  • Coupled Energy-Based Models and Normalizing Flows for Unsupervised Learning
  • Data Classification Processing for Clinical Decision-making Support in Radiation Therapy
  • Deep Active Learning for Protest Detection
  • Estimating Intracranial Pressure Using OCT Scans of the Eyeball
  • Graph Neural Networks for Electronic Health Record (EHR) Data
  • Head and Neck CT Image Segmentation
  • Head Movement Measurement During Structural MRI
  • Image Segmentation for Vestibular Schwannoma
  • Investigation into the Functionality of Key, Query, Value Sub-modules of a Transformer
  • Know Your Worth: An Analysis of Job Salaries
  • Machine learning-based computational phenotyping of electronic health records
  • Modeling the Speed Accuracy Tradeoff in Decision-Making
  • Multi-modal Breast Cancer Detection
  • Multi-Modal Deep Learning with Medical Images and EHR Data
  • Multimodal Representations for Document Understanding
  • Nematode Counting
  • News Clustering and Summarization
  • Post-surgical resection mapping in epilepsy using CNNs
  • Predicting Grandstanding in the Supreme Court through Speech
  • Predicting Probability of Post-Colectomy Hospital Readmission
  • Prediction of Total Knee Replacement Using Radiographs and Clinical Risk Factors
  • Reinforcement Learning for Option Hedging
  • Representation Learning Regarding RNA-RBP Binding
  • Self-Supervised Learning of Medical Image Representations Using Radiology Reports
  • The Study of American Public Policy with NLP
  • Topical Aggregation and Timeline Extraction on the NYT Corpus
  • Unsupervised Deep Denoiser for Electron-Microscope Data
  • Using Deep Learning and FBSDEs to Solve Option Pricing and Trading Problems
  • Vision Language Models for Real Estate Images and Descriptions

Featured 2020 Capstone Projects

Speak or Chat with Me Paper Chart

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

By Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson MoraisJain

Accented Speech Paper Chart

Accented Speech Recognition Inspired by Human Perception

By Xiangyun Chu, Elizabeth Combs, Amber Wang, Michael Picheny

Diarization of Legal Proceedings Paper Chart

Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

By Jeffrey Tumminia, Amanda Kuznecov, Sophia Tsilerides, Ilana Weinstein, Brian McFee, Michael Picheny, Aaron R. Kaufman

2020 Capstone Project List

  • 2D to 3D Video Generation for Surgery (Best Capstone Poster)
  • Action Primitive Recognition with Sequence to Sequence Models towards Stroke Rehabilitation
  • Applying Self-learning Methods on Histopathology Whole Slide Images
  • Applying Transformers Models to Scanned Documents: An Application in Industry
  • Beyond Bert-based Financial Sentimental Classification: Label Noise and Company Information
  • Bias and Stability in Hiring Algorithms (Best Capstone Poster)
  • Breast Cancer Detection using Self-supervised Learning Method
  • Catastrophic Forgetting: An Extension of Current Approaches (Best Capstone Poster)
  • ClinicalLongformer: Public Available Transformers Language Models for Long Clinical Sequences
  • Complication Prediction of Bariatric Surgery
  • Constraining Search Space for Hardware Configurations
  • D4J: Data for Justice to Advance Transparency and Fairness
  • Data-driven Diesel Insights
  • Deep Learning to Study Pathophysiology in Dermatomyositis
  • Detection Of Drug-Target Interactions Using BioNLP
  • Determining RNA Alternative Splicing Patterns
  • Developing a Data Ecosystem for Refugee Integration Insights
  • Diarizing Legal Proceedings
  • Estimating the Impact of the Home Health Value-Based Purchasing Model
  • Extracting economic sentiment from mainstream media articles
  • Food Trend Detection in Chinese Financial Market
  • Forecasting Biodiesel Auction Prices
  • Generative Adversarial Networks for Electron Microscope Image Denoising
  • Graph Embedding for Question Answering over Knowledge Graphs
  • Impact of NYU Wasserman Resources on Students’ Career Outcomes
  • Improving Accented Speech Recognition Through Multi-Accent Pre-Exposure
  • Improving Synthetic Image Generation for Better Object Detection
  • Learning-based Model for Super-resolution in Microscopy Imaging
  • Modeling Human Reading by a Grapheme-to-Phoneme Neural Network
  • Movement Classification of Macaque Neural Activity
  • New OXXO Store in Brazil and Revenue Prediction
  • Numerical Relativity Interpolations using Deep Learning
  • One Medical Passport: Predictive Obstructive Sleep Apnea Analysis
  • Online Student Pathways at New York University
  • Predicting YouTube Trending Video Project
  • Promotional Forecasting Model for Profit Optimization
  • Question Answering on Tabular Data with NLP
  • Raizen Fuel Demand Forecasting
  • Reach for the stars: detecting astronomical transients
  • Reverse Engineering the MOS 6502 Microprocessor
  • Selecting Optimal Training Sets
  • Synthesizing baseball data with event prediction pretraining
  • Train ETA Estimation for Rumo S.A.
  • Training a Generalizable End-to-End Speech-to-Intent Model
  • Utilizing Machine Learning for Career Advancement and Professional Growth

Best Fall 2019 Capstone Projects

Wikipedia Articles poster

  • Inferring the Topic(s) of Wikipedia Articles

By Marina Zavalina, Sarthak Agarwal, Chinmay Singhal, Peeyush Jain

portfolio replication poster

Option Portfolio Replication and Hedging in Deep Reinforcement Learning

By Bofei Zhang, Jiayi Du, Yixuan Wang, Muyang Jin

Deep-Learning Regressions in Astronomy poster

Adversarial Attacks Against Linear and Deep-Learning Regressions in Astronomy

By Teresa Huang, Zacharie Martin, Greg Scanlon, Eva Wang Mentors: Soledad Villar, David W. Hogg

2019 Capstone Project List

  • Adversarial Attacks Against Linear and Deep-learning Regressions in Astronomy
  • Automated Breast Cancer Screening
  • Automatic Legal Case Summaries
  • Cross-task Transfer Between Language Understanding Tasks in NLP
  • Dark Matter and Stellar Stream Detection using Deep Learned Clustering
  • Exploiting Google Street View to Generate Global-scale Data Sets for Training Next Generation Cyber-Physical Systems
  • Federated Incremental Learning
  • Fraud Detection in Monetary Transactions Between Bank Accounts
  • Guided Image Upsampling
  • Improving State of the Art Cross-Lingual Word-Embeddings
  • Latent Semantic Topics Distribution Over Web Content Corpus
  • Lease Renewal Probability Prediction
  • Machine Learning for Adaptive Fuzzy String Matching
  • Market Segmentation from Retailer Behavior
  • Modeling the Experienced Dental Curriculum from Student Data
  • Modelling NBA Games
  • Movie Preference Prediction
  • MRI Image Reconstruction
  • NLP Metalearning
  • Predict next sales office location

Predicting Stock Market Movements using Public Sentiment Data & Sequential Deep Learning Models

  • Predictive Maintenance Techniques
  • Reinforcement Learning for Replication and Hedging of Option
  • Self-supervised Machine Listening

Sentence Classification of TripAdvisor ‘Points-of-Interest’ Reviews

  • Simulating the Dark Matter Distribution of the Universe with Deep Learning
  • SMaPP2: Joint Embedding of User-content and Network Structure to Enable a Common coordinate that captures ideology, geography and user topic spectrum.”
  • Sparse Deconvolution Methods for Microscopy Imaging Data Analysis
  • Stereotype and Unconscious Bias in Large Datasets
  • Structuring Exploring and Exploiting NIH’s Clinical Trials Database
  • The Analysis, Visualization, and Understanding of Big Urban Noise Data
  • Unsupervised and Self-supervised Learning for Medical Notes
  • Unsupervised Generative Video Dubbing
  • Using Deep Generative Models to de-noise Noisy Astronomical Data

Featured Academic Capstone Projects

deep learning poster

Deep Learning for Breast Cancer Detection

By Jason Phang, Jungkyu (JP) Park, Thibault Fevry, Zhe Huang, The B-Team

Brain segmentation poster

Brain Segmentation Using Deep Learning

By Team 22/7 | Chaitra V. Hegde | Advisor: Narges Razavian

Knee replacement poster

Predict Total Knee Replacement Using MRI With Supervised and Semi-Supervised Networks

By Team Glosy: Hong Gao, Mingsi Long, Yulin Shen, and Jie Yang

Featured Industry Capstone Projects

accern logo

Determining where New York Life Insurance should open its next sales office

BK Nets logo

NBA Shot Prediction with Spatio-Temporal Analysis

Other past capstone projects.

  • Active Physical Inference via Reinforcement Learning
  • Deep Multi-Modal Content-User Embeddings for Music Recommendation
  • Fluorescent Microscopy Image Restoration
  • Learning Visual Embeddings for Reinforcement Learning
  • Offensive Speech Detection on Twitter
  • Predicting Movement Primitives in Stroke Patients using IMU Sensors
  • Recurrent Policy Gradients For Smooth Continuous Control
  • The Quality-Quantity Tradeoff in Deep Learning
  • Trend Modeling in Childhood Obesity Prediction
  • Twitter Food/Activity Monitor

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

EveThan/IBM-Applied-Data-Science-Capstone-Project

Folders and files, repository files navigation, ibm applied data science capstone project.

The PowerPoint slides for this project can be found at Capstone_Presentation.pptx or Capstone_Presentation.pdf .

Executive summary

In this capstone project, we will predict if the SpaceX Falcon 9 first stage will land successfully using several machine learning classification algorithms. The main steps in this project include:

  • Data collection, wrangling, and formatting
  • Exploratory data analysis
  • Interactive data visualization
  • Machine learning prediction

Our graphs show that some features of the rocket launches have a correlation with the outcome of the launches, i.e., success or failure. It is also concluded that decision tree may be the best machine learning algorithm to predict if the Falcon 9 first stage will land successfully.

Introduction

In this capstone, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

Most unsuccessful landings are planned. Sometimes, SpaceX will perform a controlled landing in the ocean. The main question that we are trying to answer is, for a given set of features about a Falcon 9 rocket launch which include its payload mass, orbit type, launch site, and so on, will the first stage of the rocket land successfully?

Methodology

The overall methodology includes:

  • Data collection, wrangling, and formatting, using:
  • Web scraping
  • Exploratory data analysis (EDA), using:
  • Pandas and NumPy
  • Data visualization, using:
  • Matplotlib and Seaborn
  • Machine learning prediction, using
  • Logistic regression
  • Support vector machine (SVM)
  • Decision tree
  • K-nearest neighbors (KNN)

Data collection using SpaceX API

1_Data Collection API.ipynb

Libraries or modules used: requests, pandas, numpy, datetime

  • The API used is here .
  • The API provides data about many types of rocket launches done by SpaceX, the data is therefore filtered to include only Falcon 9 launches.
  • The API is accessed using requests.get().
  • The json result is converted to a dataframe using the json_normalize() function from pandas.
  • Every missing value in the data is replaced the mean the column that the missing value belongs to.
  • We end up with 90 rows or instances and 17 columns or features.

Data Collection with Web Scraping

2_Data Collection with Web Scraping.ipynb

Libraries or modules used: sys, requests, BeautifulSoup from bs4, re, unicodedata, pandas

  • The data is scraped from List of Falcon 9 and Falcon Heavy launches .
  • The website contains only the data about Falcon 9 launches.
  • First, the Falcon9 Launch Wiki page is requested from the url and a BeautifulSoup object is created from response of requests.get().
  • Next, all column/variable names are extracted from the HTML table header by using the find_all() function from BeautifulSoup.
  • A dataframe is then created with the extracted column names and entries filled with launch records extracted from table rows.
  • We end up with 121 rows or instances and 11 columns or features.

EDA with Pandas and Numpy

3_EDA.ipynb

Libraries or modules used: pandas, numpy

Functions from the Pandas and NumPy libraries such as value_counts() are used to derive basic information about the data collected, which includes:

  • The number of launches on each launch site
  • The number of occurrence of each orbit
  • The number and occurrence of each mission outcome

EDA with SQL

4_EDA with SQL.ipynb

Framework used: IBM DB2

Libraries or modules used: ibm_db

The data is queried using SQL to answer several questions about the data such as:

  • The names of the unique launch sites in the space mission
  • The total payload mass carried by boosters launched by NASA (CRS)
  • The average payload mass carried by booster version F9 v1.1

The SQL statements or functions used include SELECT, DISTINCT, AS, FROM, WHERE, LIMIT, LIKE, SUM(), AVG(), MIN(), BETWEEN, COUNT(), and YEAR().

Data Visualization using Matplotlib and Seaborn

5_EDA Visualization.ipynb

Libraries or modules used: pandas, numpy, matplotlib.pyplot, seaborn

Functions from the Matplotlib and Seaborn libraries are used to visualize the data through scatterplots, bar charts, and line charts. The plots and charts are used to understand more about the relationships between several features, such as:

  • The relationship between flight number and launch site
  • The relationship between payload mass and launch site
  • The relationship between success rate and orbit type

Examples of functions from seaborn that are used here are scatterplot(), barplot(), catplot(), and lineplot().

Picture 1

Data Visualization using Folium

6_Interactive Visual Analytics with Folium lab.ipynb

Libraries or modules used: folium, wget, pandas, math

Functions from the Folium libraries are used to visualize the data through interactive maps. The Folium library is used to:

  • Mark all launch sites on a map
  • Mark the succeeded launches and failed launches for each site on the map
  • Mark the distances between a launch site to its proximities such as the nearest city, railway, or highway

These are done using functions from folium such as add_child() and folium plugins which include MarkerCluster, MousePosition, and DivIcon.

Picture 2

Data Visualization using Dash

7_spacex_dash_app.py

Libraries or modules used: pandas, dash, dash_html_components, dash_core_components, Input and Output from dash.dependencies, plotly.express

Functions from Dash are used to generate an interactive site where we can toggle the input using a dropdown menu and a range slider. Using a pie chart and a scatterplot, the interactive site shows:

  • The total success launches from each launch site
  • The correlation between payload mass and mission outcome (success or failure) for each launch site

The application is launched on a terminal on the IBM Skills Network website.

Picture 3

Machine Learning Prediction

8_Machine Learning Prediction.ipynb

Libraries or modules used: pandas, numpy, matplotlib.pyplot, seaborn, sklearn

Functions from the Scikit-learn library are used to create our machine learning models. The machine learning prediction phase include the following steps:

  • Standardizing the data using the preprocessing.StandardScaler() function from sklearn
  • Splitting the data into training and test data using the train_test_split function from sklearn.model_selection
  • Creating machine learning models, which include:
  • Logistic regression using LogisticRegression from sklearn.linear_model
  • Support vector machine (SVM) using SVC from sklearn.svm
  • Decision tree using DecisionTreeClassifier from sklearn.tree
  • K nearest neighbors (KNN) using KNeighborsClassifier from sklearn.neighbors
  • Fit the models on the training set
  • Find the best combination of hyperparameters for each model using GridSearchCV from sklearn.model_selection
  • Evaluate the models based on their accuracy scores and confusion matrix using the score() function and confusion_matrix from sklearn.metrics

Putting the results of all 4 models side by side, we can see that they all share the same accuracy score and confusion matrix when tested on the test set. Therefore, their GridSearchCV best scores are used to rank them instead. Based on the GridSearchCV best scores, the models are ranked in the following order with the first being the best and the last one being the worst:

  • Decision tree (GridSearchCV best score: 0.8892857142857142)
  • K nearest neighbors, KNN (GridSearchCV best score: 0.8482142857142858)
  • Support vector machine, SVM (GridSearchCV best score: 0.8482142857142856)
  • Logistic regression (GridSearchCV best score: 0.8464285714285713)

Picture 5

From the data visualization section, we can see that some features may have correlation with the mission outcome in several ways. For example, with heavy payloads the successful landing or positive landing rate are more for orbit types Polar, LEO and ISS. However, for GTO, we cannot distinguish this well as both positive landing rate and negative landing(unsuccessful mission) are both there here.

Therefore, each feature may have a certain impact on the final mission outcome. The exact ways of how each of these features impact the mission outcome are difficult to decipher. However, we can use some machine learning algorithms to learn the pattern of the past data and predict whether a mission will be successful or not based on the given features.

In this project, we try to predict if the first stage of a given Falcon 9 launch will land in order to determine the cost of a launch. Each feature of a Falcon 9 launch, such as its payload mass or orbit type, may affect the mission outcome in a certain way.

Several machine learning algorithms are employed to learn the patterns of past Falcon 9 launch data to produce predictive models that can be used to predict the outcome of a Falcon 9 launch. The predictive model produced by decision tree algorithm performed the best among the 4 machine learning algorithms employed.

~ Project created in January 2022 ~

  • Jupyter Notebook 99.5%
  • Python 0.5%

capstone project for data science

University of Washington Information School

Team presenting a Capstone project in the HUB Lyceum

Students show real-world impact of Capstone projects

A group of 5 students presents at the front of a room.

Each year, Information School students in the  Bachelor of Science in Informatics ,  Master of Library and Information Science (MLIS) and  Master of Science in Information Management (MSIM) programs participate in  Capstone , their culminating learning experience. Students draw from the skills they’ve acquired throughout their degree program to solve a real-world information problem, often working in teams and in partnership with a sponsoring organization.

Dean Anind K. Dey shared that this partnership component makes Capstone his favorite event of the year. “I love how much impact these projects can have by solving real-world problems,” he said.

Three students present in the HUB.

Students presented their projects over the course of two evenings, one featuring more than 60 online presentations, followed by more than 100 presentations the next evening in rooms across the HUB.

At the close of the second evening, students and attendees gathered in the HUB and  via livestream to recognize this year’s award winners. Most importantly, as Dean Dey highlighted, students’ projects will have long-term impact "on causes and organizations near and far."

Capstone award winners gather after the ceremony, showing their certificates.

Capstone Awards

Design award.

This award goes to the project that addresses an information problem by leveraging user-centered design to meaningfully improve upon ease of use, enjoyment, functionality, or integration with existing tools and systems.

Design Award winner - Informatics:   Retirement Adequacy Project

Joseph Fran, Ken Huang, Minh Mai, Peter Corroom, Ty Okazaki

Design Award winner - MLIS:   The Misinformation Play Pack: Play-Based Educational Resources for Community Information Literacy

Design Award winner - MSIM:   Last Mile Innovations: Leveraging Smart Solutions for Efficient Amazon Warehouse Audits

Ashwin Jagdish, Sanyam Savla, Siddharth Purohit, Stushi Das

Design Award Finalists:

  • AMZL Digital Twin : Qi Duan, Denzil Dsouza, Shobhit Verma, Shirsha Datta
  • Guitars For Libraries: Community Partnerships and Lifelong Music Education : Apollo Battey, Nick Passabet
  • Kids Are Not Content : Emily Hale, Aina Engelbrekt, Rebecca Jessup, Shelly Zhao, Haley Michaelson
  • Mass Delivery Integration for DoorDash : Hongyiming Cui, Xinyuan You, Aryan Shah, Keaton Staggs, Quin Baebler
  • Seattle Girls' School Library Redesign : Julia Tawney, Kiran Mufty
  • Waza: Home-cooked Food Delivery : Adam Bi, Jessica Kuo, Sharique Khan

Inclusion, Diversity, Equity, Accessibility, and Sovereignty (IDEAS) Award

This award goes to the project that explores and engages themes of diversity and identity, power, privilege, equity and inclusion, or sovereignty.

IDEAS Award winner - Informatics:   Blooming

Fardouse Marghani, Fardowsa Douled, Jamilah Saleh, Nawal Dhabar, Seblwerk Enyew

IDEAS Award winner - MLIS :  In Their Own Words: Reporting & Organizing Youth Testimonials of Censorship from Books Unbanned

Danette Jasper, Jessica Roelling, Marissa Fischer, Meghan Foulk  

IDEAS Award winner - MSIM :  Workplace Equity: Optimizing Data-Smart, Equity-Centered Work Environments

Emma Grothaus, Gauri Nigam, Vanshika Srivastava

IDEAS Award Finalists:

  • A Web for All: Generative AI Powered Navigation for the Neurodiverse : Sheel Sanghvi, Nishit Bhasin, Ansh Shah
  • Contigo Chatbot : Keiver Bencomo Vasquez, Mason Green, Sean Guevarra, Eric Xue, Russell Liu
  • Here in Perpetuity: Uplifting Tribal Sovereignty in Public Libraries : Caitlin Mccabe, Devon Coultas
  • Psychiatric Care Allocation Disparity: King County, WA : Justin Phan, Meg Balfrey, Amy Chew, Nicole Han, Nooha Mohammed
  • Systemic Deconstruction: Addressing Information Equity Using Brian Deer’s Framework : Ash King 

Innovation Award

The Innovation Award goes to the project that addresses an information problem through demonstrating original solutions that advance or improve upon existing processes, systems, or modalities, while producing the same or better results. Innovation may be characterized as a recommendation or prototype that is faster, cheaper, more effective, or more inclusive.

Innovation Award winner - Informatics :  Vedette: Streamlining Bug Report Deduplication for Google’s Android Security

Eddy Peng, Harold Pham, Hitanshu Prajapati, Kyle Raychel, Sami Foell

Innovation Award winner - MLIS :  SIFF: Building Your Archive from the Ground Up

Drew Burns, Victoria Rincon

Innovation Award winner - MSIM :  A Web for All: Generative AI Powered Navigation for the Neurodiverse

Ansh Shah, Nishit Bhasin, Sheel Sanghvi

Innovation Award Finalists:

  • ClearView Assist: Aiding the Visually Impaired in Navigating Cluttered Websites with Generative AI : Ritika Vijay Rajpal, Dhruv Khanna, Menita Agarwal, Minal Naik
  • HuskySync: Where tests become collaborative quests : Anumita Ghosh, Varsha Palepu, Renusree Chittella, Thomas Kanenaga, Anay Deshpande
  • inStroketor: Simplifying your stroke recovery : Esha Bantwal, Lily Jeffs, Leah Jia, Aldijana Sabanovic, Manav Agarwal
  • Lambda News Digest : Jiashu Chen, Chesie Yu, Qianqian Liu, Juntong Wu
  • Rethinking Dewey in the School Library : Tia Heywood, Shira Gottfried 

Research Award

Three women hold their award certificates.

The Research Award goes to the project that explores a significant research question related to people, information and technology. Projects may involve original data collection or analysis of an existing data set.

Research Award winner - Informatics :  Psychiatric Care Allocation Disparity: King County, WA

Amy Chew, Justin Phan, Meg Balfrey, Nicole Han, Nooha Mohammed

Research Award winner - MLIS : Improving Analytics, Search, and Content Operations: Metadata for Intel.com

Giselle Shannon, Lily Woodard, Melinda Geist

Research Award winner - MSIM :  Switching Your DEI-ET: a Project to Standardize Diversity Reports

Karan Pandya, Ketki Godse, Peter O’Meara, Tarang Pande

Research Award Finalists:

  • Seeing Change: Cultivating Trauma Informed Librarianship : Kimberlie Sullivan
  • Six degrees of Kevin Bacon: A Network-based Approach to Venture Capital : Divyansh Chouhan, Abhishek Kulkarni, Isha Doshi, Tanishqa Shetty
  • Vedette: Streamlining Bug Report Deduplication for Google’s Android Security : Eddy Peng, Harold Pham, Hitanshu Prajapati, Kyle Raychel, Sami Foell

Social Impact and Social Justice Award

The Social Impact and Social Justice Award goes to the project that addresses issues of exclusion, discrimination or other barriers, or that improves quality of life for marginalized or disenfranchised communities.

Social Impact and Social Justice Award winner - Informatics :  Empowering decision-making for Legal Aid Nonprofit: Sound Legal Aid’s Data-Driven Dashboard to Serve Marginalized Communities

Kriti Vajjhula, Priya Hariharan, Saimanasvi Charugundla, Terra Shrestha, Vega Jethani

Social Impact and Social Justice Award winner - MLIS :  Building a Trauma-Informed Workplace to Support Library Staff

Ruby Vail, Vivian Edwards

Social Impact and Social Justice Award winner - MSIM :  AI Potential with Service Requests and Incidents

Dexter Xu, Haocheng Bao, Sophie Deng, Wenjing Li

Social Impact and Social Justice Award Finalists:

  • Archiving Materials on the University of Washington Gender Identity Clinic : Eli Wachter
  • Diversifying the Federal Workforce: A DEI Recruiting Tool for NOAA : Sarah Thomas, Justin Sukomol, Scott Nguyen, Peijie Zheng, Hung Nguyen
  • Fortnite Harvest: A Quest for Zero Hunger : Melanie Jiang, Kayla Ren, Shaun Xu, Tingyu He
  • In Their Own Words: Reporting & Organizing Youth Testimonials of Censorship from Books Unbanned : Danette Jasper, Jessica Roelling, Marissa Fischer, Meghan Foulk      

The iSchool thanks all Capstone sponsors and iAffiliates partners; alumni, family and friends who attended the event in support of our students; and the event planning and support team.

Thank you to project sponsors: 

360 Social Impact Studios; Alliance of Angels; Amazon Last Mile; Anne Blecksmith, Avery Associate Director and Head of Reader Services at The Huntington Library; Ballard High School; Bigfoot Communications; Boeing; Brooklyn Public Library; Camano City Schoolhouse Foundation; City University of New York’s Office of Library Services; Country Bookshelf; DoorDash; Dr. Judith Henchy; Epic Games; Fishing Comets Farm; Free to Heal; GEN (Gender Equity Now); Grand Père Wholesale Bakery; HEAL-WA; High Point Public Library; Il Viale Cafe; Institute for Health Metrics and Evaluation; Jolayne Houtz and Hector Martinez; Ladies Musical Club of Seattle; Lauren Richey, Open Window School; Legal Information Institute: Wex Taxonomy; Library of Congress; Longview Public Library; Marc-André Argentino, PhD; Masters of Science in Information Management Graduate Student, Jichang Cheng; Movie Madness; Museum of Flight; National Oceanic and Atmospheric Administration; Open Books; Oregon Health & Science University; PitchBook; reDiscover Center; Sarah Adams, Mom.uncharted; Seattle Girls' School; Seattle International Film Festival (SIFF); Self-Help Graphics and Art; Smartsheet; Sound Legal Aid; Team Read; The Greater Seattle Chinese Chamber of Commerce; The Trevor Project, Advocacy & Government Affairs Team; U Student Media; University of Washington Art Library; University of Washington Libraries Special Collections; UW Farm; Virufy; Washington State Poet Laureate; Washington Trails Association; Woodburn Public Library (Woodburn, Ore.); iSchool faculty and staff: the Learning Technologies Team, Cindy Aden, Dr. Miranda Belarde-Lewis, Lorcan Dempsey, Marlina Hales, Dr. Trent Hill, Dr. Jin Ha Lee, Dr. Sandy Littletree, Nam-ho Park, Doug Parry, Dr Chirag Shah, Dr. Melanie Walsh, Ph.D. Candidate Yekaterina Yefimova, and Dr. Jason Yip.

And a big thank you to this year’s panel of judges: 

Design Award judges: Dr. Jason C. Yip, Associate Professor, UW Information School; Dr. Martin Saveski, Assistant Professor, UW Information School; Dr. J. Elizabeth Mills, UW Information School. Thanks to the Inclusion, Diversity, Equity, Accessibility, and Sovereignty (IDEAS) Award judges: Dr. Trent Hill, Associate Teaching Professor, UW Information School; Dr. Sam Otim, Assistant Teaching Professor, UW Information School; Stephanie Harris, Business Systems Analyst, UW Information School. Thanks to the Innovation Award judges: Dr. Cris Fowler, Director of Academic Services, UW Information School; Dr. Bill Howe, Professor, UW Information School; Dr. Lindah Kotut, Assistant Professor, UW Information School; Dr. J. Elizabeth Mills, UW Information School; Louis Spinelli, Senior Product Manager, Microsoft; Lindsey Sullivan, Career Services Advisor, UW Information School. Thank you Research Award judges: Dr. David Hendry; Associate Professor & MSIM Program Chair, UW Information School; Dr. Andrew Reifers, Associate Teaching Professor, UW Information School; Michael Grass, Assistant Director of Communications, Center for an Informed Public, UW Information School. We extend our gratitude to the Social Impact and Social Justice Award judges: Dr. Adam Moore, Professor, UW Information School; Sunday Stanley, Director of Faculty Human Resources, UW Information School; Nicole Gustine, Assistant General Counsel, Washington State Bar 

Full Results

Customize your experience.

capstone project for data science

  • Academic Programs
  • Undergraduate Programs
  • USC Price in D.C.
  • Master’s Programs
  • Online Learning
  • Dual Degree Programs
  • Graduate Certificate Programs
  • Doctoral Programs
  • Executive Education
  • Admission Information
  • Information Sessions
  • Schedule an Admissions Visit
  • Funding Your Education
  • Department of Public Policy and Management
  • Health Policy and Management
  • Wilbur H. Smith III Department of Real Estate Development
  • Urban Planning and Spatial Analysis
  • Research at USC Price
  • Economic Development
  • Environment
  • Health Care
  • Nonprofits & Philanthropy
  • Real Estate
  • Social Innovation
  • Transportation
  • Urban Planning & Spatial Analysis
  • Career Services
  • Our History
  • About the Dean
  • Board of Councilors
  • Dean’s Cabinet
  • Statement of Core Values
  • Life in Los Angeles
  • Newsletter Archive
  • Academic Integrity
  • Student Advocacy and Bias Reporting
  • Impact Report
  • Commencement
  • Student Affairs
  • Postdoctoral Fellows
  • Diversity, Equity, & Inclusion
  • Accessibility at Price
  • Global Engagement
  • Our Podcasts
  • ROTC Programs
  • Nautical Science

USC Price student discovers family heirloom during capstone project

capstone project for data science

Ernesto Corona has long felt that his father – labor organizer Humberto “Bert” Corona – didn’t get the recognition he deserved. From the 1940s to 1960s, the Mexican-American activist and USC alum was an early organizer of undocumented workers and founded one of California’s oldest Latino political organizations, the Mexican American Political Association. 

“I don’t think he’s spoken about enough despite the impact he’s had on so many labor leaders, Chicano leaders, and Latino leaders,” said Ernesto Corona, who just earned his Master of Nonprofit Leadership & Management (MNLM) from the USC Sol Price School of Public Policy. “I think he really deserves a statue in the city of L.A.”

Imagine Corona’s surprise, then, when he stumbled upon a museum exhibit honoring his father while working on his capstone project – in which students tackle real-life policy challenges to complete their degrees.

Corona was part of a student team helping LA Plaza de Cultura y Artes – a Smithsonian-affiliate museum in L.A. – create a plan for collecting data to make informed decisions, demonstrate impact to stakeholders and secure funding from donors, among other goals. The students checked out the museum to learn more about it when Corona discovered his dad’s exhibit . 

“When they were giving the tour, I was like, ‘Oh, this is my father. This is incredible – or fate,’” Corona said. “Being able to share that experience really encouraged me to support LA Plaza even more.”

The exhibit excited his teammates, too. “It was really special,” said student Devon McCann, who is earning an MNLM this summer. “It was really cool to stumble upon and I think inspired us even more to give this project our all for the client.”

Diving into data

The capstone project was more than just a memorable personal experience. The student team of Corona, McCann, Laura Hurtado (MPA ‘24) and Luis Sanchez (MPA ‘24) gained valuable professional experience by working with LA Plaza. 

LA Plaza, a nonprofit that celebrates Latinx culture in L.A. with exhibits and events, tasked the student team with assessing its data collection practices. Students analyzed the nonprofit’s existing reports and raw data; interviewed staff, peer organizations, funders and experts; and reviewed literature and relevant case studies of how other nonprofits leveraged data. 

The group ultimately identified key performance indicators (KPIs) that LA Plaza can use for each of its strategic goals, drawing from data the nonprofit currently collects as well as new data it can start gathering. 

For example, students learned that LA Plaza wasn’t using valuable data it collected through its interactive exhibits. The nonprofit has a recording booth where visitors can record oral histories, including background about their lives in Los Angeles and their ancestors. LA Plaza also has written response opportunities in which visitors comment on the exhibits. Students showed how the nonprofit could utilize that data to communicate with donors and community members, as well as inform decisions about future exhibits. 

capstone project for data science

“It’s really important for both public constituents and potential supporters to be able to see and understand the impact that LA Plaza and nonprofits are having,” McCann said. “Without data to demonstrate impact, as well as inform decisions internally, you’re just working blindly and hoping that you’re having the impact you want.”

The capstone project was a great partnership between the students and the nonprofit, said Nicole Esparza , USC Price School associate professor and faculty advisor for the project. 

“It was more than just an assignment,” Esparza said. “The students became deeply invested and very interested in that organization, and I know they will continue to partner with them.”

LA Plaza has benefitted from the partnership too. The students presented LA Plaza thoughtful recommendations that organized the nonprofit’s current information and identified gaps in its collection process, said Alondra Virrey, LA Plaza engagement manager, and Veronica Diaz, the marketing and communication manager.

“It was great to see how to best leverage the information we currently collect to our new Strategic Framework and Mission, and how to approach our stakeholders for greater support,” the organization said in a joint statement. “We are excited to meet with our staff and share with them the findings; it was particularly insightful to learn how similar cultural institutions are measuring their impact.”

capstone project for data science

Capstone Projects

M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project over the course of two semesters. 

Most projects are sponsored by an organization—academic, commercial, non-profit, and government—seeking valuable recommendations to address strategic and operational issues. Depending on the needs of the sponsor, teams may develop web-based applications that can support ongoing decision-making. The capstone project concludes with a paper and presentation.

Key takeaways:

  • Synthesizing the concepts you have learned throughout the program in various courses (this requires that the question posed by the project be complex enough to require the application of appropriate analytical approaches learned in the program and that the available data be of sufficient size to qualify as ‘big’)
  • Experience working with ‘raw’ data exposing you to the data pipeline process you are likely to encounter in the ‘real world’  
  • Demonstrating oral and written communication skills through a formal paper and presentation of project outcomes  
  • Acquisition of team building skills on a long-term, complex, data science project 
  • Addressing an actual client’s need by building a data product that can be shared with the client

Capstone projects have been sponsored by a variety of organizations and industries, including: Capital One, City of Charlottesville, Deloitte Consulting LLP, Metropolitan Museum of Art, MITRE Corporation, a multinational banking firm, The Public Library of Science, S&P Global Market Intelligence, UVA Brain Institute, UVA Center for Diabetes Technology, UVA Health System, U.S. Army Research Laboratory, Virginia Department of Health, Virginia Department of Motor Vehicles, Virginia Office of the Governor, Wikipedia, and more. 

Sponsor a Capstone Project  

View previous examples of capstone projects  and check out answers to frequently asked questions. 

What does the process look like?

  • The School of Data Science periodically puts out a Call for Proposals . Prospective project sponsors submit official proposals, vetted by the Associate Director for Research Development, Capstone Director, and faculty.
  • Sponsors present their projects to students at “Pitch Day” near the start of the Fall term, where students have the opportunity to ask questions.
  • Students individually rank their top project choices. An algorithm sorts students into capstone groups of approximately 3 to 4 students per group.
  • Adjustments are made by hand as necessary to finalize groups.
  • Each group is assigned a faculty mentor, who will meet groups each week in a seminar-style format.

What is the seminar approach to mentoring capstones?

We utilize a seminar approach to managing capstones to provide faculty mentorship and streamlined logistics. This approach involves one mentor supervising three to four loosely related projects and meeting with these groups on a regular basis. Project teams often encounter similar roadblocks and issues so meeting together to share information and report on progress toward key milestones is highly beneficial.

Do all capstone projects have corporate sponsors?

Not necessarily. Generally, each group works with a sponsor from outside the School of Data Science. Some sponsors are corporations, some are from nonprofit and governmental organizations, and some are from in other departments at UVA.

One of the challenges we continue to encounter when curating capstone projects with external sponsors is appropriately scoping and defining a question that is of sufficient depth for our students, obtaining data of sufficient size, obtaining access to the data in sufficient time for adequate analysis to be performed and navigating a myriad of legal issues (including conflicts of interest). While we continue to strive to use sponsored projects and work to solve these issues, we also look for ways to leverage openly available data to solve interesting societal problems which allow students to apply the skills learned throughout the program. While not all capstones have sponsors, all capstones have clients. That is, the work is being done for someone who cares and has investment in the outcome. 

Why do we have to work in groups?

Because data science is a team sport!

All capstone projects are completed by group work. While this requires additional coordination , this collaborative component of the program reflects the way companies expect their employees to work. Building this skill is one of our core learning objectives for the program. 

I didn’t get my first choice of capstone project from the algorithm matching. What can I do?

Remember that the point of the capstone projects isn’t the subject matter; it’s the data science. Professional data scientists may find themselves in positions in which they work on topics assigned to them, but they use methods they enjoy and still learn much through the process. That said, there are many ways to tackle a subject, and we are more than happy to work with you to find an approach to the work that most aligns with your interests.

Your ability to influence which project you work on is in the ranking process after “pitch day” and in encouraging your company or department to submit a proposal during the Call for Proposal process. At a minimum it takes several months to work with a sponsor to adequately scope a project, confirm access to the data and put the appropriate legal agreements into place. Before you ever see a project presented on pitch day, a lot of work has taken place to get it to that point!

Can I work on a project for my current employer?

Each spring, we put forward a public call for capstone projects. You are encouraged to share this call widely with your community, including your employer, non-profit organizations, or any entity that might have a big data problem that we can help solve. As a reminder, capstone projects are group projects so the project would require sufficient student interest after ‘pitch day’. In addition, you (the student) cannot serve as the project sponsor (someone else within your employer organization must serve in that capacity).

If my project doesn’t have a corporate sponsor, am I losing out on a career opportunity?

The capstone project will provide you with the opportunity to do relevant, high-quality work which can be included on a resume and discussed during job interviews. The project paper and your code on Github will provide more career opportunities than the sponsor of the project. Although it does happen from time to time, it is rare that capstones lead to a direct job offer with the capstone sponsor's company. Capstone projects are just one networking opportunity available to you in the program.

Capstone Project Reflections From Alumni  

Theo Braimoh, MSDS Online Graduate and Admissions Student Ambassador

"For my Capstone project, I used Python to train machine learning models for visual analysis – also known as computer vision. Computer vision helped my Capstone team analyze the ergonomic posture of workers at risk of developing musculoskeletal injuries. We automated the process, and hope our work further protects the health and safety of people working in the United States.” — Theophilus Braimoh, MSDS Online Program 2023, Admissions Student Ambassador

Haley Egan, MSDS Online 2023 and Admissions Student Ambassador

“My Capstone experience with the ALMA Observatory and NRAO was a pivotal chapter in my UVA Master’s in Data Science journey. It fostered profound growth in my data science expertise and instilled a confidence that I'm ready to make meaningful contributions in the professional realm.” — Haley Egan, MSDS Online Program 2023, Admissions Student Ambassador

Mina Kim, MSDS/PhD 2023

“Our Capstone projects gave us the opportunity to gain new domain knowledge and answer big data questions beyond the classroom setting.” — Mina Kim, MSDS Residential Program 2023, Ph.D. in Psychology Candidate

Capstone Project Reflections From Sponsors  

“For us, the level of expertise, and special expertise, of the capstone students gives us ‘extra legs’ and an extra push to move a project forward. The team was asked to provide a replicable prototype air quality sensor that connected to the Cville Things Network, a free and community supported IoT network in Charlottesville. Their final product was a fantastic example that included clear circuit diagrams for replication by citizen scientists.” — Lucas Ames, Founder, Smart Cville
“Working with students on an exploratory project allowed us to focus on the data part of the problem rather than the business part, while testing with little risk. If our hypothesis falls flat, we gain valuable information; if it is validated or exceeded, we gain valuable information and are a few steps closer to a new product offering than when we started.” — Ellen Loeshelle, Senior Director of Product Management, Clarabridge

electrolarynx capstone project

Student Capstone Project Looks To Improve Electrolarynx Speech-to-Text

Women discussing by laptop

MSDS Capstone Projects Give Students Exposure to Industry While in Academia

student presentations

Master's Students' Capstone Presentations

Get the latest news.

Subscribe to receive updates from the School of Data Science.

  • Prospective Student
  • School of Data Science Alumnus
  • UVA Affiliate
  • Industry Member

IMAGES

  1. capstone-project-ideas-for-data-science

    capstone project for data science

  2. Capstone Project Ideas for Data Science

    capstone project for data science

  3. Request a Powerful Data Science Capstone from Us & Shine

    capstone project for data science

  4. Master of Science in Data Science

    capstone project for data science

  5. IBM Advanced Data Science Capstone Project

    capstone project for data science

  6. Capstone Project Ideas For Data Analytics

    capstone project for data science

VIDEO

  1. Mitigating Transit Mode Bias in BE Studies

  2. Data Science Capstone Project

  3. capstone project phase II

  4. Week 12

  5. Rental Car Program

  6. Capstone Project: Explanation of Mobile Application

COMMENTS

  1. 21 Interesting Data Science Capstone Project Ideas [2024]

    Best Data Science Capstone Project Ideas - According to Skill Level. Data science capstone projects are a great way to showcase your skills and apply what you've learned in a real-world context. Here are some project ideas categorized by skill level: Beginner-Level Data Science Capstone Project Ideas. 1. Exploratory Data Analysis (EDA) on a ...

  2. Data Science Capstone Course by Johns Hopkins University

    There are 7 modules in this course. The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.

  3. 10 Unique Data Science Capstone Project Ideas

    Project Idea #10: Building a Chatbot. A chatbot is a computer program that uses artificial intelligence to simulate human conversation. It can interact with users in a natural language through text or voice. Building a chatbot can be an exciting and challenging data science capstone project.

  4. An Exemplary Data Science Capstone, Annotated

    Since there was a lot of content, I'll conclude with my top three tips for doing a great data science capstone project: Choose a good data set: a small, uninteresting, or otherwise hard-to-analyze data set will make it substantially harder to make a great project. Include all of the following: Data cleaning.

  5. Data Science: Capstone

    By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning.

  6. Capstone Projects

    Faculty-Sponsored Capstone Projects. A DSI faculty member proposes a research project and advises a team of students working on this project. This is a great way to run a research project with enthusiastic students, eager to try out their newly acquired data science skills in a research setting.

  7. Applied Data Science Capstone Course by IBM

    There are 5 modules in this course. This is the final course in the IBM Data Science Professional Certificate as well as the Applied Data Science with Python Specialization. This capstone project course will give you the chance to practice the work that data scientists do in real life when working with datasets.

  8. Data Science: Capstone

    To become an expert data scientist you need practice and experience. By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling ...

  9. Capstone Course

    Data science education for master's students at Harvard culminates in a semester-long capstone research project course where skills like machine learning, statistics, data management and visualization are used to solve real-world problems from partner companies and organizations.

  10. Data Science at Scale

    There are 6 modules in this course. In the capstone, students will engage on a real world project requiring them to apply skills from the entire data science pipeline: preparing, organizing, and transforming data, constructing a model, and evaluating results. Through a collaboration with Coursolve, each Capstone project is associated with ...

  11. Data Science Capstone

    This capstone course is the culmination of the Master of Liberal Arts, data science, where students execute their research proposal from CSCI S-597. It gives students the opportunity to collaborate on a complex research topic using their data science skills. At the completion of the capstone, students are able to demonstrate their ability to ...

  12. HarvardX: Data Science: Capstone

    Show what you've learned from the Professional Certificate Program in Data Science.

  13. Capstone Projects

    Capstone Projects. You will stay at the cutting edge of what real clients need from data scientists through the Capstone experience. You will enhance your collaboration skills, benefit mentoring, and engage in networking opportunities. These faculty-mentored teams add value to top companies across multiple sectors—from finance to ...

  14. Capstone Projects

    Capstone Projects. Online M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project in term 4 and work on the project in term 5 ...

  15. A friendly walk-through of a Data Science Capstone Project

    Many websites and online courses focus on what beginners need to learn in order to become data scientists or on the importance of doing capstone projects to showcase one's skills.

  16. Master's in Data Science

    Capstone 2024: Submissions Now Open Master's in Data Science Capstone Project Capstone Project CDS master's students have a unique opportunity to solve real-world problems through the capstone course in the final year of their program. The capstone course is designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving …

  17. Data Science Project Ideas To Try

    A data science project is a practical application of your skills. A typical data science project allows you to use skills in data collection, cleaning, exploratory data analysis, visualization, programming, machine learning, and so on. It helps you take your skills to solve real-world problems.

  18. Final Capstone Project for IBM Data Science Professional ...

    Final Capstone Project for IBM Data Science Professional Certification - GitHub - vikthak/IBM-AppliedDataScience-Capstone-FINAL: Final Capstone Project for IBM Data Science Professional Certification

  19. GitHub

    Executive summary. In this capstone project, we will predict if the SpaceX Falcon 9 first stage will land successfully using several machine learning classification algorithms. The main steps in this project include: Data collection, wrangling, and formatting. Exploratory data analysis. Interactive data visualization. Machine learning prediction.

  20. Data Science with R

    In this capstone course, you will apply various data science skills and techniques that you have learned as part of the previous courses in the IBM Data Science with R Specialization or IBM Data Analytics with Excel and R Professional Certificate. For this project, you will assume the role of a Data Scientist who has recently joined an ...

  21. Data Science Capstone Experience

    Capstone Experience - 1 course/final project. The Capstone Experience in Data Science (EN.553.806) is a research-oriented project which must be approved by the research supervisor, academic advisor and the Internal Oversight Committee. The Capstone Experience can be taken in multiple semesters, but the total number of credits required for ...

  22. Capstone Project

    Educational Objectives. Additional field-based credits from the capstone program accrue over the final 12 months of the 20-month MSHS program. The capstone project may be conducted on campus at Cedars-Sinai or in other approved healthcare organizations. Students work on their capstone project on a schedule agreed upon with their primary mentor.

  23. Students show real-world impact of Capstone projects

    Dean Anind K. Dey shared that this partnership component makes Capstone his favorite event of the year. "I love how much impact these projects can have by solving real-world problems," he said. Information problems exist in all fields and sectors - a fact made obvious by the range of topics addressed by the more than 160 projects.

  24. IBM DS0720EN Certificate

    Supported by the following organizations. This is to certify thatBHARATHI KAVURUsuccessfully completed and received a passing grade inDS0720EN: Data Science and Machine Learning Capstone Projecta course of study offered by IBM, an online learning initiative of IBM.

  25. USC Price student discovers family heirloom during capstone project

    "Without data to demonstrate impact, as well as inform decisions internally, you're just working blindly and hoping that you're having the impact you want." The capstone project was a great partnership between the students and the nonprofit, said Nicole Esparza, USC Price School associate professor and faculty advisor for the project.

  26. 4395 Capstone Ct, Roswell, GA 30075

    The listing broker's offer of compensation is made only to participants of the MLS where the listing is filed. Georgia. Cobb County. Roswell. 30075. 4395 Capstone Ct. Zillow has 43 photos of this $1,059,900 4 beds, 4 baths, 3,300 Square Feet single family home located at 4395 Capstone Ct, Roswell, GA 30075 built in 2024. MLS #7365599.

  27. SQL for Data Science Capstone Project

    There are 4 modules in this course. Data science is a dynamic and growing career field that demands knowledge and skills-based in SQL to be successful. This course is designed to provide you with a solid foundation in applying SQL skills to analyze data and solve real business problems. Whether you have successfully completed the other courses ...

  28. How Science, Math, and Tech Can Propel Swimmers to New Heights

    After outlining the evolution of swimming over the past 100 years, the paper explains how an understanding of math and physics, combined with the use of technology to acquire individual-level data, can help maximize performances. Essential to understanding the scientific principles involved with the swimming stroke, the paper says, are Newton ...

  29. Capstone Projects

    Capstone Projects. M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project ...

  30. PDF UNC Public Policy Capstone Program University of North Carolina at

    Project Proposal Deadlines Fall 2024 semester: July 29, 2024 Spring 2025 semester: December 1, 2024. The Public Policy Major. UNC Public Policy is an interdisciplinary social science major designed to provide students with the theoretical perspective, analytical skill, and substantive knowledge needed to respond to major domestic and global ...