unsupervised machine learning thesis

Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network

Published: 04 November 2021
Volume 77 , pages 297–309, ( 2022 )

Cite this article

Ons Aouedi 1 ,
Kandaraj Piamrat ORCID: orcid.org/0000-0002-2343-0850 1 ,
Salima Hamma 1 &
J. K. Menuka Perera 1

1113 Accesses

8 Citations

Explore all metrics

Recent development in smart devices has lead us to an explosion in data generation and heterogeneity, which requires new network solutions for better analyzing and understanding traffic. These solutions should be intelligent and scalable in order to handle the huge amount of data automatically. With the progress of high-performance computing (HPC), it becomes feasible easily to deploy machine learning (ML) to solve complex problems and its efficiency has been validated in several domains (e.g., healthcare or computer vision). At the same time, network slicing (NS) has drawn significant attention from both industry and academia as it is essential to address the diversity of service requirements. Therefore, the adoption of ML within NS management is an interesting issue. In this paper, we have focused on analyzing network data with the objective of defining network slices according to traffic flow behaviors. For dimensionality reduction, the feature selection has been applied to select the most relevant features (15 out of 87 features) from a real dataset of more than 3 million instances. Then, a K-means clustering is applied to better understand and distinguish behaviors of traffic. The results demonstrated a good correlation among instances in the same cluster generated by the unsupervised learning. This solution can be further integrated in a real environment using network function virtualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Machine Learning: Algorithms, Real-World Applications and Research Directions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Machine Learning: A Review of the Algorithms and Its Applications

https://www.kaggle.com/jsrojas/ip-network-traffic-flows-labeled-with-87-apps

Shen X, Gao J, Wu W, Lyu K, Li M, Zhuang W, Li X, Rao J (2020) Ai-assisted network-slicing based next-generation wireless networks. IEEE Open J Veh Technol 1:45–66

Article Google Scholar

Fantacci R, Picano B (2020) When network slicing meets prospect theory: A service provider revenue maximization framework. IEEE Trans Veh Technol 69(3):3179–3189

Boutaba R, Salahuddin MA, Limam N, Ayoubi S, Shahriar N, Estrada-Solano F, Caicedo OM (2018) A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. J Internet Serv Appl 9(1):1–99

Li X, Samaka M, Chan HA, Bhamare D, Gupta L, Guo C, Jain R (2017) Network slicing for 5g: Challenges and opportunities. IEEE Internet Comput 21(5):20–27

Abidi MH, Alkhalefah H, Moiduddin K, Alazab M, Mohammed MK, Ameen W, Gadekallu TR (2021) Optimal 5g network slicing using machine learning and deep learning concepts. Comput Stand Interfaces, p 103518

Kafle VP, Fukushima Y, Martinez-Julia P, Miyazawa T (2018) Consideration on automation of 5g network slicing with machine learning. In: 2018 ITU Kaleidoscope: Machine learning for a 5G future (ITU K). IEEE, pp 1–8

Mestres A, Rodriguez-Natal A, Carner J, Barlet-Ros P, Alarcón E, Solé M, Muntés-Mulero V, Meyer D, Barkai S, Hibbett MJ et al (2017) Knowledge-defined networking. ACM SIGCOMM Comput Commun Rev 47(3):2–10

L’heureux A, Grolinger K, Elyamany HF, Capretz MA (2017) Machine learning with big data: Challenges and approaches. IEEE Access 5:7776–7797

Kuranage MPJ, Piamrat K, Hamma S (2019) Network traffic classification using machine learning for software defined networks. In: International conference on machine learning for networking. Springer, pp 28–39

Le L-V, Lin B-SP, Tung L-P, Sinh D (2018) Sdn/nfv, machine learning, and big data driven network slicing for 5g. In: 2018 IEEE 5G world forum (5GWF). IEEE, pp 20–25

Nakao A, Du P (2018) Toward in-network deep machine learning for identifying mobile applications and enabling application specific network slicing. IEICE Trans Commun, 1536–1543

Le L-V, Sinh D, Lin B-SP, Tung L-P (2018) Applying big data, machine learning, and sdn/nfv to 5g traffic clustering, forecasting, and management. In: 2018 4th IEEE conference on network softwarization and workshops (NetSoft). IEEE, pp 168–176

Wang S, Wu X, Chen H, Wang Y, Li D (2014) An optimal slicing strategy for sdn based smart home network. In: 2014 International conference on smart computing. IEEE, pp 118–122

Singh SK, Salim MM, Cha J, Pan Y, Park JH (2020) Machine learning-based network sub-slicing framework in a sustainable 5g environment. Sustainability 12(15):6250

Foukas X, Patounas G, Elmokashfi A, Marina MK (2017) Network slicing in 5g: Survey and challenges. IEEE Commun Mag 55(5):94–100

Afolabi I, Taleb T, Samdanis K, Ksentini A, Flinck H (2018) Network slicing and softwarization: A survey on principles, enabling technologies, and solutions. IEEE Commun Surv Tutorials 20(3):2429–2453

Ye Q, Li J, Qu K, Zhuang W, Shen XS, Li X (2018) End-to-end quality of service in 5g networks: Examining the effectiveness of a network slicing framework. IEEE Veh Technol Mag 13(2):65–74

Usama M, Qadir J, Raza A, Arif H, Yau K-LA, Elkhatib Y, Hussain A, Al-Fuqaha A (2019) Unsupervised machine learning for networking: Techniques, applications and research challenges. IEEE Access 7:65579–65615

Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio-Sci Bio-Techn 5(5):241–266

Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, no. 34, vol 96, pp 226–231

Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio-Sci Bio-Technol 5(5):241–266

Ahmad A, Khan SS (2019) Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7:31883–31902

Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227

Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

Janecek A, Gansterer W, Demel M, Ecker G (2008) On the relationship between feature selection and classification accuracy. In: New challenges for feature selection in data mining and knowledge discovery, PMLR, pp 90–105

Domingos P (2012) Afew useful things to know about machine learning. Commun ACM 55 (10):78–87

Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422

Rojas JS, Gallón Á, Corrales JC (2018) Personalized service degradation policies on ott applications based on the consumption behavior of users. In: International conference on computational science and its applications. Springer, pp 543–557

Langley P et al (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI fall symposium on relevance, vol 184, pp 245–271

Aouedi O, Piamrat K, Parrein B (2021) Performance evaluation of feature selection and tree-based algorithms for traffic classification. In: 2021 IEEE international conference on communications (ICC) DDINS Workshop, Montreal Canada

Li R, Zhao Z, Zhou X, Ding G, Chen Y, Wang Z, Zhang H (2017) Intelligent 5g: When cellular networks meet artificial intelligence. IEEE Wirel Commun 24(5):175–183

Download references

Author information

Authors and affiliations.

Laboratoire des Sciences du Numerique de Nantes, Nantes, France

Ons Aouedi, Kandaraj Piamrat, Salima Hamma & J. K. Menuka Perera

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kandaraj Piamrat .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Aouedi, O., Piamrat, K., Hamma, S. et al. Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network. Ann. Telecommun. 77 , 297–309 (2022). https://doi.org/10.1007/s12243-021-00889-1

Download citation

Received : 25 July 2020

Accepted : 14 September 2021

Published : 04 November 2021

Issue Date : June 2022

DOI : https://doi.org/10.1007/s12243-021-00889-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Machine learning
Feature selection
Unsupervised learning
Network traffic
Traffic analysis
Network slicing
Find a journal
Publish with us
Track your research

2024 Theses Doctoral

Unsupervised Machine-Learning Applications in Seismology

Sawi, Theresa

Catalogs of seismic source parameters (hypocenter locations, origin times, and magnitudes) are vital for studying various Earth processes, greatly enhancing our understanding of the nature of seismic events, the structure of the Earth, and the dynamics of fault systems. Modern seismic analyses utilize supervised machine learning (ML) to build enhanced catalogs based on millions of examples of analyst-picked phase-arrivals in waveforms, yet the ability to characterize the time-varying spectral content of the waveforms underlying those catalogs remains lacking. Unsupervised machine learning (UML) methods provide powerful tools for inferring patterns from musical spectrograms with little a priori information, yet has been relatively underutilized in the field of seismology. In this thesis, I leverage advanced tools from UML to analyze the temporal spectral content of large sets of spectrograms generated by different mechanisms in two distinct geologic settings: icequakes and tremors at Gornergletscher (a Swiss temperate glacier) and repeating earthquakes from a 10-km-long creeping segment of the San Andreas Fault. The core algorithm in this work, now known as Spectral Unsupervised Feature Extraction, or SpecUFEx, extracts time-varying frequency patterns from spectrograms and reduces them into low-dimensionality fingerprints via a combination of non-negative matrix factorization and hidden Markov Modeling (Holtzman et al. 2018), optimized for large data sets via stochastic variational inference. This work describes the SpecUFEx algorithm and the suite of preprocessing, clustering, and visualization tools developed to create an UML workflow, SpecUFEx+, that is widely-accessible and applicable for many seismic settings. I apply theSpecUFEx+ workflow to single- and multi-station seismic data from Gornergletscher, and demonstrate how some fingerprint-clusters track diurnal tremor related to subglacial water flow, while others correspond to the onset of the subglacial and englacial components of a glacial lake outburst flood. I also discover periods of harmonic tremor localized near the ice-bed interface that may be related to glacial stick-slip sliding. I additionally apply the SpecUFEx+ workflow to earthquakes on the San Andreas Fault to unveil far more repeating earthquake sequences than previously inferred, leading to enhanced slip-rate estimates at seismogenic depths and providing a more detailed image of seismic gaps along the fault interface. Unsupervised feature extraction is a novel tool to the field of seismology. This work demonstrates how scientific insight can be gained through the characterization of the spectral-temporal patterns of large seismic datasets within an UML-framework.

Geographic Areas

California--San Andreas Fault
Switzerland--Alps, Swiss
Machine learning--Industrial applications
Earthquakes
Markov processes

thumnail for Sawi_columbia_0054D_18258.pdf

More About This Work

DOI Copy DOI to clipboard

File(s) under embargo

until file(s) become available

A Study on the Use of Unsupervised, Supervised, and Semi-supervised Modeling for Jamming Detection and Classification in Unmanned Aerial Vehicles

In this work, first, unsupervised machine learning is proposed as a study for detecting and classifying jamming attacks targeting unmanned aerial vehicles (UAV) operating at a 2.4 GHz band. Three scenarios are developed with a dataset of samples extracted from meticulous experimental routines using various unsupervised learning algorithms, namely K-means, density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering (AGG) and Gaussian mixture model (GMM). These routines characterize attack scenarios entailing barrage (BA), single- tone (ST), successive-pulse (SP), and protocol-aware (PA) jamming in three different settings. In the first setting, all extracted features from the original dataset are used (i.e., nine in total). In the second setting, Spearman correlation is implemented to reduce the number of these features. In the third setting, principal component analysis (PCA) is utilized to reduce the dimensionality of the dataset to minimize complexity. The metrics used to compare the algorithms are homogeneity, completeness, v-measure, adjusted mutual information (AMI) and adjusted rank index (ARI). The optimum model scored 1.00, 0.949, 0.791, 0.722, and 0.791, respectively, allowing the detection and classification of these four jamming types with an acceptable degree of confidence.

Second, following a different study, supervised learning (i.e., random forest modeling) is developed to achieve a binary classification to ensure accurate clustering of samples into two distinct classes: clean and jamming. Following this supervised-based classification, two-class and three-class unsupervised learning is implemented considering three of the four jamming types: BA, ST, and SP. In this initial step, the four aforementioned algorithms are used. This newly developed study is intended to facilitate the visualization of the performance of each algorithm, for example, AGG performs a homogeneity of 1.0, a completeness of 0.950, a V-measure of 0.713, an ARI of 0.557 and an AMI of 0.713, and GMM generates 1, 0.771, 0.645, 0.536 and 0.644, respectively. Lastly, to improve the classification of this study, semi-supervised learning is adopted instead of unsupervised learning considering the same algorithms and dataset. In this case, GMM achieves results of 1, 0.688, 0.688, 0.786 and 0.688 whereas DBSCAN achieves 0, 0.036, 0.028, 0.018, 0.028 for homogeneity, completeness, V-measure, ARI and AMI respectively. Overall, this unsupervised learning is approached as a method for jamming classification, addressing the challenge of identifying newly introduced samples.

Collaborative Research: SaTC: CORE: Small: UAV-NetSAFE.COM: UAV Network Security Assessment and Fidelity Enhancement through Cyber-Attack-Ready Optimized Machine-Learning Platforms

Directorate for Computer & Information Science & Engineering

Degree Type

Master of Science
Electrical and Computer Engineering

Campus location

Advisor/supervisor/committee chair, additional committee member 2, additional committee member 3, usage metrics.

Other engineering not elsewhere classified
Machine learning not elsewhere classified

IMAGES

Understanding the Working of Unsupervised Machine Learning Algorithm
Unsupervised Machine Learning: Definition, Working, Types, Pros & Cons
Unsupervised Machine Learning: Definition, Working, Types, Pros & Cons
Unsupervised Machine Learning. So What Is Unsupervised Machine…
The Concept of Unsupervised Learning
2: Unsupervised Learning (Source:...

VIDEO

Difference between Supervised learning and Unsupervised learning
Hierarchical Unsupervised Machine Learning in hindi #ai #ml #cs
Types of Machine Learning || Supervised Learning and Unsupervised learning
Slides: Unsupervised Machine Learning / Clustering / DBSCAN Algorithm
Empowering Music Creation with Machine Learning (Thesis Proposal)
Unsupervised Machine Learning MNIST Handwritten Digits with Isomap

COMMENTS

PDF Unsupervised Learning by Program Synthesis
We formalize unsupervised program synthesis as Bayesian inference within the following generative model: Draw a program f( ) from a description length prior over programs, which depends upon the sketch. Draw N inputs fIigN to the program f( ) from a domain-dependent description length. i=1. prior PI( ).
PDF Interactive Algorithms for Unsupervised Machine Learning
This thesis explores the power of interactivity in unsupervised machine learning problems. Interactive algorithms employ feedback-driven measurements to reduce data acquisition costs and consequently enable statistical analysis in otherwise in-tractable settings. Unsupervised learning methods are fundamental tools across a
(PDF) What is unsupervised Learning
Abstract. Unsupervised learning is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision. In ...
PDF An Unsupervised Machine Learning Approach to Diabetic Neuropathy Data
Unsupervised machine learning is especially useful when a data set is available and it is inferred that patterns in the data exist but there exists no current label to corroborate this inference. Figure 1: Machine learning applied to animals . 1.2 Data Clustering . A common method of unsupervised machine learning is clustering. Clustering is
PDF Unsupervised Learning Data Streams
In this thesis, we aim to propose unsupervised and incremental machine learning algorithms for data streams. We focus on algorithms able to up-date their classi cation model with few or without external feedback. We start by addressing the problem of concept drift in data streams with few la-beled data.
A Systematic Review on Supervised and Unsupervised Machine Learning
Machine learning is as growing as fast as concepts such as Big data and the field of data science in general. The purpose of the systematic review was to analyze scholarly articles that were published between 2015 and 2018 addressing or implementing supervised and unsupervised machine learning techniques in different problem-solving paradigms.
Unsupervised machine learning in urban studies: A systematic review of
For example, there is an integrated machine learning package for various supervised and unsupervised algorithms — scikit-learn (Pedregosa et al., 2011), and gensim package, which support the review topic modeling example shown in Section 3. On top of that, thanks to the integration with a deep learning environment, Python is able to process ...
PDF Interactive Algorithms for Unsupervised Machine Learning
This thesis explores the power of interactivity in unsupervised machine learning prob-lems. Interactive algorithms employ feedback driven measurements to mitigate the cost of data acquisition and consequently enable statistical analysis in otherwise intractable settings.
PDF Unsupervised Machine Learning for Networking: Techniques, Applications
Unsupervised learning is interesting since it can unconstrain us from the need for labeled data and manual handcrafted feature engineering thereby facilitating ﬂexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of ...
[2111.03598] Quantum Algorithms for Unsupervised Machine Learning and
Quantum Algorithms for Unsupervised Machine Learning and Neural Networks. In this thesis, we investigate whether quantum algorithms can be used in the field of machine learning for both long and near term quantum computers. We will first recall the fundamentals of machine learning and quantum computing and then describe more precisely how to ...
[2312.00101] Towards Unsupervised Representation Learning: Learning
Unsupervised representation learning aims at finding methods that learn representations from data without annotation-based signals. Abstaining from annotations not only leads to economic benefits but may - and to some extent already does - result in advantages regarding the representation's structure, robustness, and generalizability to different tasks. In the long run, unsupervised methods ...
Unsupervised Learning
Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mixtures of Gaussians, ICA, hidden Markov models, state-space models, and many variants and extensions. We derive the EM algorithm and give an overview of fundamental ...
Network traffic analysis using machine learning: an unsupervised
The existing research with the most similar context to this paper is presented in [] where the authors have discussed software defined networks (SDN), network function virtualization (NFV), Machine learning, and big data driven network slicing for 5G.In this work, they have proposed an architecture to classify network traffic and used those decisions for network slicing.
PDF Evaluation Metrics for Unsupervised Learning Algorithms
Supervised machine learning starts from prior knowledge of the desired result in the form of labeled data sets, which allows to guide the training process, whereas unsupervised machine learning works directly on unlabeled data. In the absence of labels to orient the learning process, these labels must be "discovered" by the learning ...
ASI
Du et al. proposed an unsupervised machine learning-based detection model based on LSTM-AE and GANs, which can learn complex patterns in time series data to detect anomalies more accurately. In addition, Goh et al. [ 22 ] introduced an unsupervised learning approach using RNNs to learn the changes in data patterns over time and use them to ...
PDF Demystifying Unsupervised Feature Learning
Machine learning is a key component of state-of-the-art systems in many application domains. Applied to many kinds of raw data, however, most learning algorithms are unable to make good predictions. In order to succeed, most learning algorithms are applied instead to \features" that represent higher-level concepts extracted from the raw data.
Supervised and Unsupervised Machine Learning Algorithms
Summary. In this post you learned the difference between supervised, unsupervised and semi-supervised learning. You now know that: Supervised: All data is labeled and the algorithms learn to predict the output from the input data. Unsupervised: All data is unlabeled and the algorithms learn to inherent structure from the input data.
PDF Using Machine Learning to Predict Student Performance
This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of ... supervised and unsupervised learning. In supervised learning, input data comes with a known class structure (Mohri et al., 2012; Mitchell, 1997). This input data is ...
(PDF) Unsupervised Machine Learning via Feature Extraction and
Unsupervised Machine Learning via Feature Extraction and Clustering to Classify Tree Species from High-Resolution UAV-based RGB Image Data October 2023 Thesis for: Master of Science
Unsupervised Machine-Learning Applications in Seismology
2024 Theses Doctoral. Unsupervised Machine-Learning Applications in Seismology. Sawi, Theresa. Catalogs of seismic source parameters (hypocenter locations, origin times, and magnitudes) are vital for studying various Earth processes, greatly enhancing our understanding of the nature of seismic events, the structure of the Earth, and the dynamics of fault systems.
PDF Supervised and Unsupervised Machine Learning Techniques for Text
SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES FOR TEXT DOCUMENT CATEGORIZATION by Arzucan Ozg¨¨ ur B.S. in Computer Engineering, Bo˘gazi¸ci University, 2002 Submitted to the Institute for Graduate Studies in Science and Engineering in partial fulﬁllment of the requirements for the degree of Master of Science
PDF Unsupervised Anomaly Detection based on Machine Learning
ter thesis, with the objective of building an unsupervised anomaly detector based on machine learning. The system under study is a three-stage Centrifugal Air Compressor system, equipped
Unsupervised anomaly detection: methods and applications
1and M. 2. The two methods produce different anomaly scores and ranks as illustrated by the color of each instance (darker colors indicate higher anomaly scores). By removing, for example, N =4most anomalous instances and applying a quality metric on the remaining instances M. 1proves to be better than M.
A Study on the Use of Unsupervised, Supervised, and Semi-supervised
In this work, first, unsupervised machine learning is proposed as a study for detecting and classifying jamming attacks targeting unmanned aerial vehicles (UAV) operating at a 2.4 GHz band. Three scenarios are developed with a dataset of samples extracted from meticulous experimental routines using various unsupervised learning algorithms, namely K-means, density-based spatial clustering of ...
Unsupervised Learning, Recommenders, Reinforcement Learning
In the third course of the Machine Learning Specialization, you will: • Use unsupervised learning techniques for unsupervised learning: including clustering and anomaly detection. • Build recommender systems with a collaborative filtering approach and a content-based deep learning method. • Build a deep reinforcement learning model.