Analyzing detection algorithms for cybersecurity in financial institutions

Frauds in financial services are an ever-increasing phenomenon, and cybercrime generates multimillion revenues. Even a small improvement in fraud detection rates would lead to significant savings. Traditional rule-based systems have limitations in blocking potentially fraudulent transactions. This chapter explores how machine learning, specifically supervised and unsupervised learning, can address these limitations more effectively. We present a novel AI-based fraud detection system that combines supervised and unsupervised models. In the batch layer, transaction data undergoes pre-processing and model training, while the stream layer handles real-time fraud detection based on new input transaction data. The architecture automates fraud detection processes, making it a valuable tool for supporting fraud analysts. This research aims to enhance cybersecurity in financial institutes by leveraging the power of AI and machine learning. The integration of supervised and unsupervised models provides a robust defense against cyber faults, ensuring the safety of financial transactions.


Introduction
In today's interconnected digital landscape, financial institutes face relentless cyber threats that jeopardize data security, financial stability, and customer trust.As cybercriminals evolve their tactics, the need for robust defense mechanisms becomes paramount.This journal explores the application of supervised and unsupervised learning techniques to prevent and mitigate cyber faults within financial institutions.
Cyber security is a crucial issue in the modern world, as various cyberspaces are used by criminals to conduct cybercrime and cyber threats.To cope with these challenges, the banking and financial industry has adopted artificial intelligence (AI) as a promising technology that can perform various functions associated with human minds, such as reasoning, learning, interacting, creating, perceiving, and problem-solving.AI can also handle large volumes of structured and unstructured data, extract useful patterns and insights, and control individual human behavior, inference methods, and knowledge representation.However, AI also has some limitations and risks, such as ethical, legal, social, and technical aspects.This paper aims to explore the applications and implications of AI in the context of cyber security and cybercrime prevention.It will discuss the various methods and techniques of AI that are used to execute various tasks and solve problems related to cyber security.It will also analyze the benefits and drawbacks of AI in the banking and financial sector, and suggest some ways to improve the performance and reliability of AI systems.
Intrusion Detection Systems (IDS) play a critical role in identifying and preventing malicious activities within networks, including smart grids.However, these very systems are often prime targets for cyber-attacks.Researchers have proposed various approaches to classify and detect such attacks, with supervised machine learning being a common method.Nevertheless, these supervised models rely on extensive labeled datasets for training and evaluation.In this study, we compare the performance of supervised and unsupervised learning models in detecting cyber-attacks.The supervised models include Gaussian Naïve Bayes, Classification and Regression Decision Tree, Logistic Regression, C-Support Vector Machine, Light Gradient Boosting, and Alex Neural Network.Conversely, the unsupervised models consist of Principal Component Analysis, K-means, and Variational Autoencoder.Our evaluation considers accuracy, probability of detection, probability of misdetection, probability of false alarm, processing time, prediction time, training time per sample, and memory size.The results indicate that the Alex Neural Network model outperforms other supervised models, while the Variational Autoencoder model exhibits superior performance among the unsupervised models.The supervised model workflow encompasses several essential steps: data acquisition, dataset assessment, model training, and optimization.In this context, supervised models rely on labelled data, necessitating various techniques for data assessment, including data balancing, imputation, normalization, and encoding.Specifically, Gaussian Naive Bayes, Classification and Regression Decision Trees, C-Support Vector Machines, Logistic Regression, Alex Neural Network, and Light Gradient Boosting are trained to identify and classify network attacks.These models are then fine-tuned using optimization techniques such as grid search and ADAM optimizer.

Methodology
In contrast, unsupervised models operate with an unlabelled dataset, leading to a reduced need for data assessment techniques.The unsupervised models-K-means, Principal Component Analysis, and Variational Autoencoder-are evaluated based on unknown data patterns after applying optimization techniques.

Unsupervised Working Flow.
Figure 1 illustrates the workflow for both supervised and unsupervised models.In Figure 1A, the supervised model workflow comprises several critical stages: data acquisition, dataset assessment, model training, and optimization.These supervised models rely on labelled data, necessitating various techniques for data assessment, including data balancing, imputation, normalization, and encoding.Specifically, we employ Gaussian Naive Bayes, Classification and Regression Decision Trees, C-Support Vector Machines, Logistic Regression, Alex Neural Network, and Light Gradient Boosting to detect and classify network attacks.These models are fine-tuned using optimization techniques such as grid search and ADAM optimizer.
In contrast, Figure 1B depicts the use of unsupervised models with an unlabelled dataset, resulting in fewer data assessment techniques.Our unsupervised models-K-means, Principal Component Analysis, and Variational Autoencoder-are evaluated based on unknown data patterns after applying optimization techniques.For a comprehensive overview of materials and techniques, please refer to the following section.

Dataset
We used dataset from the Canadian Institute of Cyber-Security and the University of New Brunswick.This comprehensive dataset comprises both normal network traffic samples and instances from 10 distinct attack types.The corresponding sample counts for each attack type are summarized in Table 1.
Notably, the attack classes within the dataset exhibit imbalanced distribution, which can potentially impact the accuracy of detection algorithms.To mitigate this issue, we established a common threshold based on the lowest number of attack samples-specifically, the UDP-lag attacks, which totaled 366,461 samples.Consequently, we uniformly limited the sample count for each attack category to this minimum value.
For the normal samples, we randomly selected 4,000,000 instances, resulting in a final dataset containing 8,000,000 samples.In the original dataset, a total of 88 features were present, but not all of them significantly contributed to attack detection.To address this issue, the authors1 employed feature reduction techniques, specifically Pearson's Correlation and Tree-based feature selection.As a result, the dataset was streamlined to include only 21 relevant features, as detailed in Table 1.For training supervised models, this balanced dataset with labeled samples was utilized.However, when training unsupervised models, the labeled column was intentionally removed from the dataset.

Pre-Processing
This data preprocessing step plays a crucial role in enhancing data quality.In the context of supervised models, this step involves several techniques, including missing data imputation, transformation, and encoding.However, when dealing with unsupervised learning models, the focus narrows down to missing data imputation and transformation.
To address the issue of null or missing values within the dataset, we employed a mean imputation technique.This method replaces missing values with the mean of all available values for that specific feature in the given dataset.Additionally, the provided data underwent normalization and standardization using a feature scaling technique.
Specifically, the features were rescaled using the Yeo-Johnson Power Transformer.This transformation not only shapes the data to exhibit a more Gaussian distribution but also effectively handles zero, positive, and negative values.

Models
In the realm of financial systems, machine learning models play a pivotal role in detecting and mitigating cyber faults and fraudulent activities.Let's explore some of the prominent machine learning approaches used for this purpose:

Supervised learning
Supervised learning is a fundamental machine learning paradigm where the model learns from labeled data.In this approach, the algorithm is trained using input-output pairs, where the input (features) is associated with a known output (target).The goal is to learn a mapping function that can predict the correct output for new, unseen data.Common supervised learning algorithms include decision trees, support vector machines, neural networks, and regression models.These models find applications in various domains, such as classification, regression, and anomaly detection, making them essential tools for solving real-world problems.
 Our chosen supervised models encompass a diverse set of algorithms, each tailored to specific tasks.Let's delve into their characteristics:

Unsupervised learning
Unsupervised learning plays a critical role in detecting cyber faults within financial systems.Unlike supervised learning, which relies on labeled data, unsupervised learning operates with unlabeled data.Its primary goal is to uncover hidden patterns, anomalies, or clusters within the data without explicit guidance.In the realm of cybersecurity, unsupervised models-such as K-means, Principal Component Analysis (PCA), and Variational Autoencoder-excel at identifying irregularities, network intrusions, and suspicious behavior.By analyzing transaction data, these models can reveal subtle deviations from expected norms, aiding in early detection and prevention of cyber threats.Their ability to adapt to evolving attack techniques and handle large-scale data makes them invaluable tools for safeguarding financial systems against fraud and unauthorized access." Among the unsupervised models, as highlighted in Figure 2, we selected three key approaches: K-means clustering, Principal Component Analysis (PCA), and the Variational Autoencoder (VA-Encoder).
 K-means Clustering: This model, based on clustering, aims to identify centroids that minimize the withincluster sum-of-squares criterion (inertia).It effectively groups similar data points together. Principal Component Analysis (PCA): Widely used for dimensionality reduction, PCA enhances model performance on highly correlated data.By transforming features into a new coordinate system, it captures essential information while reducing redundancy. Variational Autoencoder (VA-Encoder): A neural network-based technique, VA-Encoder compresses raw data into a compact representation.Comprising three components-encoder, decoder, and loss function-it provides a probabilistic approach to explain observations in latent space.Notably, it mitigates overfitting issues, ensuring that the latent space captures meaningful features during the generative process." These unsupervised methods contribute significantly to understanding data patterns and anomalies, critical for cyber fault detection and prevention in financial systems.These metrics collectively provide insights into the model's performance, helping us assess its effectiveness in cyber fault detection within financial systems.

Results and discussions
To evaluate our models, we employed a 5-fold cross-validation strategy.In this approach, 80% of the data was used for training, while the remaining 20% served as the test set.The training data was divided into five equal subsets, and the model was trained on four of these subsets in each iteration.This process was repeated five times, utilizing different subsets of the dataset.Table 3 presents the optimal hyperparameters obtained through grid search and the ADAM optimizer.
Figures 3 and 4 showcase the performance results of our machine learning (ML) models across key metrics: accuracy, probability of detection (PD), probability of misdetection (PMD), and probability of false alarm (PFA).
Among the supervised models, the AlexNet model demonstrated superior performance in terms of the selected metrics (as depicted in Figure 3).While LightGBM performed well, it exhibited slightly lower accuracy (ACC) and PD, along with higher PMD and PFA compared to AlexNet.Other supervised models-such as CART and C-SVM-also delivered satisfactory results.However, LR and GNB models lagged behind.
In contrast, the unsupervised models showed significantly lower performance across the same metrics.The VA-encoder model outperformed other unsupervised approaches.Meanwhile, PCA yielded notably lower performance than VA-Encoder.The K-means model had the lowest ACC and PD, coupled with the highest PMD and PFA.Comparing supervised and unsupervised models, AlexNet led the pack, followed by LightGBM, VA-Encoder, CART, C-SVM, PCA, GNB, LR, and K-means.Table 4 provides additional insights into model performance using four other metrics: processing time (PR), prediction time (PT), training time per sample (TPS), and memory usage (M).AlexNet excelled in all these aspects among both supervised and unsupervised models.Conversely, GNB exhibited the poorest performance across these metrics.CART slightly outperformed AlexNet in terms of PRT, PT, TPS, and M.Among unsupervised models, VA-encoder stood out, while K-means had the lowest performance." These findings contribute to our understanding of model effectiveness in cyber fault detection and guide future research in this domain.To contextualize our findings, we compared our proposed techniques with existing studies in the literature (as summarized in Table 6).These prior studies utilized different datasets, including NSL KDD and KDDCup99.Notably, most of these studies primarily focused on supervised models, leaving a gap in understanding the performance of unsupervised models for intrusion detection in smart grids.Our study bridges this gap by evaluating the effectiveness of both supervised and unsupervised models.Overall, our results highlight that AlexNet and VA-Encoder outperform other models in terms of accuracy, probability of detection, probability of misdetection, probability of false alarm, processing time, prediction time, training time per sample, and memory usage.

Conclusions
Intrusion Detection Systems (IDS) play a critical role in safeguarding networks by monitoring and detecting anomalies.
While existing research has predominantly focused on supervised machine learning models for attack detection, our study provides a comprehensive comparison between supervised and unsupervised approaches.We evaluated these models across various metrics, including accuracy, probability of detection, probability of misdetection, probability of false alarm, processing time, prediction time, training time per sample, and memory usage.
Our model selection spanned diverse categories: Bayesian, Tree-based, Instance-based, Regularization-based, Neural Network, and Ensemble models.From these, we chose specific models for both supervised and unsupervised learning.Notably, the Alex Neural Network emerged as a top performer among supervised models, while the Variational Autoencoder (VA-Encoder) excelled among unsupervised models.VA-Encoder's ability to prevent overfitting and generate meaningful features in the latent space contributed to its superior performance.
Furthermore, our findings demonstrate that cyber-attacks can be more effectively detected using Variational-Encoder compared to other unsupervised methods.As future work, we recommend exploring the performance of deep learning models-both supervised and unsupervised-for detecting attacks in IDS.These insights contribute to enhancing network security and fortifying financial systems against evolving threats.

Figure 1
Figure 1 Supervised and Unsupervised Learning Working Flow.(A) Supervised Working Flow (B)

1 .
Supervised Learning Models: o Decision Trees: These models create a tree-like structure to classify data based on features.Decision trees are interpretable and can handle both categorical and numerical data.o Support Vector Machines (SVM): SVMs are effective for binary classification tasks.They find a hyperplane that best separates different classes.o Artificial Neural Networks (ANN): ANNs mimic the human brain's neural network and are adept at handling complex relationships in data.o Random Forest: An ensemble of decision trees that improves accuracy and reduces overfitting.2. Unsupervised Learning Models: o K-means: A clustering algorithm that groups similar data points together.o Principal Component Analysis (PCA): Used for dimensionality reduction by transforming features into a new coordinate system.o Variational Autoencoder: A type of neural network that learns efficient representations of data.3. Hybrid Approaches: o Semi-supervised Learning: Combines labeled and unlabeled data to enhance model performance.o Reinforcement Learning: Although less common, reinforcement learning can adapt to dynamic environments and learn from feedback.

Figure 2
Figure 2 Classification of ML models used in this study Gaussian Naïve Bayes (GNB): A Bayesian-based model, GNB is well-suited for data following a Gaussian normal distribution. Classification and Regression Tree (CART): This tree-based model employs the Gini index as a splitting criterion and cost-complexity pruning to enhance accuracy while mitigating overfitting issues. C-Support Vector Machine (C-SVM): An instance-based model, C-SVM directly utilizes training data without preprocessing the target function. Logistic Regression (LR): Falling under the regularization-based category, LR effectively fits functions to training data, preventing overfitting by incorporating additional information. Alex Neural Network (AlexNet): A neural-network-based model with 25 layers, including input, rectified linear units (ReLU), convolutional, max pooling, normalization, dropout, SoftMax, and output layers.The ReLU activation function accelerates training while maintaining generalization abilities with lower computational costs -.  Light Gradient Boosting (LightGBM): An ensemble-based approach, LightGBM leverages three models for superior efficiency, faster training, reduced memory usage, and improved accuracy compared to other boosting models.

1 . 3 . 4 .
Accuracy (ACC): This metric quantifies the overall correctness of the model's predictions.It calculates the ratio of correctly predicted instances (both true positives and true negatives) to the total number of instances.Mathematically, it can be expressed as:  =  +   +  +  +  where: o (TP) represents true positives (correctly predicted positive instances).o (TN) represents true negatives (correctly predicted negative instances).o (FP) represents false positives (incorrectly predicted positive instances).o (FN) represents false negatives (incorrectly predicted negative instances). 2. Probability of Detection (PD): Also known as sensitivity or recall, PD measures the model's ability to correctly identify positive instances detecting network attacks).It is defined as:  =  + A higher PD indicates better performance in capturing true positive cases.Probability of Misdetection (PMD): This metric represents the likelihood of failing to detect positive instances (i.e., network attacks).It is complementary to PD and can be calculated as:  = 1 −  Lower PMD values indicate better performance in minimizing missed detections.Probability of False Alarm (PFA): Also known as fall-out, PFA measures the rate at which the model incorrectly predicts positive instances when the actual class is negative (e.g., false alarms).It is defined as:  =   +  A lower PFA signifies fewer false alarms.

Figure 3
Figure 3 Performance evaluation of the ML models in terms of ACC, PD, PMD, and PFA for Test Data

Figure 4
Figure 4 Performance evaluation of cyber-attacks based on best ML models in terms of ACC, PD, PMD, and PFA.

Table 1
List of Attacks

Table 4
The ML models' performance in Terms of PRT, PT, TPS, and M for Test Data (best performancesare in bold) Among the supervised models, the AlexNet model achieved the most favorable results, while among the unsupervised models, the VA-Encoder stood out in terms of accuracy (ACC), probability of detection (PD), probability of misdetection (PMD), probability of false alarm (PFA), processing time (PRT), prediction time (PT), training time per sample (TPS), and memory usage (M).

Table 5
Performance of the ML Models in terms of PRT, PT, TPS, and M for TEST data Encoder excelled in detecting and classifying UDP attacks.While AlexNet slightly lagged behind in detecting MSSQL attacks, it outperformed VA-Encoder overall.Notably, VA-Encoder struggled with SSDP, NTP, and TFP attacks.In summary, AlexNet consistently demonstrated superior performance across most attack types, reaffirming its effectiveness in cyber fault detection.
higher PRT, PT, TPS, and memory usage for this specific attack type.Conversely, VA-Encoder demonstrated superior performance in detecting SSDP attacks, albeit with higher resource utilization.Sample data and table taken from research work "A Comparative Analysis of Supervised and Unsupervised Models for Detecting Attacks on the Intrusion Detection Systems"