Classifying Spams Using Apache Spark MLlib

Mayowa Timothy Adesina; Joshua Mayokun Adesina

doi:10.30574/ijsra.2024.12.2.1332

Classifying Spams Using Apache Spark MLlib

Mayowa Timothy Adesina ^{1, *} and Joshua Mayokun Adesina ²

¹College of Business Administration, Kansas State University, Manhattan, KS 66502

²Sociology Department, Federal University, Oye-Ekiti, Nigeria.

Research Article

International Journal of Science and Research Archive, 2024, 12(02), 2091-2112.

Article DOI: 10.30574/ijsra.2024.12.2.1332

DOI url: https://doi.org/10.30574/ijsra.2024.12.2.1332

Publication history:

Received on 26 June 2024; revised on 11 August 2024; accepted on 13 August 2024

Abstract:

This paper provides a comprehensive overview of various machine learning algorithms, including Logistic Regression, Decision Trees, and Random Forests, with a focus on their application in predictive modeling. The discussion emphasizes the importance of feature selection, engineering, and model evaluation techniques like cross-validation to ensure robust and generalizable models. By leveraging the Spambase dataset from the UCI Machine Learning Repository, the performance of these algorithms is compared and contrasted using key metrics such as accuracy, precision, recall, and F1-score. The paper also highlights the significance of understanding dataset characteristics and feature importance in optimizing model performance. The findings demonstrate that while each algorithm has its strengths and limitations, Random Forests generally provide superior predictive performance, especially in handling complex and high-dimensional datasets. This work serves as a valuable resource for data scientists and researchers looking to understand the practical implications of different machine learning techniques and their impact on real-world data.

Keywords:

Machine Learning; Artificial Intelligence; Spam Messages; Logistic Regression; Decision Tree; Random Forest; Apache Spark

Full text article in PDF:

Click here

Classifying Spams Using Apache Spark MLlib

Mayowa Timothy Adesina ^{1, *} and Joshua Mayokun Adesina ²

For Authors: Fast Publication of Research and Review Papers

ISSN Approved Journal publication within 48 hrs in minimum fees USD 35, Impact Factor 8.2

Classifying Spams Using Apache Spark MLlib

Mayowa Timothy Adesina 1, * and Joshua Mayokun Adesina 2

For Authors: Fast Publication of Research and Review Papers

ISSN Approved Journal publication within 48 hrs in minimum fees USD 35, Impact Factor 8.2

Mayowa Timothy Adesina ^{1, *} and Joshua Mayokun Adesina ²