Enhancing Preventive Healthcare: Developing a Robust ML-Based Model for Diabetes Prediction
DOI:
https://doi.org/10.51485/ajss.v10i4.290Keywords:
Diabetes prediction, Machine Learning, Random Forest, Cross-Validation, Pima Indians Dataset, Healthcare Analytics, Preprocessing, Predictive ModelingAbstract
Diabetes Mellitus represents a significant global health challenge, with early detection being crucial for mitigating severe complications. This study conducts a rigorous comparative analysis of machine learning models for diabetes prediction, leveraging the Pima Indians Diabetes Dataset. We implemented a rigorous preprocessing protocol to address the dataset’s inherent challenges, including the handling of missing data denoted by zero values in key clinical features. Four machine learning algorithms—Support Vector Machine (SVM), Random Forest, Decision Tree, and Naïve Bayes—were meticulously optimized and evaluated using stratified 10-fold cross-validation. This method ensures a robust and generalizable assessment of model performance. Our results indicate that the Random Forest classifier outperformed its counterparts, achieving a mean cross-validation accuracy of 84.2%, a precision of 0.80, a recall of 0.82, an F1-score of 0.81, and an AUC of 0.90. The study demonstrates the efficacy of ensemble methods in medical diagnostics and provides a transparent, reproducible benchmark for future research. This research underscores the potential of ML-based tools to augment traditional diagnostic methods, paving the way for accessible prescreening in diverse clinical environments.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Kavya Markapuram, Veema Rao, Bobbepalli Meera Mohiddin Shaik, Chakrapani Sai Manikanta Badigunchala, Rajasekhar Boddu, Krishna Jyothi Nannapaneni

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

