Comparative Study of Machine Learning Approaches Based on Artificial Neural Network, Regression, and Clustering for Diabetes Prediction
Keywords:
Artificial Neural Network, Logistic Regression, K-Means Clustering, Diabetes Prediction, Machine Learning,Abstract
This study presents a comparative analysis of three machine learning model and algorithms Artificial Neural Network (ANN), Logistic Regression, and K-Means Clustering using the Pima Indians Diabetes dataset. The main objective is to evaluate the performance of supervised and unsupervised methods in predicting diabetes based on physiological and clinical features. he ANN model was developed using a feedforward and backpropagation approach, Logistic Regression applied the fundamental logit equation, and K-Means Clustering was employed as an unsupervised reference. Model performance was assessed using Accuracy, Precision, Recall, and F1-score for supervised models, and Adjusted Rand Index (ARI) for clustering. Experimental results indicate that Logistic Regression achieved the best accuracy of 0.7573, followed by ANN with 0.7078, while K-Means obtained an ARI of 0.1614. The heatmap comparison shows that supervised models outperform unsupervised approaches, with Logistic Regression offering better interpretability and stability, and ANN demonstrating the ability to model nonlinear relationships. K-Means, though less accurate, provided valuable insight into data structure and natural grouping. Overall, the findings confirm that supervised learning models, particularly Logistic Regression and ANN, are more effective for medical prediction tasks. Future research may explore hybrid or ensemble models that combine the interpretability of Logistic Regression, the adaptability of ANN, and the exploratory capability of clustering to enhance medical diagnostic performance.
References
A. M. Egan and S. F. Dinneen, “What is diabetes?,” Med. (United Kingdom), vol. 47, no. 1, pp. 1–4, 2019, doi: 10.1016/j.mpmed.2018.10.002.
J. K. Mathew and S. S. Lakshmi, “A Study on Diagnosis of Diabetes Mellitus Based on Tongue Images with Various Methods,” Proc. Int. Conf. Comput. Commun. Secur. Intell. Syst. IC3SIS 2022, 2022, doi: 10.1109/IC3SIS54991.2022.9885616.
G. Roglic, “WHO Global report on diabetes: A summary,” Int. J. Noncommunicable Dis., vol. 1, no. 1, pp. 3–8, 2016, doi: 10.4103/2468-8827.184853.
I. Kusumastuty, D. M. Halimatussa’diah, C. S. Wilujeng, and F. A. Nugroho, “Gambaran Pola Asuh terhadap Kepatuhan Diet Anak dan Remaja dengan Diabetes Mellitus: Studi Kasus,” Indones. J. Hum. Nutr., vol. 7, no. 2, pp. 139–152, 2020, [Online]. Available: https://www.researchgate.net/profile/Fajar_Ari_Nugroho/publication/314713055_Kadar_NF-_Kb_Pankreas_Tikus_Model_Type_2_Diabetes_Mellitus_dengan_Pemberian_Tepung_Susu_Sapi/links/5b4dbf09aca27217ff9b6fcb/Kadar-NF-Kb-Pankreas-Tikus-Model-Type-2-Diabetes-Melli
E. D. Parker et al., “Economic costs of diabetes in the u.S. in 2022,” Diabetes Care, vol. 47, no. 1, pp. 26–43, 2024, doi: 10.2337/dci23-0085.
M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, “Diabetes prediction using ensembling of different machine learning classifiers,” IEEE Access, vol. 8, pp. 76516–76531, 2020, doi: 10.1109/ACCESS.2020.2989857.
T. Guillod, P. Papamanolis, and J. W. Kolar, “Artificial neural network (ann) based fast and accurate inductor modeling and design,” IEEE Open J. Power Electron., vol. 1, pp. 284–299, 2020, doi: 10.1109/OJPEL.2020.3012777.
H. Kahramanli and N. Allahverdi, “Design of a hybrid system for the diabetes and heart diseases,” Expert Syst. Appl., vol. 35, no. 1–2, pp. 82–89, 2008, doi: 10.1016/j.eswa.2007.06.004.
W. Li, W. Wang, and W. Huo, “RegBoost: a gradient boosted multivariate regression algorithm,” Int. J. Crowd Sci., vol. 4, no. 1, pp. 60–72, 2020, doi: 10.1108/IJCS-10-2019-0029.
M. R. Romadhon and F. Kurniawan, “A Comparison of Naive Bayes Methods, Logistic Regression and KNN for Predicting Healing of Covid-19 Patients in Indonesia,” 3rd 2021 East Indones. Conf. Comput. Inf. Technol. EIConCIT 2021, pp. 41–44, 2021, doi: 10.1109/EIConCIT50028.2021.9431845.
J. Li, W. Liu, M. Liu, and M. Huang, “Study on Chinese Text Clustering Algorithm Based on K-mean and Evaluation Method on Effect of Clustering for Software-intensive System,” Proc. - 2020 Int. Conf. Comput. Eng. Appl. ICCEA 2020, pp. 513–519, 2020, doi: 10.1109/ICCEA50009.2020.00114.
M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electron., vol. 9, no. 8, pp. 1–12, 2020, doi: 10.3390/electronics9081295.
N. M. Karie, N. M. Sahri, W. Yang, C. Valli, and V. R. Kebande, “A Review of Security Standards and Frameworks for IoT-Based Smart Environments,” IEEE Access, vol. 9, pp. 121975–121995, 2021, doi: 10.1109/ACCESS.2021.3109886.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nauval Alfarizi, Adi Putra, Prima Lydia Yosophin Batubara, Satria Sinurat

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
