The COMPARISON OF NEURAL NETWORK AND RANDOM FOREST MODEL PERFORMANCE IN DIABETES DISEASE DETECTION

Anti Aisyah; Ferly Ardhy; Panji Bintoro; Tahta Herdian Andika

doi:10.33365/jtst.v7i1.1034

Anti Aisyah Program Studi Teknik Informatika, Fakultas Teknologi dan Informatika, Universitas Aisyah Pringsewu Jl. A Yani No. 1 A Tambak Rejo, Wonodadi, Kec. Pringsewu, Kabupaten Pringsewu, Lampung 35372
Ferly Ardhy Program Studi Teknik Informatika, Fakultas Teknologi dan Informatika, Universitas Aisyah Pringsewu Jl. A Yani No. 1 A Tambak Rejo, Wonodadi, Kec. Pringsewu, Kabupaten Pringsewu, Lampung 35372
Panji Bintoro Program Studi Rekayasa Perangkat Lunak, Fakultas Teknologi dan Informatika, Universitas Aisyah Pringsewu Jl. A Yani No. 1 A Tambak Rejo, Wonodadi, Kec. Pringsewu, Kabupaten Pringsewu, Lampung 35372
Tahta Herdian Andika Program Studi Teknik Informatika, Fakultas Teknologi dan Informatika, Universitas Aisyah Pringsewu Jl. A Yani No. 1 A Tambak Rejo, Wonodadi, Kec. Pringsewu, Kabupaten Pringsewu, Lampung 35372

DOI: https://doi.org/10.33365/jtst.v7i1.1034

Keywords: Diabetes, Smote, Early Detection, Neural Network, Random Forest

Abstract

In order to better understand how neural networks and random forest algorithms detect diabetes, this study will assess their performance. The necessity for trustworthy prediction methodologies, especially in the healthcare industry, is the driving force behind this study's foundation. Accuracy, precision, recall, F1-score, and AUC-ROC are important metrics, and the Pima Indian Diabetes dataset from Kaggle is used for this purpose.

Collecting and preparing data, normalizing it, and dividing it into training and testing subsets are all part of the study methodology. In order to rectify the data's class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was utilized to equalize the distribution of diabetes cases, both positive and negative. In contrast to the Random Forest model's use of an ensemble of decision trees to produce predictions, the Neural Network model was built with numerous hidden layers.

Applying SMOTE improved the performance of both models, according to the data. In comparison to the Neural Network's 61% F1-score, Random Forest's recall and F1-score improvement reached 72%. According to these results, data balancing greatly enhances the models' capacity to correctly detect positive and negative instances.

References

[1] R. Mallik, P. Kar, H. Mulder, and A. Krook, ―The future is here: an overview of technology in diabetes,‖ Diabetologia, vol. 67, no. 10, pp. 2019–2026, 2024, doi: 10.1007/s00125-024-06235-z.
[2] D. C. P. Buani, ―Deteksi Dini Penyakit Diabetes dengan Menggunakan Algoritma Random Forest,‖ EVOLUSI J. Sains dan Manaj., vol. 12, no. 1, pp. 1–8, 2024, doi: 10.31294/evolusi.v12i1.21005.
[3] L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G. Stiglic, ―Early detection of type 2 diabetes mellitus using machine learning-based prediction models,‖ Sci. Rep., vol. 10, no. 1, pp. 1–12, 2020, doi: 10.1038/s41598-020-68771z.
[4] S. Y. Rhee, J. M. Sung, S. Kim, I. J. Cho, S. E. Lee, and H. J. Chang,
―Development and validation of a deep learning based diabetes prediction system using a nationwide population-based cohort,‖ Diabetes Metab. J., vol. 45, no. 4, pp. 515–525, 2021, doi: 10.4093/DMJ.2020.0081.
[5] M. S. Alzboon, M. Al-Batah, M. Alqaraleh, A. Abuashour, and A. F. Bader,
―A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes,‖ 2023 IEEE 10th Int. Conf. Commun. Networking, ComNet 2023 - Proc., no. Ml, 2023, doi: 10.1109/ComNet60156.2023.10366688.
[6] N. Owano, ―Deep neural network trained to detect early signs of diabetes,‖ pp. 1–4, 2018.
[7] M. Rizky, A. Pramuntadi, D. Prastowo, and D. Hardan Gutama,
―Implementation of Deep Neural Network Method on Classification of Type 2 Diabetes Mellitus Disease Implementasi Metode Deep Neural Network pada Klasifikasi Penyakit Diabetes Melitus Tipe 2,‖ MALCOM Indones. J. Mach.
Learn. Comput. Sci. , vol. 4, no. 3, pp. 1043–1050, 2024.
[8] M. Y. Shams, Z. Tarek, and A. M. Elshewey, ―A novel RFE-GRU model for diabetes classification using PIMA Indian dataset,‖ Sci. Rep., vol. 15, no. 1,
pp. 1–22, 2025, doi: 10.1038/s41598-024-82420-9.
[9] T. Ooka, H. Johno, K. Nakamoto, Y. Yoda, H. Yokomichi, and Z. Yamagata,
―Random forest approach for determining risk prediction and predictive factors of type 2 diabetes: Large-scale health check-up data in Japan,‖ BMJ Nutr.
Prev. Heal., vol. 4, no. 1, pp. 140–148, 2021, doi: 10.1136/bmjnph-2020-
000200.
[10] C. N. Noviyanti and A. Alamsyah, ―Early Detection of Diabetes Using Random Forest Algorithm,‖ J. Inf. Syst. Explor. Res., vol. 2, no. 1, pp. 41–48,2024, doi: 10.52465/joiser.v2i1.245.
[11] H. Harwani, M. O. Khan, and A. Arora, ―Prognostication of Diabetes using Random Forest,‖ Int. J. Comput. Appl., vol. 175, no. 29, pp. 40–43, 2020, doi: 10.5120/ijca2020920833.
[12] A. Mousa, W. Mustafa, and R. B. Marqas, ―A Comparative Study of Diabetes Detection Using The Pima Indian Diabetes Database,‖ J. Univ. Duhok, vol. 26, no. 2, pp. 277–288, 2023, doi: 10.26682/suod.2023.26.2.24.
[13] Dr. K. Kasturi, ―Comparison of Machine Learning Models for Diabetes Prediction,‖ Int. J. Adv. Res. Sci. Commun. Technol., no. June, pp. 531–536, 2024, doi: 10.48175/ijarsct-19072.
[14] Zulkifli, F. A. Makkiyah, D. Antoni, F. Fitriana, T. Jamaan, and A. Taufik,
―Multi-Algorithm to Measure the Accuracy Level of Diabetes Status Prediction,‖ J. Appl. Data Sci., vol. 5, no. 2, pp. 736–746, 2024, doi: 10.47738/jads.v5i2.250.
[15] I. Tasin, T. U. Nabil, S. Islam, and R. Khan, ―Diabetes prediction using machine learning and explainable AI techniques,‖ Healthc. Technol. Lett., vol. 10, no. 1–2, pp. 1–10, 2023, doi: 10.1049/htl2.12039.
[16] Z. Mirikharaji et al., ―A survey on deep learning for skin lesion segmentation,‖
Med. Image Anal., vol. 88, pp. 1–55, 2023, doi: 10.1016/j.media.2023.102863.
[17] B. O. Olorunfemi et al., ―Efficient diagnosis of diabetes mellitus using an improved ensemble method,‖ Sci. Rep., vol. 15, no. 1, p. 3235, 2025, doi: 10.1038/s41598-025-87767-1.
[18] M. Jungkunz, A. Köngeter, K. Mehlis, E. C. Winkler, and C. Schickhardt,
―Secondary use of clinical data in data-gathering, non-interventional research or learning activities: Definition, types, and a framework for risk assessment,‖
J. Med. Internet Res., vol. 23, no. 6, 2021, doi: 10.2196/26631.
[19] Z. Zhang, ―Comparison of Machine Learning Models for Predicting Type 2 Diabetes Risk Using the Pima Indians Diabetes Dataset,‖ vol. 4, no. 1, pp. 65– 71, 2025, doi: 10.56397/JIMR/2025.02.07.
[20] M. Teresa García-Ordás, C. Benavides, J. Alberto Benítez-Andrades, H. Alaiz- Moretón, and I. García-Rodríguez, ―Computer Methods and Programs in Biomedicine Diabetes detection using deep learning techniques with oversampling and feature augmentation‖.
[21] L. Alzubaidi et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, vol. 8, no. 1. Springer International Publishing, 2021. doi: 10.1186/s40537-021-00444-8.
[22] Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, ―RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,‖ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3- 030-62008-0_35.
[23] G. Anwar and N. N. Abdullah, ―The impact of Human resource management practice on Organizational performance,‖ Int. J. Eng. Bus. Manag., vol. 5, no. 1, pp. 35–47, 2021, doi: 10.22161/ijebm.5.1.4.