OPTIMASI KLASIFIKASI GANGGUAN TIDUR PADA DATASET TIDAK SEIMBANG MENGGUNAKAN SMOTE DAN ALGORITMA MACHINE LEARNING

Titik Misriati; Riska Aryanti

doi:10.33365/teknoinfo.v19i2.295

Titik Misriati Universitas Bina Sarana Informatika
Riska Aryanti Universitas Bina Sarana Informatika

DOI: https://doi.org/10.33365/teknoinfo.v19i2.295

Keywords: sleep disorder, classification, machine learning, SMOTE, imbalanced data

Abstract

Sleep disorders are increasingly prevalent health issues that significantly affect individual’s quality of life. Timely detection and accurate classification of these disorders are essential for proper diagnosis and effective clinical intervention. However, a major challenge in classifying sleep disorders lies in the imbalance of data distribution—where majority classes have substantially more data than minority ones. This imbalance often leads to predictive models that favor the dominant class, thereby reducing overall classification accuracy. This study focuses on enhancing sleep disorder classification performance on imbalanced datasets by applying the Synthetic Minority Over-sampling Technique (SMOTE) to balance the data. It also evaluates the effectiveness of various machine learning algorithms in identifying sleep disorders. The algorithms analyzed include Random Forest (RF), Neural Network (NN), Naive Bayes (NB), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Logistic Regression (LR), tested both before and after applying SMOTE. Model performance was assessed using accuracy, precision, recall, and F1-score to ensure a comprehensive evaluation. The findings indicate that SMOTE consistently boosts the performance of all tested models. Among them, the Neural Network combined with SMOTE achieved the highest performance, with an accuracy of 92.00%, precision of 91.88%, recall of 92.00%, and an F1-score of 91.91%. Additionally, the Random Forest model with SMOTE produced the highest F1-score at 93.18%, demonstrating strong performance stability. These results highlight the effectiveness of integrating oversampling techniques like SMOTE with machine learning models to address class imbalance, leading to more accurate and reliable classification outcomes. The study offers valuable insights for developing AI-based medical decision support systems focused on sleep disorder diagnosis.

References

A. U. M. Hadori et al., Kesehatan Mental Dalam Kehidupan Masyarakat Modern: Manajemen Stres Dan Beberapa Fenomena Umum. Jakarta: Salemba Humanika, 2024.

A. M. A. Rahim, I. Y. R. Pratiwi, and M. A. Fikri, “Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier,” Indonesian Journal of Computer Science, vol. 12, no. 5, Nov. 2023, doi: 10.33022/ijcs.v12i5.3413.

A. X. Wang, V.-T. Le, H. N. Trung, and B. P. Nguyen, “Addressing imbalance in health data: Synthetic minority oversampling using deep learning,” Comput Biol Med, vol. 188, p. 109830, Apr. 2025, doi: 10.1016/j.compbiomed.2025.109830.

L. Hussain, K. J. Lone, I. A. Awan, A. A. Abbasi, and J.-R. Pirzada, “Detecting congestive heart failure by extracting multimodal features with synthetic minority oversampling technique (SMOTE) for imbalanced data using robust machine learning techniques,” Waves in Random and Complex Media, vol. 32, no. 3, pp. 1079–1102, May 2022, doi: 10.1080/17455030.2020.1810364.

I. A. Hidayat, “Classification of Sleep Disorders Using Random Forest on Sleep Health and Lifestyle Dataset,” Journal of Dinda : Data Science, Information Technology, and Data Analytics, vol. 3, no. 2, pp. 71–76, Aug. 2023, doi: 10.20895/dinda.v3i2.1215.

T. S. Alshammari, “Applying Machine Learning Algorithms for the Classification of Sleep Disorders,” IEEE Access, vol. 12, pp. 36110–36121, 2024, doi: 10.1109/ACCESS.2024.3374408.

D. Sari, “Prediksi Gangguan Tidur pada Sleep Health and Lifestyle Menggunakan Support Vector Machine dan Neural Network,” JAVIT : Jurnal Vokasi Informatika, pp. 36–42, Mar. 2024, doi: 10.24036/javit.v4i1.168.

M. A. C. Candrakasih, D. Krisbiantoro, and R. Waluyo, “Perbandingan Random Forest dan Support Vector Machine Untuk Klasifikasi Sleep Apnea,” Technologia : Jurnal Ilmiah, vol. 16, no. 2, p. 337, Apr. 2025, doi: 10.31602/tji.v16i2.18555.

Y. Anusha, R. Visalakshi, and K. Srinivas, “Imbalanced data classification using improved synthetic minority over-sampling technique,” Multiagent and Grid Systems, vol. 19, no. 2, pp. 117–131, Oct. 2023, doi: 10.3233/MGS-230007.

G. Wei, W. Mu, Y. Song, and J. Dou, “An improved and random synthetic minority oversampling technique for imbalanced data,” Knowl Based Syst, vol. 248, p. 108839, Jul. 2022, doi: 10.1016/j.knosys.2022.108839.

E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 677–690, Jul. 2022, doi: 10.30812/matrik.v21i3.1726.

Yoga Religia, Agung Nugroho, and Wahyu Hadikristanto, “Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 187–192, Feb. 2021, doi: 10.29207/resti.v5i1.2813.

R. G. McClarren, “Decision Trees and Random Forests for Regression and Classification,” in Machine Learning for Engineers, Cham: Springer International Publishing, 2021, pp. 55–82. doi: 10.1007/978-3-030-70388-2_3.

H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/BJML/2024/007.

R. Justo-Silva, A. Ferreira, and G. Flintsch, “Review on Machine Learning Techniques for Developing Pavement Performance Prediction Models,” Sustainability, vol. 13, no. 9, p. 5248, May 2021, doi: 10.3390/su13095248.

L. M. Sinaga, Sawaluddin, and S. Suwilo, “Analysis of classification and Naïve Bayes algorithm k-nearest neighbor in data mining,” IOP Conf Ser Mater Sci Eng, vol. 725, no. 1, p. 012106, Jan. 2020, doi: 10.1088/1757-899X/725/1/012106.

S. Budiyanto and I. Pratama, “Classification of Network Status in Academic Information Systems using Naive Bayes Algorithm Method,” in 2020 2nd International Conference on Broadband Communications, Wireless Sensors and Powering (BCWSP), IEEE, Sep. 2020, pp. 107–112. doi: 10.1109/BCWSP50066.2020.9249398.

R. Ehsani and F. Drabløs, “Robust Distance Measures for KNN Classification of Cancer Data,” Cancer Inform, vol. 19, Jan. 2020, doi: 10.1177/1176935120965542.

S. Zhang, “Challenges in KNN Classification,” IEEE Trans Knowl Data Eng, vol. 34, no. 10, pp. 4663–4675, Oct. 2022, doi: 10.1109/TKDE.2021.3049250.

L. Mohan, J. Pant, P. Suyal, and A. Kumar, “Support Vector Machine Accuracy Improvement with Classification,” in 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), IEEE, Sep. 2020, pp. 477–481. doi: 10.1109/CICN49253.2020.9242572.

A. V. Joshi, “Support Vector Machines,” in Machine Learning and Artificial Intelligence, Cham: Springer International Publishing, 2023, pp. 89–99. doi: 10.1007/978-3-031-12282-8_8.

H. Li, “Support Vector Machine,” in Machine Learning Methods, Singapore: Springer Nature Singapore, 2024, pp. 127–177. doi: 10.1007/978-981-99-3917-6_7.

A. Purnama and D. Hamidin, “Metode Algoritma Logistic Regression dalam Klasifikasi Email Spam,” Journal Software, Hardware and Information Technology, vol. 5, no. 1, pp. 39–47, Jan. 2025, doi: 10.24252/shift.v5i1.159.

I. I. Ridho, G. Mahalisa, D. R. Sari, and I. Fikri, “Metode Neural Network Untuk Penentuan Akurasi Prediksi Harga Rumah,” Technologia : Jurnal Ilmiah, vol. 13, no. 1, p. 56, Feb. 2022, doi: 10.31602/tji.v13i1.6252.

B. Kumaraswamy, “Neural Networks for Data Classification,” in Artificial Intelligence in Data Mining, Elsevier, 2021, pp. 109–131. doi: 10.1016/B978-0-12-820601-0.00011-2.

F. Sutomo et al., “Optimization Of The K-Nearest Neighbors Algorithm Using The Elbow Method on Stroke Prediction,” Jurnal Teknik Informatika (Jutif), vol. 4, no. 1, pp. 125–130, Feb. 2023, doi: 10.52436/1.jutif.2023.4.1.839.

K. Gupta, N. Jiwani, N. Afreen, and D. D, “Liver Disease Prediction using Machine learning Classification Techniques,” in 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT), IEEE, Apr. 2022, pp. 221–226. doi: 10.1109/CSNT54456.2022.9787574.

W. Andriani, Gunawan, and N. N. P. W. Naja, “Analisis perbandingan machine learning untuk prediksi kelayakan kredit perbankan pada Bank BRI Tegal,” IT-Explore: Jurnal Penerapan Teknologi Informasi dan Komunikasi, vol. 4, no. 1, pp. 82–92, Feb. 2025, doi: 10.24246/itexplore.v4i1.2025.pp82-92.

L. Tharmalingam, “Sleep Health and Lifestyle Dataset.” [Online]. Available: https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset

J. Pardede and M. F. Raspati, “Gated Recurrent Units dalam Mendeteksi Obstructive Sleep Apnea,” MIND Journal, vol. 6, no. 2, pp. 221–235, Dec. 2021, doi: 10.26760/mindjournal.v6i2.221-235.

W. D. Septiani, “Klasifikasi Gangguan Tidur Menggunakan Metode Decision Tree dan Algoritma Genetika,” Jurnal Minfo Polgan, vol. 13, no. 2, pp. 2668–2675, Feb. 2025, doi: 10.33395/jmp.v13i2.14585.