Synthetic Data Pattern Simulation of Patient Care Journey Using K-Means Clustering
Abstract
Heterogeneous synthetic data is artificial data that can include many types of features (demographics, examinations, therapies). Complex patients (many procedures & medications) but fast service process and low complications. All patients are divided into 4 clusters, patient segmentation includes cluster 1 including mild patients, Cluster 2 including complex patients, Cluster 3 including high costs, Cluster 4 including high readmission risk. The highest silhouette score is 0.2187, which is obtained when the number of clusters (k) is 2. Based on previous calculations, the Davies-Bouldin Index result for the current clustering solution is 2.33. The Calinski-Harabasz index for the clustering solution with k=4 is 367.72. Clustering results are simply groups, without labels. Further analysis is needed to assign clinical meaning to each cluster.
Downloads
References
M. Rahman, “Data-driven business strategies with the power of the K-means algorithm,” vol. 11, no. 2, pp. 1–10, 2025.
Y. Chaiyo, W. Rueangsirarak, and G. Hristov, “Improving Early Detection of Dementia : Extra Trees-Based Classification Model Using Inter-Relation-Based Features and K-Means Synthetic Minority Oversampling Technique,” pp. 1–32, 2025.
N. Adhikari et al., “clustering algorithm for analysis of longitudinal trajectories in large electronic health records data,” 2025, doi: 10.1177/ToBeAssigned.
J. G. Marques and B. M. De Carvalho, “Pattern recognition in SARS cases : insights from t -SNE and k-means clustering applied to COVID- symptomatology”.
S. J. Pawan et al., “Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics : A Bootstrapped Approach,” pp. 1–12, 2025.
S. Healthcare, and H. Informatics, “AI-DRIVEN PREDICTIVE OPERATIONS MANAGEMENT : A BUSINESS SCIENCE FRAMEWORK FOR DYNAMIC HOSPITAL RESOURCE OPTIMIZATION AND CLINICAL WORKFLOW EFFICIENCY, 2025.
R. F. Pinheiro, M. P. Guarino, and M. Lages, “Prediabetes risk classi fi cation algorithm via carotid bodies and K-means clustering technique,” 2025, doi: 10.7717/peerj-cs.2516.
A. S. Sinaga and R. E. Putra, “Predictive Analytic Healthcare Sector Using Classification Machine Learning Algorithm,” Proceeding - 2022 Int. Symp. Inf. Technol. Digit. Innov. Technol. Innov. Dur. Pandemic, ISITDI 2022, pp. 59–64, 2022, doi: 10.1109/ISITDI55734.2022.9944492.
P. A. Gbadega, Y. Sun, and O. A. Balogun, “Optimized energy management in Grid-Connected microgrids leveraging K-means clustering algorithm and Artificial Neural network models,” Energy Convers. Manag., vol. 336, no. April, p. 119868, 2025, doi: 10.1016/j.enconman.2025.119868.
O. Kisi, S. Heddam, K. S. Parmar, A. Petroselli, C. Külls, and M. Zounemat-kermani, “Integration of Gaussian process regression and K means clustering for enhanced short term rainfall runoff modeling,” pp. 1–26, 2025.
A. S. R. M. Sinaga, R. E. Putra, and A. S. Girsang, “Prediction measuring local coffee production and marketing relationships coffee with big data analysis support,” Bull. Electr. Eng. Informatics, vol. 11, no. 5, pp. 2764–2772, 2022, doi: 10.11591/eei.v11i5.4082.
“Journal of Pathology Informatics,” vol. 14, no. August, p. 339259, 2023, doi: 10.1016/j.jpi.2023.100327.
A. Chen and D. O. Chen, “Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data,” Sci. Rep., pp. 1–11, 2022, doi: 10.1038/s41598-022-23011-4.
A. Tucker, “Generating high- fi delity synthetic patient data for assessing machine learning healthcare software,” npj Digit. Med., doi: 10.1038/s41746-020-00353-9.
J. Rajotte, R. Bergen, D. L. Buckeridge, K. El Emam, and R. Ng, “iScience ll Synthetic data as an enabler for machine learning applications in medicine,” ISCIENCE, vol. 25, no. 11, p. 105331, 2022, doi: 10.1016/j.isci.2022.105331.
Copyright (c) 2025 Arjon Samuel Sitio, Richard Parlindungan, Anita Sindar Sinaga

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


