Synthetic Data Pattern Simulation of Patient Care Journey Using K-Means Clustering

Arjon Samuel  Sitio; Richard  Parlindungan; Anita Sindar Sinaga

doi:10.33365/jatika.v6i4.1498

Arjon Samuel Sitio Tjut Nyak Dhien University
Richard Parlindungan Tjut Nyak Dhien University
Anita Sindar Sinaga STMIK Pelita Nusantara

DOI: https://doi.org/10.33365/jatika.v6i4.1498

Keywords: Data Sintetis, Clusters, Centroid, Evaluation Metric, K-Means Clustering

Abstract

Heterogeneous synthetic data is artificial data that can include many types of features (demographics, examinations, therapies). Complex patients (many procedures & medications) but fast service process and low complications. All patients are divided into 4 clusters, patient segmentation includes cluster 1 including mild patients, Cluster 2 including complex patients, Cluster 3 including high costs, Cluster 4 including high readmission risk. The highest silhouette score is 0.2187, which is obtained when the number of clusters (k) is 2. Based on previous calculations, the Davies-Bouldin Index result for the current clustering solution is 2.33. The Calinski-Harabasz index for the clustering solution with k=4 is 367.72. Clustering results are simply groups, without labels. Further analysis is needed to assign clinical meaning to each cluster.

Downloads

Download data is not yet available.

References

M. Rahman, “Data-driven business strategies with the power of the K-means algorithm,” vol. 11, no. 2, pp. 1–10, 2025.

Y. Chaiyo, W. Rueangsirarak, and G. Hristov, “Improving Early Detection of Dementia : Extra Trees-Based Classification Model Using Inter-Relation-Based Features and K-Means Synthetic Minority Oversampling Technique,” pp. 1–32, 2025.

N. Adhikari et al., “clustering algorithm for analysis of longitudinal trajectories in large electronic health records data,” 2025, doi: 10.1177/ToBeAssigned.

J. G. Marques and B. M. De Carvalho, “Pattern recognition in SARS cases : insights from t -SNE and k-means clustering applied to COVID- symptomatology”.

S. J. Pawan et al., “Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics : A Bootstrapped Approach,” pp. 1–12, 2025.

S. Healthcare, and H. Informatics, “AI-DRIVEN PREDICTIVE OPERATIONS MANAGEMENT : A BUSINESS SCIENCE FRAMEWORK FOR DYNAMIC HOSPITAL RESOURCE OPTIMIZATION AND CLINICAL WORKFLOW EFFICIENCY, 2025.

R. F. Pinheiro, M. P. Guarino, and M. Lages, “Prediabetes risk classi fi cation algorithm via carotid bodies and K-means clustering technique,” 2025, doi: 10.7717/peerj-cs.2516.

A. S. Sinaga and R. E. Putra, “Predictive Analytic Healthcare Sector Using Classification Machine Learning Algorithm,” Proceeding - 2022 Int. Symp. Inf. Technol. Digit. Innov. Technol. Innov. Dur. Pandemic, ISITDI 2022, pp. 59–64, 2022, doi: 10.1109/ISITDI55734.2022.9944492.

P. A. Gbadega, Y. Sun, and O. A. Balogun, “Optimized energy management in Grid-Connected microgrids leveraging K-means clustering algorithm and Artificial Neural network models,” Energy Convers. Manag., vol. 336, no. April, p. 119868, 2025, doi: 10.1016/j.enconman.2025.119868.

O. Kisi, S. Heddam, K. S. Parmar, A. Petroselli, C. Külls, and M. Zounemat-kermani, “Integration of Gaussian process regression and K means clustering for enhanced short term rainfall runoff modeling,” pp. 1–26, 2025.

A. S. R. M. Sinaga, R. E. Putra, and A. S. Girsang, “Prediction measuring local coffee production and marketing relationships coffee with big data analysis support,” Bull. Electr. Eng. Informatics, vol. 11, no. 5, pp. 2764–2772, 2022, doi: 10.11591/eei.v11i5.4082.

“Journal of Pathology Informatics,” vol. 14, no. August, p. 339259, 2023, doi: 10.1016/j.jpi.2023.100327.

A. Chen and D. O. Chen, “Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data,” Sci. Rep., pp. 1–11, 2022, doi: 10.1038/s41598-022-23011-4.

A. Tucker, “Generating high- fi delity synthetic patient data for assessing machine learning healthcare software,” npj Digit. Med., doi: 10.1038/s41746-020-00353-9.

J. Rajotte, R. Bergen, D. L. Buckeridge, K. El Emam, and R. Ng, “iScience ll Synthetic data as an enabler for machine learning applications in medicine,” ISCIENCE, vol. 25, no. 11, p. 105331, 2022, doi: 10.1016/j.isci.2022.105331.