Comparison of Prediction Models: Decision Tree, Random Forest, and Support Vector Regression

  • Kurnia Ramadhan Putra Institut Teknologi Nasional Bandung
Keywords: IT Salary Prediction; Machine Learning Models; Random Forest Regression; Predictive Analytics; Demographic and Professional Data

Abstract

The Information Technology (IT) industry continues to grow rapidly, creating challenges in determining fair and competitive salaries for professionals. Accurate salary predictions are essential for companies to attract and retain talent while providing insights for individual career planning. This research aims to compare the performance of three machine learning models, such as Decision Tree Regression, Random Forest Regression, and Support Vector Regression in predicting IT sector salaries using demographic and professional data, including age, gender, education level, job position, and work experience. The study uses a dataset of 6,704 entries from Kaggle, with relationships between variables analyzed through statistical techniques such as Pearson Correlation and ANOVA. Model performance was evaluated using the R² Score, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). Among the models, Random Forest Regression demonstrated the best performance, achieving the highest R² of 91.49% and an RMSE of 0.058, indicating high predictive accuracy with low error rates. Scatter plot visualizations confirm a strong correlation between actual and predicted salaries, supported by error analysis identifying minimal overestimation and underestimation cases. The research concludes that Random Forest Regression is the most effective model for IT salary predictions. These findings provide practical insights for organizations and individuals, highlighting the potential of data-driven approaches in salary determination. Future studies may focus on hyperparameter optimization and incorporating additional features to improve model performance and generalizability further improve model performance and generalizability.

Downloads

Download data is not yet available.

References

O. Dilip Dsouza et al., “Salary Estimator using Machine Learning,” Int. J. All Res. Educ. Sci. Methods, vol. 12, no. 1, pp. 2455–6211, 2024, [Online]. Available: https://www.researchgate.net/publication/377776848

D. M. Lothe, P. Tiwari, N. Patil, S. Patil, and V. Patil, “Salary Prediction Using Machine Learning,” Int. J. Adv. Sci. Res., vol. 6, no. 5, p. 199, 2021.

Y. GÖRMEZ, H. ARSLAN, S. SARI, and M. DANIŞ, “SALDA-ML: Machine Learning Based System Design to Predict Salary In-crease,” Adv. Artif. Intell. Res., vol. 2, no. 1, pp. 15–19, 2022, doi: 10.54569/aair.1029836.

D. Nyoman, M. Cahyani, N. Putu, and K. Indah, “Comparison Of Decision Tree, Linear Regression, and Random Forest Regressor Models for Predicting House Prices,” vol. 12, no. 1, pp. 62–71, 2024.

S. Wijaya and F. Fauziah, “Analysis of the Comparison Between Linear Regression, Random Forest, and Logistic Regression Methods in Predicting Crude Palm Oil (CPO) Price,” Brill. Res. Artif. Intell., vol. 3, no. 2, pp. 343–350, 2023, doi: 10.47709/brilliance.v3i2.3334.

F. Özen, “Random forest regression for prediction of Covid-19 daily cases and deaths in Turkey,” Heliyon, vol. 10, no. 4, pp. 1–19, 2024, doi: 10.1016/j.heliyon.2024.e25746.

D. Doz, M. Cotič, and D. Felda, “Random Forest Regression in Predicting Students’ Achievements and Fuzzy Grades,” Mathematics, vol. 11, no. 19, 2023, doi: 10.3390/math11194129.

M. Mao, “A Comparative Study of Random Forest Regression for Predicting House Prices Using,” Highlights Sci. Eng. Technol., vol. 85, pp. 969–974, 2024, doi: 10.54097/bdfe8032.

Wulan Septya Zulmawati, Nonong Amalita, Syafriandi Syafriandi, and Admi Salma, “Evaluation of Support Vector Regression Methods inPredictions Bitcoin’s Close Price,” UNP J. Stat. Data Sci., vol. 1, no. 5, pp. 488–495, 2023.

T. Yu and H. Zhu, “Hyper-Parameter Optimization: A Review of Algorithms and Applications,” pp. 1–56, 2020, [Online]. Available: http://arxiv.org/abs/2003.05689

D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput. Sci., vol. 7, pp. 1–24, 2021, doi: 10.7717/PEERJ-CS.623.

S. I. Ayua, Y. M. Malgwi, and J. Afrifa, “Salary Prediction Model for Non-Academic Staff Using Polynomial Regression Technique,” Artif. Intell. Appl., no. 2021, pp. 1–11, 2023, doi: 10.47852/bonviewaia3202795.

C. Magazzino, M. Mele, and M. Mutascu, “An artificial neural network experiment on the prediction of the unemployment rate,” J. Policy Model., no. xxxx, pp. 1–21, 2025, doi: 10.1016/j.jpolmod.2024.10.004.

Abdullah-All-Tanvir, I. Ali Khandokar, A. K. M. Muzahidul Islam, S. Islam, and S. Shatabda, “A gradient boosting classifier for purchase intention prediction of online shoppers,” Heliyon, vol. 9, no. 4, p. e15163, 2023, doi: 10.1016/j.heliyon.2023.e15163.

H. Aminu, B. Imam Yau, F. Umar Zambuk, E. Ramsom Nanin, A. Abdullahi, and I. Zahraddeen Yakubu, “Salary Prediction Model using Principal Component Analysis and Deep Neural Network Algorithm,” Int. J. Innov. Sci. Res. Technol., vol. 8, no. 12, pp. 1–11, 2023, [Online]. Available: www.ijisrt.com

F. Zinzendoff Okwonu, B. Laro Asaju, and F. Irimisose Arunaye, “Breakdown Analysis of Pearson Correlation Coefficient and Robust Correlation Methods,” IOP Conf. Ser. Mater. Sci. Eng., vol. 917, no. 1, 2020, doi: 10.1088/1757-899X/917/1/012065.

J. Kotary, V. Di Vito, J. Cristopher, P. Van Hentenryck, and F. Fioretto, “Learning Joint Models of Prediction and Optimization,” vol. 2, no. d, 2024, doi: 10.3233/FAIA240775.

E. D. Wahyuni, A. A. Arifiyanti, and M. Kustyani, “Exploratory Data Analysis dalam Konteks Klasifikasi Data Mining,” Pros. Nas. Rekayasa Teknol. Ind. dan Inf. XIV Tahun 2019, vol. 2019, no. November, pp. 263–269, 2019, [Online]. Available: http://journal.itny.ac.id/index.php/ReTII

M.- Mambang, “Exploratory Data Analysis of Exact Science and Social Science Learning Content on Digital Platform,” Walisongo J. Inf. Technol., vol. 4, no. 2, pp. 87–94, 2022, doi: 10.21580/wjit.2022.4.2.12676.

A. S. Rao, B. V. Vardhan, and H. Shaik, “Role of Exploratory Data Analysis in Data Science,” Proc. 6th Int. Conf. Commun. Electron. Syst. ICCES 2021, no. August, pp. 1457–1461, 2021, doi: 10.1109/ICCES51350.2021.9488986.

I. Muhamad Malik Matin, “Hyperparameter Tuning Menggunakan GridsearchCV pada Random Forest untuk Deteksi Malware,” Multinetics, vol. 9, no. 1, pp. 43–50, 2023, doi: 10.32722/multinetics.v9i1.5578.

Published
2025-03-15
How to Cite
Putra, K. R. (2025). Comparison of Prediction Models: Decision Tree, Random Forest, and Support Vector Regression. Jurnal Informatika Dan Rekayasa Perangkat Lunak, 6(1), 39-49. https://doi.org/10.33365/jatika.v6i1.18