Building a new hybrid machine learning model for improvement insurance cross-sell prediction

Authors

  • Doan Gia Bao Ngoc
    University of Economics and Law, Ho Chi Minh CityVietnam National University, Ho Chi Minh City, VN
  • Luu Minh Quan
    University of Economics and Law, Ho Chi Minh CityVietnam National University, Ho Chi Minh City, VN
  • Truong Thi Thanh Ha
    University of Economics and Law, Ho Chi Minh CityVietnam National University, Ho Chi Minh City, VN
  • Nguyen Duc Minh Tan
    University of Economics and Law, Ho Chi Minh CityVietnam National University, Ho Chi Minh City, VN
  • Phan Thi Minh Huyen
    University of Economics and Law, Ho Chi Minh CityVietnam National University, Ho Chi Minh City, VN
  • Duy Thanh Tran
    University of Economics and Law, Ho Chi Minh CityVietnam National University, Ho Chi Minh City, VN

DOI:

10.46223/HCMCOUJS.econ.en.16.1.4306.2026

Keywords:

Borderline-SMOTE; cross-sell prediction; decision tree; hybrid model; logistic regression; random forest; ROC-AUC; XGBoost

JEL Classification:

C53; E27; E37

Abstract

Amid rising competition in the insurance sector, optimizing cross-selling strategies is crucial for sustainable growth and requires a deep understanding of customer behavior. This study proposes a machine learning-driven framework for cross-sell prediction to enhance personalization, increase conversion rates, and maximize return on investment. Using 381,109 customer records from an insurance company, the data undergoes preprocessing steps including outlier treatment for Annual Premium, encoding categorical variables such as Gender and Vehicle Age, and standardizing numerical features like Age, Annual Premium, and Vintage. To address class imbalance in the Response variable, where only 12.26 percent of customers responded positively, Borderline-Synthetic Minority Over-sampling Technique (Borderline-SMOTE) is applied to generate synthetic samples and improve prediction accuracy. Four machine learning models, including Logistic Regression, Decision Tree, Random Forest, and XGBoost, are trained and evaluated using Accuracy, Receiver Operating Characteristic - Area Under the Curve (ROC-AUC), Mean Absolute Error, Mean Squared Error, and Root Mean Squared Error. Among these, XGBoost with Borderline-SMOTE achieves the best performance, with an accuracy of 0.84 and a ROC-AUC score of 0.8436, representing a significant improvement over the baseline XGBoost model with a ROC-AUC of 0.7768. Logistic Regression also improves, with its ROC-AUC increasing from 0.8250 to 0.8451. Visual analysis reveals behavioral patterns, such as a 25 percent purchase rate among customers with vehicles older than two years and a 20 percent rate among male customers with prior vehicle damage. The study delivers a high-performing predictive model to support targeted marketing efforts, potentially increasing cross-sell conversion rates by 5 to 10 percent. Future work will explore deep learning techniques and larger datasets to further enhance prediction capabilities.

Downloads

Download data is not yet available.

References

Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20-29.

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Brockett, P. L., Golden, L. L., & Guillén, M. (2008). Genetic programming for cross-selling insurance products. Journal of Risk and Insurance, 75(3), 641-658. https://doi.org/10.1111/j.1539-6975.2008.00279.x

Downloads

Received: 13-04-2025
Accepted: 02-06-2025
Published: 07-09-2025

Statistics Views

Abstract: 748
PDF: 55

How to Cite

Doan, N. G. B., Luu, Q. M., Truong, H. T. T., Nguyen, T. D. M., Phan, H. T. M., & Tran, D. T. (2025). Building a new hybrid machine learning model for improvement insurance cross-sell prediction. HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE - ECONOMICS AND BUSINESS ADMINISTRATION, 16(1), 92–110. https://doi.org/10.46223/HCMCOUJS.econ.en.16.1.4306.2026