Predicting motor insurance claim incidence using generalized and tree-based models: A comparative statistical approach

  • 5 Views
  • 0 Downloads

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

Type of the article: Research Article

Abstract
Accurate prediction of motor insurance claim frequency is necessary for efficient risk management, underwriting, and policy pricing. Predictive performance of Poisson Generalized Linear Models (GLMs), Decision Trees, and Generalized Additive Models (GAMs) is investigated using 108,699 motor third-party liability insurance contracts, representing the French Motor TPL dataset from the CASdatasets R package widely used in actuarial research. These models’ predictability, explainability, and flexibility on training and testing sets are compared using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Poisson Deviance metrics. Results indicate that, although GLM offers an interpretable, accurate baseline, GAM slightly surpasses GLM and Decision Trees under all performance measures. Results demonstrate that GAM achieves superior performance across all metrics, with the lowest MSE (0.0506), RMSE (0.2251), and Poisson Deviance (36.41% training, 37.76% test), compared to GLM (MSE: 0.0509, RMSE: 0.2257, Poisson Deviance: 36.83% training, 38.08% test) and Decision Trees (MSE: 0.0582, RMSE: 0.2413, Poisson Deviance: 37.12% training, 38.31% test). The GAM model reduces prediction error by approximately 0.6% compared to GLM and 13.1% compared to Decision Trees based on MSE. Empirical findings reveal how GAMs achieve an optimum balance between model explainability and prediction flexibility, rendering them best suited for insurers who want to refine risk segmentation without compromising on regulatory compliance and business transparency. This study joins other research calling for interpretable state-of-the-art statistical techniques in insurance analytics and presents worthwhile observations for actuaries and data scientists who wish to refine motor insurance frequency modeling frameworks.

view full abstract hide full abstract
    • Figure 1. Claim frequency by vehicle age and Bonus-Malus level
    • Figure 2. Claim frequency by driver age and Bonus-Malus level
    • Figure 3. Decision Tree for claim frequency
    • Figure 4. GAM smooth functions for vehicle age and driver age
    • Figure 5. GAM smooth functions for Bonus-Malus by driver age group
    • Table 1. Summary of dataset variables and their measurement scales
    • Table 2. Summary of descriptive statistics for key variables
    • Table 3. Decision Tree splits for claim frequency, with node sample sizes, deviances, and mean claim frequencies
    • Table 4. Poisson GLM regression results: estimated coefficients for claim frequency
    • Table 5. Parametric coefficient estimates from the GAM for claim frequency
    • Table 6. Approximate significance of smooth terms in the GAM
    • Table 7. Model performance comparison for claim frequency prediction
    • Conceptualization
      Eslam Abdelhakim Seyam
    • Data curation
      Eslam Abdelhakim Seyam
    • Formal Analysis
      Eslam Abdelhakim Seyam
    • Funding acquisition
      Eslam Abdelhakim Seyam
    • Investigation
      Eslam Abdelhakim Seyam
    • Methodology
      Eslam Abdelhakim Seyam
    • Project administration
      Eslam Abdelhakim Seyam
    • Resources
      Eslam Abdelhakim Seyam
    • Software
      Eslam Abdelhakim Seyam
    • Supervision
      Eslam Abdelhakim Seyam
    • Validation
      Eslam Abdelhakim Seyam
    • Visualization
      Eslam Abdelhakim Seyam
    • Writing – original draft
      Eslam Abdelhakim Seyam
    • Writing – review & editing
      Eslam Abdelhakim Seyam