Feature selection methods and sampling techniques to financial distress prediction for Vietnamese listed companies

  • Received December 21, 2018;
    Accepted March 12, 2019;
    Published March 25, 2019
  • Author(s)
  • DOI
    http://dx.doi.org/10.21511/imfi.16(1).2019.22
  • Article Info
    Volume 16 2019, Issue #1, pp. 276-290
  • TO CITE АНОТАЦІЯ
  • Cited by
    7 articles
  • 1599 Views
  • 252 Downloads

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

The research is taken to integrate the effects of variable selection approaches, as well as sampling techniques, to the performance of a model to predict the financial distress for companies whose stocks are traded on securities exchanges of Vietnam. A firm is financially distressed when its stocks are delisted as requirement from Vietnam Stock Exchange because of making a loss in 3 consecutive years or having accumulated a loss greater than the company’s equity. There are 12 models, constructed differently in feature selection methods, sampling techniques, and classifiers. The feature selection methods are factor analysis and F-score selection, while 3 sets of data samples are chosen by choice-based method with different percentages of financially distressed firms. In terms of classifying technique, logistic regression together with SVM are used in these models. Data are collected from listed firms in Vietnam from 2009 to 2017 for 1, 2 and 3 years before the announcement of their delisting requirement. The experiment’s results highlight the outperformance of the SVM model with F-score selection method in a data sample containing the highest percentage of non-financially distressed firms.

view full abstract hide full abstract
    • Table 1. Description of the models
    • Table 2. Stepwise selection results for logistic regression models – variable set 1
    • Table 3. Stepwise selection results for logistic regression models – variable set 2
    • Table 4. Features selected in SVM models – variable set 1
    • Table 5. Features selected in SVM models – variable set 2
    • Table 6. Logistic regression classification accuracy (%)
    • Table 7. Type I errors of logistic regression models (%)
    • Table 8. Summary of classification – SVM models (%)
    • Table 9. Type I error of SVM models (%)
    • Table A1. List of independent variables – variable set 1
    • Table A2. List of independent variables – variable set 2
    • Table A3. Results of factor analysis – variable set 1
    • Table A4. Results of factor analysis – variable set 2
    • Table A5. Model’s overall significance – variable set 1
    • Table A6. Model’s overall significance – variable set 2
    • Table A7. Summary of C and gamma