“Multiple-step value-at-risk forecasts based on volatility-filtered MIDAS quantile regression: Evidence from major investment assets”

Forecasting multiple-step value-at-risk (VaR) consistently across asset classes is hin- dered by the limited sample size of low-frequency returns and the potential model misspecification when assuming identical return distributions over different holding periods. This paper hence investigates the predictive power for multi-step VaR of a framework that models separately the volatility component and the error term of the return distribution. The proposed model is illustrated with ten asset returns series including global stock markets, commodity futures, and currency exchange products. The estimation results confirm that the volatility-filter residuals demonstrate distin-guished tail dynamics to that of the return series. The estimation results suggest that volatility-filtered residuals may have either negative or positive tail dependence, unlike the unanimous negative tail dependence in the return series. By comparing the pro- posed model to several alternative approaches, the results from both the formal and informal tests show that the specification under concern performs equivalently well if not better than its top competitors at the 2.5% and 5% risk level in terms of accuracy and validity. The proposed model also generates more consistent VaR forecasts under both the 5-step and 10-step setup than the MIDAS-Q model. a 10-day VaR according to the rules of thumb. Such calculation ignores the fact that the distributional characteristics of 10-day returns might be different from those of daily returns. Two categories of methods are commonly employed to predict multi-step-ahead VaR. The first category is to convert daily return data directly to low-frequency data. This conversion not only causes a potential loss of high-frequency information but also largely reduces the sample size, which is especially problematic for model validation. The second category involves two methods that transform the estimation of daily volatility into volatility estimation of multiple periods. The first method aggregates one-step volatility forecasts of multiple periods into the volatility of lower frequency returns with an iterating method. This method ignores the fact that the prediction error increases with the forecasting horizon.


INTRODUCTION
Value-at-risk (VaR) forecasting plays an important role in financial risk management. However, the extant literature on VaR forecasts mainly focuses on single-period risk prediction, lacking a sound implementation of extrapolating the single-period forecast to a multiplestep ahead forecast. For example, while regulators often require banks to measure their risk with a 10-day VaR at the 2.5% risk level, the industry practice is, however, to convert the 1-day VaR directly into a 10-day VaR according to the rules of thumb. Such calculation ignores the fact that the distributional characteristics of 10-day returns might be different from those of daily returns. Two categories of methods are commonly employed to predict multi-step-ahead VaR. The first category is to convert daily return data directly to low-frequency data. This conversion not only causes a potential loss of high-frequency information but also largely reduces the sample size, which is especially problematic for model validation. The second category involves two methods that transform the estimation of daily volatility into volatility estimation of multiple periods. The first method aggregates onestep volatility forecasts of multiple periods into the volatility of lower frequency returns with an iterating method. This method ignores the fact that the prediction error increases with the forecasting horizon.

LITERATURE REVIEW
This study is closely related to three main strands of studies for forecasting the VaR, namely the onestep VaR prediction, multi-step VaR measurement, and mixed data sampling. This paper attempts to integrate them into a unified practical approach.
First, there are three categories of the one-step VaR forecasting models, i.e., non-parametric, parametric, and semi-parametric. The non-parametric approach relies on historical data to simulate VaR forecasts without putting forward a hypothesis on the distribution of return series. The filtered historical (Barone-Adesi et al., 1999, 2002 and Monte Carlo simulations are most commonly used to obtain VaR prediction. While the non-parametric approach is easy to understand, it has been criticized for several reasons. For example, its prediction value of VaR cannot exceed the maximum loss in historical returns; the independent distribution assumption on return series cannot address model misspecification issues, and its slow adjustment cannot capture the dynamic characteristics of the return series. The parametric approaches make assumptions on the return distribution and attempt to capture the fat-tailed, volatility clustering, and negative skewness features of the return distribution, according to the findings of Fama (1965) and Bollerslev et al. (1992). In industry practice, the RiskMetrics model utilizes the exponentially weighted average model (EWMA) to measure the dynamics of conditional volatility. Engle (1982) and Bollerslev (1986) suggest the use of generalized autoregressive conditional heteroscedasticity (GARCH) models, and such models are further augmented by Braun et al. (1995), Baillie et al. (1996), Ding et al. (1993), Giot  Second, the multiple-step VaR is measured by extending the one-step-ahead model not only naively assuming the return distributions of different time horizons are identical, but also involving a more complicated multi-stage setup. Essentially, there exist three common methods for multiple-step VaR prediction -direct method, iterating method, and square-root rule for extrapolating VaR. The first two converts daily data into low-frequency data. Marcellino et al. (2006) compare the direct method with the iterating method on the multi-step forecast for macroeconomic indicators in the U.S. over 170 months. The results find the iterating method to be the one producing more accurate predictions. So and Wong (2012) derive an exact analytical relationship between multi-period and single-period conditional variance. It is concluded that the functional method outperforms several benchmarks for a 10-day holding period on seven national market indexes. Lönnbark (2016) estimates multiple-step volatility via the Gram-Charlier expansion of the conditional density function and proves its superiority compared to traditional prediction methods for 10-day-VaR. Degiannakis and Potamia (2017) test the accuracy of the inter-day and intra-day volatility model in estimating 10-and 20-day VaR. Empirics show that the best performer depends on the risk levels.
Finally, more recent works highlight the importance of resorting to MIDAS (Ghysels et al., 2004) in computing VaR. Ghysels and Valkanov (2012) indicate that lagged weighting function in MIDAS can improve the down-converting method by taking the advantage of highly parsimonious parameterization and the flexibility of lag-order selection. 1 Ghysels et al. (2006) are the first to implement MIDAS on high-frequency intra-day returns in estimating daily volatility. Subsequently, Engle et al. (2013) propose the GARCH-MIDAS model combining the GARCH and MIDAS approach and use this model to decompose daily volatility into the long-term and the short-term components. Ghysels (2014) proposes the MIDAS quantile regression model to estimate the quantiles of monthly and quarterly return series using daily information. The estimated quantiles for the upper and lower tail are then used to calculate the conditional skewness of developed and developing countries. In sum, this paper proposes to capture the conditional tails of volatility-filtered error term using the MIDAS quantile regression model, thus distinguish the second-moment risk modeling from the third-moment risk modeling. The aim is to model the different characteristics in the conditional distribution more flexibly.

METHODOLOGY
This section describes the proposed forecasting model for multi-step VaR, followed by the description of the formal and informal tests for model validity.

Volatility-filtered MIDAS quantile regression
The volatility-filtered MIDAS quantile regression model (denoted by F-MIDAS-Q hereafter) is constructed by two steps. First, a GARCH(1,1)-t model in (1) is fitted to model the return time series.
( ) where t σ is the conditional volatility of daily return t r and t ε is the error term following a Student-t distribution with a v degree of freedom, u is the mean of t r . At the first step, the multi-step volatility is estimated with both the iterating method and the square root of time criteria, and the estimate is denoted 2 t σ . The estimate of , , is the volatility-filtered residuals taken from the first-step estimation.
Then, a multi-step MIDAS-Q framework of Ghysels (2014) as defined in (2) is fitted to t e to capture the tail dynamics of the error term, by allocating weights among lagged daily information to predict the error quantiles of a lower frequency, say weekly.   by the multi-step volatility forecast obtained in the first step.

Backtesting methods and competing models
The proposed model is compared with a range of classical and popular VaR forecasting models, namely, (1) the semiparametric MIDAS-Q; (2) the direct method with nonparametric historical simulation (DR-HS); (3) the square-root-of-time criteria method with GARCH(1,1)-t (SR-GARCH-t); (4) the iterating method with GARCH(1,1)-t (IT-GARCH-t) and with the RiskMetrics (IT-EWMA). Each model is used to forecast 5-and 10-step-ahead VaR at 1%, 2.5%, and 5% risk levels. The 1% risk level is a popular choice for daily VaR measurements. However, Kinateder (2016) finds that capital requirement based on VaR at the 1% risk level might overestimate market risk, resulting in redundant capital requirements. Basel Accord III also recommends banks employ the 2.5% or 5% risk level to predict the VaR of a longer holding period.
The out-of-sample performance of all these models is evaluated from both the accuracy and validity perspectives. The accuracy of prediction is measured by comparing the proportion of violations, i.e., the actual return exceeding the estimated VaR to the nominal risk level. Specifically, Equation (4) Kupiec (1995) likelihood-ratio test (POF). The independence test examines the independence of each violation, and this paper employs the Christoffersen (1998) interval forecast method (CC) and the time-between-failures test (TBF). The hybrid test combines Kupiec likelihood ratio and the CC or TBF independence statistic to construct a test which is referred to as the CCI or TBFI test. Therefore, in total there are seven tests to be applied to the forecasts. Moreover, the model validity is reflected in the degree of excess losses based on the loss function (Lopez, 1996;Caporin, 2008; Abad, 2014) defined by Equation (5):

Data
The data are sourced from the Wind database, consisting of five representative stock markets indexes, namely the U.S. S&P500 and NASDAQ index, the Hong Kong Hang Seng Index (HSI), the U.K. FTSE100 index, and the Japanese Nikkei N225 index, and two commodity futures, namely the ICE Brent crude oil contract (OIL) traded in the London Exchange and the COMEX gold futures (GOLD) traded in the New York Exchange, as well as three currency exchange rates, namely CAD/ USD, EUR/USD, and USD/GBP. The period spans from February 1, 1991, to March 30, 2018, with 6,640 daily observations for each asset. Daily logarithmic returns based on closing price are adopted. 2 Consistent with the Basel Accord III where VaR as capital reserve figure is calculated during a 10-day holding period, this paper considers 5and 10-step-ahead VaR prediction. Multiple-step returns are obtained by summing up the daily or single-period returns. Table A1 in Appendix A summarizes the descriptive statistics of 10 asset series of daily, 5-step, and 10-step returns. It can be found that most assets have close-to-zero average daily returns, while, over a longer holding period, some assets gain a small positively average return. It can also be observed that return fluctuations increase with the holding time horizon, and currency exchange rates are the least volatile series while the oil futures return is most volatile. As for higher moments, a majority of asset return series have slightly negative skewness and may become more negatively skewed when the holding period increases. For example, S&P 500 return series are not as volatile as OIL, but it has a greater higher moment risk -S&P 500 has a daily return skewness of -0.403 but 5-day return skewness as low as -1.811 . On the other hand, GOLD and CAD/US has positive return skewness for all three frequencies. Risk-averse investors may prefer for assets with positively skewed assets but request extra risk premia for the higher moment risk for negatively skewed assets (Chen et al., 2001;Patton, 2004). It merits a note that, as the holding period increases, the deviation of the kurtosis from the normal distribution is weakened. This observation suggests that picturing the third-moment dynamics is important in capturing the tail risk in the asset return distribution.
The Jarque-Bera statistics are also computed to test the normality. As the p-values of the Jarque-Bera test fall far below 0.001 for all series, it can be con- 2 For crude oil and gold futures, the settlement price (i.e. the weighted average price of the transaction price in the last hour of a trading day) is used instead of the closing price to calculate returns. This is because their closing prices only reflect the price of the last transaction in a trading day, hence leading to unexpected fluctuations.
cluded that none of the asset returns is normally distributed. All return series are tested stationary since the p-values of the ADF unit root test are far less than 0.001. To test the rationality of the GARCH volatility model, daily returns are tested by ARCH tests. The p-values of ARCH tests turn out to be far less than 0.001 with a lag order of 5, which suggests that the null hypothesis that the squares of returns are not auto-correlated is rejected. This is also confirmed by examining the autocorrelation and partial autocorrelation functions of squared residuals.

Estimation results of F-MIDAS-Q model
The estimation results of the F-MIDAS-Q model using whole sample data are shown in two parts.  (1), the estimates are greater than 1 for most assets, indicating monotonically decreasing weights for lagged information, the larger the value, the faster the weight decays to 0. For example, the error quantiles of GOLD at all three levels tend to depend on the most recent shocks, while the error quantiles of NASDAQ tend to be influenced by distant information sets. Different An examination of the results for the 10-day holding period in Table A4 shows that  is still estimated negative for all assets, however, the signs of the coefficient for lagged information tell a different story compared to that for the 5-day holding period. Now at least half of 10 asset return series demonstrate negative tail dependence at each quantile level. It suggests a negative shock may be followed by a value in the opposite (right) tail, thus resulting in mean reversion, which is more likely to occur for a longer holding period. Again, the sign of , ,1 h F α γ has significant changes among three quantile levels for three assets, NASDAQ, HSI, and EUR/ USD. Similar to the setup of the 5-day holding period, the weighting parameter h F α γ is es-timated significant and negative at 1% significance level across various assets for all three risk levels and both the 5-step and 10-step holding periods, except CAD/USD at 2.5% risk level and N225 at 5% risk level of the 10-step holding period. These results suggest that a left-tail loss brings a certain degree of momentum to a positive return in the next period. The unanimous negative tail dependence in the return series is due to the combined effect of the second and third-moment dynamics, considering the positive tail dependence in the error term along revealed by the F-MIDAS-Q results. Besides, , ,1 h F α γ is estimated greater for a longer holding period, again implying the existence of a mean-reverting pattern. Another difference is that the intercept , ,0 h F α γ is generally estimated smaller in MIDAS-Q than in F-MIDAS-Q, it might be due to the decreasing trading volume when the disagreement among investors is gradually reduced (Chen et al., 2001). This comparison between the results of estimating F-MIDAS-Q and MIDAS-Q shows that the F-MIDAS-Q framework may be more flexible to capture the third moment dynamics other than the second moment, therefore, to reveal more details in the investor behavior.
To conclude, the F-MIDAS-Q model has an additional advantage in modeling the tail risk besides the flexible utilization of lagged information rendered by the weighting function in the MIDAS-Q framework. F-MIDAS-Q models the multi-step quantiles separately from the multi-step volatility, thus shedding a light on the detailed dynamics in the tail distribution, which vary with assets, quantile levels, and holding periods.

Evaluation of forecasting performance
A rolling window approach is adopted to obtain a series of out-of-sample multi-step VaR forecasts. In specific, the models are estimated with 1,500 daily observations and can generate forecasts for the next 5-step or 10-step VaR. Then, the window moves 5-step or 10-step ahead and uses the updated 1,500 daily observations to re-estimate the models and predict for the next 5-step or 10-step VaR. This procedure is repeated until it comes to the last observation of the 6,640. In the end, 1,028 out-of-sample 5-step forecasts and 514 out-ofsample 10-step forecasts for VaR are obtained.
To visually evaluate the out-of-sample forecasting performance, Figure 1 shows the FTSE100 index forecasts from selected models for both holding periods at the 2.5% risk level. As expected, DR-HS produces the least responsive forecasts, and the forecasts by MIDAS-Q tend to be influenced by extremely large positive or negative returns and usually are more conservative than the forecasts from other models during the highly volatile down-side market. As F-MIDAS-Q employs a similar GARCH framework for modeling conditional volatility, its risk forecasts are very close to those from IT-GARCH and IT-EWMA and are less influenced by extreme values than the MIDAS-Q model, but still produces the second-most conservative risk forecasts. Detailed forecasting performance will be illustrated by the formal and informal tests. Table 1 summarizes the testing results of the formal and informal tests. First, the ratio between the VolRate and the nominal risk level, , α ρ defined in Section 2.2, is an informal test for the accuracy of the risk forecasting performance and is calculated for each risk level and each asset. Then, the average α ρ (denoted by Mean of α ρ ) and the standard deviation of α ρ from 1 (denoted by Std. of α ρ ) for each model across the ten assets are calculated and presented in Column (1)-(2) and (3)-(4) in Table 1 for 5-step and 10-step VaR, respectively. Clearly, at the 1% risk level, the IT-GARCH-t and SR-GARCH-t model generate the most favorable risk forecasts, whose average violation rate is closest to the nominal risk level with the smallest deviation, for both 5-step and 10-step VaR. However, the average risk levels are still underestimated by these two models, resulting in 17% to Figure 1. Out-of-sample forecasts of the 5-step and 10-step VaR for the FTSE100 index at the 2.5% risk level 23% more violations than desired. When it comes to the Basel III recommending risk levels at 2.5% and 5%, the MIDAS type models begin to show an advantage in VaR forecasting and both rank the top two most favored models at 5-step VaR forecast, especially F-MIDAS-Q is most favored for 5-step 5% VaR forecast with average violation rates only 1% above the desired level and second smallest deviation. F-MIDAS-Q model maintains the advantage in 10-step VaR forecasting, it ranks second in the average α ρ and turns out to be the best in terms of the deviation of α ρ from 1 for 2.5% α = and 5%.
Columns (5)-(6) and (7)-(8) summarize the counts of rejections by the seven formal likelihood tests defined in Section 2.2, across the ten assets for each model, for 5-step and 10-step VaR forecasts respectively. In particular, the counts are summarized with and without the number of rejections of the TBF type tests. The reason is as follows. Since the sample for backtesting the 5-step and 10-step VaR is greatly reduced in size compared to the sample for 1-step forecasts, the time-between-failure shrinks proportionally, and the TBF and TBFI test are more likely to reject the independence among violations. This sample size effect may be exaggerated for the quantile regression models as their nature is to depict the dependence in the tails. Therefore, it helps to remove the potential influences of sample size by also reporting the number of rejections without the rejections of TBF-related tests. First, the total counts including the TBF type tests are examined. For 1% VaR, IT-GARCH-t and SR-GARCH-t models are still most favored by the likelihood ratio tests, rejected twice and 5 times for 5-day and 10-day holding period respectively. At the 2.5% risk level, F-MIDAS-Q and MIDAS-Q are the top two forecasting models for 5-step VaR, while IT-GARCH-t and IT-EWMA are most preferred by the tests for 10-step VaR. When 5%, α = MIDAS-Q and F-MIDAS-Q rank the top two models, while the IT-EWMA model has the same number of rejections as F-MIDAS-Q. Obviously, F-MIDAS-Q and MIDAS-Q model suffer more influence of the dependence among violations. For example, after removing the results of the TBF type tests, the number of rejections reduces from 9 to 1 for 5% 10-step VaR, while it reduces from 9 to 3 for IT-EWMA. Therefore, it is reasonable to consider the results of the formal test are consistent with those of informal tests, and the quantile regression models generate the best 2.5% and 5% multi-step VaR forecasts.
Columns (9) and (10) summarize the number of times that each model ranks among the top 3 out Note: At each risk level and each holding period, the two top-performing models are selected according to each criterion. of 6 models with the smallest loss function defined in Section 2.2, across ten assets, for each risk level and each holding period. According to this measurement, the maximum value of this indicator is hence 10. For 1% VaR forecast, each of ST-GARCH-t and IT-GARCH-t is among the top 3 models for ten times. For 2.5% and 5% 5-step and 5% 10-step VaR forecast, F-MIDAS-Q and IT-EWMA rank among the top 3 models most frequently. For 2.5% 10-step VaR forecast, two models with the iterating method are the most frequently top-ranked models. To conclude, the validity testing results stay almost the same as the accuracy testing results. Especially, comparing the performance of F-MIDAS-Q with MIDAS-Q, the former performs more consistently in risk forecasting with different holding periods than the latter. A distribution-free F-MIDAS-Q specification can reduce the model misspecification caused by incorrect assumptions on residuals, thus provide equivalently adequate if not better VaR forecasts than the classical models.

CONCLUSION
Does multi-step risk forecasting gain from modelling high moments separately from the second moment? This paper addresses the information loss problem during the data conversion and aggregation in the classical methods for multi-step VaR forecasting. This paper designs a framework to estimate the multi-step quantiles of the return distribution and the multi-step volatility separately. A volatility-filtered MIDAS quantile regression is proposed to capture the quantile dynamics of the error term in a classical GARCH framework. The proposed model is illustrated with real return series of ten assets and generates 5-step and 10-step 1%, 2.5%, and 5% VaR forecasts. The estimation results with real data provide evidence for the distinct tail behavior of the error term.