Improving Forecasts

The taxonomy clarified the sources of predictive failure. The taxonomy also led to and formalised new techniques that robustify forecasts after structural breaks and that augment robust devices with information from economic models.

Robustification led to research on nowcasting and, from a completely different route, impulse indicator saturation. Hendry (2006) develops and systematises robustification methods, which include intercept correction, pooling, leading indicators, and differencing. These four tools and nowcasting serve as foci for discussing David's contributions to improving forecasts.

Intercept Correction: In addition to investigating the many aspects of forecasting discussed in Section 4.2, David and Mike Clements re-examined the ubiquitous forecast tool known as “add factors”. Add factors are now interpretable as a form of intercept correction and hence are a potentially useful method for robustifying forecasts against the effects of structural breaks. This interpretation contrasts with David's earlier harsh views on add factors, as one example illustrates. Peter Hooper was presenting forecast results on the Fed's Multi-country Model at a Fed workshop in 1985, and David was highly critical of Peter's adjustment of the forecasts with add factors. At the time, David remarked: ‘Why adjust forecasts if the model is good?'. David's views on add factors have evolved enormously since then.

Some history helps put that evolution in perspective. Klein (1971) discussed that add factors might improve economic forecasting, but he gave no theory explaining why they might do so. There was no such theory at the time.

Much later, David and Mike Clements realised that some types of add factors might mitigate forecast failure that was caused by location shifts at the start of the forecast period. Clements and Hendry (1996a) showed analytically and in practice how intercept correction could improve forecasts in the face of location shifts.

Intercept correction differences the forecast error that would have occurred otherwise and thereby removes the original forecast error's systematic component. Consequently, intercept correction is a valuable tool in the face of location shifts.

Pooling. Combining or “pooling” forecasts provides another tool for robustify- ing forecasts. Bates and Granger (1969) proposed combining forecasts as a mechanism for improving forecast performance. Chong and Hendry (1986) later showed that pooling is unnecessary under the null of forecast encompassing but could improve forecasting when (e.g.) neither of two forecasts forecast- encompassed the other forecast. Bates and Granger provided the intuition: in that situation, each forecast model has information that the other model does not. Pooling combines the information in the models' forecasts. Bates and Granger did not address the question of whether pooling forecasts was better than utilising the information from both models in a nesting model and generating forecasts from that model. Hendry and Clements (2004) showed that there was not a unique answer. It can pay to pool forecasts in some situations and not in others.

Pooling is often viewed as being benign at worst, serving as insurance against bad forecasts by averaging across a range of forecasts. It does carry an important caveat, though: a single poor forecast can ruin the average. Imagine having a set of good models, along with one poisonous model. Averaging the forecast of the poisonous model with those of the good models can poison the pooled forecast. If the poisonous models are eliminated—through model selection, say—then averaging over the forecasts from just the remaining models may reduce the risk a little; see Hendry and Doornik (2014: 286).

In the literature, model averaging is often over all possible models that arise by either including or excluding the variables from a given set of explanatory variables. Most of those models are “poisonous” because they are distorted by omitted variables, unmodelled nonlinearities, intermittent location shifts, etc.

One has to be careful which forecasts one averages across, and how that averaging is carried out. In their submission to the recent M4 forecast competition, Doornik, Castle and Hendry (2020a) designed pooled forecasts with computer-automated model selection, aiming to embody key features learned from the taxonomy.

Forecasts from different models may also be of value in themselves. Divergence of different models' forecasts can indicate breaks that are occurring and hence can serve as “canaries in a coal mine”. The Bank of England has used a suite of models in this manner, as Hatch (2001) discusses in Hendry and Ericsson (2001). When models are sufficiently different, they need not all be affected in the same way by a major unanticipated shift. Including robust forecasting devices in the suite of models can help, too. Robust devices are not affected systematically once the breakpoint is past, although they will still misforecast as the break hits.

Leading Indicators: Leading indicators are yet another tool aimed at improving forecasts. Emerson and Hendry (1996) found that the variables selected as leading indicators changed all too often, suggesting that they did not lead for very long. Also, picking leading indicators by maximum in-sample correlation was unreliable. Emerson and Hendry concluded that using only leading indicators for economic forecasting was not a fruitful route to pursue.

That said, leading indicators could have some role in forecasting. For instance, a cointegrated system can be written as a set of differenced variables that are explained by lagged cointegrating combinations and lagged differenced variables. That system is interpretable as a system of leading indicators because its endogenous variables depend on past outcomes. Also, higher frequency information may improve forecasting performance, with that information acting as a leading indicator. Moreover, leading indicators may help predict turning points and breaks, as in Birchenhall, Jessen, Osborn and Simpson (1999).

Differencing. Hendry (2006) shows that predictive failure is an inherent issue for econometric models and that differencing is a natural solution for robustifying those models' forecasts. To put differencing in context, Hendry notes that virtually all standard economic models are equilibrium correction models, including dynamic stochastic general equilibrium (DSGE) models, New Keynesian Phillips Curve models, structural vector autoregressions, and so-called error correction models. When the equilibrium mean alters, the model's equilibrium correction term pushes the model's forecasts back towards the old equilibrium—not the new one—inducing the sort of systematic predictive failure that is often seen in practice. Intercept correction—and hence differencing—can robustify the forecast of an equilibrium correction model because it serves as a good proxy for such shifts in the equilibrium. Hendry (2006) formalises this. Castle, Clements and Hendry (2013, 2015) illustrate it empirically with an assessment of robustified US GDP forecasts.

The taxonomy of forecast errors also provides insights on why differencing a model robustifies the model's forecasts. From the taxonomy, few things can go wrong in forecasting a variable if the forecasting model for the second difference of that variable has no parameters and no deterministic terms, thereby eliminating the sources of forecast error in (3) and (a) above. If the data do not accelerate, the second difference of the variable being forecast has a mean of zero, implying that the first difference of the current-dated variable (or the current growth rate) is an unconditionally unbiased forecast for its future value. Because that current growth rate is the current value and not the future one, such a “forecast” device never really forecasts. However, the current growth rate will be close to the future growth rate in the absence of acceleration.

The first difference of the dependent variable has another interpretation as well: it is a single measure that aggregates almost all the information needed in forecasting its future value.

The explanation requires a slight digression. In David's view, economists build congruent, encompassing, cointegrated models to test theories, understand the economy, and conduct policy analysis. These models also need to account for breaks and other non-stationarities. For forecasting, though, these models can be differenced to eliminate deterministic terms such as intercepts and location shifts. Doing so introduces the current growth rate in the model for forecasting the future growth rate, and the current growth rate depends on the cointegrating relationship as a feedback term. This new system thus retains the economics and the policy-relevant causal information that underlie the original model. Also, differencing the model introduces the first difference of the model's other economic variables.

Moreover, because the current growth rate itself is generated by the DGP, it necessarily includes relevant variables for forecasting the future growth rate. By contrast, a model of the current growth rate is a simplification of the DGP and need not include the relevant variables that determine the current growth rate. When forecasting, there is also no need to disentangle the DGP's individual components that enter the current growth rate—unlike when modelling or for policy analysis. The data themselves provide the basis for forecasting. As a practical implication, differencing creates a system that is robust after location shifts because the current growth rate includes all stochastic and deterministic shifts, and also any variables omitted from the forecast model. Moreover, use of the current growth rate to forecast the future growth rate obviates the need to estimate model parameters.

Hendry (2006) derives yet another, related interpretation of the current growth rate, as arises from the standard representation of the vector equilibrium correction model (VEqCM). In the simplest VEqCM, the future growth rate of the dependent variable is forecast by its mean growth rate (the VEqCM's intercept) and the current disequilibrium (the deviation of the cointegration vector from the equilibrium mean).

Both the mean growth rate and the current disequilibrium employ full-sample estimates of the model's parameters. In the differenced VEqCM (or DVEqCM), however, the mean growth rate is estimated by the current growth rate, and the disequilibrium is estimated by the deviation in the cointegrating relation from its previous value. Both terms in the DVEqCM are estimates that use only current-dated observed growth rates, although the cointegrating coefficients themselves need to be estimated with a longer sample.

Forecasts from the VEqCM itself use fixed values of two key VEqCM components—the mean growth rate and the equilibrium mean—shifts in which can cause forecast failure. By contrast, forecasts from the DVEqCM use the current period's observations to estimate those key components and so may be more relevant for forecasting than using the full historical sample.

This approach generates a class of “data-based” forecasting devices that could utilise a single observation (as in the DVEqCM), a subset of observations (as in rolling regressions), or the full sample (as in the VEqCM); see Martinez, Castle and Hendry (2021). The choice of sample highlights a tradeoff between precision in estimation and rapid adaptation. As harbingers to these developments in forecasting, Hendry and Ericsson (1991b) and Campos and Ericsson (1999) formulated such data-based predictors in empirical modelling. Other similar approaches, such as in Phillips (1995), adapt the forecasts to location shifts through automated variable reselection and parameter estimate updating. Eitrhein, Husebo and Nymoen (1999) empirically document implications of the taxonomy by comparing real-world forecasts from Norges Bank's macro-model RIMINI with forecasts from simple robust devices, finding that the latter often won at four quarters ahead but lost out at a longer forecast horizon; see also Bardsen, Eitrheim, Jansen and Nymoen (2005).

Nowcasting. The taxonomy of forecast errors also has implications for nowcasting. David and Mike Clements started thinking about nowcasting in a more structured way when they were consulting for the UK Statistics Commission and evaluating how the UK's Office for National Statistics calculated its flash estimates of the national accounts; see Clements and Hendry (2003). Nowcasting can imply measurement errors of the forecast origin, that is, the combination (1)+(a) from Section 4.2. Sometimes, those errors are systematic and large, as with official economic statistics during the 2008 financial crisis and the more recent COVID-19 pandemic. Improved methods of nowcasting can help reduce real-time forecast problems that arise from mismeasuring the forecast origin.

Large data revisions during the financial crisis and COVID-19 pandemic are not surprising in light of the methods used to produce flash estimates. For example, in the USA and the UK, a flash (or “advance”) estimate of quarterly GDP growth is released about a month after the quarter's end, and that flash estimate is derived in part from many disaggregate components. Observations on some disaggregate components become available too late for inclusion in the flash estimate, so those missing components are “infilled”, based on interpolation models such as Holt-Winters (a form of exponential smoothing).

Such infilling can work reasonably well during times of steady and uniform growth across the economy. However, sudden changes in data behaviour—as occurred during the financial crisis—can make interpolation methods inappropriate. They led to flash estimates of aggregate economic growth that were systematically above the final data in the downturn and systematically below the data in the upturn—often by several percentage points per annum; see Ericsson (2017). In 2008, these mismeasurements made it difficult for policymakers to ascertain the timing and extent of the crisis, as Stekler and Symington (2016) and Ericsson (2016) discuss.

Systematic errors such as these have led to proposed improvements in nowcasting, as documented in Mazzi and Ladiray (2017). The taxonomy delineates what does and what does not cause forecast failure and so has direct implications for nowcasting; see Castle, Hendry and Kitov (2017). When a statistical agency estimates (say) GDP growth from a set of disaggregate components, the agency could check whether previous forecasts of those components are close to their now known outcomes. If they are not, a location shift may be responsible, so any missing disaggregates could be infilled, taking into account information about the recent break. Considerable contemporaneous information is available for nowcasting, including surveys, Google Trends, mobile phone data, prediction markets, and previous historically similar episodes. All could be used to improve the accuracy of forecast-origin estimates. Automatic model selection can help do so, as by building forecasting models of the disaggregated series. An alternative approach is to summarise the information from large numbers of variables by using principal components or factors: see Forni, Hallin, Lippi and Reichlin (2001), Artis, Banerjee and Marcellino (2005), Stock and Watson (2011), and Castle, Clements and Hendry (2013). Regardless, nowcasts that utilise such additional information could be created before the end of the reference period, thereby reducing the delay with which flash estimates appear.

The coronavirus pandemic poses a global challenge—medically, socially, politically, and economically. To better inform decision-making, Jennie Castle, Jurgen Doornik, and David Hendry have been generating short-term (one-week-ahead) forecasts for confirmed cases and deaths from COVID-19; see Castle, Doornik and Hendry (2020a) and Doornik, Castle and Hendry (2020b, 2021). Jennie, Jurgen, and David select their forecast models by Autometrics, incorporating generalisations of impulse indicator saturation. In addition, Castle, Doornik and Hendry (2020b) have been making mediumterm (multi-week) forecasts from models utilising path indicator saturation (PathIS)—a new saturation technique that saturates across paths, similar to the designer breaks in Pretis, Schneider, Smerdon and Hendry (2016). Both the short-term and medium-term forecasts combine key elements of David's contributions outlined in Sections 2, 3 and 4, including model design through machine learning with diagnostic testing and saturation techniques, and forecast design through robustification in light of the forecast taxonomy. Notably, these forecasts perform well relative to some standard epidemiological models.

In retrospect, David's attitude towards economic forecasting—and the profession's attitude as well—has shifted significantly over the last three decades, and for the better. Many top econometricians are now involved in the theory of forecasting, including Frank Diebold, Hashem Pesaran, Peter Phillips, Lucrezia Reichlin, Jim Stock, Timo Terasvirta, Ken Wallis, and Mark Watson. Their technical expertise as well as their practical forecasting experience is invaluable in furthering the field. As the taxonomy illustrated, mathematical treatment can help understand economic forecasts, with key developments summarised in the books by Hendry and Ericsson (2001), Clements and Hendry (2002a, 2011), and Elliott, Granger and Timmermann (2006).

<< | >>

↑

Source: Cord Robert A. (ed.). The Palgrave Companion to Oxford Economics. Palgrave Macmillan,2021. — 819 p. 2021

Improving Forecasts

More on the topic Improving Forecasts: