Understanding Forecasts
Until the early 1990s, David had viewed forecasting as an activity subsumed by model design. That perspective arose naturally from the taxonomy of information for empirical model evaluation and design in Hendry and Richard (1982), and from the framework for exogeneity in Engle, Hendry and Richard (1983).
While these developments were central to improvements in empirical modelling, they did hamper David's understanding of forecasting as a separate discipline in its own right. Moreover, the ubiquity of predictive failure was discouraging.Policy rekindled David's interest in forecasting and led to major breakthroughs in the understanding of forecasts—particularly through the development of a taxonomy for the sources of forecast error. The catalyst was the 1991 enquiry by the UK Parliament's Treasury and Civil Service Committee into “Official Economic Forecasting”; see the Treasury and Civil Service Committee (1991a, b). As a backdrop to the enquiry, forecasts by HM Treasury missed the 1987 boom in the UK economy and subsequently missed the sharp economic downturn in 1989, with the resulting policy mistakes combining to induce high inflation and high unemployment.
Evidence submitted to the parliamentary Committee included many forecasts from many forecasters and dozens of ex post forecast evaluations that tried to sort out why forecasts had gone wrong. Forecasts from different models frequently conflicted, and the underlying models often suffered forecast failure. As Makridakis and Hibon (2000) and Clements and Hendry (2001) later argued, those realities could not be explained within the standard paradigm that forecasts were the conditional expectations of the variables being forecast. Empirics dominated theory in the enquiry. In fact, there was almost no theory of economic forecasting presented. At the time, most theories of forecasting were from the physical and statistical sciences.
Those theories typically assumed data ergodicity and so were not necessarily relevant to economic forecasting, where intermittent structural breaks are a key data feature.David submitted evidence on economic forecasting to the parliamentary Committee. Preparation of his report—detailed in Hendry (1991) and to be published in Ericsson (2021)—led to a broader understanding of the subject. David subsequently produced a torrent of insightful evaluations of many existing forecast techniques, including error correction models and cointegration, mean square forecast errors, add factors, leading indicators, pooling of forecasts, multi-step estimation for forecasting, and forecast competitions; see Clements and Hendry (1998b, 1999a) in particular. David also developed a theory of forecasting, which included a taxonomy of forecast errors (initially sketched out in Hendry (1991)) and a theory of unpredictability with implications for parsimony, congruence, and aggregation; see Clements and Hendry (2005a, b), Hendry and Mizon (2014), and Hendry and Hubrich (2011). From that theory of forecasting, David was able to develop and refine tools such as intercept correction, robustification, and nowcasting to improve forecasts themselves; see Section 4.3.
David's renewed interest in forecasting resulted in a remarkable and continuing collaboration with his then DPhil student Mike Clements. Motivated by the encouraging developments in Hendry (1991), David and Mike sought to develop analytical foundations for understanding ex ante forecast failure when the economy is subject to structural breaks, and the forecasts are from misspecified and inconsistently estimated models that are based on incorrect economic theories and selected from inaccurate data. Everything was allowed to be wrong, but the investigator did not know that.
Despite the generality of this framework, David and Mike derived many interesting results about economic forecasting, as shown in Clements and Hendry (1993) and Hendry and Clements (1994a, b).
The theory's empirical content matched the historical record, and it suggested how to improve forecasting methods. Estimation per se was not a key issue. The two important features in their framework were allowing for misspecified models and incorporating structural change in the DGP. With that combination, causal variables need not beat non-causal variables at forecasting. In particular, extrapolative methods could win at forecasting, as shown in Clements and Hendry (1999b).The implications are fundamental. Ex ante forecast failure should not be used to reject models. A model well-specified in-sample could forecast poorly—and worse than an extrapolative procedure—so the debate between Box-Jenkins models and econometric models needed reinterpretation.
In this context, Clements and Hendry (1998a) brought to the fore the difference between equilibrium correction and error correction. The first induces cointegration, whereas in the latter the model adjusts to eliminate forecast errors. A cointegrated system—which has equilibrium correction—will forecast systematically badly when its equilibrium mean shifts, with the cointegrated system continuing to converge back to the old equilibrium. By contrast, devices such as random walks and exponentially weighted moving averages embody error correction. While an error correction model will temporarily misforecast when an equilibrium mean shifts, it will then adjust relative to the new equilibrium mean. Mike and David's insight explained why the Treasury's cointegrated system had performed so badly in the mid-1980s, following the sharp reduction in UK credit rationing. It also helped Clements and Hendry (1996a) demonstrate the advantageous property of intercept corrections to offset such shifts. Hendry and Ericsson (2001) and Castle, Clements and Hendry (2019) offer highly intuitive nontechnical introductions to forecasting and their uses, challenges, and benefits. Clements and Hendry (2002a) give a compendium.
David's initial collaborations with Mike Clements, however, examined mean square forecast errors (MSFEs), a standard tool for comparing forecasts from different models.
Clements and Hendry (1993, 1995) questioned their value and generated considerable controversy—the discussants' published comments on Clements and Hendry (1993) are longer than the paper itself. Cointegration was the origin of these two papers.At its inception in the early 1980s, cointegration had demonstrated many real advantages—in modelling, in economic understanding, and in interpretation. Engle and Yoo (1987) then discovered that imposing cointegration significantly improved forecasts in terms of MSFEs. This result seemed to show yet additional value from cointegration—in forecasting. Clements and Hendry (1995) replicated Engle and Yoo's Monte Carlo experiments and found that, to the contrary, imposing cointegration did not appear to reduce MSFEs. This discrepancy in results arose because Engle and Yoo (1987) had calculated MSFEs for the variables' levels whereas Clements and Hendry (1995) had calculated MSFEs for the cointegrating combination. Inadvertently, Clements and Hendry (1995) had discovered that data transformations affected MSFEs. Additionally, rankings across models often depended more on the choice of data transformation, and less on whether or not cointegration was imposed, or even whether the model included the equilibrium correction term.
Clements and Hendry (1993) formalised algebraically these properties of MSFEs. The ranking of different models' forecasts could alter, depending upon whether and how the variables being forecast were transformed. Ericsson (2008) illustrated this problem by comparing forecasts in levels and forecasts in differences for two models of crude oil spot prices. For forecasts of the level of oil prices, the MSFE for the first model was more than four times that for the second model. However, for forecasts of the change of oil prices, the MSFE for the first model was less than half that for the second model. Thus, a simple transformation of the variable being forecast altered the MSFE ranking of the models, with no change to the models, to the forecasts, or to the underlying data.
Furthermore, the oil price example illustrated that, for a given model, the MSFE was not invariant to the transformation from levels to differences. Clements and Hendry (1993) showed that MSFEs lack robustness when the data are transformed, when forecasts are multivariate, and when forecasts are multi-step ahead. All three situations are common in economics.Clements and Hendry (1993) also showed that useful comparison of MSFEs required highly restrictive assumptions about the forecasts—namely, that the forecasts must be of a single specific variable just one step ahead. Data transformations, multivariate forecasts, and multi-step-ahead forecasts are all outside that limited structure because they imply a vector of forecasts. Clements and Hendry (1993) discussed how the predictive likelihood generalises the MSFE for a vector of forecasts. Moreover, predictive likelihood is the only direction-invariant measure, as it does not depend on nonsingular linear scale-preserving transformations of the system. Even so, predictive likelihood has not been used much for forecast evaluation. Wallis (1993) pioneered its use, but its practical implementation was hindered because its calculation seemed to require having sufficient observations on all the multistep-ahead forecast errors in order to estimate their variance-covariance matrix. Results in Abadir, Distaso and Zikes (2014) encouraged David to revisit predictive likelihood in Hendry and Martinez (2017), where they show that one can evaluate multi-step-ahead system forecasts with relatively few forecast errors. Explicit loss functions also have come back into favour, as in Granger (2001) and Barendse and Patton (2019).
Because MSFEs are widely used for comparing forecasts, David and Mike became interested in the forecasting competitions organised by Spyros Makridakis, which at that time was the M3 competition, hosted by the International Journal of Forecasting. Many different time series were divided into subperiods, each of which was then forecast by many methods, albeit usually only one step ahead.
Various evaluation criteria were applied to each forecasting device on each dataset to find which methods had the best ex post forecast performance as measured by the chosen criteria. Those methods with the best forecast performance then “won” the competition. Because parsimonious methods such as damped trend often did well, whereas less parsimonious methods such as econometric models often did poorly, Makridakis and Hibon (2000) concluded that parsimony was key to good forecast performance.David could not understand why parsimony per se should make models do so well at forecasting. After all, the sample mean of a variable’s level is parsimonious, but it is often a dreadful forecast of the variable’s future values. To understand the empirical results in the M3 competition and, more generally, to help interpret the problems that arise in economic forecasting, David and Mike developed a general analytical framework that describes a taxonomy for forecast errors. Initially, David and Mike solved the taxonomy for vector autoregressive models and simple time-series models. More recently, David has considered open dynamic simultaneous systems and nonlinear formulations.
The taxonomy delineates all possible sources of forecast error—nine sources in total. These sources derive from the three components of a model:
(1) Unobserved terms,
(2) Observed stochastic variables, and
(3) Deterministic terms.
The first component is what the model fails to explain, and it thus includes mismeasurement of the data at the forecast origin, omitted variables, and the innovation errors in the DGP The second and third components characterise what is modelled, and they often correspond to the slope parameter and the equilibrium mean. Each of the model’s three components is itself subject to three potential problems:
(a) Estimation uncertainty,
(b) Misspecification, and
(c) Change in the DGP’s parameter values,
leading to a 3?3 array of possibilities and implying nine sources of forecast error.
The taxonomy has immediate implications: the consequences of forecast error depend on the sources of forecast error, and the taxonomy allows deriving the effects of each source for a given forecasting device. For instance, the combination (3)+(c) is an out-of-sample structural break involving deterministic terms, as with a change in the equilibrium mean. For equilibrium correction models, that particular combination results in systematic misforecasting. That problem is fundamental, pernicious, and common in economic forecasting. Such predictive failure due to a location shift is easily detected because it induces forecast bias and increases the MSFE, noting that the MSFE includes the squared shift in the mean. Other sources of forecast error can deteriorate forecast performance as well, but they are often harder to detect and with more benign effects. If forecast errors arise from multiple sources, interactions between sources may also matter.
More generally, the taxonomy reveals which sources of forecast error most affect each forecasting method, thus clarifying why some methods outperform or underperform others, and when. For intermittent location shifts, all methods misforecast at the break. However, after the breakpoint, methods that are not robust to such breaks tend to make systematic forecast errors, whereas robust methods get the forecasts back on track; see Hendry and Doornik (1997).
The taxonomy also shows that rankings of forecasts should not depend particularly on the number of parameters in either the model or the DGP, whereas the rankings do depend on the robustness of the forecasting devices to structural breaks. The design of forecast competitions such as M3 happened to favour robust devices by having many short forecasting subperiods with intermittent location shifts in the data, thus giving the impression that parsimony per se was advantageous in forecasting. Clements and Hendry (2001) showed that many of the key empirical results in the M3 competition were derivable from the taxonomy of forecast errors. Clements and Hendry (1994, 1998b, 1999a, 2006) give comprehensive derivations and analyses of the taxonomy.
One major insight about forecasting came during a seminar in which David was explaining a very early version of the taxonomy. David noticed that the change in the slope coefficient [(2)+(c) above] was multiplied by the deviation of the data at the forecast origin from the data's equilibrium mean. Consequently, if forecasting happened to start when the data were in equilibrium, changes in the slope parameter would not affect the forecast errors. Indeed, if the mean of the data stayed constant and the forecast origin were accurately measured, forecasts would not be systematically biased—even if all the other problems were present. Conversely, out-of-sample location shifts would systematically bias the forecasts, even if the forecast model were the in-sample DGP itself. That realisation in the middle of the seminar astonished David as much as the seminar participants!
Hendry and Mizon (2000a, b) found additional implications of the taxonomy: the best explanatory model need not be the best for forecasting, and the best policy model could conceivably be different from both. Some structural breaks—such as shifts in equilibrium means—are inimical to forecasts from econometric models but not from robust forecasting devices, which themselves may well not explain behaviour. However, such shifts need not affect the relevant policy derivatives. For example, the effect of interest rates on consumers' expenditure could be constant, despite a shift in the target level of savings due to (say) changed government provisions for health care in old age. After the shift, altering the interest rate still could have the expected policy effect, even though the econometric model misforecasted. Because econometric models can be robustified against such forecast failures, it may prove possible to use the same baseline causal econometric model for forecasting and for policy.
This analytical framework represents considerable progress in developing a general theory of forecasting. It does not assume how the model is estimated, how badly misspecified it is, or what changes occur in the economy. Many aspects still need more research, though, including how to forecast breaks, how to best select forecasting models for realistic economic processes, and how to improve forecasts—the next topic.
4.3