Empirical Model Discovery and Theory Evaluation

Hendry (1995) laid the framework for empirical model evaluation and design, and Banerjee, Dolado, Galbraith and Hendry (1993) provided the statistical framework for dealing with cointegration.

However, the actual construction of a model by manually simplifying from general to simple was tedious, timeconsuming, and fraught with error, not least because there often were many simplification paths to follow. David’s initial empirical studies—of consumers’ expenditure, money demand, and the mortgage and housing markets—highlighted those challenges and difficulties; see Section 5. A twofold serendipity for David led to remarkable breakthroughs in empirical modelling. First, gen- eral-to-specific modelling could be automated in computer software with machine learning. Second, the number of potential variables being considered could be more than the number of observations. Hendry and Doornik’s (2014) book Empirical Model Discovery and Theory Evaluation provides the theoretical, statistical, computational, and empirical basis that integrates those breakthroughs.

The first serendipity occurred at a Carnegie-Rochester conference in November 1996. David was the discussant of Faust and Whiteman (1997), who critiqued the Hendry approach to modelling, with David’s formal reply published as Hendry (1997). One of the conference participants was Kevin Hoover, who knew David from Oxford when he (Kevin) was writing his DPhil at Nuffield College in the early 1980s. Over drinks, Kevin expressed scepticism about general-to-specific modelling, with David pointing to the success of his and others’ various empirical modelling efforts. After the conference, Kevin and his student Stephen Perez set out to scientifically challenge David’s claim by constructing a computer-based simulacrum of what general-to-specific modellers did in practice, focusing on path search and diagnostic testing.

Much to Kevin’s surprise, the simulacrum worked well—phenomenally well in fact—and well beyond even David’s own hopes and expectations; see Hoover and Perez (1999).

David immediately saw the potential of this computer-automated approach that employed machine learning. David and his colleague Hans-Martin Krolzig built on Kevin and Stephen’s achievement, developing the econometrics package PcGets (“Gets” for “general to specific”). Subsequently, David and Jurgen Doornik embedded and enhanced that modelling approach directly in their econometrics package PcGive as the routine Autometrics; see Section 3 for further details.

The second serendipity arose through Jan Magnus and Mary Morgan's (1999) econometric modelling competition, in which they invited researchers to analyse two datasets, following different modelling approaches. One dataset was of the US demand for food from 1929 to 1989, building on Tobin's (1950) empirical analysis through 1948. Most investigators discarded the data for the inter-war period and for the Second World War as being too difficult to model. For example, a standard demand model fitted over the whole sample delivered positive price elasticities.

David was a late entrant in the competition, serving as discussant to Siegert (1999), who had analysed the data acting “as if” he were David. In David's follow-up, published as Hendry (1999) and to be reprinted in Ericsson (2021), David aimed to replicate Siegert's and others' findings for the postwar subsample while actually using the whole sample. After all, more data should be better than less, if used in the right way. David thus estimated a given model over the whole sample, including indicator variables (one-off dummy variables for individual observations) for all observations up to the beginning of the post-war period. Several of those indicator variables were highly significant. Three were associated with a food programme in the USA during the Great Depression. Unsurprisingly, the food programme affected the demand for food.

The other significant indicator variables were for years during the Second World War.

David then reversed the whole procedure, estimating the model over the whole sample but including indicators for the post-war period. That was equivalent to estimating the model over the first part of the sample. A few post-war indicators were marginally significant, as the corresponding Chow test revealed.

Finally, David estimated the model over the whole sample, including the indicators selected in the two subsample estimations. Of those indicators, only those for the food programme and the Second World War were significant, and they had clear economic explanations. By including just those indicators, the whole sample could be adequately captured by a single model. The large data variability during the inter-war period and the Second World War also greatly reduced the estimated economic parameters' standard errors relative to those in the same model estimated on the post-war period alone.

In the process, David had included an indicator for every observation, albeit in two large blocks. Model selection could handle more potential variables than there are observations—something previously believed to be impossible, both theoretically and empirically. All indicators could be considered. The key was realising: just not all of them at once.

There are precursors to this approach in the literature. For reference, the canonical case for this problem in model selection is impulse indicator saturation (IIS), in which the set of candidate explanatory variables includes a dummy variable for each observation. The solution to this canonical case is implicit in several existing techniques. For instance, as Salkever (1976) shows, the Chow (1960) statistic for testing predictive failure can be calculated by including zero-one indicator variables for all observations in the forecast period and then testing those indicators’ joint significance. Recursive estimation is another example.

Its “forward” version can be calculated by estimating the model, including an indicator variable for every observation in the latter part of the sample, and then sequentially removing the indicators, one indicator at a time. Both forward and backward versions of recursive estimation can be calculated in this fashion. Together, they require indicators for all observations in the sample and thus analyse as many potential variables as there are observations. Andrews’ (1993) unknown breakpoint test and Bai and Perron’s (1998) generalisation thereon are also interpretable as specific algorithmic implementations of saturation techniques.

To understand IIS’s properties, Hendry, Johansen and Santos (2008) considered a stylised version of IIS with a split-half sample, similar to what David undertook empirically in Hendry (1999). Under the null hypothesis that there are no outliers or breaks in the DGP, IIS incurs only a small loss of efficiency. For example, for a sample size of 100, on average one impulse indicator out of the 100 total would be significant at the 1% significance level. Because an impulse indicator merely removes one observation from the sample, the method is 99% efficient under the null hypothesis. IIS is almost costless, despite searching across 100 indicators.

Under the alternative hypothesis, IIS can detect multiple outliers and location shifts (aka structural breaks). Castle, Doornik and Hendry (2012) demonstrate high power for multiple location shifts that are “large enough”. Importantly, IIS can detect breaks that are near or at the ends of the sample. That circumvents an implicit shortcoming of the Andrews and Bai-Perron procedures. Johansen and Nielsen (2009) generalise the theory of IIS to include autoregressive distributed-lag models with or without unit roots and prove that IIS does not affect the rate of convergence of other parameter estimates to their population values.

IIS adds blocks of dummies to estimation and model selection.

IIS can consider many blocks, thereby allowing many different alternatives to be considered. This feature of IIS has remarkable implications. Under the null hypothesis, an indicator for a given observation is significant only if it is discrepant. Its significance does not depend particularly on how or how often the indicators are split into blocks, provided that the blocks are large and that multiple search paths are explored.

The alternative hypothesis of multiple unmodelled breaks or outliers is equally important. For ease of discussion, assume two outliers. Detection of one outlier (the first, say) can be difficult unless the other outlier is accounted for. Failing to include that second outlier in the model induces a larger estimated error variance, making the first outlier appear less significant than it actually is. Hence, there is a need to include sufficient indicators to capture all actual outliers.

Hoover and Perez (1999) showed the advantages of multiple-path contracting searches that are guided by encompassing evaluations. Moreover, the block-search algorithm can be generalised to include candidate variables such as standard economic variables, and not just impulse indicators. Purely contracting searches are not always possible, but the principle of examining many large blocks remains. Blocks help avoid inadvertently eliminating variables that are correlated with already selected variables, and blocks help detect effects that are camouflaged by breaks.

Block searches allow selecting jointly across lag length, functional form, relevant variables, and breaks, even when doing so implies that the number of candidate variables is greater than the number of observations. Such block searches can still be implemented, so long as the number of variables in each block is smaller than the sample size. Block searches can be iterated—and with changing composition—to allow many alternatives to be considered. Under the null, estimates of the parameters of interest are still relatively efficient.

Under the alternative, it is particularly important to consider all of these complications jointly because they are likely to be connected. As with cointegration, proofs of distributional results involve additional mathematics, such as an iterated one-step approximation to the Huber-skip estimator; see Johansen and Nielsen (2013, 2016).

Other procedures tend to address just one or a few issues, rather than all of them at once. Nonparametric statistics can determine functional form but, in so doing, assume constant parameters, accurate measurements, and inclusion of all relevant variables. Robust statistics can tackle contaminated data but assume an otherwise correct specification. Step-wise regression and Lasso may easily detect a single omitted variable but can fail badly under multiple misspecifications. Those techniques lack a mechanism that ensures capturing all relevant outliers and breaks. The block-search approach aims at considering all complications together. As Hendry and Johansen (2015) show, it can do so without distorting the distribution of the parameter estimates of a correct theory-specified model. In yet another moment of serendipity, David and S0ren discovered this result while trying to prove something else.

Hendry and Doornik (2014) thus integrate the computer-automated model selection approach launched by Hoover and Perez (1999) and the IIS technique formulated in Hendry (1999), enhancing and generalising both. Hendry and Doornik (2014) document that automated approaches such as Autometrics avoid the pernicious properties of many earlier approaches, which employed poor algorithms and inappropriate criteria for model selection and evaluation. Whether starting from a large model that nests the DGP or from a model that is the DGP itself, model search a l'Autometrics retains roughly the same relevant variables, and it obtains a controlled average number of irrelevant variables. Hendry (2015) and Castle and Hendry (2019) show at an intuitive level how these tools are accessible for empirical macroeconometric modelling of economic time series, illustrating with equations for wages, prices, unemployment, and money demand in the UK.

<< | >>

↑

Source: Cord Robert A. (ed.). The Palgrave Companion to Oxford Economics. Palgrave Macmillan,2021. — 819 p. 2021

Empirical Model Discovery and Theory Evaluation

More on the topic Empirical Model Discovery and Theory Evaluation: