Monday, November 14, 2011

How to Spot Overfitting

FRBSF Economic Letter: Probability of a Recession vs. Actual Recession Dates

You can teach statistics, but unfortunately, you can't teach people not to overfit data. The problem is that it is too tempting to look at some data, keep applying different inputs and functional forms, until you fit the data. In some sense, that's what a good model does, so the objective, maximizing the R2, is encouraged.

But there's no point fooling yourself, it just wastes time, and only the researcher really knows if they overfit the problem, because outsiders don't know how the ultimate functional form was chosen (iterating over a large set of inputs?). Macro forecasting is especially difficult, and anyone familiar with its history would do well to be modest (see here). The above graph from some San Francisco Fed researchers is clearly overfit because the base recession rate is about 16% since 1945, so the average forecast should be around 16%, not jumping from 0 to 100%. A forecast should cluster around the unconditional expectation, not the extremes. Only with hindsight do these kind of forecasts make sense.

No comments:

Post a Comment