Saturday, June 16, 2007

The Value of Home Field Advantage Part I

So let's go back to basics. Really basic stuff.

How much is home field advantage actually worth? Pretty straightforward question, but there are several ways to tackle the question. From 1994-2006, home field advantage was worth about 2.6362 points on average, but the standard deviation of the results was about 14.0780 points. 69.91% of the games fell within one standard deviation of the mean, or in other words, those games were within 2 touchdowns either way of the average result. As a side note, the outcomes fall in line with a normal distribution, which means that linear regression is a good way to try predicting future outcomes. So home field advantage on average matters to a small extent. Home teams win 58.81% of games. Between two teams very close in talent, go for the home team, but if one team is clearly better than the other, you're better off picking the better team. That's pretty much the conventional wisdom, isn't it?


So how does a predictor like the spread do in terms of valuing home field advantage? The average spread from 1998-2006 was -2.5346, very close to the actual average. The standard deviation, however, is only 5.7783. The extreme outcomes (games decided by 20+ points) have a 18.515% chance of occurring. There's very little incentive statistically to predict such large wins, though the outcome is more frequent than is perhaps expected. I ran a few experiments to classify games as big wins or close wins and for which team in the original research, and the more classes I introduced, the worse classification accuracy became. For the 4-class problem, 28-32% accuracy was the best I could do. The prediction systems play the odds and thus have a tighter range of margins than the actual outcomes. Tightening the bounds of the actual range within the training data does not help accuracy of the prediction systems I've implemented.

What's interesting to note is that the value of home field advantage fluctuates a fair deal from year to year, but reached a peak in 2005 and a deep, deep valley in 2006, which caused the accuracy of the spread and my prediction systems to similarly fluctuate, particularly on 2006. The chart below lays out all the specific numbers.














Average Actual ResultAverage Spread*Proportion of games won by home teamProportion of games home team was favorite
OVERALL** 2.6362 -2.534658.51%66.859%
19983.5042-2.447962.917%67.083%
19993.0645-2.391159.677%64.516%
2000***2.8226-2.614957.447%68.511%
20012.0444-2.264155.645%65.323%
20022.2461-2.263757.813%64.844%
20033.5313-2.558661.328%70.313%
20042.5078-2.537156.641%66.406%
20053.6484-2.630958.984%67.969%
20060.84766-2.802753.125%66.797%















Average Actual Margin of VictoryAverage Margin of Victory Predicted by SpreadProportion of games won by favoriteProportion of games in which favorite beat the spread
OVERALL**11.4715.349865.352%48.263%
199811.5045.731370.00%52.917%
199911.3555.556565.726%50.403%
2000***11.7985.823464.682%45.957%
200111.0775.159365.323%49.194%
200211.1054.935562.109%49.219%
200311.9144.979267.13%51.389%
200411.3675.12762.891%44.922%
200511.6885.41672.656%48.047%
200611.4265.521558.594%46.094%


* Spread is negative when home team is favored.
** Spread covers 1998-2006, but the averages for actual outcomes are from 1994-2006.
*** Spreads for week 4 of 2000 could not be found and are not included in the 2000 spread stats.


Curiously, there's a correlation with the predictive performance of Football Outsider's DVOA stats as well. In 2005, two-thirds of games were won by the team with the higher DVOA. In 2006, that number fell to 55.80%. Without more years of data, it's hard to say if this is just some natural aberration. But there was something unusual about 2006. Was it rule changes, a change in how rules are enforced, a change in stadiums or playing fields? Is there anyway to account for the natural variance from year to year? For prediction systems like linear regression, one could alter the bias coefficient to reduce the bias towards home teams, but there's no guarantee that it'll improve accuracy.

In what ways are the spread and other prediction systems being inefficient in dealing with home-field advantage? One obvious place to start is the weather.

No comments: