Spent the first half of the day rafting. There were a couple of rapids but it was rather tame. The food has been rather bland I would say. Plain meat and potatoes and rice. I wouldn´t say the prices have been too cheap either. The afternoon was spent climbing the fortress in Ollantaytambo. That was an interesting Inca ruin. They sure seemed to love to build steps and things into the side of mountains.
Tomorrow morning we get geared up and start hiking the Inca trail that will end at Machu Picchu on Friday. That was the main reason for this trip. So no communication until Friday night at the earliest.
Things have gotten off to an ok start but it has been a long day. i’m not good at sleeping on planes so for the most part i have been awake for two days now. Everything was mostlysmooth for me. We have white water rafting tomorrow but i forgot to bring any shoes other thanmyhikingboots which icant afford to get wet. I’ll just have to bope they allow you to go barefoot. We do have one serious problem in that one girl has lost her passport. you need the passport at various checkpoints along the trail. Not to mention getting home again.
I’ve generally used this blog for two functions. Football related posts and travel related posts. It’s not football season so that must mean it is time for another vacation.
One Saturday I am leaving for Peru where I will be hiking the Inca Trail to Machu Picchu. From Miami I’ll fly overnight to Lima and then on to Cusco on Sunday Morning. We’ll tour through some of the sites of the Urubamba Sacred Valley on our way to the where we’ll stay in Urubamba. You need to acclimate to the altitude so we don’t start the hike for a couple of days. The second day will be white water rafting and more of the Sacred Valley sites. Then we start the 4 day hike that ends at Machu Picchu. We’ll spend a final day in Cusco, the former capital of the Inca empire.
I’ve been climbing stairs in preparation for the last 8 months so I am looking forward to getting to the real thing. There certainly won’t be internet connections on the hike itself but I should be able to give updates from the hotels before and after as time allows.
During the off-season last year I was thinking about some discussions I had had with another person in the computer rankings world during the prior off-season. Basically my belief has been that the Vegas line is unbiased and therefore which team covers the spread was a random event. So it wouldn’t matter if a computer system is 1 point greater than the line or 7 points, each would be equally likely (roughly 50%) to be correct against the spread. People will write to me and say I should or ask me to do the numbers against the spread for subsets, for example when the computer predictions are 3 points or more from the line. My observations with this over the years was that these cut points didn’t really improve the numbers any. So are not as useful as people think they would be.
So I was thinking about ways to test this belief. Pondering that lead me to look at the line. The line is approximately unbiased. If you look at the bias numbers they are historically centered around zero. That in itself goes a long way towards explaining why it is hard to be much better than 50% betting against the line. But then if you look at the absolute error there appears to be a disconnect. On average the bias is close to zero while the average absolute error is 10-11 points. So even though the difference between the line and actual score is close to zero on average, the line is off by 10 points or more more than 50% of the time. That is mind blowing when you think about it. Shouldn’t that imply that there is a lot of opportunities to beat the spread? If the line is off by so much so often then why can’t someone or some system consistently find those holes? Personally, I think it goes back to my original theory, that the difference between the line and the actual score is random and centered close to zero.
So I started to think of any ways that you might at least be able to reduce the amount of this somewhat extreme variability or random error. It seemed clear that the most likely candidate was turnovers, which are generally considered to be random bounces of the ball. Hmm, random error, random turnovers, sounds like there could be a connection. The problem was I don’t collect individual game statistics so I couldn’t investigate this idea. I eventually found a source of data so that I could look at the 2010 season. What I found was that the turnover margin in a game explained roughly 40% of the difference between the line and the actual outcome. I got very excited. The 2011 preseason was about to start so I needed to come up with a system that incorporated the turnovers along with the scores. So what I did was very simple. I just added it as a new variable in the least squares regression model that I have been running for years. For these models the outcome is the score differential. The variables are a matrix of the games. For each game a variable for the home team is equal to 1, and the variable for the road team is equal to -1. All other team variables have a value of zero. To this I added the turnover margin for the game. I knew the results early in season would be meaningless because this system is based only on games of the current year. So it could take some time before it became stable. When it did kick in it really kicked in. As you can see from the NFL prediction tracker results page this new system came in first place in 3 out of the 5 categories over the second half of the season.
I’m digging into the numbers a little bit more here after the season is over. Looking at the actual regression models for this season the turnovers explained a little less than 40% of the error in the line, but it was consistently in the 35-40% range all season long. If you look at the regression models with and without the turnover variable. The R-square of the model with turnovers is 0.60 and the R-square of the model without the turnover variable is 0.33. That is a very large difference for only adding one variable to a model. But of course predicting future games is very different from fitting prior games. So I was never expecting to see the mean error in predictions to drop by 40%. That would mean reducing it from 10 points a game down to about 6. So how well did it improve the original least squares predictions? In straight up game winners it was 4 games better. Against the spread it was 11 games better. For absolute error it was a about a 6.8% improvement, and for mean square error it was an 11% improvement. So all in all I think the results were very good. Now it will be interesting to see if the results are repeatable year to year.
My original thought was to go another step in first trying to predict the turnover margin in a game and then plug that into the model to see if that further improved the predictions. The problem was that I wasn’t able to find a way to reliably predict a turnover margin between two teams. That does appear to be pretty random. So for now the predictions from this model are predictions assuming that the turnover margin will be zero. If you are curious, a turnover was worth an average of 4.53 points this past season. So if a team was favored to win by 3 points and they were +1 in turnovers in the game the averaged a win by 7.5 points. If they were a 3 point favorite and -1 in turnovers in the game then they averaged losing by 1.5 points. So you see why the favorites don’t always win. A 3 point favorite that loses the turnover game loses the game. A touchdown favorite can loose a game by being -2 in turnovers. The average turnover margin was +0.21 in favor of the home team, so I could have possibly tried adding .21*4.53=0.95 points to each home team.
I’d be interested in hearing anyone’s thoughts for or against my theory of the winner against the spread being random or any other ideas the explain even more of the error in the line.
Saturday, January 14, 2012
home p(win) p(cover) road p(win) p(cover) line lineavg
Baltimore 0.68324 0.40416 Houston 0.31676 0.59584 7.5 4.9708
San Francisco 0.45478 0.58543 New Orleans 0.54522 0.41457 -3.5 -1.2070
Green Bay 0.79951 0.55564 N.Y. Giants 0.20049 0.44436 7.5 8.9994
New England 0.90118 0.51568 Denver 0.09882 0.48432 13.5 13.9251
last week finished the regular season off strong with a 3-0 record in the top 3 games.
So the final results for the NFL season were 11-6 for the top game, and 33-18 for the top 3 games.
I’ll continue to post these odds but will stop the record keeping here.
For the season, when p(cover) was > .55 the record was 61.5%, when p>.60 the record was 65.8%,
when p>.65 the record was 70.3%, when p>.70 then record was 74.7%.
Definitely good results. Hopefully next season can duplicate this. But I do remain skeptical. Overall the mean prediction was 58.1% which is unusually high. So I think a large part of it was that it was just a lucky year.
home p(win) p(cover) road p(win) p(cover) line lineavg
N.Y. Giants 0.53170 0.41657 Atlanta 0.46830 0.58343 3.0 0.82229
Denver 0.27348 0.57908 Pittsburgh 0.72652 0.42092 -8.5 -6.38467
New Orleans 0.79020 0.42735 Detroit 0.20980 0.57265 10.5 8.55825
Houston 0.67296 0.52623 Cincinnati 0.32704 0.47377 4.0 4.68853
For those that manipulated your ratings last week to have the 14-1 Green Bay Packers lose at home, I laugh in your general direction. I’m trying to measure computer systems/algorithms not people.
Friday, December 30, 2011
Last week the top 3 were 2-1 but the lone miss was the top game. The top game is 10-6, and the top 3 30-18 for the entire season. The flaw in just taking the top game is injuries. Houston had been coming out as the top game for 4 weeks in a row. This week they at least drop down to #2. But then it is replaced by the Packers who have nothing to play for and will likely sit out some key players.
home p(win) p(cover) road p(win) p(cover) line lineavg
Green Bay 0.74835 0.84077 Detroit 0.25165 0.15923 -3.5 7.1342
Houston 0.71324 0.80163 Tennessee 0.28676 0.19837 -3.0 5.9335
New Orleans 0.86991 0.64128 Carolina 0.13009 0.35872 8.0 11.7889
Oakland 0.49489 0.38231 San Diego 0.50511 0.61769 3.0 -0.1341
Minnesota 0.43551 0.39824 Chicago 0.56449 0.60176 1.0 -1.6993
St. Louis 0.10937 0.40242 San Francisco 0.89063 0.59758 -10.5 -13.1400
Arizona 0.51827 0.40405 Seattle 0.48173 0.59595 3.0 0.4762
New England 0.88977 0.58790 Buffalo 0.11023 0.41210 10.5 12.8252
Atlanta 0.89942 0.58358 Tampa Bay 0.10058 0.41642 11.5 13.7745
Jacksonville 0.70473 0.58012 Indianapolis 0.29527 0.41988 3.5 5.6071
Denver 0.66462 0.55512 Kansas City 0.33538 0.44488 3.0 4.4514
N.Y. Giants 0.57280 0.45760 Dallas 0.42720 0.54240 3.0 1.8984
Cleveland 0.23363 0.47898 Pittsburgh 0.76637 0.52102 -7.0 -7.5473
Cincinnati 0.43904 0.51570 Baltimore 0.56096 0.48430 -2.0 -1.5916
Philadelphia 0.79943 0.51173 Washington 0.20057 0.48827 8.5 8.8084
Miami 0.58301 0.48860 N.Y. Jets 0.41699 0.51140 2.5 2.2000
Thursday, December 22, 2011
Last week was the first losing week for a while, 1-2. That makes the top pick10-5, and the top 3 28-17. The top 3 this week look kind of iffy to me.
home p(win) p(cover) road p(win) p(cover) line lineavg
Indianapolis 0.14238 0.33667 Houston 0.85762 0.66333 -7.0 -11.5531
Kansas City 0.51407 0.41908 Oakland 0.48593 0.58092 2.5 0.3682
Buffalo 0.46267 0.57666 Denver 0.53733 0.42334 -3.0 -0.9794
N.Y. Jets 0.67304 0.56366 N.Y. Giants 0.32696 0.43634 3.0 4.6692
Tennessee 0.73163 0.45831 Jacksonville 0.26837 0.54169 7.5 6.4131
Washington 0.69951 0.45875 Minnesota 0.30049 0.54125 6.5 5.4255
Pittsburgh 0.94527 0.53214 St. Louis 0.05473 0.46786 16.0 16.8491
New England 0.83880 0.53074 Miami 0.16120 0.46926 9.5 10.3032
Dallas 0.58295 0.52631 Philadelphia 0.41705 0.47369 1.5 2.1901
Green Bay 0.88152 0.47753 Chicago 0.11848 0.52247 13.0 12.4087
Seattle 0.38655 0.48012 San Francisco 0.61345 0.51988 -2.5 -3.0227
Carolina 0.77352 0.51927 Tampa Bay 0.22648 0.48073 7.5 8.0163
Baltimore 0.89135 0.51585 Cleveland 0.10865 0.48415 12.5 12.9159
Cincinnati 0.66302 0.51379 Arizona 0.33698 0.48621 4.0 4.3581
New Orleans 0.72538 0.48773 Atlanta 0.27462 0.51227 6.5 6.1826
Detroit 0.58328 0.48962 San Diego 0.41672 0.51038 2.5 2.2247
Monday, December 19, 2011
I’ve got two new papers (as the statistician co-author) that are in press this month. One in the Journal Epidemiology, “Accounting for Bias Due to Selective Attrition. The example of Smoking and Cognitive Decline”. And another in the journal Neuro-Epidemiology. “Characteristics of MR Infarcts Associated with Dementia and Cognitive Function in the Elderly”.
I’ll be heading home for the holidays Tuesday night. So things could get updated slowly.