An NFL Prediction Model - kgalarneau

Download Report

Transcript An NFL Prediction Model - kgalarneau

By Kelly Galarneau
AN NFL PREDICTION MODEL
SPORTS AND MATH, WHAT’S NOT TO LOVE?
Don’t Attack Me Until the End.
 My probability model is not perfect, and
never will be.
 We will discuss the assumptions and
limitations of this model.
 History and Concept
 NCAA tournament
 Rating Systems
Jeff Sagarin’s Ratings
 Published in the USA Today since 1985.
 NCAA men’s basketball tournament
selections
 Bowl Championship Series (BCS) selections
 Secret Formula
 Each team’s rating
 Home field advantage (constant for all teams)
Any Given Sunday
 We know that a team’s performance is not
always constant. On some days they play
better, others worse.
 Many things naturally vary according to the
normal distribution, so perhaps we can
assume NFL ratings do the same.
 In an e-mail, Sagarin suggested that I use a
standard deviation of 15 or 16 (perhaps 15.5).
I chose 16.
Normal Distribution
 Here is a graph for the theoretical performance
of the Tennessee Titans, with a mean of 32.7
and standard deviation of 16.
Normal Curves
 Now compare two teams, Titans with average of 32.7 and
the Lions with an average of 7.82. We’ll pretend the Lions
are playing at home and add in Sagarin’s prescribed 3.02.
Monte Carlo Simulation
 Used for simulating probability situations.
 Mostly used for business and finance
applications.
 Allows us to vary a parameter according to
whatever distribution we choose (uniform,
normal, Poisson, exponential, etc.)
 As the parameter is changing, we can observe
the effect on other variables.
 I am using free software from
www.yasai.rutgers.edu. (Rutgers University)
How each trial works
 We let the computer
This graph shows the results
of one trial, with a win going
to the Titans.
pick a random
“performance rating”
from the normal
distribution for each
team.
 We see which one is
greater and tally that
as a win for that
team.
 Then repeat 1000
times.
The graph on the left represents
a game in which the Lions
perform better than average, the
Titans perform worse than
average, but the Titans still get
the win.
Then we repeat…
…1000 times.
The graph on the right
represents a game in
which the Lions pull off
the upset.
Here is a graph of 1000 trials
Histogram
140
120
Frequency
100
80
Lions
60
Titans
40
20
0
Bin -40 -35 -30 -25 -20 -15 -10 -5
0
5
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
Bin
Here is my actual Excel
programming:
A
1
B
C
D
Team Rating S.D.
2
Titans
35.72
3
Lions
7.82
1
2
3
B
C
F
G
Normal
If/then
Output
16 =gennormal(C2,D2) =if(E2>E3,1,0) =simoutput(F2)
16 =gennormal(C3,D3) =if(E3>E2,1,0) =simoutput(F3)
Here is one trial:
A
E
D
Team Rating S.D.
Titans
35.72 16
Lions
7.82 16
This matchup gives us: Titans 88.8%
(1000 trials)
Lions 11.2%
E
F
G
Normal
If/then
Output
25.98819
-3.61047
1
0
1
0
Do we need a simulation?
 I tried to approach the probability calculation
from a theoretical perspective.
 My thought was to assign each team a
random variable X and Y and vary them
according to a normal distribution.
So: X ~ N(μ₁, σ) and Y ~ N(μ₂, σ).
 A statistics textbook led me to the idea of
multiplying them to get a 3-D probability
distribution [X, Y, and P(X,Y)].
A 3-D Probability Distribution
Here, X ~ N(35.72,16) and Y ~ N(10.84,16)
Lions
11.2%
Titans
88.8%
Where is
X > Y?
Y
X
Integrals
 To find the area under this surface, we need to evaluate the
double integral.
 Mathematica will not evaluate the exact integral, but it will
give us a decimal approximation.
Comparing the results
 Mathematica Input :
NIntegrate[NIntegrate[(1/((16^2)*2*Pi))*(E^((1/2)*((x-35.72)/16)^2))*(E^((-1/2)*((y7.82)/16)^2)),{y,x,300}],{x,-300,300}]
 Mathematica gives us:
Titans 89.1%
Lions 10.9%
 The Monte Carlo with 1000 trials gave us:
Titans 88.8%
Lions 11.2%
The Bracket
 6 teams from each conference (AFC and NFC)
make the playoffs. The division winners are
seeded 1-4 by record, the wild card teams are
seeded 5 and 6 by record.
 In each game the higher seed gets home field
advantage.
 In the “Wild Card” round of the playoffs:
 The 3 seed plays the 6 seed
 The 4 seed plays the 5 seed.
 The 1 and 2 seeds get a first round bye.
 In the “Divisional” round the 1 seed gets to play
the lower seed of the two advancing teams.
So that means…
4
4 or 5
5
1
3 seed wins
2
3
4
3
6
5
1
4
2
6
3
5
1
6
2
6 seed wins
3
4 or 5
6
Possibilities…
 In a playoff bracket with 12 teams, there are
11 games to be played, with a total of
2^11=2048 possibilities.
 In my Excel spreadsheet I have made use of
the complement rule and referencing so that I
only need to calculate 66 matchups.
Assumptions and Limitations
 We are using offensive and defensive statistics
from the regular season, we are assuming that
teams continue similar play into the post-season.
 We are assuming that overall performance is
normally distributed.
 We are assuming that home field advantage is
the same constant for all teams.
 We are not taking anything into account for
weather, injuries, matchups that might be
significant, or any other variables.
2007 trial run (AFC)
Seed Team Rating
Div. Game
Superbowl
Superbowl
Win
Champ.
1
pats
36.42
100.0%
73.9%
48.3%
32.4%
2
colts
30.21
100.0%
58.0%
25.1%
13.7%
3
chargers 29.98
65.7%
28.6%
11.9%
6.5%
4
steelers 23.55
45.5%
12.8%
4.3%
1.9%
5
jags
27.9
54.5%
19.2%
8.0%
4.0%
6
titans
21.79
34.3%
7.4%
2.5%
1.0%
4
2
1
59.6%
2007 trial run (NFC)
Superbowl
Superbowl
Win
Seed
Team
Rating
Div. Game
Champ.
1
cowboys
27.95
100.0%
62.6%
34.9%
14.5%
2
packers
29.41
100.0%
65.9%
35.6%
15.6%
3
seahawks
21.99
57.0%
18.4%
7.3%
2.3%
4
bucs
19.83
39.9%
12.0%
4.2%
1.2%
5
giants
28.31
60.1%
26.4%
12.7%
5.1%
6
redskins
22.21
43.0%
14.7%
5.2%
1.7%
4
2
1
40.4%
2008 Playoff Prediction
 My OFFICIAL prediction won’t be available
until after week 17 (end of regular season is
Sunday, Dec. 28)
 Over Christmas break you can access it at:
kgalarneau.wikispaces.com/NFL+Prediction
 Feel free to e-mail me with your comments or
suggestions:
[email protected]
2008 Prediction (As of Dec. 15)
NFC
AFC
Seed Team Rating Div. Game Champ. Superbowl Superbowl Win
1 Giants 28.14
100.0%
63.6%
37.7%
19.0%
2 Panthers 27.05
100.0%
57.5%
28.1%
13.5%
3 Vikings 25.66
55.5%
25.4%
11.8%
5.3%
4 Cardinals 20.66
48.3%
16.3%
5.8%
2.2%
5 Cowboys 23.66
51.7%
20.0%
8.4%
3.6%
6
Bucs 25.31
44.5%
17.2%
8.2%
3.7%
sum
4
2
1
47.4%
1
2
3
4
5
6
Titans
Steelers
Jets
Broncos
Colts
Ravens
sum
standard dev. = 16
29.95
29.1
20.4
17.61
26.39
27.21
100.0%
100.0%
43.6%
39.1%
60.9%
56.4%
4
64.6%
65.1%
14.7%
9.7%
24.3%
21.5%
2
home field advantage = 2.58
39.6%
31.8%
5.1%
2.8%
10.8%
9.9%
1
22.0%
16.9%
2.0%
1.0%
5.4%
5.3%
52.6%
 Questions? Comments?
But Wait…
…There’s More!
History of Seeds
 The NFL has been using the 12 team playoff
system since 1990.
Seed
Division
1
2
3
4
5
6
sum
Champ.
25
24
12
11
72
Superbowl Superbowl Win
28
18
8
27
10
5
7
2
1
5
4
2
4
1
1
1
1
1
72
36
18
By Percentage:
Seed
Divisional Champ Superbowl Superbowl Win
1
38.9%
50.0%
44.4%
2
37.5%
27.8%
27.8%
3
69.4%
9.7%
5.6%
5.6%
4
66.7%
6.9%
11.1%
11.1%
5
33.3%
5.6%
2.8%
5.6%
6
30.6%
1.4%
2.8%
5.6%
sum
2
1
1
1
Superbowl Winners by Seed
since 1990
6
Seed
5
4
3
2
1
1990 1992 1994 1996 1998 2000 2002 2004 2006
Year
First 10 years – 7 #1 seeds Last 8 years – Only 1 #1 seed
9/10 are 1 or 2 seeds
4/8 are 1 or 2 seeds
The End.
Special thanks to:
Harry Geiser
Steve Havlichek