Football in the ’90’s

Download Report

Transcript Football in the ’90’s

Football in the ’90’s
Curtis Olswold
University of Iowa
22S Honors Project
Sunday Afternoon Ritual
• A team’s strategy, or tactical approach to a
game is not unique.
• There are three dominant types of offense
that exist in the NFL:
1) Pass oriented
2) Run oriented
3) Balanced (Run and Pass oriented)
Purpose
• Construct a model to predict the points a
team scores.
• Determine the probability that a team wins
given certain factors.
• Investigate whether or not there exists a
significant difference in the points a team
scores by year and week.
Why Am I Doing This?
• Determine what and how certain variables
affect the amount of points a team scores
• How effective the variables are at
determining the outcome of the game:
Win or Lose
• Rules change almost annually in the NFL to
increase the amount of points a team scores,
is it really working?
The Variables
• Score
• Rushing Yards
• Passing Attempts
• Rushing Attempts
• Passing Yards
• Completions
• Outcome
• Interceptions
• Fumbles
The Sample
• A random sample was drawn from the
population of every regular season week
from the 1990 season to the 1999 season.
• Individual team names were not identified
• From each week of each year, a sample of 5
teams were randomly chosen. This gave a
sample of 850 observations.
Regression Model
Score = 4.92593
+0.15441 * (Rushing Attempts)
+0.05858 * (Rushing Yards)
- 0.43734 * (Passing Attempts)
+0.20470 * (Pass Completions)
+0.08080 * (Passing Yards)
- 0.56074 * (Intereceptions)
- 1.09088 * (Fumbles)
Statistics of the Model
• R2 is the proportion of variability in Score that is
explained by the model.
• Adjusted R2 is a measure of how efficient the
predictor variables are: Penalizes for
overcomplicating the model.
• For this model:
R-Square
Adj R-Sq
0.5020
0.4978
• This indicates the model explains over 50% of the
variability in score and is not overly complex
Significance of Predictors
Parameter Standard
Variable
DF
Estimate
Error t Value Pr > |t|
Intercept
1
4.92593
1.66448
2.96
0.0032
ratt
1
0.15441
0.04725
3.27
0.0011
ryds
1
0.05858
0.00785
7.47
<.0001
patt
1
-0.43734
0.06043
-7.24
<.0001
comp
1
0.20470
0.10039
2.04
0.0418
pyds
1
0.08080
0.00534
15.12
<.0001
int
1
-0.56074
0.23748
-2.36
0.0184
fumble
1
-1.09088
0.25782
-4.23
<.0001
Interpretation of Significance
• Every parameter is significantly different
from zero. This means that each of the
variables constructively adds to the
precision of the model.
• The significance level is 0.05, meaning
there is only a 5% chance of wrongly
rejecting the hypothesis that the parameter
is zero, or does not help in prediction.
Interpretation of the Model
• For every rush attempt a team makes, the
model predicts they will score 3/20 of a
point.
• Every yard that a team gains on the ground
suggests 3/50 of a point increase.
• When a rush attempt results in a fumble, the
team’s score will decrease by 1 and 9/100 of
a point.
Interpretation of the Model
• As a team throws the ball more, for each
pass attempt, they will decrease their score
by 11/25 of a point.
• However, for every completion, they will
increase their point total by 1/5 of a point.
• For every yard that is gained from the
completion of a pass, a team’s score
increases by 2/25 of a point.
• If a pass attempt is intercepted, then their
points scored will decrease by just over ½
of a point.
Interpretation of the Model
All of this can be summed up quite simply: A
rushing team is superior to a passing team.This is
magnified if the team is able to gain substantial
yardage per rush. Conversely, if a team passes
many times, but completes a good percentage of
them for good yardage, the effects of the pass
attempt statistic are not as prevalent.
Examples of Prediction
Actual
Score
19
24
13
14
11
13
20
45
20
Predicted
Value
22.4272
26.3893
5.8283
8.6153
14.5838
14.6197
24.8587
31.7385
15.1015
95% Confidence Level
for the Mean
21.1840
23.6705
24.8230
27.9555
4.3773
7.2793
6.3462
10.8845
13.7462
15.4215
13.3944
15.8451
23.7494
25.9680
30.6684
32.8086
13.7705
16.4324
Explanation of the Predictions
• The above are the predictions for 9
observations of the sample.
• Obviously, none are exact, which should not
be expected.
• They are, however, relatively close to the
actual values.
Residual: Error vs. Predicted
Test of First and Second Moment Specification
DF Chi-Square Pr > ChiSq
35
37.52
0.3543
Diagnostic Checking
• Residual, or Prediction Error:
1) Constant Variance:
The plot shows that the variance is slightly shaped like a
megaphone.
The Cook and Weisberg1 formal test indicates that the
null hypothesis of constant variance cannot be rejected.
2) Normality:
The regression coefficients do not rely upon residual
normality assumption to be asymptotically normal2.
Diagnostics Continued
Variance
Inflation
0
2.71820
2.49829
4.37648
5.67120
2.57837
1.18387
1.02425
• The Variance Inflation
is a measure of the
multicollinearity
(linear relationship
among 2 or more
predictors) of the
variables.
• None of these
indicates severe
multicollinearity.
Example of a Run vs. Pass
Oriented Offense
• If a team rushes the ball 35 times in a game
gaining 150 yards with 1 fumble, passes 12 times
for 115 yards and no interceptions, then on
average it will score 23.5 points. If they throw an
interception then the points scored reduces to 22.9.
• Now suppose a team rushes 12 times for 75 yards
without fumbling, passes 35 times, completing 19
for 315 yards with 2 interceptions. They will
average 24.085 points.
Example of a Balanced Offense
• For a team that rushes the ball 25 times for
110 yards with 1 fumble, passes 22 times
and completing 12 for 145 yards and 2
interceptions will score on average 17.57
points per game.
• If they only throw one interception, then the
points scored becomes 18.13.
Statistical Comparison of
Offenses
• Definitions:
1) An offense is run oriented if it’s attempts
are 1.5 times or greater than it’s pass
attempts.
2) An offense is pass oriented if it’s pass attempts
are 1.5 times or more than it’s rush attempts.
3) If a team’s passing and rushing attempts are
anywhere within 1.5 of each other, then it is
balanced.
The ANOVA Procedure
Tukey's Studentized Range (HSD) Test for score
This test controls the Type I experimentwise error rate.
Alpha
0.05
Error Degrees of Freedom 847
Error Mean Square
89.94897
Critical Value of Studentized Range 3.32034
Comparisons significant at the 0.05 level indicated by ***.
Difference
orient
Between
Simultaneous 95%
Comparison
Means
Confidence Limits
Run - Bala
5.2083
2.8845 7.5320 ***
Run - Pass
9.2933
6.7978 11.7888 ***
Bala - Pass
4.0850
2.3737 5.7963 ***
Interpretation of Comparisons
• There is a difference between the orientations of
teams.
• In fact, they are all different from each other!
• Run oriented teams will actually score more points
than both pass oriented and balanced offenses.
• Balanced offenses score more often than pass
oriented teams.
• Why? Possibly due to the fact that more time is
used by running the football than by passing.
Determining the Probability of
Winning the Game
• A win is given a value of 1. If a team ties or
loses, they are given a value of 0.
• The only variables that are in a coach’s
immediate control are whether they run or
pass the ball on offense.
• For this reason, only rushing and passing
attempts will be used as independent
variables.
Distribution of Wins and Losses
Frequencies of Game Outcomes
Cumulative Cumulative
Outcome
Loss
Tie
Win
Frequency
Percent
Frequency
Percent
426
50.12
426
50.12
23
2.71
449
52.82
47.18
850
100.00
401
Method of Analysis
• Logistic Regression will be used to model
the probability that a team wins.
• The form of the model is:
(eβ0 + β1 * rushes +β2 *passes)
(1 + eβ0 + β1 * rushes +β2 *passes)
Estimation of Parameters
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF
Estimate
Error
Chi-Square
Pr >ChiSq
Intercept
1
-2.1140
0.5403
15.3082
<.0001
ratt
1
0.1259
0.0119
112.5931
<.0001
patt
1
-0.0481
0.0107
20.1301
<.0001
Interpretation of the Model
• Both Rushing and Passing Attempts are
significant factors in determining the
probability of winning a game.
• The parameter estimates are in the form of
the natural logarithm.
• Odds Ratios will give more insight into how
the model is affected by rushing and
passing.
Fit of the Model
Hosmer and Lemeshow Goodness-of-Fit Test
Chi-Square
9.2815
DF
8
Pr > ChiSq
0.3191
This shows that there is not evidence for
lack of model fit.
Odds Ratios
Effect
Point
Estimate
ratt
patt
1.134
0.953
95% Wald
Confidence Limits
1.108
0.933
1.161
0.973
Interpretation of the Odds Ratios
• For a one attempt increase in Rushing
Attempts, the odds in favor of winning
are multiplied by 1.134.
• For every one Pass Attempt, the odds in
favor of winning are multiplied by 0.953.
Conversion into Probabilities
• The equation for the probability of winning
a game:
P(Win) = 1 / ( 1 + eβ0 + β1 * rushes +β2 *passes)
• This yields:
P(Win) = 1 / (1 + e –2.114+ .1259 * rushes +-.0481
*passes)
Some Examples
• For a team rushing 25 times and passing 25 times,
the model yields a probability of .458 that they
will win the game, or a 45.8% chance they will
win.
• If a team rushes 35 times and passes only 15
times, their probability of winning is .827, or
nearly an 83% chance of victory.
• Now, say that team rushes only 15 times and
passes 35 times, the probability changes to .129 or
a 13% chance of winning.
What the Model Does Not
Suggest
• Given the model predicts a higher success rate if a
team rushes the ball, it may seem that a team
should never pass. If this is done, the model gives
a 98.5% chance of victory for 50 rushes and no
passes.
• Obviously, if the other team knows you are never
going to pass, you won’t be able to move the ball
10 yards on 3 plays very consistently. This shows
how real world circumstances aren’t always
modeled perfectly.
Another Consideration
• The model also does not take into account
middle of the game strategies. In other
words, the farther you are behind, the more
passes your team will attempt. Why? Less
time is taken off of the game clock by
passing.
Does the Year and Week affect
Points Scored?
•
•
(1)
(2)
(3)
(4)
Year in and year out rules are changed to
increase scoring.
Rule changes include:
2 point conversions allowed
Defensive Line Encroachment Rules
The 5-Yard Bump Rule on Receivers
Etc…….
2 Way ANOVA for Points
Scored, Year and Week
Sum of
Source
DF
Squares
Mean Square
F Value
Model
25
321.405283
12.856211
Error
824
9509.705603
11.540905
C Total
849
9831.110886
1.11
Pr > F
0.3186
R-Square
Coeff Var
Root MSE
tscore Mean
0.032693
38.39034
3.397191
8.849077
Source
DF
Type III SS
Mean Square
F Value
Pr > F
year
9
133.2107197
14.8011911
1.28
0.2423
week
16
188.1945631
11.7621602
1.02
0.4333
Interpretation of the 2 Way
ANOVA
• The model indicates there is not sufficient
evidence to conclude that Year and Week
have no effect on how many points are
scored.
• This means that for any given week in any
given year, the points scored by a team is
not affected, in this model.
Conclusion
All 3 statistical models point towards Rushing
Attempts as being the important statistic in
determining the points a team scores, and whether
or not they win the game.
Ball control is thus the essence to winning a
football game. This is most readily seen by a
team that rushes with consistency. A team is
in better position to win if they can run and
pass only occasionally.
References
(1) Applied Linear Regression, 2nd Edition, pp.135-136
Sanford Weisberg
Publisher: John Wiley and Sons, 1985
(2) Applied Linear Statistical Models, 4th Edition, pp. 54-55
Neter, Kutner, Nachtsteim, Wasserman
Publisher: Irwin (Chicago) 1996
(3) Professor Kate Cowles
University of Iowa
Department of Statistics and Actuarial Science
(4) Data collected from: http://www.mrncaa.com