Transcript Slides

Football for KMS: NFL ‘01
APRIL 30TH 2008
Abhijit Kumar
Kaijia Bao
Vishal Rupani
Course Instructor: Prof. Hsinchun Chen
Agenda
ABHI
VISHAL
KAI
Data Collection
Client Relations
Final Presentation
Data Cleaning
Statistical Analysis
Final Paper
Data Import
Data Transformation
Data Mining
Objectives
Literature Overview
Conclusion
Knowledge Discovery
Statistical Analysis
Data Mining Techniques
Key Findings
KMS Demonstration
Research Objectives
 Pattern identification
 Descriptive Statistics
 Data Mining Techniques
 Prediction
 Developing a strategy
 Fantasy League
Literature Overview
 Moneyball:The Art of Winning an Unfair Game
Michael Lewis
 Las Vegas Odds
www.VegasInsider.com
 NFL Fantasy League
www.Nfl.com/fantasy
Knowledge Discovery Process
TRANSFORMATION
DATA
Pro-Football
-3 Tables
-40 Columns
-82,346 Rows
Lisa Ordonez
-1 Table
-90 Columns
-50,417 Rows
Dependent
Variables
Play Decision, Intended Player, Play Direction, Yards
Calculated
Variables
GameNum, IsPlayChal, PlayZone, TotalOffTO,
PlayDecision, QtrTimeLeft, HalfTimeLeft,
GameTimeLeft
Independent
Variables
SQL 2005 AS
SQL 2005 IS
Defense, Down, GAP, Halftime Left, Off Ydl, Offense,
Play Zone, QTR, ToGo, Total Off TO
Knowledge Discovery Process
MINING
PROCESSING
TRANSFORMATION
DATA
Pro-Football
-3 Tables
-40 Columns
-82,346 Rows
Lisa Ordonez
-1 Table
-90 Columns
-53,000 Rows
Dependent
Variables
Calculated
Variables
Accuracy
-Lift Charts
-Classification
Matrix
SQL 2005 AS
Independent
Variables
SQL 2005 AS
SQL 2005 IS
Simple
Statistics
-Play Decision
-Intended
Player
-Play
Direction
-Yards
Models
- ID3
- Neural
Networks
MS Excel 2007
Dependency Network
Dependency Network
Intended Player: Statistics
Top 3 Intended Players for Passes for the 4
teams that played in the semi-finals
H.Ward (142), P.Burress (121), B.Shaw (44)
T.Brown (143), D.Patten (93), M.Edwards (39)
T.Holt (133), M.Faulk (104), I.Bruce (103)
J.Thrash (107), D.Staley (89), T.Pinkston (83)
Play Direction: Statistics
 Direction of Rushes for all plays in 2001
season
Left End
Left Tackle Left Guard
Middle
Middle Right Guard
Right Tackle Right End
Play Direction: Statistics
 Direction of Rushes for all plays in 2001
season
Number of Rushes
600
500
400
300
200
100
0
Direction
Yardage: Statistics
 Yardage during each down for Pass and Rush
Rushes
Average Yards Covered
Passes
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1
2
3
4
5
6
7
8
9
10 > 10
Down 1
Down 2
Down 3
1
Yards To Go
2
3
4
5
6
7
8
9
10 > 10
Play Decision: Statistics
 Play Decisions for the 4 teams that played in
the semi-finals
Play Decision Type
New England
Philadelphia
Pittsburgh
St. Louis
Kneel
Field goal
1pt extra
0
10
20
30
40
Number of Decisions
50
60
Play Decision: Analysis Overview
 Discovery of what environmental and/or
game factors affect play decision
 Discovery of football expert knowledge
through data mining
 Prediction of play decisions based on game
factors
Play Decision: ID3 Analysis
Play Decision: ID3 Analysis
Play Decision: Accuracy
Rush Accuracy: Lift Chart
Field Goal Accuracy: Lift Chart
Play Decision: Classification Matrix
Play Decision: Key Findings
 Football strategy can be discovered through
data, instead of knowledge experts
 Top 3 factors affecting decision:
 Down, Off Ydl, Time
 Accuracy of the models are different
depending on the decision we are trying to
predict
 Team specific strategies may be discovered
with more data.
Play Direction: Analysis Overview
 Discover team’s strengths and weakness in
their defense and/or offense
 Prediction of play directions based on game
factors
Left End
Left Tackle Left Guard
Middle
Middle Right Guard
Right Tackle
Right End
Play Direction: Accuracy
Play Direction: Key Findings (ID3)
Intended Player: Analysis Overview
 Discover each team’s favored recipient of a
pass
 Prediction of intended player based on game
factors
Intended Player: Lift Chart
Intended Player: Key Findings
 There are 400+ intended players
 Not enough data to accurately predict
intended players
 Not enough data to gain knowledge over
statistical models
Conclusions
INTENDED PLAYERS
PLAY DIRECTION - Insufficient data
- Less accurate
- No knowledge gained
- Enough data to
- Need to increase
PLAY DECISION gain knowledge
sample size
- Accurate
- Gained
Knowledge
Future Direction
 Increase sample set
 More instances of different scenarios
 Incorporate additional information
 Pro-football-Reference.com
 VegasInsider.com (Odds for favorites)
 Extend Analysis
 Nested case (Historical performance)
References
 Prof. Lisa Ordóñez
 Professor in Statistics
 Steve Aldrich
 Author of Moneyball in Football
 About Football
 Glossary of terms
Knowledge Discovery Process
MINING
PROCESSING
TRANSFORMATION
DATA
Pro-Football
-3 Tables
-40 Columns
-82,346 Rows
Lisa Ordonez
-1 Table
-90 Columns
-53,000 Rows
Dependent
Variables
Calculated
Variables
Accuracy
-Lift Charts
-Classification
Matrix
SQL 2005 AS
Independent
Variables
SQL 2005 AS
SQL 2005 IS
Simple
Statistics
-Play Decision
-Intended
Player
-Play
Direction
-Yards
Models
- ID3
- Neural
Networks
MS Excel 2007
Research Objectives
Accuracy: Lift Chart Charts
Literature Overview
Analysis: Play Decision
Knowledge Discovery
Analysis: Play Direction
Statistics: Intended Player
Analysis: Intended Player
Statistics: Play Direction
Conclusions
Statistics: Yardage
Future Directions
Statistics: Play Decision
System Design
Backup Slide Section
Data Collection
55,000 rows
90 columns
• Football
Outsiders
• Pro-Football
Initial Dataset
Processing
• Cleaning
• Hierarchy
• Relevance
47,033 rows
30 columns
• Dependent
• Independent
• Calculated
Analysis
Dependent – 4
Independent – 10
Calculated - 9
System Design
NFL KMS
FOOTBALL DATA
Model Building
NFL Season
2001
DB
Testing/ Accuracy
Pattern Analysis
DEFENSE STRATEGY
METRICS
Accuracy
Performance
FIELD STRATEGY
Formations
Substitutions
Play Decisions
Yards Analysis
 Yards gained on the play is used as a metric to
measure effort
 Discover how environmental and/or game
factors affect player’s efforts
 Key Findings: Top 4 environmental factors
 Off Ydl
 Time
 Down
 Gap