Transcript Slide 1
Baseball Enigma:
In Quest of An Optimal Batting
Palisade User Conference - 2007
Miami, Florida USA
© 2007 Analytical Advantages, LLC, all rights reserved
Time Line – Exciting Journey!
2004
Optimal
Batting
Order
Model
2005
“Can you
Predict
Who wins”
2005
Build
Baseball
Gambling
Model
2006
Beta Test
baseball
gambling
Model
© 2007 Analytical Advantages, llc all rights reserved
2006
Palisade
User
Conference
January
2007
First
Meeting
Boston
Red Sox
June
2007
First
Meeting
New York
Yankees
October
2007
Palisade
User
Conference
Agenda – Kind of?
• Traditions
• Good batting order characteristics
• Nature of game
• Batter vs. Pitcher
• Field of play
• Umpire bias
© 2007 Analytical Advantages, llc all rights reserved
Traditional Thinking
• “…the leadoff man should be a
good base stealer.”
• “…number two should be a
contact hitter who can hit behind
a runner.”
• “…bat your best hitter third.”
© 2007 Analytical Advantages, llc all rights reserved
Characteristics of a Good Batting
Order to Consider
• Runs scored from those on base
• Expected Run Production vs. Actual
•
Runs Scored
Consistency
© 2007 Analytical Advantages, llc all rights reserved
Runs are the Currency of Baseball
It’s simple – the more runs you score
the greater likelihood you’ll win.
100%
90%
Probability of Winning
80%
70%
60%
50%
40%
30%
20%
10%
0%
1
2
3
4
Source: MLB.COM 2003-2005 AL Scores
© 2007 Analytical Advantages, llc all rights reserved
5
6
7
8
Runs/Game
9
10
11
12
13
14
When on Base Do You Score?
Scoring When On Base
41%
Percent Scoring
40%
39%
38%
37%
36%
35%
34%
33%
32%
31%
Source: MLB.COM
Tabulated by: Universal Analytics, llc
© 2007 Analytical Advantages, llc all rights reserved
Scoring Expectations:
Doesn’t Mean Winning
Expected Runs per Game
5.75
Expected Runs/Game
5.50
5.25
5.00
4.75
4.50
4.25
4.00
Source: MLB, Baseball Prospectus, CBSSportline.com 2007 Season
Tabulated by: Analytical Advantages, llc
© 2007 Analytical Advantages, llc all rights reserved
Scoring Effectiveness:
Actual - Expected Runs / Game
(essential to winning)
Performance Variation (O/U_runs/game)
0.75
Expected Runs/Game
0.50
0.25
0.00
-0.25
-0.50
-0.75
Source: MLB, Baseball Prospectus, CBSSportline.com 2007 Season
Tabulated by: Analytical Advantages, llc
© 2007 Analytical Advantages, llc all rights reserved
Winning Tied to Effective Scoring
Winning as a Function of Run Scoring Efficiency
80%
70%
%Win
60%
50%
40%
R2 = 0.51
30%
20%
0.3
0.32
0.34
0.36
0.38
Runs/OB
Source: MLB, Baseball Prospectus, Team Offensive Statistics 1996 - 2007 Seasons
Tabulated by: Analytical Advantages, llc
© 2007 Analytical Advantages, llc all rights reserved
0.4
0.42
0.44
Dichotomous Objectives!
Maximize the Expected Value of Runs
Scored for Each and Every Game
Consistency (small variability)
Not the same for all games (teams)!
© 2007 Analytical Advantages, llc all rights reserved
Fundamentals of The Game
Runs
Pitcher vs. batters
Field of play (layout, surface and
game temperature)
Umpire characteristics
Understanding performance
© 2007 Analytical Advantages, llc all rights reserved
Scoring Profile: Interdependencies
(Building Blocks of Scoring)
Runs the result of multiple factors
Batter
Park
RUNS
Ump
Pitcher
© 2007 Analytical Advantages, llc all rights reserved
Batter vs. Pitcher:
The Fundamental Conflict
Runs per Plate Appearance – Measurement Foundation
0.150
0.145
Runs / Plate Appearance
0.140
0.135
0.130
0.125
0.120
0.115
0.110
0.105
0.100
35%
Source: MLB.COM (2006 Season)
Tabulated by: Analytical Advantage,llc
© 2007 Analytical Advantages, llc all rights reserved
45%
55%
Winning Percentage
Pitchers
Batters
65%
Park Physical Layouts Vary
Ability to Score Dependent Upon Field Configuration
Infield Layout:
Only Consistency
Yankee Stadium
Fenway Park
(New York)
Skydome
Camden Yards
Tropicana Field
(Toronto)
(Baltimore)
(Tampa Bay)
Wrigley Field
Minute Maid Park
(Chicago)
(Houston)
© 2007 Analytical Advantages, llc all rights reserved
Batters Favored on Artificial Surface
Performance Factor (Avg. = 1.00)
Hitting Performance Improved on Artificial Surface
1.30
1.25
1.20
1.15
29%
1.10
1.05
Artificial
10%
7%
Grass
1.00
0.95
0.90
0.85
0.80
Singles
Doubles
Type of Hit
© 2007 Analytical Advantages, llc all rights reserved
Triples
Ballpark Fair Territory:
Yankee Stadium Among Smaller Fields in Baseball
Area by Field
106,000
104,000
Square Feet
102,000
100,000
98,000
96,000
94,000
92,000
90,000
Source: MLB.COM
Tabulated by: Universal Analytics, llc
© 2007 Analytical Advantages, llc all rights reserved
Expected Runs Varies With Park
1.40
1.35
1.30
Run Index
1.25
1.20
1.15
1.10
1.05
1.00
0.95
0.90
0.85
0.80
Source: MLB.COM
Tabulated by: Universal Analytics, llc
© 2007 Analytical Advantages, llc all rights reserved
Venue
Quantifying Umpire Bias
Comparison of Input Distribution and Gamma(47.19,4.28e-2)
2.0
Pitcher Favored
Batter Favored
1.0
0.0
1.0
1.5
Source: Baseball Prospectus 2005-2007
Tabulated by: Universal Analytics, llc
© 2007 Analytical Advantages, llc all rights reserved
1.9
2.4
2.9
3.4
On Base Percentage Key
Characteristic
On Base Percentage - MLB
Onbase Percentage
0.36
0.35
0.34
0.33
0.32
0.31
0.30
Source: CBSSportline.com 6/11/2007
Tabulated by: Analytical Advantages, llc
© 2007 Analytical Advantages, llc all rights reserved
More Wins Requires a
Scoring Consistency - Redistribution
Yankee’s Probability of Winning Vulnerable
Especially While Scoring Less Than 7 Runs
100%
Percent of Games Won 2006
90%
80%
Areas of high opportunity to win more
70%
60%
50%
40%
30%
20%
10%
0%
1
2
3
Source: MLB.COM 2006 AL Scores
© 2007 Analytical Advantages, llc all rights reserved
4
5
6
7
8
Runs Per Game
9
10
11
12
13
14
Therefore:
Effective Batting Order
Key to Winning More!
© 2007 Analytical Advantages, llc all rights reserved
Batting Order:
Consider the Possibilities
9 batters
Over 200 possible pitchers (1 to 5 per game)
14 American League parks (plus 16 National)
80+ umpires
© 2007 Analytical Advantages, llc all rights reserved
Batting Order:
Consider the Possibilities
9 batters
Over 200 possible pitchers (1 to 5 per game)
14 American League parks (plus 16 National)
80+ umpires
© 2007 Analytical Advantages, llc all rights reserved
Hence:
Calculations too numerous
for anyone to enumerate!
© 2007 Analytical Advantages, llc all rights reserved
It’s All About Raising
The Probabilities of Winning
© 2007 Analytical Advantages, llc all rights reserved
© 2007 Analytical Advantages, llc all rights reserved
Create an Objective Foundation
Simultaneously Integrate the
Building Blocks of Scoring
Ump
Bias
Field & Temp
Elements
Pitcher
Characteristics
Batter Profile
© 2007 Analytical Advantages, llc all rights reserved
Method of Approach and Path to Optimality:
Improving the probability of winning
Data driven analytics
Model the process resultant game
Stochastic performance
Genetic programming algorithm
Expectations
© 2007 Analytical Advantages, llc all rights reserved
Genetic Programming
•
•
•
•
•
•
Random vs. Deterministic
Population vs. Single Best Solution
Creating new Solutions through Mutation
Combining Solutions through Crossover
Selecting Solutions via “Survival of the Fittest”
Drawback of Genetic (evolutionary) Algorithms
• Better compared to what you know
• Never know when to quit
© 2007 Analytical Advantages, llc all rights reserved
Genetic Programming
•
•
•
•
•
•
Random vs. Deterministic
Population vs. Single Best Solution
Creating new Solutions through Mutation
Combining Solutions through Crossover
Selecting Solutions via “Survival of the Fittest”
Drawback of Genetic (evolutionary) Algorithms
• Better compared to what you know
• Never know when to quit
© 2007 Analytical Advantages, llc all rights reserved
Lack of Consistency Burdens Yankee Winning
Standard Deviation
Yankees Most Volatile Scoring Team in American League
3.7
3.6
3.5
3.4
3.3
3.2
3.1
3.0
2.9
2.8
2.7
© 2007 Analytical Advantages, llc all rights reserved
Scoring Skewness Robs Yankee Scoring Potential
(Getting Runs When You Need Them)
Scoring Frequency Shows Yankees Higher Output Bias
18%
16%
14%
Percent
12%
10%
2006 AL % Averages
8%
2006 Yankees %
6%
2007 Yankees %
4%
2%
0%
0
1
2
3
4
5
6
7
8
9
10 11
Runs
© 2007 Analytical Advantages, llc all rights reserved
12 13 14 15 16 17 18 19 20
Discrete Density Functions for each:
Player, Pitcher, Field Umpire Combination
Outcome Probabilities By Batter (P/U: Perez/Wegner) at Yankee Stadium
70%
60%
50%
Out
BB
40%
HP
1
2
30%
3
HR
20%
10%
0%
Melky Cabrera
Derek Jeter
Bobby Abreu
© 2007 Analytical Advantages, llc all rights reserved
Jorge Posada
Miguel Cairo
Josh Phelps
Robinson Cano
Hideki Matsui
Alex Rodriguez
An Example:
Angels vs. Yankees May 27th
65,500
© 2007 Analytical Advantages, llc all rights reserved
Raises Expected Value of Runs Scored
Expected Runs from 3.8 to 4.0
Original
Optimal
.16
.08
0
0
4
8
12
Runs/Game
© 2007 Analytical Advantages, llc all rights reserved
16
20
0
4
8
12
Runs/Game
16
20
Optimal Batting Order:
Shifts Distribution to be more Effective (winning)
Original Key to More Wins!
Optimal
.16
.08
0
0
4
8
12
Runs/Game
16
© 2007 Analytical Advantages, llc all rights reserved
20
0
4
8
12
Runs/Game
16
20
Subtle Changes in Batting Order Raises
Expected Runs Scored by 7% (huge)
LA Angels at New York May 27 (lost 4- 3)
Starting Order
•
•
•
•
•
•
•
•
•
Cabrera
Jeter
Matsui
Rodriguez
Giambi
Cano
Abru
Mientkiewicz
Nieves
© 2007 Analytical Advantages, llc all rights reserved
Optimal Order
•
•
•
•
•
•
•
•
•
Cabrera
Giambi
Matsui
Rodriguez
Cano
Jeter
Abru
Mientkiewicz
Nieves
The Further From Optimum:
The Greater the Probability of Loss
Probability of
Losing over 60%
Potential High Risk Games Identified
Runs/Game from Optimal
3.50
Loss
3.00
Wins
2.50
2.00
Vulnerable Games Identified
1.50
1.00
0.50
0.00
Selected 2006 Games With Boston
© 2007 Analytical Advantages, llc all rights reserved
The Further From Optimum:
The Greater the Probability of Loosing
Runs(Optimal - Original) > .6, P(winning) < 40%
(further from optimal solution)
Runs(Optimal - Original) < .6, P(winning) > 60%
(closer to optimal)
© 2007 Analytical Advantages, llc all rights reserved
Profiles of Batter, Pitcher, Field and Umpire Yields
Discrete Density Function for Every at Bat
© 2007 Analytical Advantages, llc all rights reserved
From Density Function to Reaction Matrix
(Finite number situations)
Number of outs
Players on base
At bat Performance
© 2007 Analytical Advantages, llc all rights reserved
Game Summary
© 2007 Analytical Advantages, llc all rights reserved
What Can Be Gained?
Increased expected value of runs for each
and every game,
Greater consistency (reduced sigma),
Skewness favorably shifted,
Confidence doing best with what you’ve
got,
Give Yankee pitchers greater support.
© 2007 Analytical Advantages, llc all rights reserved
What Can Be Accomplished?
Raise the probability of winning each
and every game!,
Increase likelihood of making playoffs,
Significantly enhance the possibility of
getting to and winning the World
Series!
© 2007 Analytical Advantages, llc all rights reserved
What Can You Expect?
Win between 6 and 15 (or more)
games that would have
otherwise been lost!
© 2007 Analytical Advantages, llc all rights reserved
Next Step
Only 119 Days
Until Pitchers and Catchers
Show up for Spring Training!
© 2007 Analytical Advantages, llc all rights reserved