Picking a Winner in the NHL… Is It Possible?

Download Report

Transcript Picking a Winner in the NHL… Is It Possible?

Picking a Winner
in the NHL…
Is It Possible?
An Investigation By
History
• Since the late 1800s when the National Hockey
League was formed, there have been people to
coach, people to play and people to watch.
• Hockey is and has been a source of enjoyment
for many people around the world.
• To enhance this enjoyment, people have been
placing wagers on outcomes of certain games.
• At the time of the NHL’s commencement, people
wagered gold, land, and owners even wagered
their players.
Gambling Today
• Today the, gambling is more structured and is mostly
organized through the Federal Lottery and Gaming
Commission.
• A report from Statistics Canada showed that Canadian
wagered over 100 million dollars in 2002, on sports
alone. This does not even take into account the money
wagered “under the table”. Statistics Canada also
reported that there was only a return of 14 million
dollars. This means that 86% of the money wagered is
lost!
• It has been the job of many people in Las Vegas and
other “gambling hot spots” to come up with predictions,
and make odds. These people use statistics to make
educated predictions.
Gambling Now…
Synopsis
• Using statistics from the entire 2002-2003
NHL Regular Season, we will use our
knowledge from the Mathematics of Data
Management course to see if we can
predict a Stanley Cup winner!!!
Gambling After
Investigation…
Variables Investigated
Some of the variables that we will be
analyzing are:
• Goals
• Shots
• Face-offs
• Winning Percentage
• Power Play Percentage
• Penalty Kill Percentage
In Summation
Using as many tools possible that we have
learned we will make an educated guess as to
who the Stanley Cup Winner will be, and if
there is a reliable method as to how to pick a
winner of any individual game. Basically what
we are trying to ask is, using statistics and
mathematical theories…
CAN WE
PREDICT A
WINNER?
Procedures Taken
• Collecting the data. This was done by
locating the official score-sheets, face-off
comparisons and super stats.
• The statistics contained in each of the
aforementioned documents were
extracted and organized into charts for
each team, as well as the entire league.
Procedures…
The data contained in the charts was once again extracted and
placed into sub-charts. Some of these sub-charts include the
following:
•
•
•
•
•
•
•
•
Correlation between goals and shots
Correlation between face-off percentage and goals
Correlation between face-off percentage and winning
percentage
Correlation between average shots taken and winning
percentage
Probability of having more shots than opponent
Probability of winning more face-offs than opponent
Correlation between power play percentage and winning
percentage
Correlation between penalty minutes for and goals against
Procedures…
• This data was then used to create scatter plots
to have a better understanding of the data
• Trend lines were used to pinpoint any type of
relationship between the data
• The correlation coefficient was established,
again using technology, and this was used to
determine how well or how poorly the data fit the
newly discovered trend
Excel Functions
Some of the commands used in Microsoft Excel were:
• AVERAGE() – used to calculate the mean of the data
• STDEVP() – used to calculate the standard deviation
of the data
• CORREL() – Used to find the correlation coefficient
of the data (Same as PEARSON())
• The Sort command was used quite often to sort data
by different variables
• Sum and Difference functions were used when
analyzing data
Procedures…
• Certain empirical probabilities were calculated
from the data collected
• Measures of central tendency were calculated
along with measures of spread
• The data collection and organization can be
considered an iterative process. (I.E. Data taken
from score sheet, place on to Excel spreadsheet,
repeat unless no more data points to fill)
Data Collected
Team Data
Team data was organized into chart form using
Microsoft Excel. This data was then used to create the
graphs contained in this section.
Team
Winning
%
Faceoff %
PP %
PK %
GF
Atlanta
38
46
17.2
81.6
226
Florida
29
46
14.2
81.4
176
Minnesota
51
47
14.2
86
198
(27 more
teams)
(6 more
headings)
Shots vs. Goals
The first scatter plot that we made looked at the correlation
between shots and goals, the most obvious comparison
made by all sports statisticians.
Correlation Between Goals and Shots
y = 1.0811Ln(x) - 0.9321
R2 = 0.0257
14
12
Goals
10
8
Goals
6
Log. (Goals)
4
2
0
0
10
20
30
40
50
60
Shots
With this data we found a very weak correlation. The value
of r, the correlation coefficient sits at less than 0.33, it is
0.15.
Given this, we can come to the conclusion that
predicting the outcome of a on the relation of goals vs.
shots would not be very accurate. The vital statistics
about the data in the graph above are contained in the
following chart.
Mean (Shots)
28.37
Standard Deviation (Shots)
6.56
Mean (Goals)
2.65
Standard Deviation (Goals)
1.62
Power Play Percentage vs.
Winning Percentage
Winning Percentage
Many sports commentators get hung up on the idea of a great
power play wins games, but is this true? The following graph will
compare the two variables and it will be discussed whether or
not this prediction
Does an Increased Power Play Percentage Increase the
WInning Percentage?
made by analysts
everywhere is
70
60
correct. Compared to
50
the last set of data
Winning %
40
30
Poly. (Winning %)
examined, this
20
correlation would be
10
0
superior. With a
0
5
10
15
20
25 y = 0.2439x - 6.6472x + 85.136
R = 0.3288
moderate correlation
Power Play Percentage
coefficient of 0.568
we can see that as the power play percentage increases, so
does the winning percentage.
2
2
Apparently the statisticians that boast this fact are correct.
The vital statistics about the data in the graph above are
contained in the following chart.
Mean (PP%)
16.36
Standard Deviation (PP%)
2.86
Mean (Winning Percentage)
43.67
Standard Deviation (Winning
Percentage)
9.55
Penalty Minutes For vs.
Goals Against
Goals Against
We will examine the perceived ill effects of having to kill a penalty
by looking at the relationship between penalty minutes for and
goals scored against for each game. Although the correlation is
not very strong
Does Being In The Box Hurt You In The Net?
(r=0.147), it cans
14
till be seen that as
12
Goals Against
the penalty
Linear (Goals Against)
10
minutes increase,
y = 0.0227x + 2.3304
8
the more goals
2
R = 0.0217
6
there are that are
4
being scored on
2
you team.
0
0
50
100
Penalty Minutes
150
This weak correlation combined with other statistics
could prove to be useful in the future. The vital statistics
about the data in the graph above are contained in the
following chart.
Mean (Penalty Minutes)
14.26
Standard Deviation (Penalty Minutes)
10.50
Mean (Goals Against)
2.65
Standard Deviation (Goals Against)
1.62
Face-off Percentage vs.
Goals Scored
It seems that in every playoff game, the face-offs are key to
winning the game. On a key face-off win, the game-winning
goal is scored, or when it is lost, the momentum shifts to
the other team. In the hunt for the cup, do the face-offs you
win help you score goals? The graph below will examine
this more closely.
Do More Face-off Wins Mean More Goals?
14
y = -0.0006x 2 + 0.0616x + 1.0391
R2 = 0.001
12
Goals
10
8
Goals
6
Poly. (Goals)
4
2
0
0
10
20
30
40
50
Face-off Percentage
60
70
80
Interestingly enough, there is little to no correlation
(r=0.0316), and as it can be seen, after a certain point, the
more face-offs you win, the amount of goals you score
decreases. The vital statistics about the data in the graph
above are contained in the following chart.
Mean ( Face-off Percentage)
50.00
Standard Deviation (Face-off
Percentage)
7.26
Mean (Goals)
2.65
Standard Deviation (Goals)
1.62
Average Face-Off Winning
Percentage vs. Game
Winning Percentage
Winning Percentage
As in the last relationship where face-off wins were compared
with goals scored, on our quest to find the Holy Grail of
statistics, we must examine whether winning face-offs improves
your
Do More Face-off Wins Mean Winning More Games?
overall
70
winning
60
50
percentage.
40
Winning %
30
Poly. (Winning %)
This differs
20
y = -0.2578x + 26.024x - 611.62
from the last
10
R = 0.0292
0
example
44
46
48
50
52
54
56
Face-off Percentage
because it
goes on a
team average basis, where the average from the entire season
was taken, along with the winning percentage obtained by each
team.
2
2
Similarly to the “Face-off Wins vs. Goals” data, this set
follows an odd trend, where after a certain point, the more
face-off you win, the worse your winning percentage
becomes. This odd occurrence follows a weak correlation,
the value for r=0.171. The vital statistics about the data in
the graph above are contained in the following chart.
Mean (Face-Off Percentage)
50.00
Standard Deviation (Face-Off
Percentage)
2.25
Mean (Winning Percentage)
43.67
Standard Deviation (Winning
Percentage)
9.55
Average Shots vs.
Winning Percentage
Winning Percentage
It would be assumed that the more shots you have, the better you
chance at scoring, and therefore the better you chance at
winning, but is that necessarily true? This relationship could very
well show the most promise when it comes to picking a winner
because as any young player learns, “You don’t score 100% of the
shots you don’t take”. Taught early on, it must be important and
must lead to
Do More Shots Lead To More Wins?
winning. The
70
following graph
60
shows the
50
40
Winning %
results.
30
Linear (Winning %)
20
y = 2.0275x - 13.841
R2 = 0.1328
10
0
23
24
25
26
27
28
29
Average Shots
30
31
32
33
By the slight increase in slope, it can be observed that the more
shots on average that you take, leads to a higher winning
percentage at the end of the season. So, apparently taking a lot
of shots is important. The correlation coefficient shows a
moderate correlation of r=0.364.
To win 100% of the games, lets solve how many shots you would
need to take per game.
100 = 2.0275x – 13.841
100 + 13.841 = 2.0275x
113.841/2.0275 = x
56.15 = x
Therefore, in order to win every game, you would need to take
on average 56 shots per game.
The vital statistics about the data in the graph above are
contained in the following chart.
Mean (Average Shots)
28.36
Standard Deviation (Average
Shots)
1.72
Mean (Winning Percentage)
43.67
Standard Deviation (Winning
Percentage)
9.55
Measures of Central
Tendencies and Measures of
Spread
Mean
Goals per game = 2.7 each
Shots per game = 28.4 each
Penalty minutes per game = 14.3 each
Face-off percentage = 50% each
Note: The shots per game takes into account 50% winning and
50% losing, if it were doubled to be 100% winning you would see
that the amount of shots taken would be 58.8 (28.4 * 2), which is
almost identical to that value we found in the “Average Shots vs.
Winning Percentage” section.
Median
Goals per game = 2
Shots per game = 28
Penalty minutes per game = 12
Face-off percentage = 50%
Mode
Goals per game = 2
Shots per game = 28
Penalty minutes per game = 8
Face-off percentage = 50%
Standard Deviation
When calculating the standard deviation of this data it is
important to realize that we are dealing with a population not a
sample, because we are dealing with every single game that
took place during the 2002-2003 NHL season.
Goals per game = +/- 1.6 each
Shots per game = +/- 6.6 each
Penalty minutes per game = +/- 10.5 each
Face-off percentage = +/- 7.3% each
Probabilities
Probability of Having
More Shots Than
Opponent and More
Face-offs (Separate)
Probability of Having
More Shots and More
Face-off Wins
(Combined)
Breakthrough!
This combination of statistics is the biggest breakthrough
when looking how to pick a winner. As it can be seen, the two
NHL 2003 Stanley Cup Finalists sit atop the leader board. We
can ignore Philadelphia because they are a part of the same
conference as New Jersey. So New Jersey and Anaheim are
both in first place in their respective conferences. Lets
evaluate this further!
Analysis
Taking into account everything we have learned, we can now
start to make a prediction as to who the winners of the
Stanley Cup will be this year. We will evaluate the most
important statistics, face-offs and shots.
Face-offs
We learned earlier in our “Average Face-Off Winning
Percentage vs. Game Winning Percentage” section that
winning too many face-offs can be detrimental to your teams
chances. We will examine the two Stanley Cup Finalists.
New Jersey has a 51% chance that they will win more face-offs
than their opponent.
Anaheim has a 56% chance that they will win more face-offs
than their opponent.
When we put these percentages into our equation:
New Jersey 
-0.2578(51)^2 + 26.024(51) – 611.62 = Winning Percentage
45.0662 = Win %
Anaheim 
-0.2578(56)^2 + 26.024(56) – 611.62 = Winning Percentage
37.2632 = Win %
Therefore in the face-off category, the advantage goes to the
New Jersey Devils.
Shots
We learned earlier in our “Average Shots vs. Winning
Percentage” section that the more shots you take, the better
your winning percentage. For this we need to look past the
above chart, and back at our Team Stats Chart.
New Jersey averages 31.7 shots per game
Anaheim averages 27.4 shots per game
When we put these into our equation for shots:
New Jersey 
2.0275(31.7) – 13.841 = Winning Percentage
50.43075 = Win %
Anaheim 
2.0275(27.4) – 13.841 = Winning Percentage
41.7125 = Win %
Therefore in the shot category, the advantage goes to the New
Jersey Devils, once again.
From these two pieces of information we can
conclude confidently that the NEW JERSEY
DEVILS will be the 2003 NHL Stanley Cup
Champions.
Testing The Face-off Win
and Shot Equations
In NHL 2003 by EA Sports
Conclusions
To answer the question that we first posed, “Can you predict a
winner?” The answer is yes. Through our findings of the Shots and
Face-off, we are confident that the New Jersey Devils will win the
Stanley Cup.
However, these findings are not just limited to this year. They can
be used over and over again. Sometimes they will need to be
changed for changes in the rules and whatnot, but the principle will
still be the same. To win the game you need two things:
1) More Shots
2) Less than 51.2% Face-off Wins
What About a Split?
If the equations return a different team for each (I.E. New
Jersey with the advantage for shots, and Anaheim with the
advantage for face-offs) then the team with the advantage for
the shots shall be chosen as the winner because the
correlation coefficient is larger, therefore the data fits the
trend better.
Discussion
When doing an investigation like this, it is important to note
certain things. The first thing we would like to note is that
when taking data, sometimes there can be a bias. In this
case, since we chose all the data that the NHL had offered to
give there is no bias. All of the data is fact since it actually
occurred, there can be no discrepancies.
The second thing we would like to note is, although the
data may look nice and sound nice, and we think we have
predicted the winners, in a game like hockey there are so
many outliers. Some of these include:
Sickness
Temperature of the rink
Puck Elasticity
Meals eaten
Loss of sleep the night before
Muscle pulls/ Injuries
New Equipment/ Equipment problems
The list goes on…
The Last Word…
All the aspects in which the investigation was done,
was done in the way that we learned in class. This was
done to create familiar concepts, theories and
mathematical evaluations so that all who have take
Data Management, and even some who haven’t, could
understand.
THE END