Graphical presentation of data

Download Report

Transcript Graphical presentation of data

‫שיטות גראפיות פשוטות‬
‫להצגה וניתוח נתונים‬
‫‪1‬‬
Star/Radar/Spider Plot
Figure
1 :A typical radar graph with two plots
2
‫דיאגראמה עכביש‬
‫יופי פנים‬
‫‪10‬‬
‫‪8‬‬
‫‪6‬‬
‫‪4‬‬
‫‪2‬‬
‫‪0‬‬
‫שכל‬
‫‪III‬‬
‫גובה‬
‫‪I‬‬
‫‪II‬‬
‫‪III‬‬
‫גיזרה‬
‫משכורת‬
‫‪1‬‬
‫‪8‬‬
‫‪9‬‬
‫‪3‬‬
‫‪2‬‬
‫‪II‬‬
‫‪8‬‬
‫‪2‬‬
‫‪4‬‬
‫‪9‬‬
‫‪3‬‬
‫‪I‬‬
‫‪5‬‬
‫‪9‬‬
‫‪4‬‬
‫‪3‬‬
‫‪5‬‬
‫יופי פנים‬
‫גובה‬
‫משכורת‬
‫גיזרה‬
‫שכל‬
‫‪3‬‬
Purpose
• The star plot is a method of displaying multivariate
data.
• Each star represents a single observation.
• Typically, star plots are generated in a multi-plot
format with many stars on each page and each star
representing one observation .
• Star plots are used to examine the relative values for
a single data point (e.g., point 3 is large for
variables 2 and 4, small for variables 1, 3, 5, and 6)
4
and to locate similar points or dissimilar points
Sample Plot
• The plot below contains the star plots of 16 cars.
• The variable list for the sample star plot is:
1. Price
2. Mileage (MPG)
3. 1978 Repair Record (1 = Worst, 5 = Best)
4. 1977 Repair Record (1 = Worst, 5 = Best)
5. Headroom
6. Rear Seat Room
7. Trunk Space
8. Weight
9. Length
5
6
We can look at these plots individually or we can use them to
identify clusters of cars with similar features.
• We can look at the star plot of the Cadillac Seville :
it is one of the most expensive cars,
gets below average (but not among the worst) gas mileage,
has an average repair record,
and has average-to-above-average roominess and size.
• We can then compare the Cadillac models (the last three plots) with
the AMC models (the first three plots).
The AMC models tend to be inexpensive,
have below average gas mileage,
and are small in both height and weight and in roominess.
The Cadillac models are expensive,
have poor gas mileage,
and are large in both size and roominess .
7
Questions
The star plot can be used to answer the following
questions:
• What variables are dominant for a given
observation?
• Which observations are most similar, i.e., are there
clusters of observations?
• Are there outliers?
8
Weakness in Technique
• Star plots are helpful for small-to-moderate-sized
multivariate data sets.
• Their primary weakness is that their effectiveness
is limited to data sets with less than a few
hundred points.
• After that, they tend to be overwhelming .
9
‫דיאגראמת עוגה‬
10
name
Avi
Moshe
Doron
Debora
Amnon
Roza
salary
21990
19463
18764
3354
2254
1123
‫דיאגראמת עוגה‬
Doron
28%
salary
Debora
5%
Other
5%
Moshe
29%
Avi
33%
11
Amnon
3%
Roza
2%
Pivot Chart
Drop Page Fields Here
Sum of ‫שכ יח ות‬
25
20
‫ימ ים‬
6
15
5
4
3
10
2
1
5
0
12
4
4
3
3
2
2
1
1
‫ח‬
‫ז‬
‫ו‬
‫ה‬
‫ד‬
‫ג‬
‫ב‬
‫א‬
‫ע ובד‬
‫מכ ונה‬
Histogram
http://www.stat.sc.edu/~west/javahtml/Histogram.html
A Frequency
120
100
80
60
40
20
13
195
185
175
165
155
145
135
125
115
105
95
85
75
65
55
45
35
25
15
5
0
Cumulative Histogram
Cumulative %
120.00%
100.00%
80.00%
60.00%
40.00%
20.00%
14
11
0
12
5
14
0
15
5
17
0
18
5
20
0
95
80
65
50
35
20
5
0.00%
Bihistogram
A Frequency
B Frequency
150
100
50
-100
-150
-200
-250
15
195
185
175
165
155
145
135
125
115
95
85
75
65
55
45
35
25
105
-50
15
5
0
‫דיאגראמה פארטו‬
Week
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
16
Product
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
Number
of
Returns
4
2
12
3
0
3
1
4
4
2
5
1
8
1
10
1
‫דיאגראמה פארטו‬
‫‪17‬‬
‫דיאגראמה פארטו‬
‫‪18‬‬
Box-and-Whisker Plot (1)
18 27 34 52 54 59 61 68 78 82 85 87 91 93 100
• 68 is the median
• 52 is the lower quartile
• 87 is the upper quartile
• 35 is the interquartile range (IQR)
19
Box-and-Whisker Plot (2)
20
Box-and-Whisker Plot (3)
•
•
•
•
•
•
There is a useful variation of the box plot that more specifically identifies outliers. To create this variation:
Calculate the median and the lower and upper quartiles.
Plot a symbol at the median and draw a box between the lower and upper quartiles.
Calculate the interquartile range (the difference between the upper and lower quartile) and call it IQ.
Calculate the following points:
L1 = lower quartile - 1.5*IQ
L2 = lower quartile - 3.0*IQ
U1 = upper quartile + 1.5*IQ
U2 = upper quartile + 3.0*IQ
•
The line from the lower quartile to the minimum is now drawn from the lower quartile to the smallest point that is
greater than L1. Likewise, the line from the upper quartile to the maximum is now drawn to the largest point
smaller than U1.
Points between L1 and L2 or between U1 and U2 are drawn as small circles. Points less than L2 or greater than
U2 are drawn as large circles.
Questions The box plot can provide answers to the following questions:
Is a factor significant?
Does the location differ between subgroups?
Does the variation differ between subgroups?
Are there any outliers?
Importance:
Check the significance of a factor The box plot is an important EDA tool for determining if a factor has a significant
effect on the response with respect to either location or variation. The box plot is also an effective tool for
summarizing large quantities of information.
•
•
•
•
•
•
•
21
Box-and-Whisker Plot (4)
22
Box-and-Whisker Plot (5)
23
‫דיאגראמה פיזור‬
‫‪24‬‬
Scatter Plot: No Relationship
25
Scatter Plot: Strong Linear (positive
correlation) Relationship
26
Scatter Plot: Strong Linear (negative
correlation) Relationship
27
Scatter Plot: Exact Linear (positive
correlation) Relationship
28
Scatter Plot: Quadratic
Relationship
29
Scatter Plot: Sinusoidal
Relationship (damped)
30
Scatter Plot: Variation of Y
Does Not Depend on X
31
Scatter Plot: Variation of Y
Does Depend on X
32
Scatter Plot: Outlier
33
34
‫תרשים רץ (‪)1‬‬
‫‪35‬‬
‫תרשים רץ (‪)2‬‬
‫‪36‬‬
‫תרשים רץ ‪ +‬גבולות בקרה = תרשים בקרה‬
‫‪37‬‬
Lag Plot-(1)
38
Lag Plot (2)
xt
Interpolate
these…
To get the final
prediction
xt-1
New Point
39
Lag Plot: Random Data
40
Lag Plot: Moderate
Autocorrelation
41
Lag Plot: Strong Autocorrelation
and Autoregressive Model
42
Lag Plot: Sinusoidal Models and
Outliers
43