Patterns in Data
Download
Report
Transcript Patterns in Data
Data Analysis
UNLOCKING THE SECRETS HIDDEN IN YOUR
DATA
Why Do Data Analysis ?
Avoids
incorrect
assumptions
Does the data
makes sense?
Which one is
better?
402000
400000
398000
396000
394000
392000
390000
y = -20x2 + 60x + 400100
R2 = 0.9988
388000
386000
384000
382000
0
5
10
15
20
25
30
35
30
35
402000
400000
398000
396000
time
0
10
20
30
elevation
400000
399000
393000
384000
394000
392000
390000
388000
y = 0.3333x3 - 35x2 + 216.67x + 400000
R2 = 1
386000
384000
382000
0
5
10
15
20
25
Why Do Data Analysis ?
Are your
assumptions
correct?
Did you collect
enough data?
If this is a model
of a following
body which is
better?
Be careful what's
better
mathematically is
not always better
scientifically
430000
y = 0.3333x3 - 35x2 + 216.67x + 400000
R2 = 1
410000
390000
370000
350000
330000
310000
290000
y = -20x2 + 60x + 400100
R2 = 0.9988
270000
250000
0
10
20
30
40
50
60
70
80
90
Ways to Analyze Data
Plotting Data
Ways to visually
understand data
Statistics
Makes is easier to
compare data
Mean, Median,
Mode
Makes it clear if you
have NOISY data
Range, Variance,
Standard
Deviation
30
25
20
Mean Pink
Pink
15
Mean Blue
Blue
10
5
0
0
10
20
30
40
50
60
Ways to Analyze Data
Derivatives (Slopes)
Tell if changes in
parameters affect data
Parameter 2 has a
greater effect than
Parameter 1
Get more information
from data
Great
Derivative
4
3.5
Slope = 0.39
3
2.5
Base Case
Parameter 1
2
Slope = 0.16
Parameter 2
1.5
1
Slope = 0.08
0.5
0
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Plotting Data – Extracting from Netlogo
Two ways
1st Way: Write code to
extract the data you want –
see File Output Example in
the Code Examples
Open file in setup
procedure
Create a write-to-file
procedure
Plotting Data – Extracting from Netlogo
2nd way: Extract data from
Netlogo graphs
Have Netlogo generate graph on
Interface page (example on later
slide)
Create a setup-plot procedure
and a do-plot procedure
Call the setup-plot procedure
in setup procedure
Call do-plot procedure in go
procedure
Plotting Data – Extracting from Netlogo
Run model until sufficient data
obtained
(PC) Right Click on Graph/(Mac)
Select Export
Choose location and File name - select
save
Excel File is created – Next Slide
Contains all the information in the
plot and input parameters used.
Contains excess information about
the plot (color, pen down, mode,
interval…)
LET’S DO IT – Open Rabbits Grass Weeds
Plotting Data – Extracting from Netlogo
This is what
You need
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
Bar Charts – preferred snacks
Pie Charts – music preference
Pets purchased at pet store
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
Line Graphs – cell phone use
http://www.statcan.ca
Scatter Plots
http://en.wikipedia.org/wiki/Scatterplot
Plotting Data – Activity in Excel
LET’S DO IT
Open File Car Data
Insert Chart
Select type of chart
XY Scatter
Select Data Range
Highlight data to be
plotted
Plotting Data – Activity in Excel
Label each data series
Label Graph and Axis
Select where you want graph to
be (on that page -worksheet –
or on another worksheet in
same file)
Statistics
Statistics help you
Summarize data
Describe data
Analyze data
Hard to describe the difference
Between the two data sets
22
Now it is easy to summarize, describe
and analyze the data….
The blue and the pink data have the
AVERAGE value (mean) but the blue
data is “NOISIER” (greater standard
deviation). Therefore…
22
18
18
Noisy
Noisier
14
14
Mean (both)
Noisy
Noisy + 2SD
Noisier
Noisy - 2SD
10
10
Noisier + 2SD
Noisier - 2SD
6
6
2
2
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Statistics – How to Calculate in Excel
+,-,*,/ used for addition,
subtraction, multiplication and
division.
Each cell has a label based on the
column and row.
Use cells to perform calculations
instead of numbers. Example :
=(A4+B4)/C4
Perform calculations on an
entire column - copy and paste
the equation .Warning : this
changes the cell number for each
line.
Fix a specific cell - use the $
symbol, example (A4+B4)/$C$1
Excel has many built in
statistical functions
Makes life easy!
E1
Statistics – Measurements of Central Tendency
Mean (Average), Median, and Mode
Definitions
Mean (Average) – Sum divided by the number of data points
Median – Middle data point when arranged from highest to lowest
Mode – Most frequent value
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
Use data set to calculate Mean (Average) Median, Mode,
Max and Min
Select Cell where you want the value of the function to appear
Select Insert then Function
Select Statistical
Select function wanted (AVERAGE, MEDIAN, or MODE) then hit OK
Select Range of data you want to analyze by clicking on range symbol
and highlighting range. Hit enter or OK
Statistics – Measurements of Data Spread
Range, Variance and Standard Deviation
Definitions
Range = maximum - minimum
Variance = measures noise of the data around the mean value.
Standard Deviation (S) is the square
root of the variance. Most commonly
used measure of spread (same units
as the data). Another reason to use S:
~68% of the data are in the interval
Mean – S to Mean + S
~95% of the data are in the interval
Mean – 2 S to Mean + 2 S
~99% of the data are in the interval
Mean – 3 S to Mean + 3 S
300
250
Number of Rabvits
Rabbit Population
200
150
100
50
0
0
500
EXCEL does it for you!!!
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
1000
1500
Ticks
Rabbits
Mean
Mean - 2 S
Mean + 2 S
2000
Distance
30
25
20
15
10
5
0
0
2
4
6
8
10
12
10
12
10
12
Slope of distance
8
7
6
Velocity
What are Derivatives?
A simple calculation using data
Instantaneous rate of change
= SLOPE
Why use Derivatives?
Get more information from data
More Ways to comparison data
Car moving down a road
Data = the distance traveled
Velocity = the 1st derivative
of distance
Acceleration = 2nd derivative
of distance
= the 1st derivative
of velocity
35
5
4
3
2
1
0
0
2
4
6
8
Slope of velocity
2
Acceleration
Derivatives
40
1
0
0
2
4
6
-1
-2
-3
-4
Time
8
How to Calculate a Derivative
Mathematically:
x = position
t = time
B3 B2 A3 A2
x
t
You
Don’t
Have
To Use
This
Use
this in
Excel
In Excel
x
t
x 2 x1
t 2 t1
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass