Patterns in Data

Download Report

Transcript Patterns in Data

Data Analysis
UNLOCKING THE SECRETS HIDDEN IN YOUR
DATA
Why Do Data Analysis ?
 Avoids
incorrect
assumptions
 Does the data
makes sense?
 Which one is
better?
402000
400000
398000
396000
394000
392000
390000
y = -20x2 + 60x + 400100
R2 = 0.9988
388000
386000
384000
382000
0
5
10
15
20
25
30
35
30
35
402000
400000
398000
396000
time
0
10
20
30
elevation
400000
399000
393000
384000
394000
392000
390000
388000
y = 0.3333x3 - 35x2 + 216.67x + 400000
R2 = 1
386000
384000
382000
0
5
10
15
20
25
Why Do Data Analysis ?
 Are your
assumptions
correct?
 Did you collect
enough data?
 If this is a model
of a following
body which is
better?
 Be careful what's
better
mathematically is
not always better
scientifically
430000
y = 0.3333x3 - 35x2 + 216.67x + 400000
R2 = 1
410000
390000
370000
350000
330000
310000
290000
y = -20x2 + 60x + 400100
R2 = 0.9988
270000
250000
0
10
20
30
40
50
60
70
80
90
Ways to Analyze Data
 Plotting Data

Ways to visually
understand data
 Statistics


Makes is easier to
compare data
 Mean, Median,
Mode
Makes it clear if you
have NOISY data
 Range, Variance,
Standard
Deviation
30
25
20
Mean Pink
Pink
15
Mean Blue
Blue
10
5
0
0
10
20
30
40
50
60
Ways to Analyze Data
 Derivatives (Slopes)
Tell if changes in
parameters affect data
 Parameter 2 has a
greater effect than
Parameter 1
 Get more information
from data

Great
Derivative
4
3.5
Slope = 0.39
3
2.5
Base Case
Parameter 1
2
Slope = 0.16
Parameter 2
1.5
1
Slope = 0.08
0.5
0
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Plotting Data – Extracting from Netlogo
 Two ways

1st Way: Write code to
extract the data you want –
see File Output Example in
the Code Examples

Open file in setup
procedure
Create a write-to-file
procedure

Plotting Data – Extracting from Netlogo
 2nd way: Extract data from
Netlogo graphs

Have Netlogo generate graph on
Interface page (example on later
slide)
 Create a setup-plot procedure
and a do-plot procedure
 Call the setup-plot procedure
in setup procedure
 Call do-plot procedure in go
procedure
Plotting Data – Extracting from Netlogo
Run model until sufficient data
obtained
 (PC) Right Click on Graph/(Mac)
 Select Export
 Choose location and File name - select
save
 Excel File is created – Next Slide
 Contains all the information in the
plot and input parameters used.
 Contains excess information about
the plot (color, pen down, mode,
interval…)

LET’S DO IT – Open Rabbits Grass Weeds
Plotting Data – Extracting from Netlogo
This is what
You need
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
 Bar Charts – preferred snacks
 Pie Charts – music preference
Pets purchased at pet store
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
 Line Graphs – cell phone use
http://www.statcan.ca
 Scatter Plots
http://en.wikipedia.org/wiki/Scatterplot
Plotting Data – Activity in Excel
LET’S DO IT
 Open File Car Data
 Insert Chart
 Select type of chart
 XY Scatter
 Select Data Range
 Highlight data to be
plotted
Plotting Data – Activity in Excel
 Label each data series
 Label Graph and Axis
 Select where you want graph to
be (on that page -worksheet –
or on another worksheet in
same file)
Statistics
 Statistics help you
 Summarize data
 Describe data
 Analyze data
Hard to describe the difference
Between the two data sets
22
Now it is easy to summarize, describe
and analyze the data….
The blue and the pink data have the
AVERAGE value (mean) but the blue
data is “NOISIER” (greater standard
deviation). Therefore…
22
18
18
Noisy
Noisier
14
14
Mean (both)
Noisy
Noisy + 2SD
Noisier
Noisy - 2SD
10
10
Noisier + 2SD
Noisier - 2SD
6
6
2
2
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Statistics – How to Calculate in Excel
 +,-,*,/ used for addition,





subtraction, multiplication and
division.
Each cell has a label based on the
column and row.
Use cells to perform calculations
instead of numbers. Example :
=(A4+B4)/C4
Perform calculations on an
entire column - copy and paste
the equation .Warning : this
changes the cell number for each
line.
Fix a specific cell - use the $
symbol, example (A4+B4)/$C$1
Excel has many built in
statistical functions
Makes life easy!
E1
Statistics – Measurements of Central Tendency
Mean (Average), Median, and Mode
 Definitions



Mean (Average) – Sum divided by the number of data points
Median – Middle data point when arranged from highest to lowest
Mode – Most frequent value
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
 Use data set to calculate Mean (Average) Median, Mode,
Max and Min





Select Cell where you want the value of the function to appear
Select Insert then Function
Select Statistical
Select function wanted (AVERAGE, MEDIAN, or MODE) then hit OK
Select Range of data you want to analyze by clicking on range symbol
and highlighting range. Hit enter or OK
Statistics – Measurements of Data Spread
Range, Variance and Standard Deviation
 Definitions
 Range = maximum - minimum

Variance = measures noise of the data around the mean value.

Standard Deviation (S) is the square
root of the variance. Most commonly
used measure of spread (same units
as the data). Another reason to use S:


~68% of the data are in the interval
Mean – S to Mean + S
~95% of the data are in the interval
Mean – 2 S to Mean + 2 S
~99% of the data are in the interval
Mean – 3 S to Mean + 3 S
300
250
Number of Rabvits

Rabbit Population
200
150
100
50
0
0
500
EXCEL does it for you!!!
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
1000
1500
Ticks
Rabbits
Mean
Mean - 2 S
Mean + 2 S
2000
Distance
30
25
20
15
10
5
0
0
2
4
6
8
10
12
10
12
10
12
Slope of distance
8
7
6
Velocity
 What are Derivatives?
 A simple calculation using data
 Instantaneous rate of change
= SLOPE
 Why use Derivatives?
 Get more information from data
 More Ways to comparison data
 Car moving down a road
 Data = the distance traveled
 Velocity = the 1st derivative
of distance
 Acceleration = 2nd derivative
of distance
= the 1st derivative
of velocity
35
5
4
3
2
1
0
0
2
4
6
8
Slope of velocity
2
Acceleration
Derivatives
40
1
0
0
2
4
6
-1
-2
-3
-4
Time
8
How to Calculate a Derivative
 Mathematically:


x = position
t = time
B3  B2   A3  A2 
x
t
You
Don’t
Have
To Use
This
Use
this in
Excel
 In Excel


x
t
x 2  x1
t 2  t1
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass