Patterns in Data - Supercomputing Challenge

Download Report

Transcript Patterns in Data - Supercomputing Challenge

Data Analysis
UNLOCKING THE SECRETS HIDDEN IN YOUR
DATA
Why Do Data Analysis ?


Are your assumptions
correct?
Did you collect
enough data?
Rabbit Population
300
250
200
Number of Rabbits
What did you see?
 Makes your data
visible
 Helps find obvious
patterns
 Does the data makes
sense?
150
100
50
0
0
50
100
150
200
Time
250
300
350
Why Do Data Analysis ?
What does it Mean?
 Is there is more information in the
data
Rabbit Population
300
emergent behavior
 unexpected patterns

Why Does it Matter?
 Draw conclusions from data

More grass gives more rabbits
 To make your project stand out to
the Challenge Judges


Appears to be an improvement
vs
Improves the result by a
specific amount
200
Number of Rabbits
 Was the hypothesis correct ?
250
150
100
50
0
0
50
100
150
200
Time
250
300
350
Ways to Analyze Data
 Plotting Data

Ways to visually
understand data
 Statistics


Makes is easier to
compare data
 Mean, Median,
Mode
Makes it clear if you
have NOISY data
 Range, Variance,
Standard
Deviation
30
25
20
Mean Pink
Pink
15
Mean Blue
Blue
10
5
0
0
10
20
30
40
50
60
Ways to Analyze Data
 Derivatives (Slopes)
Tell if changes in
parameters affect data
 Parameter 2 has a
greater effect than
Parameter 1
 Get more information
from data

Great
Derivative
4
3.5
Slope = 0.39
3
2.5
Base Case
Parameter 1
2
Slope = 0.16
Parameter 2
1.5
1
Slope = 0.08
0.5
0
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Plotting Data – Extracting from StarlogoTNG



Data can be extracted from a graph
Create a graph using the line graph
piece and put reset clock on Setup
block to clear graph
After program is run


Click on graph in Spaceland
Save File – Excel file
This is
what
you get
LET’S DO IT – Open Fish and Plankton !!
Plotting Data – Extracting from Netlogo
 Two ways

1st Way: Write code to
extract the data you want –
see File Output Example in
the Code Examples

Open file in setup
procedure
Create a write-to-file
procedure

Plotting Data – Extracting from Netlogo
 2nd way: Extract data from
Netlogo graphs

Have Netlogo generate graph on
Interface page (example on later
slide)
 Create a setup-plot procedure
and a do-plot procedure
 Call the setup-plot procedure
in setup procedure
 Call do-plot procedure in go
procedure
Plotting Data – Extracting from Netlogo
Run model until sufficient data
obtained
 (PC) Right Click on Graph/(Mac)
 Select Export
 Choose location and File name - select
save
 Excel File is created – Next Slide
 Contains all the information in the
plot and input parameters used.
 Contains excess information about
the plot (color, pen down, mode,
interval…)

LET’S DO IT – Open Rabbits Grass Weeds
Plotting Data – Extracting from Netlogo
This is what
You need
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
 Bar Charts – preferred snacks
 Pie Charts – music preference
Pets purchased at pet store
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
 Line Graphs – cell phone use
http://www.statcan.ca
 Scatter Plots
http://en.wikipedia.org/wiki/Scatterplot
Plotting Data – Activity in Excel
LET’S DO IT
 Open File Car Data
 Insert Chart
 Select type of chart
 XY Scatter
 Select Data Range
 Highlight data to be
plotted
Plotting Data – Activity in Excel
 Label each data series
 Label Graph and Axis
 Select where you want graph to
be (on that page -worksheet –
or on another worksheet in
same file)
Statistics
 Statistics help you
 Summarize data
 Describe data
 Analyze data
Hard to describe the difference
Between the two data sets
22
Now it is easy to summarize, describe
and analyze the data….
The blue and the pink data have the
AVERAGE value (mean) but the blue
data is “NOISIER” (greater standard
deviation). Therefore…
22
18
18
Noisy
Noisier
14
14
Mean (both)
Noisy
Noisy + 2SD
Noisier
Noisy - 2SD
10
10
Noisier + 2SD
Noisier - 2SD
6
6
2
2
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Statistics – How to Calculate in Excel
 +,-,*,/ used for addition,





subtraction, multiplication and
division.
Each cell has a label based on the
column and row.
Use cells to perform calculations
instead of numbers. Example :
=(A4+B4)/C4
Perform calculations on an
entire column - copy and paste
the equation .Warning : this
changes the cell number for each
line.
Fix a specific cell - use the $
symbol, example (A4+B4)/$C$1
Excel has many built in
statistical functions
Makes life easy!
E1
Statistics – Measurements of Central Tendency
Mean (Average), Median, and Mode
 Definitions



Mean (Average) – Sum divided by the number of data points
Median – Middle data point when arranged from highest to lowest
Mode – Most frequent value
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
 Use data set to calculate Mean (Average) Median, Mode,
Max and Min





Select Cell where you want the value of the function to appear
Select Insert then Function
Select Statistical
Select function wanted (AVERAGE, MEDIAN, or MODE) then hit OK
Select Range of data you want to analyze by clicking on range symbol
and highlighting range. Hit enter or OK
Statistics – Measurements of Data Spread
Range, Variance and Standard Deviation
 Definitions
 Range = maximum - minimum

Variance = measures noise of the data around the mean value.

Standard Deviation (S) is the square
root of the variance. Most commonly
used measure of spread (same units
as the data). Another reason to use S:


~68% of the data are in the interval
Mean – S to Mean + S
~95% of the data are in the interval
Mean – 2 S to Mean + 2 S
~99% of the data are in the interval
Mean – 3 S to Mean + 3 S
300
250
Number of Rabvits

Rabbit Population
200
150
100
50
0
0
500
EXCEL does it for you!!!
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
1000
1500
Ticks
Rabbits
Mean
Mean - 2 S
Mean + 2 S
2000
Distance
30
25
20
15
10
5
0
0
2
4
6
8
10
12
10
12
10
12
Slope of distance
8
7
6
Velocity
 What are Derivatives?
 A simple calculation using data
 Instantaneous rate of change
= SLOPE
 Why use Derivatives?
 Get more information from data
 More Ways to comparison data
 Car moving down a road
 Data = the distance traveled
 Velocity = the 1st derivative
of distance
 Acceleration = 2nd derivative
of distance
= the 1st derivative
of velocity
35
5
4
3
2
1
0
0
2
4
6
8
Slope of velocity
2
Acceleration
Derivatives
40
1
0
0
2
4
6
-1
-2
-3
-4
Time
8
How to Calculate a Derivative
 Mathematically:


x = position
t = time
B3  B2   A3  A2 
x
t
You
Don’t
Have
To Use
This
Use
this in
Excel
 In Excel


x
t
x 2  x1
t 2  t1
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass