Patterns in Data - Supercomputing Challenge
Download
Report
Transcript Patterns in Data - Supercomputing Challenge
Data Analysis
UNLOCKING THE SECRETS HIDDEN IN YOUR
DATA
Why Do Data Analysis ?
Are your assumptions
correct?
Did you collect
enough data?
Rabbit Population
300
250
200
Number of Rabbits
What did you see?
Makes your data
visible
Helps find obvious
patterns
Does the data makes
sense?
150
100
50
0
0
50
100
150
200
Time
250
300
350
Why Do Data Analysis ?
What does it Mean?
Is there is more information in the
data
Rabbit Population
300
emergent behavior
unexpected patterns
Why Does it Matter?
Draw conclusions from data
More grass gives more rabbits
To make your project stand out to
the Challenge Judges
Appears to be an improvement
vs
Improves the result by a
specific amount
200
Number of Rabbits
Was the hypothesis correct ?
250
150
100
50
0
0
50
100
150
200
Time
250
300
350
Ways to Analyze Data
Plotting Data
Ways to visually
understand data
Statistics
Makes is easier to
compare data
Mean, Median,
Mode
Makes it clear if you
have NOISY data
Range, Variance,
Standard
Deviation
30
25
20
Mean Pink
Pink
15
Mean Blue
Blue
10
5
0
0
10
20
30
40
50
60
Ways to Analyze Data
Derivatives (Slopes)
Tell if changes in
parameters affect data
Parameter 2 has a
greater effect than
Parameter 1
Get more information
from data
Great
Derivative
4
3.5
Slope = 0.39
3
2.5
Base Case
Parameter 1
2
Slope = 0.16
Parameter 2
1.5
1
Slope = 0.08
0.5
0
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Plotting Data – Extracting from StarlogoTNG
Data can be extracted from a graph
Create a graph using the line graph
piece and put reset clock on Setup
block to clear graph
After program is run
Click on graph in Spaceland
Save File – Excel file
This is
what
you get
LET’S DO IT – Open Fish and Plankton !!
Plotting Data – Extracting from Netlogo
Two ways
1st Way: Write code to
extract the data you want –
see File Output Example in
the Code Examples
Open file in setup
procedure
Create a write-to-file
procedure
Plotting Data – Extracting from Netlogo
2nd way: Extract data from
Netlogo graphs
Have Netlogo generate graph on
Interface page (example on later
slide)
Create a setup-plot procedure
and a do-plot procedure
Call the setup-plot procedure
in setup procedure
Call do-plot procedure in go
procedure
Plotting Data – Extracting from Netlogo
Run model until sufficient data
obtained
(PC) Right Click on Graph/(Mac)
Select Export
Choose location and File name - select
save
Excel File is created – Next Slide
Contains all the information in the
plot and input parameters used.
Contains excess information about
the plot (color, pen down, mode,
interval…)
LET’S DO IT – Open Rabbits Grass Weeds
Plotting Data – Extracting from Netlogo
This is what
You need
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
Bar Charts – preferred snacks
Pie Charts – music preference
Pets purchased at pet store
Plotting Data – Different Types of Plots
All plots from http://www.statcan.ca
Line Graphs – cell phone use
http://www.statcan.ca
Scatter Plots
http://en.wikipedia.org/wiki/Scatterplot
Plotting Data – Activity in Excel
LET’S DO IT
Open File Car Data
Insert Chart
Select type of chart
XY Scatter
Select Data Range
Highlight data to be
plotted
Plotting Data – Activity in Excel
Label each data series
Label Graph and Axis
Select where you want graph to
be (on that page -worksheet –
or on another worksheet in
same file)
Statistics
Statistics help you
Summarize data
Describe data
Analyze data
Hard to describe the difference
Between the two data sets
22
Now it is easy to summarize, describe
and analyze the data….
The blue and the pink data have the
AVERAGE value (mean) but the blue
data is “NOISIER” (greater standard
deviation). Therefore…
22
18
18
Noisy
Noisier
14
14
Mean (both)
Noisy
Noisy + 2SD
Noisier
Noisy - 2SD
10
10
Noisier + 2SD
Noisier - 2SD
6
6
2
2
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Statistics – How to Calculate in Excel
+,-,*,/ used for addition,
subtraction, multiplication and
division.
Each cell has a label based on the
column and row.
Use cells to perform calculations
instead of numbers. Example :
=(A4+B4)/C4
Perform calculations on an
entire column - copy and paste
the equation .Warning : this
changes the cell number for each
line.
Fix a specific cell - use the $
symbol, example (A4+B4)/$C$1
Excel has many built in
statistical functions
Makes life easy!
E1
Statistics – Measurements of Central Tendency
Mean (Average), Median, and Mode
Definitions
Mean (Average) – Sum divided by the number of data points
Median – Middle data point when arranged from highest to lowest
Mode – Most frequent value
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
Use data set to calculate Mean (Average) Median, Mode,
Max and Min
Select Cell where you want the value of the function to appear
Select Insert then Function
Select Statistical
Select function wanted (AVERAGE, MEDIAN, or MODE) then hit OK
Select Range of data you want to analyze by clicking on range symbol
and highlighting range. Hit enter or OK
Statistics – Measurements of Data Spread
Range, Variance and Standard Deviation
Definitions
Range = maximum - minimum
Variance = measures noise of the data around the mean value.
Standard Deviation (S) is the square
root of the variance. Most commonly
used measure of spread (same units
as the data). Another reason to use S:
~68% of the data are in the interval
Mean – S to Mean + S
~95% of the data are in the interval
Mean – 2 S to Mean + 2 S
~99% of the data are in the interval
Mean – 3 S to Mean + 3 S
300
250
Number of Rabvits
Rabbit Population
200
150
100
50
0
0
500
EXCEL does it for you!!!
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass
1000
1500
Ticks
Rabbits
Mean
Mean - 2 S
Mean + 2 S
2000
Distance
30
25
20
15
10
5
0
0
2
4
6
8
10
12
10
12
10
12
Slope of distance
8
7
6
Velocity
What are Derivatives?
A simple calculation using data
Instantaneous rate of change
= SLOPE
Why use Derivatives?
Get more information from data
More Ways to comparison data
Car moving down a road
Data = the distance traveled
Velocity = the 1st derivative
of distance
Acceleration = 2nd derivative
of distance
= the 1st derivative
of velocity
35
5
4
3
2
1
0
0
2
4
6
8
Slope of velocity
2
Acceleration
Derivatives
40
1
0
0
2
4
6
-1
-2
-3
-4
Time
8
How to Calculate a Derivative
Mathematically:
x = position
t = time
B3 B2 A3 A2
x
t
You
Don’t
Have
To Use
This
Use
this in
Excel
In Excel
x
t
x 2 x1
t 2 t1
LET’S DO IT : StarlogoTNG : Fish and Plankton
Netlogo : Rabbits and Grass