Data Analysis Overview
Download
Report
Transcript Data Analysis Overview
Data and Data Analysis
UNLOCKING THE SECRETS HIDDEN IN YOUR
DATA
PART 1
Data
What is Data?
Data is information gathered from observation,
experimentation or modeling
Qualitative – not precise (usually descriptive)
Quantitative - Precise (usually numeric)
The output of your model (i.e. number of healthy
agents, number of infected agents, time…)
Data
How do we gather data?
Data collection is the systematic recording of
information while changing Variables (a quantity
that may assume any given value or set of values).
Collect the output (i.e. number of healthy agents,
number of infected agents, time…) while changing
the variables (number of devils, number initially
infected) of the model
Data
Why should we get data?
To answer questions
To develop understanding
To validate experiments
What should we do with data?
Display – usually graph it to make it easier to see
trends
Analysis – use math skills to uncover patterns and
trends in data sets
Interpretation - involves possible explanation
those patterns and trends.
Extracting Data from StarlogoTNG
There are three ways to extract data from
StarlogoTNG
Collect the data by hand
Create a chart in Starlogo TNG and extract the data to Excel
Create a table in Stalogo TNG and extract the data to Excel
Why Should We Display Data ?
Are your assumptions
correct?
Did you collect
enough data?
Rabbit Population
300
250
200
Number of Rabbits
What did you see?
Makes your data
visible
Helps find obvious
patterns
Does the data makes
sense?
150
100
50
0
0
50
100
150
200
Time
250
300
350
Why Should We Analyze Data ?
What does it Mean?
Is there is more information in the
data
Rabbit Population
300
emergent behavior
unexpected patterns
250
Was the hypothesis correct ?
More grass gives more rabbits
To help you answer questions
Provide visible evidence and
support for our conclusions to you
audience (e.g. Challenge judges)
Validity of model, experiment,
theory, …
Number of Rabbits
Why Does it Matter?
Draw conclusions from data
200
150
100
50
0
0
50
100
150
200
Time
250
300
350
Ways to Analyze Data
Plotting Data
Ways to visually
understand data
30
25
Statistics
Makes it easier to
compare data
Mean, Median,
Mode
Makes it clear if you
have NOISY data
Range, Variance,
Standard
Deviation
20
Mean Pink
Pink
15
Mean Blue
Blue
10
5
0
0
10
20
30
40
50
60
Ways to Analyze Data
Derivatives (Slopes)
Tell if changes in
parameters affect data
Parameter 2 has a
greater effect than
Parameter 1
Get more information
from data
Great
Derivative
4
3.5
Slope = 0.39
3
2.5
Base Case
Parameter 1
2
Slope = 0.16
Parameter 2
1.5
1
Slope = 0.08
0.5
0
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Collecting Data: Variable Sweeping
Did you collect enough
data?
Did you vary the parameters
throughout their ranges?
If you have sliders (input
variables) in your program,
you need data for the full
range of those sliders.
Minimum 3 runs for a
single variable (low,
medium, high)
More than one slider
(variable), must vary them
separately.
2 variable perhaps 9 runs
Collecting Data from Starlogo TNG
Gathering Data by hand
Tasmanian Devils
Variable sweep
More than one variable
Multiple runs at each
variable combination
Average the data
Collecting Data from Starlogo TNG
Lets Do It
Open Tasmanian Devil
Run a section of the data
sheet
Do variable sweep
Initial Population
Initial Percent Infected
Multiple runs at each set
of variables
Collect output in data
sheet
Number healthy after 200
ticks
Collecting Data from Starlogo TNG
Put Data into Excel
Calculate Averages
Collecting Data from Starlogo TNG
Make a Summary Table
Create XY Charts
Collecting Data from Starlogo TNG
Make a 3D Chart
Plotting Data – Extracting from Starlogo TNG
LET’S DO IT – Tasmanian Devils !!
Data can be extracted from a graph or a table in Starlogo TNG
Create a graph using the line graph block
Put reset clock on Setup block to clear and reset graph
Plotting Data – Extracting from StarlogoTNG
LET’S DO IT – Tasmanian Devils !!
After program is run
Click on graph in Spaceland
Save File – Excel file
Data Analysis: Plotting Data – Types of Plots
All plots from http://www.statcan.ca
Bar Charts – preferred snacks
Pie Charts – music preference
Pets purchased at pet store
Data Analysis: Plotting Data – Types of Plots
All plots from http://www.statcan.ca
XY Graphs – cell phone use
http://www.statcan.ca
Scatter Plots
http://en.wikipedia.org/wiki/Scatterplot
Plotting Data – Activity in Excel
LET’S DO IT
Open Tasmanian Devil Export
file (csv file ) by double
clicking on the file
In EXCEL - Insert Chart
Select type of chart
XY Scatter
Hit the Next button
Plotting Data – Activity in Excel
Select Data Range
Highlight data to be
plotted
Plotting Data – Activity in Excel
Label each data series
NEXT - Label Graph and Axis
Plotting Data – Activity in Excel
Choose where you want the
graph to be
Get your graph
Population
Tasmani Devil Population
300
200
100
0
-100 0
Sick
Healthy
10
20
Time
30
40
Plotting Data – Extracting from Netlogo
Two ways
1st Way: Write code to
extract the data you want –
see File Output Example in
the Code Examples
Open file in setup
procedure
Create a write-to-file
procedure
Plotting Data – Extracting from Netlogo
2nd way: Extract data from
Netlogo graphs
Have Netlogo generate graph on
Interface page (example on later
slide)
Create a setup-plot procedure
and a do-plot procedure
Call the setup-plot procedure
in setup procedure
Call do-plot procedure in go
procedure
Plotting Data – Extracting from Netlogo
LET’S DO IT – Open Rabbits Grass Weeds
Run model until sufficient data
obtained
(PC) Right Click on Graph/
(Mac) Control Click on Graph
Select Export
Choose location and File name - select
save
Excel File is created – Next Slide
Contains all the information in the
plot and input parameters used.
Contains excess information about
the plot (color, pen down, mode,
interval…)
Plotting Data – Extracting from Netlogo
This is what
You need
Statistics
Statistics help you
Summarize data
Describe data
Analyze data
Hard to describe the difference
Between the two data sets
22
Now it is easy to summarize, describe
and analyze the data….
The blue and the pink data have the
Same AVERAGE value (mean) but the
blue data is “NOISIER” (greater
standard deviation). Therefore…
22
18
18
Noisy
Noisier
14
14
Mean (both)
Noisy
Noisy + 2SD
Noisier
Noisy - 2SD
10
10
Noisier + 2SD
Noisier - 2SD
6
6
2
2
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Statistics – How to Calculate in Excel
+,-,*,/ used for addition,
subtraction, multiplication and
division.
Each cell has a label based on the
column and row.
Use cells to perform calculations
instead of numbers. Example :
=(A4+B4)/C4
Perform calculations on an
entire column - copy and paste
the equation .Warning : this
changes the cell number for each
line.
Fix a specific cell - use the $
symbol, example (A4+B4)/$C$1
Excel has many built in
statistical functions
Makes life easy!
E1
Calculate in Excel Activity
Open a blank spread sheet in Excel
Create 2 columns of numbers
Then Add, Subtract, Multiple and Divide the first row
Copy and paste the formulas
Statistics – Measurements of Central Tendency
Mean (Average), Median, and Mode
Definitions
Mean (Average) – Sum divided by the number of data points
Median – Middle data point when arranged from highest to lowest
Mode – Most frequent value
LET’S DO IT : StarlogoTNG : Fish and Plankton data
Netlogo : Rabbits and Grass data
Use data set to calculate Mean (Average) Median, Mode,
Max and Min
Select Cell where you want the value of the function to appear
Select Insert then Function
Select Statistical
Select function wanted (AVERAGE, MEDIAN, or MODE) then hit OK
Select Range of data you want to analyze by clicking on range symbol
and highlighting range. Hit enter or OK
Statistics – Measurements of Data Spread
Range, Variance and Standard Deviation
Definitions
Range = maximum - minimum
Variance = measures noise of the data
around the mean value.
Standard Deviation (S) is the square
root of the variance. Most commonly
used measure of spread (same units
as the data). Another reason to use S:
~68% of the data are in the interval
Mean – S to Mean + S
~95% of the data are in the interval
Mean – 2 S to Mean + 2 S
~99% of the data are in the interval
Mean – 3 S to Mean + 3 S
Rabbit Population
300
250
Number of Rabvits
200
150
100
50
0
0
500
1000
1500
Ticks
Rabbits
Mean
Mean - 2 S
EXCEL does it for you!!!
LET’S DO IT : StarlogoTNG : Fish and Plankton data
Netlogo : Rabbits and Grass data
Mean + 2 S
2000
Distance
30
25
20
15
10
5
0
0
2
4
6
8
10
12
10
12
10
12
Slope of distance
8
7
6
Velocity
What are Derivatives?
A simple calculation using data
Instantaneous rate of change
= SLOPE
Why use Derivatives?
Get more information from data
More Ways to comparison data
Car moving down a road
Data = the distance traveled
Velocity = the 1st derivative
of distance
Acceleration = 2nd derivative
of distance
= the 1st derivative
of velocity
35
5
4
3
2
1
0
0
2
4
6
8
Slope of velocity
2
Acceleration
Derivatives
40
1
0
0
2
4
6
-1
-2
-3
-4
Time
8
A Note on Randomness
This data is not
RANDOM
Random means that
there is an equal
probability of getting
each outcome (like
rolling a die)
There is scatter in the
data but it is not random
22
18
14
Noisy
Noisier
10
6
2
0
10
20
30
40
50
60
Other Things to Think About
Is there “scatter” in your model?
Evaluate how the “scatter” effects your results – repeat model runs
Make sure you get enough data to get a good statistics
Did you collect enough data?
Did you let the model run long enough? Has the model reached
“equilibrium”
Rabbit Population
350
350
300
300
Number of Rabbits
Number of Rabbits
Rabbit Population
250
200
150
100
50
250
200
150
100
50
0
0
0
10
20
30
Ticks
40
50
60
0
100
200
300
400
500
Ticks
600
700
800
900
1000