Data and Spreadsheets
Download
Report
Transcript Data and Spreadsheets
Question
What are data and what do they mean to a
scientist?
Dinner at the Urquhart House
Brought to you by the Briggs Multiracial
Alliance
Sunday night
All food provided (probably Chinese)
Contact Mimi Reddy, [email protected]
for details
Data, Statistics, and Spreadsheets
What are data?
What are statistics?
What are spreadsheets?
How can you analyze data with
spreadsheets?
Data
Data are pieces of information
Data can be numbers, words, descriptions
Data have UNITS
The word data is PLURAL, datum is singular
Data about Willoughby:
•
•
•
•
•
•
Age: 5 (years)
Height: 47 (inches)
Weight: 66 (pounds)
Eyes: Blue
Favorite word: Wrestle
Favorite letter: W
Types of Data
Numbers – two types
– Real #s – rational numbers – 28.75 lbs
– Integers – whole numbers – 18 months
Letters – called characters in programming
– W is a character
Words – called strings in programming
– “No thanks” is a strings, can be individual
words or phrases
Statistics and Data
Test Scores:
– Jeff: 88
– Mollie: 92
– Marcie: 88
– Dave: 47
– Karim: 99
– Willoughby: 42
– Benjamin: 0
What statistics can you
calculate to describe
these data?
– Try to think of four
things to describe the
data
stop
Statistics
Statistics are derived from the data
Statistics are descriptions of data
Statistics are meant to simplify the data
Statistics can be misleading
Typical Statistics
Sample Size - number of individuals measured = n
Sum = S
Average or Mean = S/n
Median
– Value of 50th percentile, half of values fall above, half below
Maximum, Minimum, Range (Max-Min)
Mode - most common value
Standard deviation
2
Variance (SD )
Analyze these data...
Mean, max, min,
sample size (n)
range, median, mode
•
•
•
•
•
•
•
•
•
18
33
4
47
49
38
29
4
55
Sum S
mean=average=S/n
• denoted x
median = halfway
mode = most common
Spreadsheets
Spreadsheets are tables
Rainforest
Dry Forest
Total
CostaRica Nicaragua
625,000 3,712,000
50,000
300,000
675,000
4,012,000
Spreadsheets allow calculations and
manipulations of data
• Calculations: mean, standard deviation
• Manipulations: sort,
Make a data table:
Fly 1, length 13.4 mm, velocity 27 Kph, age 21 days
Fly 2, length 9.4 mm, velocity 0 Kph, age 220 days
Fly 3, length 9.3 mm, velocity 44 Kph, age 1 days
Fly 4, length 13.4 mm, velocity 17 Kph, age 32 days
Fly 5, length 17.4 mm, velocity 33 Kph, age 11 days
How many columns?
How many rows?
#s go down or across?
Data Table
Fly #
1
2
3
4
5
Length
Velocity
Age
Microsoft Excel
Typical spreadsheet program
– Lotus 1-2-3 is original commercial spreadsheet
Has similar controls to MS Word
Now allows graphing (charts)
• very restricted formats, hard to get exactly what you
want
Excel tables and graphs can be copied into
MS Word
Friday’s Assignment
We will work with Microsoft Excel to
analyze some data
Groups of two will submit one finished
spreadsheet for the assignment
Graphs
Many different types of graphs
– Points
– Lines
– Bars
– Pies
Point Graphs
Called X-Y Scatter in MS Excel
Plot points based on X and Y value
Can fit a “REGRESSION LINE” to the data
– Line that best fits the data
X-Y Scatter
Bar Graphs
Categorize data into counts or percents
Categories can be descriptive categories
(Windows 98, Windows 2000, …)
Can also be numeric categories
– Height: 60-63, 63-66, etc. or just 61, 62, 63…
– Count up number of people in each group
Histograms are a particular type of bar
graph
Bar Graph
Starting Salary
$50,000
$40,000
$30,000
Starting Salary
$20,000
$10,000
$0
1988 1989 1990 1991 1992 1993 1994
Histogram
X axis is categories
Y axis is a number or proportion of
observations in that category
Number of Crashes
Histogram Bar Graph
Regular Bar Graph vs. Histogram Bar Graph
Starting Salary
$50,000
$40,000
$30,000
Starting Salary
$20,000
$10,000
$0
1988 1989 1990 1991 1992 1993 1994
Distributions
Special type of histogram with continuous
numeric scale at bottom
Normal distribution is a key concept in
statistics
Skewed distribution is one that is
unbalanced
Sample distribution histograms
Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt
Robert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt
The NORMAL Distribution
A NORMAL
DISTRIBUTION
is the theoretical
distribution of values
given natural variation
around a MEAN
It is balanced, humped
distribution
Distributions
Skew is an imbalance in the distribution
Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt
Hypothesis Testing
Statistical Tests are how scientists decide if
data support their hypothesis
(NOT PROVE their hypothesis)
Four major statistical tests: T-test, X2 Test,
Regression, ANOVA
Hypothesis
Processor speed has an effect on the
performance of the computer.
Null Hypothesis
– H0: Processor speed has NO EFFECT on the
performance of a computer.
Statistical Tests and Probability
Statistical tests give a value
That value can be related to a probability
Probability is likelihood that NULL
hypothesis is correct given the data you
have
If P < 0.05 (1/20), then you conclude NULL
hypothesis is FALSE
T-Test
Compares differences between two means
Formula: T = (x1-x2)/SEM
– SEM is Standard Error of Mean [SD/(N-1)]
T Values: Difference between mean in
comparison to the amount of spread in your
data
T-Values
If T > 2.5 or 3.0, difference is usually
significant (this depends on your sample
sizes)