Data and Spreadsheets

Download Report

Transcript Data and Spreadsheets

Question
 What are data and what do they mean to a
scientist?
Dinner at the Urquhart House
 Brought to you by the Briggs Multiracial
Alliance
 Sunday night
 All food provided (probably Chinese)
 Contact Mimi Reddy, [email protected]
for details
Data, Statistics, and Spreadsheets
 What are data?
 What are statistics?
 What are spreadsheets?
 How can you analyze data with
spreadsheets?
Data
 Data are pieces of information
 Data can be numbers, words, descriptions
 Data have UNITS
 The word data is PLURAL, datum is singular
 Data about Willoughby:
•
•
•
•
•
•
Age: 5 (years)
Height: 47 (inches)
Weight: 66 (pounds)
Eyes: Blue
Favorite word: Wrestle
Favorite letter: W
Types of Data
 Numbers – two types
– Real #s – rational numbers – 28.75 lbs
– Integers – whole numbers – 18 months
 Letters – called characters in programming
– W is a character
 Words – called strings in programming
– “No thanks” is a strings, can be individual
words or phrases
Statistics and Data
 Test Scores:
– Jeff: 88
– Mollie: 92
– Marcie: 88
– Dave: 47
– Karim: 99
– Willoughby: 42
– Benjamin: 0
 What statistics can you
calculate to describe
these data?
– Try to think of four
things to describe the
data
 stop
Statistics
 Statistics are derived from the data
 Statistics are descriptions of data
 Statistics are meant to simplify the data
 Statistics can be misleading
Typical Statistics
 Sample Size - number of individuals measured = n
 Sum = S
 Average or Mean = S/n
 Median
– Value of 50th percentile, half of values fall above, half below
 Maximum, Minimum, Range (Max-Min)
 Mode - most common value
 Standard deviation
2
 Variance (SD )
Analyze these data...
 Mean, max, min,
 sample size (n)
range, median, mode
•
•
•
•
•
•
•
•
•
18
33
4
47
49
38
29
4
55
 Sum S
 mean=average=S/n
• denoted x
 median = halfway
 mode = most common
Spreadsheets
 Spreadsheets are tables
Rainforest
Dry Forest
Total
CostaRica Nicaragua
625,000 3,712,000
50,000
300,000
675,000
4,012,000
 Spreadsheets allow calculations and
manipulations of data
• Calculations: mean, standard deviation
• Manipulations: sort,
Make a data table:
 Fly 1, length 13.4 mm, velocity 27 Kph, age 21 days
 Fly 2, length 9.4 mm, velocity 0 Kph, age 220 days
 Fly 3, length 9.3 mm, velocity 44 Kph, age 1 days
 Fly 4, length 13.4 mm, velocity 17 Kph, age 32 days
 Fly 5, length 17.4 mm, velocity 33 Kph, age 11 days
 How many columns?
 How many rows?
 #s go down or across?
Data Table
Fly #
1
2
3
4
5
Length
Velocity
Age
Microsoft Excel
 Typical spreadsheet program
– Lotus 1-2-3 is original commercial spreadsheet
 Has similar controls to MS Word
 Now allows graphing (charts)
• very restricted formats, hard to get exactly what you
want
 Excel tables and graphs can be copied into
MS Word
Friday’s Assignment
 We will work with Microsoft Excel to
analyze some data
 Groups of two will submit one finished
spreadsheet for the assignment
Graphs
 Many different types of graphs
– Points
– Lines
– Bars
– Pies
Point Graphs
 Called X-Y Scatter in MS Excel
 Plot points based on X and Y value
 Can fit a “REGRESSION LINE” to the data
– Line that best fits the data
X-Y Scatter
Bar Graphs
 Categorize data into counts or percents
 Categories can be descriptive categories
(Windows 98, Windows 2000, …)
 Can also be numeric categories
– Height: 60-63, 63-66, etc. or just 61, 62, 63…
– Count up number of people in each group
 Histograms are a particular type of bar
graph
Bar Graph
Starting Salary
$50,000
$40,000
$30,000
Starting Salary
$20,000
$10,000
$0
1988 1989 1990 1991 1992 1993 1994
Histogram
 X axis is categories
 Y axis is a number or proportion of
observations in that category
Number of Crashes
Histogram Bar Graph
Regular Bar Graph vs. Histogram Bar Graph
Starting Salary
$50,000
$40,000
$30,000
Starting Salary
$20,000
$10,000
$0
1988 1989 1990 1991 1992 1993 1994
Distributions
 Special type of histogram with continuous
numeric scale at bottom
 Normal distribution is a key concept in
statistics
 Skewed distribution is one that is
unbalanced
Sample distribution histograms
Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt
Robert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt
The NORMAL Distribution
 A NORMAL
DISTRIBUTION
is the theoretical
distribution of values
given natural variation
around a MEAN
 It is balanced, humped
distribution
Distributions
 Skew is an imbalance in the distribution
Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt
Hypothesis Testing
 Statistical Tests are how scientists decide if
data support their hypothesis
 (NOT PROVE their hypothesis)
 Four major statistical tests: T-test, X2 Test,
Regression, ANOVA
Hypothesis
 Processor speed has an effect on the
performance of the computer.
 Null Hypothesis
– H0: Processor speed has NO EFFECT on the
performance of a computer.
Statistical Tests and Probability
 Statistical tests give a value
 That value can be related to a probability
 Probability is likelihood that NULL
hypothesis is correct given the data you
have
 If P < 0.05 (1/20), then you conclude NULL
hypothesis is FALSE
T-Test
 Compares differences between two means
 Formula: T = (x1-x2)/SEM
– SEM is Standard Error of Mean [SD/(N-1)]
 T Values: Difference between mean in
comparison to the amount of spread in your
data
T-Values
 If T > 2.5 or 3.0, difference is usually
significant (this depends on your sample
sizes)