Transcript Week 2

Organizing & Reporting Data: An Intro
•Statistical analysis works with data sets
A collection of data values on some variables
recorded on a number cases (records)
 For example, the student data from last week:
Organizing & Reporting Data (cont.):
•Structure of most data sets = “rectangular
Columns = Variables
Rows = Cases
Cells = individual values
Managing Data: Basic Tasks
•NOTE: Reliance on Codebook for Data Set
– Specify information about variables in the data set
– Indicate Variable Names & Labels
– Indicate Variable Values (codes) & Value Labels
– Indicates “missing values”
•Can Modify Overall Arrangement of Data Set
–Sorting  Change the order of the cases in the file
–Selecting  identify a subset of cases to work on
–Transforming  modify the values of a variable
Organizing & Reporting Data (cont.):
•Where do the data values come from?
a) Raw Data: recorded from responses, record, or
observations
– In their (more-or-less) original form
– Some coding (or editing) operations usually involved
– Usually coded into numerical values (for ease of use)
b) Transformed Data: modified from original values
– Computed values (e.g., rates, %, sums, “imputations”)
– Recoded values (into more correct or meaningful or
useful values)
c) Created Data: values are “made up”
– Simulated values
– Demonstration values
Managing Data: Basic Tasks
•Transforming Data: Variable Transformations
a) Computing new variables from prior ones
•
•
Index = Q1 + Q2 + Q3 + Q4
Utility = probability * outcome
b) Recode Variable by changing its values
• Change missing values (“blanks”) to “0”
c) Recode Variable into a New Variable
•
•
Age (yrs)  Child (1-11); Juvenile (12-17); Adult (18over)
Age (yrs)  10-19 yrs; 20-29 yrs; 30-39 yrs; 40-49 yrs;
50-59 yrs; 60-69 yrs; 70-79 yrs; 80-89 yrs; 90-99 yrs.
Computed Data: Some Useful forms
• Rates – numbers divided by populations
• Ratios – one number divided by another
• Indexes – new variable = a sum (or other
combination) of multiple prior variables
• Rescaled Data – a raw score modified by
some mathematical function (e.g., logarithm)
• Standardized scores – Rescaled to standard
units  e.g., Z-scores
Recoded Data: Some Useful forms
•Collapsed (& abbreviated) scores
•Grouped scores – recoding a numeric variable
into a discrete (numeric or ordinal) variable
–Uniform (or fixed-width) groupings  widths of
groups are all the same
[Note the standard rules for forming grouped variables]
–Non-uniform (variable or flexible) groupings 
widths of groups are not all the same
–Normed groupings  grouped by proportions of
cases  e.g., percentiles, quartiles, median-splits
[a special form of non-uniform grouping]
How to recode variables in SPSS?
• Use the Transform option on the top
menu bar to change the data (see
Appendix B in Kirkpatrick/Feeney for details)
• Compute  allows for computing a new
variable from prior variables
• Recode  allows for modifying how a
variable is coded
a) ‘Into same variables’ (change original variable)
b) ‘Into different variables’ (create new variable
with different codes & leave original variable as is)
Representing Data Distributions:
•
In statistics, we are working with a collection
of many data points  Our focus is on the
distribution of the whole set of points
• Three forms of presentation for summarizing
distributions of data points:
1. Tabular  tables and lists of numbers
2. Graphical  pictures, shapes, and lines (in
charts, graphs, and diagrams)
3. Verbal  words and phrases
Tabular Presentations: Basic Formats
1) Data Listing: simple inventory of points in the data set
2) Ordered Data Listing: Inventory of data sorted into
groups or arranged in increasing or decreasing order
3) Frequency Table: summary showing each value and the
number of cases having that value (most relevant for
discrete variables)
4) Percentage Table: table with percentages of total cases
given rather than (or in addition to) numerical counts
5) Cumulative Percentage Table: reporting percentages
of total cases which have that specific value or lower.
6) Cross-Tab Table: a “bivariate” frequency distribution of
the values of one variable across the values of another
variable
Cross-Tabulations (cont.)
• What are the parts of a cross-tab?
a)
b)
c)
d)
Cells
Rows and columns
Marginals
Grand total
• How to set up a cross-tab?
a) Which variables are in the rows and columns?
b) Use Percentages or Frequencies?
c) How to percentage a cross-tab?
Representing Distributions
Graphically: Basic Formats
• Pie Charts
• Bar Charts
– Vertical or Horizontal
– Simple or Grouped
– Stacked
• Histograms
• Line Charts
– Frequency polygons
– Time (Trend) plots
– Relationship plots
Representing Distributions
Graphically: Basic Formats
• Other Charts ( to be dealt with later):
–Box Plots (aka “Box-and-Whiskers”)
–Scatter Plots