Transcript Review1

Review
Review
 We’ve covered three main topics thus far
 Data collection
 Data summarization
 Probability
Data Collection
 We’ve talked about three ways of data collection
 Survey
 Sampling frame, questionnaire, probability sample, convenience sample,
non-response bias, other types of bias
 Observational study
 No assignment of treatments. No causal conclusions
 Randomized experiment
 Random assignment units/subjects to treatments. If done properly causal
conclusions (conclusions might not generalize).
 Why randomize?
Data summarization
 We talked about graphical and numerical summaries for one
variable and two. Important to identify type of variable.
 One categorical/qualitative variable
 graphical: pie chart, bar graph
 numerical: counts/percents/frequencies
 One quantitative variable
 graphical: histogram/boxplot (shape, center, spread, outliers)
 numerical:mean, median, standard deviation, inter-quartile range, range,
percentiles
Data summarization
 Two variables
 One categorical/qualitative and one quantitative
 graphical: side-by-side boxplots
 numerical: means, meadians, SDs, IQRs, etc. for each category
 Two quantitative
 graphical: scatterplot (form, direction, strength, outliers)
 numerical: means, SDs, etc. for both. correlation coefficient
 If association is linear model with straight line. slope and intercept of
regression line (prediction, interpretation, extrapolation, etc.)
 Two categorical/qualitative
 graphical: plots we didn’t talk about
 numerical: contigency tables; marginal frequencies, conditional frequencies
 Also relative risk and odds ratios
Probability
 To find probability of event A
 Enumerate sample space. Count number of outcomes in event
A. Divide by the total number of outcomes
 Easy to do if sample space is small
 Use probability laws to push symbols around
 Independence, mutually exclusive, joint= marginal(conditional)
 Sample space large only way to approach things
Duke b-ball
 What type of study is this?
 Survey? Randomized experiment? observational study?
 Might it be reasonable to assume that the opponents are a
random sample of all type of opponents Duke could potentially
face?
 If not, then everything we see can’t be generalized to teams
Duke might play in the future. (In other words, the population
is the teams that Duke has played so far and we’ve have
observations on all of them.)
Limitations
 Since this is not a designed experiment what are limitations?
 Can we make causal conclusions?
 nope
 Is there potential for lurking variables?
 Yup. In I’d bet there are some.
 What type of information does looking at these type of data
provide?
JMP
 Lets look at a few variables to summarize them graphically
and numerically.
Regression vs correlation coefficient
 Do change of units change value?
 Correlation coefficient (no)
 Regression slope yes
 Does defining the response and explanatory variable matter
 Correlation coefficient (no)
 Regression slope (yes)
 Provides direction and strength of linear association
 Correlation coefficient (yes, yes)
 Regression slope (yes, no)
 Quantifies linear association between two quantitative variables
 Correlation coefficient (no)
 Regression slope (yes)
Correlation coefficient vs regression
 Influenced by outliers
 Correlation coefficient (yes)
 Regression slope (yes) sometimes called influential points
 Can conclude explanatory variable causes change in the response
variable
 Correlation coefficient (no)
 Regression slope (no)
 Although under a well designed experiment it is possible
 Must both variables be quantitative
 Corelation coefficient (yes)
 Regression slope (not necessarily but I don’t think we’ll be able to
cover the the quantitative qualitative regression often called ANOVA)