#### Transcript Lecture 8

Probability Probability; Sampling Distribution of Mean, Standard Error of the Mean; Representativeness of the Sample Mean Probability – Frequency View Probability is long run relative frequency Same as relative frequency in the population Dice toss p(1) = p(2) = …=p(6) = 1/6 Coin flip p(Head) = p(Tail) = .5 Probability & Decision Making Decision making like gambling – go with what is likely. Lady tasting tea in England. Milk first or second? 5 cups of tea to taste. What is the probability she gets it right? If you cannot tell the difference, how likely will you be right on all cups? Cup Probability Correct 1 .5 ½ 2 .25 ½*½ 3 .125 ½*½*½ 4 .0625 ½*½*½*½ 5 .03125 ½*½*½*½*½ How many cups would it take to convince you? Convention in social science is a probability of .05. Using this standard, she would have to get all 5 right to be convincing in her ability. She did; they were. Frequency Distribution of the Mean What is the distribution of means if we roll dice once? What is the distribution of means if we roll dices twice and take the average? Three times? (See Excel File ‘dice’) Dice Raw Data 1 Die M = 3.5 SD = 1.87 Sampling Distributions of Means Ave of 2 Dice M = 3.5 SD = 1.23 Ave of 3 Dice M = 3.5 SD = .99 Notice the mean, standard deviation, and shape of the distributions. Sampling Distribution Notion of trials, experiments, replications Coin toss example (5 flips, # heads) Repeated estimation of the mean Sampling distribution is a distribution of a statistic (not raw data) over all possible samples. Same as distribution over infinite number of trials. Recall dice example. Estimator We use statistics to estimate parameters Most often X Suppose we want to estimate mean height of students at USF. Sample students, estimate M. Accuracy of estimate depends mostly upon N and SD. Example of Height Hypothetical data. RAW DATA Height of USF Students 66; 4 Relative Frequency 0.80 Note that graph shows the population. 0.64 0.48 0.32 0.16 0.00 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 Heignt in Inches Raw Data vs. Sampling Distribution Two Distributions Raw and Sampling 0.8 Relative Frequency Means (N=50) Note middle and spread of the two distributions. How do they compare? 0.6 0.4 0.2 Raw Data 0.0 50 52 54 56 58 60 62 64 66 68 70 Heignt in Inches 72 74 76 78 80 Definition of Bias Statisticians have worked out properties of sampling distributions Middle and spread of sampling distribution are known. If mean of sampling distribution equals parameter, statistic is unbiased. (otherwise, it’s biased.) The sample mean X is unbiased. Best estimate of is X . Definition of Standard Error The standard deviation of the sampling distribution is the standard error. For the mean, it indicates the average distance of the statistic from the parameter. Means (N=50) Standard error of the mean. Standard Error Raw Data 50 52 54 56 58 60 62 64 66 68 70 Heignt in Inches 72 74 76 78 80 Formula: Standard Error of Mean X To compute the SEM, use: X N 4 X .57 50 For our Example: Means (N=50) Standard error = SD of means = .57 Standard Error Raw Data 50 52 54 56 58 60 62 64 66 68 70 Heignt in Inches 72 74 76 78 80 Review What is a sampling distribution? What is bias? What is the standard error of a statistic? Suppose we repeatedly sampled 100 people at a time instead of 50 for height at USF. What would the mean of the sampling distribution? What would be the standard deviation of the sampling distribution? Definition A sampling distribution is a distribution of _____? 1 parameters 2 samples 3 statistics 4 variables Definition What is the standard error of the mean? 1 average distance of standard from the error 2 average distance of raw data (X) from the data average (X-bar) 3 square root of the sampling distribution of the variance 4 standard deviation of the sampling distribution of the mean Computation If the population mean is 50, the population standard deviation is 2, and the sample size is 100, what is the standard error of the mean? 1 .2 2 .5 3 2 4 10 Deciding whether a Sample represents a Population Representativeness: degree to which the sample distribution resembles the population distribution. z X X We can use the normal distribution to figure the probability of a sample mean. If the sample mean is very unlikely (has a low probability) we conclude the sample does not represent the population. If it is likely, we conclude it does. Suppose we grab a sample of 49 students and their mean GPA is 3.7. We know the population mean is 3.1 and the population SD is .35. Is the sample representative? X .35 .35 X 3.7 3.2 .5 .05 z 10 7 49 X .05 .05 Likely? z X X 3.9 3.2 .7 10 .05 .05 Probability (Relative Frequency) Standard Normal Curve 0 .4 50 Percent Area beyond 10 =? 0 .3 From z table: 34.13 % 0 .2 p = 7.69*10-23 0 .1 13.59% 2.15% 0 .0 -3 -2 -1 0 1 2 Scores in standard deviations from mu 3 Recall that anything beyond z = 2 is rare; anything beyond z = 3 is remote. Rejection Region Place in the curve that is unlikely if the scenario is true. Area totals to probability. Probability (Relative Frequency) Standard Normal Curve 0 .4 50 Percent 0 .3 34.13 % 0 .2 0 .1 13.59% 2.15% 0 .0 -3 -2 -1 0 1 2 3 Scores in standard deviations from mu Bottom 2.5 pct Convention is p = .05; That 5 percent of the area least likely to occur if the scenario is true is the rejection region. In most cases, the extremes of both tails are the places for the rejection region. The sample is unrepresentative if it falls far from the center. For z, the border is +/- 1.96 for p = .05 for 2 tails. For 1 tail, it is 1.65. Top 2.5 pct Review We know the population mean is 50 and the population standard deviation is 10. We grab 100 people at random and find the mean of the sample is 45. Does the sample represent the population?