#### Transcript [MSM04]

Hypothesis Testing II Testing Hypothesis Test of Significance of Mean for Large Sample (T) • Examples: 1.: Assumes that the average annual income for govt. employee in the nations is reported by the census bureau to be Rs. 18750. There was some doubt whether the average yearly income of govt. employee in Pune was representative of the national wage level. A random sample of 100 govt. employee in Pune was taken and found that their average salary was Rs. 19240 with standard deviation of Rs. 2610. At the = 0.05 level of significance can we conclude that the average salary of govt. employee in Pune is representative of the national wage level. ( Standard value of z = 1.96) Testing Hypothesis Test of Significance of Mean for Large Sample (T) • Examples:2.: A sample of 400 male students is found to have a mean height 67.47 inches. Can it be reasonably regarded as a sample drawn from large population with mean height 67.39 inches and standard deviation (S.D.)1.30 inches? • Test at a 5% level of significance. • ( Standard value of z = 1.96) Testing Hypothesis Test of Significance of Mean for Large Sample (L) • Examples: 3. The manufacturer of light bulbs claims • • that minimum average life of this bulbs is 1600 hours. We want to test his claim. A sample of 100 light bulbs was taken at random and the average bulbs life of this sample was computed to be 1570 hours with standard deviation of 120 hours. At = 0.01, let us use test the validity of the claim of this manufacturer. ( Standard value of z = - 2.33) Testing Hypothesis Test of Significance of Mean for Large Sample ® • Examples:4. An insurance company claims that it takes 14 days on an average to process an auto accident claim with S.D. of 6 days. To test the validity of this claim. An investigator randomly selected 36 people who recently field claims. This sample reveled that it took the company an average of 16 days to process these claims. At 99% confidence check if it takes the company more than 14 days on an average to process a claim. ( Standard value of z = 2.33) Testing Hypothesis Test of Significance of Mean for Small Sample (T) • Examples:5. A gas station repair shop claims that it can do a lubrication job and oil change in 30 minutes . The consumer protection department wants to test this claim, a sample of 6 cars were sent to the station for oil change and lubrication . The job took an average of 34 minutes with SD of 4 minutes . This claim is to be tested at = 0.05. ( Standard value of t = 2.02) t - distribution Normal distribution df = inf t- distribution for sample size n=15 (df = 15-1 = 14) t- distribution for sample size n=2 (df = 2-1 = 1) Characteristics of the t - distribution • Both t-distribution and Normal distribution • • • • are symmetrical. t-distribution is flatter than the normal distribution. A t- distribution is lower at the mean and higher at the tails than a normal distribution. When sample size gets larger, the shape of the tdistribution loses its flatness & become approximately equal to the normal distribution. In fact sample size is more than 30, the tdistribution is so close to normal distribution that we will use the normal to approximate the t. t - distribution & Degree of Freedom • There is separate t- distribution for each • • • sample size. There is different t- distribution for each of the possible degree of freedom. What is degree of freedom? Degree of Freedom is the number of values we can choose freely. (n-1) When there are two elements in our sample we have (2-1)= 1 degree of freedom, and with seven elements in our sample we have (7-1)= 6 degree of freedom. Testing Hypothesis Test of Significance of Mean for Small Sample (T) • Examples:6. A clam is maid that ICFAI College • Students have an IQ OF 120. To test this claim a random sample of 10students was taken and their IQ score are recorded as follows. 105,110,120,125,100,130,120,115,125,130, at the 0.05 level of significance, test the validity of this claim. ( Standard value of t = 2.26) Testing Hypothesis Test of Significance of Mean for Small Sample (T) • Examples:7. A dispenser is set to dispense 500ml of liquid. 15 samples were taken and it was found that the mean of the sample is equal to 498ml and sample standard deviation is 5ml. Determine whether the dispenser needs to be readjusted at ά = 0.05. • Ho: μ = 500 ml • Ha: μ ≠ 500 ml • As the sample size is n < 30 and σp is unknown, we will use t-statistics Testing Hypothesis Test of Significance of Mean for Small Sample (T) • Examples:7. t= = = - 1.4966 • Assuming a 5% level of significance, the critical • value of t for 14 degrees of freedom is ± 2.145. As the calculated t value of -1.496 lies within the acceptance region, so we accept the null hypothesis i.e. the dispenser dispenses 500 ml of liquid. Thus there is no need to readjust the dispenser Testing Hypothesis Test of Significance of Difference between Two Means (Large Sample) (T) • Examples:8. The mean produce of wheat of a • sample of 100 fields in 200 Qt per acre with SD of 10 Qt. Another sample of 150 field gives the mean of 220 Qt with SD of 12 Qt . Can the two samples be considered to have been taken from the same population whose SD is 11 Qt. Use = 0.05. (Standard value of z = 1.96) Testing Hypothesis Test of Significance of Difference between Two Means (Large Sample) (T) • Examples:9. Given the SD of each of the two • population is 2.5 and the means of two large samples of sized 1000 & 2000 are 67.5 & 68 respectively. Test equality of means at ά= 0.05. n1=1000, n2= 2000, x1= 67.5, x2= 68, •σ p1 =2.5 & σp1 =2.5 • (Standard value of z = 1.96) Testing Hypothesis Test of Significance of Difference between Two Means (Large Sample) (T) • • • • Examples:10. INC Student is conducting a survey in solapur and pune on the hourly wages of landless labours. Results of the survey are as follows. Find is there significance difference between wages of landless labours in solapur and pune at = 0.05 Town Mean Hourly wages SD Sample Solapur Rs 8.95 Rs 0.40 200 Pune Rs 9.10 Rs 0.60 175 • n1=200, n2= 175, x1= 8.95, x2= 9.10, • σ = 0.40 & σ • p1 p1 =0.60 (Standard value of z = 1.96) Testing Hypothesis Test of Significance of Difference between Two Means (Large Sample) • Examples: 11. A company manufacturing chocolates want to compare the average manufacturing time at its two facilities. At facility A, on the basis of 500 samples it was observed that, the manufacturing time was 5 hours with standard deviation of 0.5 hours. At facility B, on the basis of 600 samples it was observed that, the manufacturing time was 5.5 hours with standard deviation of 0.5 hours. Test the equality of means at 0.05 significance level. Testing Hypothesis Test of Significance of Difference between Two Means (Large Sample) • Examples:11. solution • H0 : µA = µB • There is no difference in the means of two populations • There is difference in the means of two populations • Ha : µA≠ µB • na=500, nb=600, σa =0.5, σb =0.5, • = 5, = 5.5 Testing Hypothesis Test of Significance of Difference between Two Means (Small Sample) (T) • Examples:12. • It is desired to find out if there is any significant • • • difference in the average amount of money carried by male and female students at Ferguson college pune, on valentine day. A random sample of 8 male & 10 female students was selected and the amount of money each had was found from each sample as follows. n1= 8, n2= 10, x1= $20.50, x2= $17.00, σS1= $2.00 & σS1= $1.5 (Standard value of t for df =16 is 2.12) Testing Hypothesis Test of Significance of Difference between Two Means (Small Sample) (T) • Examples:13. • The number of sales, average size of the sales • • • • • and SD of two sales person A & B are as follows. A B Number of sales 10 17 Average size (in Rs) - 6200 5600 Standard deviation (Rs)- 690 600 Examine whether the figure s of average sales size differ significantly (at = 0.05) (Standard value of t for df = 25 is 2.06) Testing Hypothesis Test of Significance of Difference between Two Means (Small Sample) • Example: 14. A supervisor observed the output of • two machines. Machine 1 produces 25 units in an hour and it was found that 5 units are defective with standard deviation of 3 units. Machine 2 produces 22 units in an hour and it was found that 4 units are defective with a standard deviation of 2 units. Determine whether the output of two machines differ significantly = 5, = 4, n1 = 25, n2 = 22, σS1 =3, σS1 =2 Testing Hypothesis Test of Significance for single Proportions (Large Sample) (T) • Example:15. The sponsor of a television show believes that his studio audience is divided equally between men and women. Out of 400 persons attending the show one day, there were 230 men (57.5%) at ά= 0.05 test the belief of the sponsor is correct. Testing Hypothesis Test of Significance for single Proportions (Large Sample) (T) • Example:16. A company contemplating to give VRS to its employee, The human resources director has given report to the president that roughly 60% of the employee are eligible for VRS. The president forms a special committee conduct the eligible candidates. The committee conducts in depth interviews with 120 employee and find that in its judgment only 70% of the sample are qualified for VRS. The president wants to know whether the finding of the human resources director are corrected or not. (ά= 0.05) Testing Hypothesis Test of Significance for single Proportions (Large Sample) • Example:16. • PHO = 0.6 •q • • • = 0.4 n = 120 = 0.7 =0.3 HO Testing Hypothesis Test of Significance for single Proportions (Large Sample) (L) • Example:17. The mayor of the city claims that 60% the people of the city follows him, and support his policies. we want to test whether his claim is valid or not. A random sample of 400 persons was taken & it was found that 220 of these people supported the mayor at the ά = 0.01 what can be conclude about mayors claim. Testing Hypothesis Test of Significance for single Proportions (Large Sample) (L) • Example:17. • PHO = 0.6 •q • • • = 0.4 n= 400 = 220/400 = 0.55 =180/ 400 = 0.45 HO Testing Hypothesis Test of Significance for single Proportions (Large Sample) (L) • Example:18. An organization wants to introduce flexible work timings, if more than 60% of the people are in favor of the policy. For this purpose it conducted survey among 500 employees and found that 450 employees were in the favor of flexible work timings. On the basis of sample survey, what the company should do? Assume significance level as 5%. Testing Hypothesis Test of Significance for single Proportions (Large Sample) • • • • • • Example:18. PHO = 0.60 (Hypothesized value of the population proportion in the favor of flexible work timings) qHO = 0.40 (Hypothesized value of the population proportion not in the favor of flexible work timings) n= 500 = 450/500= 0.9 sample proportion in the favor of flexible work timings = 50/500= 0.1 sample proportion not in the favor of flexible work timings Chi- square test x 2 Chi Square test • Symbolized by Greek • pronounced “Ki square” • A Test of STATISTICAL x 2 SIGNIFICANCE for TABLE data What do tests of statistical significance tell us? • Are OBSERVED RESULTS • SIGNIFICANTLY DIFFERENT than would be expected • BY CHANCE • Criteria ά < .05 Testing Hypothesis Chi- square test • Evaluates whether observed frequencies for a qualitative variable (or variables) are adequately described by hypothesized or expected frequencies. – Qualitative (or categorical) data is a set of observations where any single observation is a word or code that represents a class or category. Testing Hypothesis Chi- square test •X2 - test for – Test of deviation from expected frequencies: Test whether the observed frequencies deviate from expected frequencies (e.g. using a dice, there is an a priori chance of 16.67% for each number) – Test of association: Finding relationship between two or more independent variables (e.g. test relation between gender and the use of high or low accents?) Chi- square distribution ( fo fe ) fe 2 2 Testing Hypothesis Chi- square test • In the test of significance of mean we are comparing • • the mean of one sample with the hypothesized population mean. In the test of significance of difference between two means, we are comparing the means of two samples. In chi-square test, we can check the equality of more than two population parameters (like proportions, means). If we classify a population into several categories with respect to two attributes (such age & job performance) we can then use a chi-square test to determine whether the two attributes are independent of each other. Testing Hypothesis Chi- square test •Using the Chi- square test: • Chi square test enables to test • • for the equality of several proportions. Chi square is test of statistics used to test a hypothesis that provides a set of theoretical frequencies with which observed frequencies are computed. Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table. Testing Hypothesis Chi- square test Conditions for a the application of Chi- square test (x2): • All raw data for X2 must be frequencies / actual numbers (not percentages & proportions) • The expected frequency of cell should be more than 5. • The chi square test work only when the sample size is large enough (n > 50). • The observation drawn need to be random and independent. • A constraint must be linear. Chi- square test • Properties of Chi- square test: • As t-distribution there is different chi- square • • • distribution for each different number of degree of freedom. For very small number of degree of freedom, the x2 distribution is severely skewed to the right. As the number of degree of freedom increases, the curve rapidly becomes more symmetrical until the number reaches large values, at which point the distribution can approximated by the normal. The chi-square distribution is a probability distribution. The chi-square distribution involves squared observations and hence it is always positive . Chi-Square • • • • • ( fo fe )2 fe 2 Cannot be negative because all discrepancies are squared. Will be zero only in the unusual event that each observed frequency exactly equals the corresponding expected frequency. Other things being equal, the larger the discrepancy between the expected frequencies and their corresponding observed frequencies, the larger the observed value of chi-square. It is not the size of the discrepancy alone that accounts for a contribution to the value of chi-square, but the size of the discrepancy relative to the magnitude of the expected frequency. The value of chi-square depends on the number of discrepancies involved in its calculation. Chi- square as a test of Goodness of fit • Chi- square test developed by Karl Pearson in 1990. • Chi- square as a test of Goodness of fit, which is • • used to test whether or not the observed frequency results support a particular hypothesis. The test can be used to identify whether the deviation between the observed and estimated values can be because of a chance. In some situations researchers would like to see how well the observed frequency pattern will fit in to the expected frequency pattern. In such cases the chi square test is used to test whether the fit between the observed distribution and the expected distribution is good. Testing Hypothesis Steps in Chi- square test An Example:1. Chi- square test • Ex: Suppose that 60 children were asked as to • • • • which ice -cream flavour they liked out of the three flavours of vanilla, strawberry & chocolate. The answer are recorded as follows. (ά = 0.05) Flavours Numbers Vanilla 17 Strawberry 24 Chocolate 19 • x2 = ∑ (Fo – Fe) Fe An Example:2. Chi- square test • The following table depicts the expected sales (Fe) and actual sales (Fo) of television sates for company. Test whether there is a substantial difference between the observed and expected values. Using Chisquare test. (ά = 0.05) (Fo) 57 69 51 83 44 48 35 37 (Fe) 59 76 55 75 39 53 30 48 Analysis of Variance ANOVA Analysis of Variance ANOVA • Till now we delt with the research problems where tow means are involved. However if the problem requires the comparison of the means of more than two populations using z-test, t-test become more complex and tedious. To avoid this tedious process we can use Analysis of Variance ANOVA developed by R. A. Fisher it is also called the F-test. Analysis of Variance ANOVA •Objective of ANOVA: • The objective of the ANOVA test is to test • whether there is any significant difference between the means is of various samples ANOVA test uses the variability between the sample means as the basis for analysis. ANOVA measures the variability in data point within samples and it also measures the variance between the sample means. • These two variations are compared using the F-test. Analysis of Variance ANOVA • In its simplest form, it is used to compare means for three or more categories. – Example: +Life Happiness scale and Marital Status (married, never married, divorced) • Relies on the F-distribution – Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df. What is ANOVA? • If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests. – The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known). A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error) Analysis of Variance ANOVA •The methodology of ANOVA is based on the following assumptions: 1.Each sample size ‘n’ is drawn randomly & each sample is independent of the other samples. 2. The population are normally distributed. 3. The population from which the samples are drawn have equal variances. Two Sources of Variability ANOVA • In ANOVA, an estimate of variability between groups is compared with variability within groups. – Between-group variation is the variation among the means of the different treatment conditions due to chance (random sampling error) and treatment effects, if any exist. – Within-group variation is the variation due to chance (random sampling error) among individuals given the same treatment. ANOVA Total Variation Among Scores Within-Groups Variation Variation due to chance. Between-Groups Variation Variation due to chance and treatment effect (if any existis). The F Ratio ANOVA Between GroupVaria bility F Within GroupVaria bility ANOVA (F) Total Variation Among Scores Within-Groups Variation Variation due to chance. Between-Groups Variation Variation due to chance and treatment effect (if any existis). The F Ratio ANOVA ANOVA (F) Total Variation Among Scores Within-Groups Variation Variation due to chance. Between-Groups Variation Variation due to chance and treatment effect (if any existis). Mean Squares Within Mean Squares Between MSb etween F MSwith in “mean squares between” “mean squares within” The F Ratio ANOVA MSbetween F MSwithin SSwi thin MSwi thin dfwi thin SSbetween MSbetween df between “sum of squares total” SStotal SSbetween SSwithin “degrees of freedom total” df total df between dfwithin F-distribution • F-test is always a one-tailed test. An Example: ANOVA • A study compared the felt intensity of unrequited love among three groups: individuals who were currently experiencing unrequited love, individuals who had previously experienced unrequited love and described their experiences retrospectively, and individuals who had never experienced unrequited love but described how they thought they would feel if they were to experience it. Determine the significance of the difference among groups, using the .05 level of significance. Imagined Retrospective Current 7 12 8 6 8 10 5 9 12 6 11 10 An Example: ANOVA • A psychologist interested in artistic preference randomly assigns a group of 15 subjects to one of three conditions in which they view a series of unfamiliar abstract paintings. The 5 participants in the “famous” condition are led to believe that these are each famous paintings. The 5 participants in the “critically acclaimed” condition are led to believe that these are paintings that are not famous but are highly thought of by a group of professional art critics. The 5 in the control condition are given no special information about the paintings. Does what people are told about paintings make a difference in how well they are liked? Use the .01 level ofFamous significance. Critically Acclaimed No Information 10 5 4 7 1 6 5 3 9 10 7 3 8 4 3 Testing Hypothesis Test of Significance of Mean for Small Sample • Examples:5. In order to receive the accident incurrence rates for automobiles, an insurance company want to assess the damage caused by accidents at the speed of 120Km/ hour. A sample of 16 new cars was selected at random and the company crashed each one at the speed of 120 Km/hour. The cars so damages were repaired and it was found that the average repair amount was Rs 2500 with SD of Rs 950. Estimate true average damage in terms of rupee of all cars due to crash at 120 Km/ hour. Assume = 0.05 Testing Hypothesis • Examples: • Suppose we are interested in a population 20 industrial units of the same size all of which are experiencing excessive labour turnover problems. The past record shows that the mean of the distribution of annual turnover is 320 employees with SD of 75 employees. A sample of 5 of these industrial units is taken at random which gives a mean of annual turnover as 300employees. Is the sample mean consistent with the population? Testing Hypothesis • Ex: A psychologist is working with people who have had a particular type of major surgery. The psychologist proposes that people will recover from the operation more quickly if friends and family are in the room with them for the first 48 hours after the operation (based on several other studies on social support), but acknowledges that the presence of friends and family may also slow recovery time, due to the added activity and possible stress associated with visitors. It is known that time to recover from this kind of surgery is normally distributed with a mean of 12 days and a standard deviation of 5 days. The procedure of having friends and family in the room for the period after the surgery is done with 9 randomly selected patients. The patients recover in an average of 8 days. Using the .01 level of significance, what should the researcher conclude? Testing Hypothesis • State the research hypothesis: Do patients who have friends and family with them following surgery recover more or less quickly than people who do not? • State the statistical hypothesis: • Set decision rule: Z crit 2.58 • Calculate the test statistic: Z H o : 12 H A : 12 8 12 2.40 5 9 • Decide if results are significant: Retain H0, -2.40 > -2.58 • Interpret results as it relates to the statistical hypothesis: Patients who have friends and family with them following surgery do not recover significantly faster, or slower, than patients who do not have social support Hypothesis Testing Example 8.6. A horticulturist knows from experience that the honeybees visiting her orchard weigh .87 gram on the average. Feeling that this year’s honeybees look bigger, she decides to weigh a random sample of n = 50 of the bees all together and she gets an average weight of .91 grams per bee with s = .15 gram. • • • • • • • • Summary A hypothesis is a statement that is considered to be true, till it is proved false. The testing of hypothesis is a process of testing the significance of a parameter of the population on the basis of a sample. A hypothesis is tested by determining the null and alternative hypothesis. In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis. There are two types of errors in testing of hypothesis - Type I and Type II errors. Rejecting a null hypothesis when it is true is called as type I error, accepting a null hypothesis when it is false is called as type II error. Two tailed tests of a hypothesis will reject the null hypothesis if the sample mean is significantly higher than or lower than the hypothesized population mean. The test of significance is done to determine whether the means of two samples drawn from two different sources differ significantly or not. In testing of significance where sample distribution does not follow normal distribution chisquare test is used