USA: +1-585-535-1023

UK: +44-208-133-5697

AUS: +61-280-07-5697

Goodness of Fit Test

This is called distribution free test, i.e., the population may not be normal. Here the null hypothesis is taken as

 

H0 : observations are in good agreement with a hypothetical distribution/population.

 

If Oi; be the observed frequencies and e; be the expected frequencies (i = 1, 2, ... , n) then the statistic

follows chi-square distribution with (n-1) degrees of freedom. If the calculated value of the statistic is greater than the tabulated value of x2 at a level of significance a, then the null hypothesis is rejected.

 

8A. CHI-SQUARE TEST OF INDEPENDENCE

 

Consider a r * c table in which date can be explained in two ways having r rows and c columns. Also this is called contingency table. A 3 x 3 table can be taken as

where r's and c's can be taken as attributes and Oif = observed frequencies.

 

In the above, the expected cell frequencies can be calculated as

Then

 

follows chi-square distribution with (r - 1) (c - 1) degrees of freedom.

 

We set the null hypothesis as the attributes are independent

i.e., H0 : Pil = P12 = ...... = Pic  ,  i = 1, 2, ... , r

where pij = Probability of obtaining an observation belonging to the i-th row and  the j-th column and  for each column.

We shall obtain the tabulated chi-square value at a level of significance and (r - 1) (c - 1) degrees of freedom.

 

If   X2cal > X2tab then H0 is rejected.

If   Xcal < X2 tab then H0 is accepted.

8B. A 2 x 2 TABLE (SIMPLIFIED FORM)

In this case, the statistic can be calculated as

with (2 - 1) (2 - 1) = 1 degree of freedom.

 

8C. YATE'S CORRECTION

 

Due to one degree of freedom one of the four cell frequencies can be arbitrarily given if the row and column totals should remain fixed. Hence Yate has suggested the following correction in calculating the chi-square statistic.

 

If                     ad> be, reduce a and d by 0.5 and increase b and c by 0.5,

If                     ad < be, increase a and d by 0.5 and reduce b and c by 0.5.

Example 14. Fit a Poisson distribution to the following data and test the goodness of fit.

Solution. Here mean,

Expected frequencies are obtained by

X = 0, 1, 2, 3

Therefore the fitted distribution is

Let H0 : Poisson distribution is a good ftt to tlie above data .

Statistics :

=          1.41

 

Let       α = 0.05

Degrees of freedom = 5 - 1 - 1 = 3

(1 d.f. lost due to ∑Oi = ∑Ei

1 d.f.  lost due to estimate of mean)

Critical value is x = 7.82

Since                                      x < 7.82

 

::::::> H0 is accepted

 

::::::> Poisson distribution is a good fit to the given data.

 

Example 15. Fit a binomial distribution to the following data and test the goodness of fit.

q= 0.6343

The expected frequencies of the fitted binomial distribution can be calculated from

70 (0.6343 + 0.3657)5

Hence we obtain

Let H0 : Binomial distribution is a good fit to the above data.

Let us pool the last two expected values (which are less than 5) so that the pooled values in this case are 4 and 4 respectively for observed and expected values.

 

Now d.f. = 6 - 1 - 3 - 1 = 1

(1 d.f.  lost due to ∑Oi = ∑Ei

3 d.f.  lost due to estimate of p, q and mean

1 d.f.  lost due to pooling of two expected values)

 

Let α= 0.05, then = 3.84

Also

ð H0 is accepted

 

ð Binomial distribution is a good fit to the above data.

 

Example 16. The results of polls conducted 2 weeks and 4 weeks before an election, are shown in the following table.

Use the 0.05 level of significance to test whether there has been a change in opinion between the two polls.

 

Solution.

Let       H0 : Opinion does not change between the two polls.

H1 : Opinion changes between the two polls.

 

Since degree of freedom= (2- 1) (2- 1) = 1, we have to use Yate's corrected chi-square statistic.

Here N = 400, R1 = 211 , R2 = 189, c1 = c2 = 200

 

=(2600 – 200)2/211*189*100

=1.44

Also   

::::> H0is accepted

 

::::> Opinion does not change between the two polls.

 

Example 17. A random sample of 220 students in a college were asked to give opinion in terms of yes or no about the winning of their college cricket team in a tournament. The following data are collected :

Test whether there is any association between opinion and class in college [use 5% level of significance].

 

Solution. We display the contingency table with both observed frequencies and expected frequencies:

H0 : There is a association between opinion and class in the college.

H1 : There is no association between opinion and class in the college.

 

Here degrees of freedom = (3 - 1) (2 - 1) = 2

= 22.32

 

Since calculated x2 > critical value

=> H0 is rejected

 

=> H1 can be accepted, i.e., there is no association between opinion and class in the college.

PROBLEMS

(Testing of Mean/Difference of Means)

 

1. The manufacturer of television tubes knows from the past experience that the average life of tube is 2000 hrs. with a s.d. of 200 hrs. A sample of 100 tubes has an average life of 1950 hrs. Test at the 0.01 level of significance to see if this sample came from a normal population of mean 2000 hrs.

 

2. The mean lifetime of 100 electric bulbs produced by a manufacturing company is estimated to be 1570 hrs with a s.d. of 120 hrs. If µ is the mean lifetime of all the bulbs produced by the company, test the hypothesis µ= 1600 hrs, against the alternative hypothesis µ ≠ 1600 hrs, using 5% level of significance.

 

3. A sample of 400 students is found to have a mean height of 171.38 ems. Can it be reasonably regarded as a sample from a large population with mean height 171.17 cm and s.d. 3.30 cms ?

 

4. The mean weight of a random sample of size 100 from a student's population is 65.8 kgs and the standard deviation is 4 kgs. Test at 5% level of significance that the student's population weight is below 72 kgs.

 

5. The sales manager of a large company conducted a sample survey in states A and B taking 400 sample salesman in each case. The results were :

Test whether the average sales is  the same in the two states at 1% level of significance.

6. The mean yield of sunflower seeds from a district A was 200 lbs with s.d. = 10 lbs per acre from a sample of 100 plots. In another district B, the mean yield was 210 lbs with s.d. = 12 lbs from a sample of 120 plots. Assuming that the s.d. of yield in the entire state was 12 lbs, test whether there is any significant difference between the mean yields of crops in the two districts (use 1% level of significance).

 

7. An investigation of two kinds of machines in a laboratory showed that 52 failures of the first kind of machine took on the average 74 minutes to repair with a standard deviation of 15 minutes, while 68 failures of the second kind of machine took on the average 92 minutes to repair with a standard deviation of 20 minutes. Test the hypothesis that on the average it takes an equal amount of time to repair either kind of machines.

 

8. The percentage of carbon content of a certain variety of steel has a standard specification 0.05. For 15 samples of steel the percentage of carbon content were found to have an average 0.0482 and standard deviation 0.0012. Do these data reasonably conform to the standard specification ? (Assume that the population of percentages of carbon content is normal)

[Given P (|t| > 4.819) < 0.001 for 11 d.f.]

 

9. The heights of 10 residents of a given locality are found to be 70, 68, 62, 68, 61, 68, 69, 65, 64 and 66 inches. Is it reasonable to believe that the average height is greater than 64 inches ? [Use 5% level of significance).

 

10. A fertilizer machine is set to give 12 kg of nitrate for every quintal bag of fertilizer. Ten 100 kg bags are examined. The percentages of nitrate are as follows :

 

11,       14,      13,       12,       13,       12,      13,      14,       11,       12.

Is there any reason to believe that the machine is defective? (Use 5% level of significance).

11. A drug was administered to 10 patients, and the increments in their blood pressure were recorded to be

6, 3, -2, 4, -3, 4, 6, 0, 0, 2.

Is it reasonable to believe that the drug has no effect on change of blood pressure ? (Use 5% level of significance).

 

12. A sample of size 10 is drawn from each of two normal populations having the same variance which is unknown. If the mean and variance of the sample from the first population are 7 and 26 and those of the sample from the second population are 4 and 10, test at 5% significance level if the two populations have the same mean.

 

13. The sales data of an item in six shops before and after a special promotional campaign are as under :

Can the campaign be judged to be a success ? (Use 5% level of significance)

 

Testing of Proportion/Difference of Proportions

 

14. In a sample of 400 parts manufactured by a factory, the number of defective parts was found to be 30. The company however claims that only 5% of their product is defective. Is the claim tenable ?

 

15. In a sample of 500 people in a town, 280 are tea drinkers and the rest are coffee drinkers. Can we assume that both coffee and tea are equally popular in the town at 1% level of significance ?

 

16. A sample survey results show that out 9f 800 literate people 480 are employed, whereas out of 600 illiterate people only 350 are employed. Can the difference between two proportions of employed persons be ascribed due to sampling fluctuations ?

 

17. In a sample of 600 students of a certain college, 400 are found to use dot pens. In another college from a sample of 900 students 450 were found to use dot pens. Test whether the two colleges are significantly different with respect to the habit of using dot pens. (Use 5% level of significance).

 

18. In a certain city A, 100 men in a sample of 400 are found to be smokers. In another city B, 300 men in a sample of 800 are found to be smokers. Does this indicate that there is a greater proportion of smokers in B than A ?

 

19. A transportation company claims that only 7% of all lost luggage is never found. If in a random sample, 18 of 200 pieces of lost luggage are not found, test the null hypothesis p = 0.07 against the alternative hypothesis p > 0.07 at 5% level of significance.

 

(Testing of Variances/Chi-square and F-Tests)

 

20. Weights (in kg.) of 10 students are given below :

38, 40, 45, 53, 47, 43, 55, 48, 52, 49.

Can we say that the variance of the distribution of weights of all students from which the above sample of 10 students was drawn, is equal to 20 square kg. ?

 

21. A random sample of size 20 from a normal population -gives a sample mean of 42 and a sample standard deviation of 6. Test the hypothesis that the population s.d. is 9.

 

22. If 10 determinations of the specific heat of iron have a standard deviation of 0.0075, test the null hypothesis σ= 0.011 for such determinations. Use the alternative hypothesis σ ≠ 0.011 at 5% level of significance.

 

23. Two random samples are drawn from two populations and the following results were obtained :

Sample I          16        17       18       19       20        21        22        24        26       27

Sample II        19        22        23        25        26        28        29       30         31       32        35    36

 

Find the variances of the two samples and test whether the two populations have the same variance (use 5% level of significance).

 

24. The following results were obtained from two independent random samples :

Test whether the two samples may be regarded as drawn from the same normal population (use 5% level of significance).

 

(Chi-square Test/Goodness of Fit Test)

 

25. The number of road accidents per week in a certain area were as follows

12, 8, 20, 2, 14, 10, 15, 6, 9, 4.

Are these frequencies in agreement with the belief that accident conditions were the same during the 10-week period ?

 

26. A chemical extraction plant processes sea water to collect sodium chloride and magnesium. It is known that sea water contains sodium chloride, magnesium and other elements in the ratio of 62 : 4 : 34. A sample of 200 tons of sea water has resulted in 130 tons of sodium chloride and 6 tons of magnesium. Are these data consistent with the known composition of sea water at 5% level ?

 

27. The following table gives the number of accounting clerks committing errors and not committing errors among trained and untrained clerks working in an organization:

Test the effectiveness of training in preventing the errors.

 

ANSWERS

 

  1. 1.     Two tailed test, Zcal = - 2.25, accept H0 .

 

  1. Zcal = - 2.5, reject H0.

 

  1. Two tailed test, Zcal = 1.27, accept H0 at 5% level of significance.

 

  1. Zcal = - 15.5, accept H1 : µ< 72 kg.

 

  1. Two tailed test, Zcal = 8.82, H1 : H1 : µ1= µ 2 is rejected.

 

  1. Two tailed test, Zcal = - 6.15, reject H0.

 

  1. Two tailed test, Zcal = - 5.63, reject H0 at 5% level of significance.

 

  1. Two tailed test, tcal = - 5.81, reject H0.

 

  1. Right tailed test, tcal = 0.72. Average height is not greater than 64 inches.

 

  1. 10.  Two tailed test, tcal = 1.46, accept H0.

 

  1. Drug has no effect, tcal = 0.67.

 

  1. tcal = 1.58, accept H0 : µ1= µ2.

 

  1. Left tailed test, tcal = - 2.78, reject H0 : µ1= µ2. (mean sales are same).

 

  1. No. Zcal = 2.29, H0: p = 0.05, H1 : p > 0.05.

 

  1. No. |Zcal |= 2.70 > 2.58.

 

  1. Zcal = 0.6415, two tailed test, accept H0 : P1 = P2 claim is correct.

 

  1. Zcal = 6.38, reject H0 : P1 = P2, two tailed test.

 

  1. | Zcal| = 4.33, H0 : P1 = P2 is rejected at 5% level of significance.

 

  1. Zcal = 1.11, accept H0.

 

  1.   = 14, accept H0: σ2 = 20(right tailed test) at 5% level.

 

  1.  = 8.89, accept H0 : σ = 6 (right tailed test) at 5% level.

 

  1.  = 4.18, accept H0.

 

  1.  Fcal = 27.1114 = 1.94, d.f. = (11, 9), accept H0:  σ 12= σ22 , right tailed test.

 

  1.  H0 : σ 12= σ22 L Fcal = 2.068, H0 accepted (H1 : σ 12> σ22).

            But H0: µ1= µ2, is rejected against H1 : µ1 ≠ µ2.

            So the samples do not come from the same population.

 

25. = 26.6, claim is rejected at 5% level of significance.

26. = 1.025, claim is accepted.

27. = 8.7147, H1 : claim is correct, is accepted at 5% level and df = 1.