# Chi Square Test of Independence with MS Excel

*test of independence Chi-square is used to determine the relationship between two categorical variables. Examples of pairs of categorical variables are: marital status vs. The level of employment of the respondents, Breed vs. The profession of the owner, the Level of s/n vs. Specialization engineer etc. When calculating tests of independence tested the hypothesis that between the variables there is no connection. Calculations will be done using MS EXCEL 2010 ХИ2.TEST() and the usual formulas.*

Suppose we have *fetch* of data representing a result of a survey of 500 people. People were asked 2 questions: about their marital status (married, civil marriage, are not in a relationship) and their level of employment (full time, part-time employment, unemployed, in the household, in retirement, studying). All the answers placed in a table:

This table is called a *contingency table* (or factor table, eng. Contingency table). The elements at the intersection of table rows and columns is commonly referred to O_{ij} (from the English. Observed, i.e. the observed, actual frequency).

We are interested in the question "does the marital status on Employment?", i.e. is there dependence between the two classification methods *fetch*?

*inference* of this type usually take that *null hypothesis* argues about the lack of subject classification.

Consider the limiting cases. An example of the complete dependence of two categorical variables here is a result of the survey:

In this case, marital status uniquely identifies a job (see the sample file sheet explanation). Conversely, the example of full independence is another result of the survey:

Please note, what percent of employment in this case does not depend on marital status (same for married and not married). This coincides precisely with the phrase *null hypothesis*. If *null hypothesis* fair, then the poll results would have been so allocated in the table that the percentage would be the same regardless of marital status. Using this, we calculate the results of the survey that correspond to *null hypothesis* (see the sample file sheet Example).

we First calculate the probabilities that the element *fetch* would have some employment (see column u_{i}):

where *with* – the number of columns (columns) equal to the number of levels of the variable "marital status".

Then compute the estimated probability that the element *fetch* will have a certain marital status (see a line v_{j}).

where *r* – the number of rows (rows) equal to the number of levels of the variable "Employment".

the Theoretical frequency for each cell E_{ij} (from the English. Expected, i.e. the expected frequency) in the case of independence of variables is calculated by the formula:

E_{ij}=n* u_{i}* v_{j}

you Know, the statistics of X^{2}_{0} for large n is approximately ХИ2-distribution (r-1)(c-1) degrees of freedom (df – degrees of freedom):

**note**: The stats above at C=1 is used to calculate *the goodness of fit Pearson CHI-square* (see article hypothesis testing with Chi-square Pearson in MS EXCEL).

If the computed *fetch* the value of this statistic is "too large" (greater than the threshold), it is *null hypothesis* is rejected. The threshold value is computed based on the significance level, for example, using the formula =ХИ2.OBR.PH(0,05, df).

**note**: *significance Level* is usually taken equal to 0,1, 0,05, 0,01.

*testing of the hypothesis* it is also convenient to calculate p value, which we compare with the *significance level*. the *p**value* is calculated using the ХИ2-distribution (r-1)*(c-1)=df degrees of freedom.

If the probability that a random variable having ХИ2-distribution (r-1)(c-1) *degrees of freedom* will take a value greater than the calculated statistics X^{2}_{0}, i.e. P{X^{2}_{(r-1)*(c-1)}>,X^{2}_{0}}, *significance level*, *null hypothesis* is rejected.

In MS EXCEL *p-value* can be calculated using the formula =ХИ2.DIS.Pkh(X^{2}_{0},df), of course, calculating just before the value of statistics X^{2}_{0} (this is done in example file). However, it's best to use ХИ2.TEST(). As arguments of this function specify the reference to the ranges containing the actual (Observed) and calculated the theoretical frequencies (Expected).

If *significance level* >, *p**values*, then it means the actual and theoretical frequency, calculated from assumptions of justice *null hypothesis*, are distinctly different. Therefore, *null hypothesis* you want to reject.

Using ХИ2.TEST() allows you to speed up the procedure *hypothesis tests*, because it is not necessary to calculate the value of *statistics*. Now, it is sufficient to compare the result of the function ХИ2.TEST() with the specified *significance level*.

**note**: ХИ2.TEST(), the English name CHISQ.TEST, appeared in MS EXCEL 2010. An earlier version of her ХИ2ТЕСТ() available in MS EXCEL 2007 has the same functionality. But as for ХИ2.TEST(), the theoretical frequencies need to be calculated independently.

**TIP**: About test other types of hypotheses, see statistical hypothesis testing in MS EXCEL.