POL242 LAB MANUAL: EXERCISE 3A
Crosstabulation with Nominal Variables
CONTENTS
PURPOSE
 To learn how to perform a crosstabulation and practice formulating
hypotheses.
 To learn how to interpret crosstabs where at least one variables is
nominal.
 To learn how to measure the strength of the relationship between two
variables.
 To learn how to apply the basic measures of association: phi,
and Cramer's V
top5
MAIN POINTS
Crosstabulation
 Crosstabulation brings together two variables and displays the
relationship between them in a single table. Each column in the crosstab corresponds to a category
of the independent variable, and each row corresponds to a category
in the dependent variable . Hence the dependent variable goes on the left, and the independent variable
goes on the top.
 Each cell represents a unique combination of categories from each of the
variables. For example, in the table below, the cell "G" represents all
the respondents who selected Category I for the independent variable and
Category III for the dependent variable.
 The percentage in each cell is calculated by dividing the number of
respondents in the cell by the total number of respondents for the column. Note: the
cellpercentage
values will be wrong if the missing values are not eliminated. Pay
attention to the percentages in each cell rather than the number (n) of
respondents in
each cell.
 To interpret crosstabs compare the
columnpercentages across the rows to see whether they differ.
For instance, in the table below,
compare the percentage values for cells A, B, and C, then compare D, E,
and F, and finally compare G, H, and I. If the columnpercentages of
cells ABC, and/or DEF, and/or GHI remarkably differ from
one another then you have found a relationship.


INDEPENDENT VARIABLE 
Category I 
Category II 
Category III 
DEPENDENT VARIABLE 
Category I 
A 
B 
C 
Category II 
D 
E 
F 
Category III 
G 
H 
I 
5
Measures of Association:
Nominal dataPhi and Cramer's V
 Measures of Association calculate the strength, and for
ordinal variables the direction, of the
relationship between two variables.
 PHI: Used to measure the strength of the association
between two variables, each of which has only two categories. (It
applies to 2 X 2 nominal tables only.
 CRAMER'S V: Used to measure the strength of the association
between one nominal variable with either another nominal variable, or with
an ordinal variable. Both of the variables can have more than 2
categories. (It applies to either nominal X nominal crosstabs, or ordinal
X nominal crosstabs, with no restriction on the number of categories.)

 Interpreting the value of the Level of Association:

LEVEL OF ASSOCIATION 
Verbal
Description 
COMMENTS 
0.00 
No Relationship 
Knowing the
independent variable does not reduce the number of errors in predicting
the dependent variable at all. 
.00 to .15 
Not generally useful 
Not
acceptable 
.10 to .20 
Weak 
Minimally acceptable 
.20 to .25 
Moderate 
Acceptable 
.25 to .30 
Moderately
Strong 

.30 to .35 
Strong 

.35 to .40 
Very Strong 

.40 to .45 
Worrisomely Strong 
Either an extremely good relationship or
the two variables are measuring the same concept 
.45 to .99 
Redundant 
The two
variables are probably measuring the same concept. 
1.00 
Perfect Relationship. 
If we the
know the independent variable, we can perfectly predict the dependent variable. 
5
INSTRUCTIONS: Crosstabulating
Nominal Data

Select one of the
following Datasets for this exercise: CCFRpop/elites,
Macleans, CRIC, Euro2002.

Enter the Questionnaire for the chosen dataset from the Codebooks link
on the POL242Y website.
 Hypothesize a relationship between two nominal variables in the
dataset.
 For example, using either the Euro2002 Macleans data one might suspect that
support for US leadership in world affairs might be related to perceptions
post 911 motivations of the US. Details on the relevant variables is
offered below.
 Enter Webstats to select your chosen dataset.
 Perform separate trialruns of the Frequency distribution for
each of the variables. Based on the Frequency output, decide how to
recode each variable and identify the missing values.

Set the Analysis in Webstats to Bivariate Crosstabs and hit Proceed.

In 'Step 1,' enter the dependent variable first, followed by the
independent variable. Be sure to put the dependent or independent variable in
the correct entry box. If the variables are placed appropriately, the dependent
variable will appear
on the left of the crosstab and the independent variable will appear across
the top (See diagram above).
 Select "Phi and Cramer's V (PHI)" in the 'Step 2' entry box. This
section lists others measures of association that you can choose tbut since we
are working with nominal data select "Phi and Cramer's V (PHI)".
 Enter any recodes (if necessary) in 'Step 4' and hit Run.
 When evaluating the measures of association, you should look at only Phi
for 2 by 2 tables and Cramer's V for other nominal tables.
 Determine whether there is a relationship between the variables based on
the columnpercentages in the crosstab. Then, looking at the value of
the measure of association, use the above guidelines to determine the strength of the
relationship.
 Repeat the analysis until you find a set of variables with a relationship
that has a moderate degree of association ( >.2).
55
EXAMPLES
Example #1:
Using phi with two dichotomous variables
 Dataset:
 Dependent Variable:
 [Q6AGG] From your point of view, how desirable
is it that the US exert strong leadership in world affairs? Desirable, or
undesirable?

Independent Variable:

[Q28] Do you think the United States
is using these attacks [911] as an excuse to enforce its will around the globe or is
it genuinely seeking to protect itself from further attacks?
 Arrow Diagram:

 X àY
 US motive à
desirability of US leadership
 Syntax:

get file="/homes/josephf/webstats/Euro2002.sav".
missing values Q6AGG (5,6).
missing values Q28 (3 to 6).
recode Q6AGG (1=2)(2=1)
value labels Q6AGG 1 'Undesirable' 2 'Desirable'
crosstab tables=Q6AGG
by Q28
/cells=column count
/statistics=PHI.
5

Syntax Legend:

Missing Values and Recodes:
Determined by the trialrun of the Frequencies output

Crosstab
command: This tells SPSS
which variables to use in the table. enter the
Dependent
Variable
first, then the
Independent.

/cells
= This tells SPSS to put column percentages and frequencies in each
cell
 /statistics:= This is
the section of syntax that needs to be included after the crosstab syntax in
order to calculate the Measures of Association. In this case we want to
calculate Phi.
 Output:
 Q6AGG The US exert strong
leadership in world
by Q28 United States using these attacks to enf
Q28
Page 1 of 1
Count 
Col Pct Excuse t
Seeking
o enforc to prote Row
 1  2  Total
Q6AGG
+++
1  757  731  1488
Undesirable 
51.0  21.8  30.8
+++
2  726  2621  3347
Desirable 
49.0  78.2  69.2
+++
Column
1483 3352 4835
Total
30.7 69.3 100.0
Approximate
Statistic
Value ASE1
Val/ASE0 Significance

  

Phi
.29210
.00000 *1
Cramer's V
.29210
.00000 *1
*1 Pearson chisquare probability
Number of Missing Observations: 1166
5
 Crosstab Legend:
 The most important aspects of the crosstab are highlighted in colour. The number at the top of each cell is the number of cases
(n),
and the number at the bottom of each cell is the columnpercentage. The
columnpercentages are highlighted in
and
. (You may find that the row total figures will slightly differ from the figures
you would get from individual Frequency analyses. This is because some
of the people who responded to one variable did not
respond to the second and hence are eliminated by the missing values
statement. So you can expect that the
number of missing cases will be slightly higher in the crosstab than it would be was
the individual frequency analysis.) But amongst all of these figures
in the output, the most important for the your assessment will be the
columnpercentage for each cell.

Measures of
Association Legend:
 For the present time, the only aspects of the measures of association
output that you have to note are the 'Statistic'
and the 'Value' columns.
 Ignore the 'ASE1' and the 'Val/ASEO' columns.
 While you can ignore the 'Approximate Significance' column for the time
being, this will become important after we learn its meaning later in the
course.
 Interpretation of Crosstab:
 Comparing the columnpercentages for the cells in the 'Desirable' row, we can see
that there is a remarkable difference. The
columnpercentages in the 'Desireable'
row are 49.0% and 78.2%. A difference can
also be observed in the 'Undesireable'
row.
 This indicates that the individuals who believed
that the US was using the 911 attacks as an excuse to exert their will
across the globe are less likely to believe that it is desirable for the US
to be strong leaders in the world.
 However, those individuals who believe that
the US's actions are genuine attempts to prevent further attacks
are more likely to believe that it is desirable for the US to be strong
leaders in the world.
 Since the crosstab is a 2 X 2 table, we know that Phi is the appropriate
measure of association. The value of Phi is .29, which
means that this is a Fairly Strong Association.
 The value of Phi may be negative if the variables are coded in a
particular way. The meaning of a negative measure of association will
be discussed below. For the time being, recognize it means that most
of the cases are on the main diagonal of the table.
 The Phi of .29
allows us to confirm the conclusion drawn in the crosstab
analysis, namely, that
individuals who believe that the US was using the 911 attacks as an excuse
to exert their will across the globe are less likely to believe that it is
desirable for the US to be strong leaders in the world compared to those
individuals who believe that the US was genuinely seeking to protect itself
from other attacks.
5
Example #2:
Cramer's V
 Dataset:
 Dependent Variable:
 [Q36]
Overall, do you approve or disapprove of US military action against
terrorism? Is that strongly or somewhat?

Independent Variable:

[Q18] Overall,
do you feel that the attacks in New York and Washington was an attack on the
United States alone or an attack on western democratic societies, including
Canada?
 Arrow Diagram:
 911 à
approval of US action
5
 Syntax:

get file="/homes/josephf/webstats/Macleans 2001.sav".
missing values Q36 (9).
missing values Q18 (9).
crosstab tables=Q36
by Q18
/cells=column count./statistics=PHI.

Syntax Legend:

Missing Values: Determined by the trialrun of the Frequencies output

Crosstab
function: Enter the Dependent
Variable
first, then the Independent Variable
 /statistics: This is
the section of syntax calculates measures of association, in this case phi
and Cramer's V.
 Output:
 Q36 36. US miltary action
against terrorism
by Q18 18. Attacks in New York and Washington
Q18 Page 1 of 1
Count 
Col Pct Attack o
Attack o
n United n wester Row
 1  2  Total
Q36 +++
1  278  368  646
Strongly approve  47.5  64.7
 56.0
+++
2  191  160  351
Somewhat approve  32.6  28.1
 30.4
+++
3  63  24 
87
Somewhat disappr  10.8  4.2

7.5
+++
4  53  17 
70
Strongly disappr  9.1  3.0

6.1
+++
Column
585 569 1154
Total
50.7 49.3 100.0
Approximate
Statistic
Value ASE1
Val/ASE0 Significance

  

Phi
.21035
.00000 *1
Cramer's V
.21035
.00000 *1
*1 Pearson chisquare probability
Number of Missing Observations: 46
5
 Interpretation of Crosstab:
 Start by looking at the 'Strongly
Approve' row, which represents the people who strongly approved of
US military actions against terrorism. In that row, we can see that
the columnpercentage for those that thought 911 was an attack on the US was
47.5% while it was 64.7% for those individuals who thought it was an attack
on western democracy. This indicates that individuals who thought the
target of the 911 attacks was Western democracy are more likely to strongly
approve of US military action against terrorism. However, when we look at the 'Somewhat
disapprove' and the 'Strongly
disapprove' rows we can see that the trend is reversed. In
these two rows, we can see that there is a higher columnpercentage among
those individuals who thought the attacks targeted the US compared to those
individuals that thought the attacks targeted western democracy. This
indicates that individuals who thought the target was the US are more
likely to disapprove of US military action against terrorism. There
is not much of a difference when we examine the 'Somewhat
Approve' row, which is of little consequence because there is a
sufficient difference in the other three rows to draw a conclusion.
Taken together, the differences in the rows show that an
individual Canadian is more likely to
approve of US military actions against terrorism if they believed
that the 911 attacks were against western democracy in general.
Moreover, an individual is more likely to disapprove of US
military actions against terrorism if they believe that the the US alone was
the target of the 911 attacks.
5
 Interpretation of Cramer's V:
 Since this crosstab involves a nominal variable and an ordinal variable,
the appropriate measure of association is Cramer's V. We do not use
Phi because it is only appropriate for 2 X 2 tables. The Cramer's V value is
.21. Using
the standards above, this relationship is
Moderately Strong.
 We have confirmed that
an individual Canadian is more likely to disapprove of US
military actions against terrorism if they believe that the the US alone was
the target of the 911 attacks. Similarly, a Canadian is more likely to
approve of US military actions against terrorism is they believed
that the 911 attacks were against western democracy in general.
5
QUESTIONS FOR REFLECTION
 Did you discover a relevant
relationship in your crosstab based on the columnpercentages? If so, was it
evident in only one row of the table or in all rows?
 Can you compare the magnitude of a Phivalue from one relationship to the
magnitude of a Cramer's V value for another relationship?
5
DISCUSSION
 When you find a cell that has a substantially different
columnpercentage
from the other cells in that row, there are usually other rows in the table that
also have a
difference. For example, if you find a difference in the columnpercentage
for cells ABC, then there is
probably also a difference between DEF, or GHI. This happens because the
columnpercentage in any given cell influences the columnpercentage of the other
cells in that column.


INDEPENDENT VARIABLE 
Category I 
Category II 
Category III 
DEPENDENT VARIABLE 
Category I 
A 
B 
C 
Category II 
D 
E 
F 
Category III 
G 
H 
I 
 We can compare two values of the same measures readily. But be cautious
about comparing different measures of association to each other. Eg., you
should compare two measures of Phi to one other, but be cautious about
comparing a Phivalue to a Cramer’s V value.
top5