Basic Statistical Analyses
These commands may be used in a procedure to set the type of
analysis to be performed.
These additional commands may be used in a procedure
when the Advanced Analyses module has been installed.
There are many different types of analyses that can be
performed with StatPac. Most commands are easy to use since much of the
required analysis information comes from the default parameter table.
With the exception of the OPTIONS command all other
analysis commands are
mutually exclusive in any given procedure. In other words, a single procedure
cannot perform more than one kind of analysis. A procedure file, however, may
contain many procedures, each performing a different kind of analysis.
The OPTIONS command may be used in any procedure to
override the default values in the parameter table. It is used to control
printing and analysis parameters.
The analysis commands available in StatPac's basic
package are: LIST, FREQUENCIES, DESCRIPTIVE, BREAKDOWN, CROSSTABS, BANNERS,
TTEST, and CORRELATE.
The analysis commands available in the StatPac Advanced
Statistics module are: REGRESS, DISCRIMINANT, PCA, FACTOR, ANOVA, CANONICAL,
CLUSTER, and MAP.
Important User Tip
Any of the analysis commands may be abbreviated by using
only the first two characters of the keyword. For example, FREQUENCIES could
be abbreviated to FR, FACTOR could be abbreviated as FA, and OPTIONS could be
abbreviated as OP.
The LIST command is used to list selected variables in
the data file. The command syntax is:
LIST <Variable list>
If the LIST command does not specify a variable or
variable list, all variables will be listed. When used in this fashion, value
labels will be listed instead of the raw data.
For example, let's say you want to print a report
consisting of only two columns. The first column is variable 7 (AGE) and the
second column is variable 14 (SEX). Either variable numbers or names may be
used to specify the variable list. The command line would be entered as
either:
LIST V7 V14
LI AGE SEX ( LIST may be abbreviated as LI)
The keyword RECORD may be used as part of the variable
list to print the record number as one of the columns. For example, the
following parameter line will produce a report consisting of four columns,
the first column being the sequence of the case in the data file:
LIST RECORD V12 V31 V83
You may specify as many variables to be included in the
report that can be accommodated by the pitch and orientation of the output.
If too many variables are specified, the output will be truncated. Missing
data will be displayed as a series of dashes.
The LIST command is often used as a way to troubleshoot
a procedure that is not working. For example, if the following procedure
didn't work properly, we might try the LIST command to figure out what went
wrong:
STUDY EXAMPLE
COMPUTE (N4.1) AVG = V1 * V2 / 2
DESCRIPTIVE AVG
..
We could replace the DESCRIPTIVE command with the LIST
command and list the relevant variables. Also note that we added the SELECT
command to limit the printout to the first
twentyfive records (e.g., we don't need to list the whole file to find out
why it is not working).
STUDY EXAMPLE
COMPUTE (N4.1) AVG = V1 * V2 / 2
SELECT 125
LIST RECORD V1 V2 AVG
..
Example of a List Printout
To list an openended variable, simply specify it in the
LIST command. The following would list a variable called
"Comment". The COMPUTE line is used to calculate a record number so
it can be included in the printout. The IFTHENSELECT line is used to select
only those who made a comment. The OPTIONS line is used insert a blank line
between each response.
COMPUTE (N5) REC=RECORD
IF Comment <> " " THEN SELECT
LIST Rec Comment
OPTIONS BL=Y
..
The output might look like this:
Example of a Verbatim Listing
Multiple Response & Combining
Variables
There are two options (MR and CB) to control the way
that data gets displayed with the LIST command.
The MR option is used to specify variables you want to
be stacked on top of each other in a single column. The specified variables
will be listed in a single column rather than using multiple columns on the
listing. The variables do not have to be true multiple response variables in
the codebook; you may use the MR option for any variables.
The CB option is used to specify variables that you want
to combine into a single field instead of being treated as individual fields.
For example, if City, State, and Zip were separate variables, you could
display them together using the CB option.
Normally, all variables in a listing would appear
sidebyside. The MR and CB options are used to create an easier to
read format.
For example, suppose variables six, seven, and eight are
being used to hold respondents' verbatim answers to a question on a
restaurant survey: "What three things could we do to improve your
dining experience?" Three A70 variables were used. The
following command would produce a listing of the data in a vertical format.
Up to three lines in the report would be displayed for each respondent. The
SELECT command is used to eliminate subjects who did not answer the question.
IF V6 <> " " THEN SELECT
LIST V6V8
OPTIONS MR=(V6V8)
..
An example of a single record in the printout might look
like this:
Faster service.
Reduce prices.
Greater selection.
If the CB option were used instead of the MR option, all
three responses would be combined into a single field (giving the appearance
that the three responses we part of the same sentence or paragraph.
IF V6 <> " " THEN SELECT
LIST V6V8
OPTIONS CB=(V6V8)
..
The CB option formats the printout so each record in the
listing will use the number of lines that it needs to show the data. The
listing might appear as follows:
Faster service. Reduce prices. Greater selection.
The MR and CB options may be used in conjunction with
each other to produce desired outputs. For the next example, assume the
following variables:
V1 Name
V2 Street_Address
V3 City
V4 State
V5 Zip
V6 Phone_Number
V7 Fax_Number
V8 Email_Address
We might want to stack Name, Address, City, State, and
Zip into a single column on the printout. We might also want to stack the
Phone and Fax numbers into a single column. In the following procedure, the
CB option is used to combine City, State, and Zip into a single field, and
the MR option is used to specify which variables should be displayed in a
vertical column.
LIST V1V8
OPTIONS CB=(V3V5) MR=(V1V5)(V6V7)
..
The output might look like this:
Example of a Listing Using the MR and CB Options
Labeling and Spacing Options
Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E),
the variable name (LB=N), or the variable number (LB=C). Also, LB=0
suppresses labeling, and LB=X suppresses all labeling and page feeds.

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns of the
listing.

Maximum Width

MW

Sets the maximum width (in inches) that will be used for
long alpha variables and multiple response variables.

Blank Line Between Rows

BL

When BL=Y, a blank line will be printed between each row
of the listing. When BL=N, no blank line will be printed.

Maximum Pages

MP

The MP option may be set to the maximum number of pages
that will be printed. Its purpose is to prevent an unintentional listing of
hundreds or even thousands of pages. If MP=0, then the listing will become
as long as necessary to print all the output. If MP is set to any other
number, that will become the maximum number of pages that will be printed.

A frequency analysis is the simplest of all statistical
procedures. It is ideal for data which has been coded into groups or
categories. The coding can be either alpha or numerictype data.
The syntax of the command to run a frequency analysis
is:
FREQUENCIES <Variable list>
For example, to find the percent of males and females in
a sample, you would request a single analysis:
FR SEX ( FREQUENCIES may be abbreviated as
FR)
Several frequency analyses can be requested with a single
command. For example, to get a frequency analysis of SEX (V4), RACE (V5) and
INCOME (V6), the request could be specified in several ways:
FREQUENCIES SEX, RACE, INCOME
FREQUENCIES SEX RACE INCOME
FREQUENCIES V4 V5 V6
FREQUENCIES V4V6
Notice that either the variable name or the variable
number may be specified as part of the variable list.
A frequency analysis may be run on alpha or numerictype
variables. Missing data will be included in the frequencies only if there is
a value label for missing data, (e.g., <BLANK>=No response).
Table Format
Three types of printout formats are built into the
program: expanded, condensed and automatic. The option to control the table
format is:
OPTIONS TF=N
(No table will be printed)
OPTIONS TF=A
(Formatting will be automatic)
OPTIONS TF=E
(Formatting will be expanded)
OPTIONS TF=C
(Formatting will be condensed)
Condensed formatting is especially useful when there are
many unlabeled values. For example, if one of the variables is ID NUMBER,
there are generally no value labels associated with this variable. It is often a
good idea to check the data to be sure that no records were inadvertently
entered twice (i.e. duplicate ID numbers). A condensed frequencies printout
would allow you to quickly determine if any ID NUMBER is specified more than
once. An example of condensed formatting might look like this:
Example of a Compressed Frequencies Printout
Automatic formatting is generally recommended since it
minimizes the amount of paper that will be used. If automatic formatting is
used and there are more than 50 unlabeled categories (no value labels), the printout will automatically be
converted to condensed format. In most cases, this will result in the
expanded format. An example of expanded formatting might look like this:
Example of an Expanded Frequencies Printout
Print Zero Values
Sometimes there may be a category listed in the value
labels that has no accompanying data. For example, nobody in the sample may
be over 40 years old or make over $30,000 a year. Whether or not you want the
label to appear with a count of zero is a matter of preference. If you want
the reader of your report to know that a category was available, you'd
probably want to print zero values (ZV=Y). If you are interested in saving
space, you might want to exclude zero values (ZV=N).
Sort Type & Sort Order
Frequency analyses are often more meaningful when the
output is displayed in sorted order. When working with nominaltype data and
few categories, the order in which categories are presented is not very important.
(e.g. It really doesn't make much difference whether males or females are
listed first.) However, as the number of categories increases, it may
be desirable to list those with the highest count first, followed by those
with lower counts. This would be a sort by frequency of response in
descending order. It would be requested with the following options:
OPTIONS ST=F SO=D (Sort Type by
frequency of response)
(Sort Order is descending)
When data is ordinal, it is more appropriate to present
the output in order defined by the categories themselves. Usually this is the
same as the alpha or numeric code used to represent a category. For example,
take the following two survey questions:
How old are you?
What is your annual income?
A=Under
21
1=Under $10,000
B=2130
2=$10,000$20,000
C=3140
3=$21,000$30,000
D=Over
40
4=Over $30,000
Both questions are ordinal; the first one is coded alpha
and the second is numeric. It would be desirable to have the frequencies
printout appear in ascending order by the code (the same way they are listed
above). The options statement to do this is:
OPTIONS ST=C SO=A
(Sort Type is by category code)
(Sort Order is ascending)
Notice that this type of sort is generally the way the
information would be specified in the value labels. If this is the case,
sorting by category code will have no effect. Sorting by category codes is
useful if you did not enter value labels for the variable.
If no sort type is specified (ST=N), the output will be
displayed in the same order as specified by the value labels. If the value
labels do not contain all the values in the data file (such as mispunched
data), the unlabeled values will appear on the printout in the order that
they are encountered in the data file.
Additionally, a digit may be added as a suffix to the
SO=A or SO=D. It is used to sort the value labels excluding the last one or
more value labels. This is useful when the last value label is an
"other" category, and you want to sort the value labels, but still
leave the "other " as the last row in the report. For example ST=F
SO=D1 would sort the value labels in descending order by frequency, except it
would leave the last value label as the last row regardless of its frequency.
Truncate Labels
Very long value labels may sometimes exceed the space
allocated for them in the printout. In those situations, you may set the
program to either truncate the value labels (TL=Y excludes the ending portion
of the label), or to use multiple lines to print the entire value label (TL=N).
Cumulative
Percents
When the frequency table is printed in expanded format,
you may print or exclude cumulative percents with the CP option. This would
be specified as:
OPTIONS CP=Y (Turn on
cumulative percents)
OPTIONS CP=N (Turn off cumulative
percents)
Confidence Intervals
Confidence intervals for proportions can be requested
with the CI option. For example, to request the 95% confidence intervals, you
would use the option CI=95, and the 99% confidence intervals could be
requested by CI=99. Confidence intervals allow us to estimate the proportions
in the population for each of the response
categories. If repeated samples are taken from the population, we would
expect the category proportions to fall within the confidence intervals. When
confidence intervals are requested, cumulative percents will not be printed
regardless of the setting of the CP option.
Confidence intervals are calculated by first computing
the estimated standard error of the proportion, and then using the t
distribution to find the actual interval. Note that the finite population
correction factor (1n/N) is used to adjust the standard error if the sample
represents a large proportion (say greater than ten percent) of the
population. When the sample is large, use the FP options to specify the
population size (i.e., FP=x, where x is the size of the population). If the
FP option is not specified, no correction will be applied.
Example of an Confidence Intervals Around A Percent
Critical T Probability
After performing a frequency analysis, researchers are
often interested in determining if there is a significant difference between
the various categories. The Chisquare statistic is often used to determine
if the observed frequencies markedly differ from the expected frequencies.
The problem with the Chisquare statistic is that it does not isolate the
significant differences (i.e., it only tells whether or not one exists). StatPac
uses a ttest to compare all possible pairs of categories to determine where
the actual differences lie.
The CT option may be set between 0 and 1. When CT=0, no
ttests will be performed or printed. If CT=1, the tstatistic and
probability will be printed for all possible pairs of categories. A typical
setting for the critical t probability is 5% (CT=.05). In this case, StatPac
will print the tstatistic and twotailed probability for all pairs of
categories that have a probability of p=.05 or less. StatPac uses the
following formula to calculate the tstatistic:
Example of a Critical T Probability Analysis
Percentage Base
The percentage base on a frequency analysis can either
be the number of respondents (N) or the total number of responses. If PB=N,
the denominator for calculating percentages will be the number of
respondents. If PB=R, the denominator will be the total number of responses
for all individuals.
Multiple Response
Surveys often include questions in which the respondent
is asked to make more than one response to a single question. An example of
the kind of question that is appropriate for multiple variable response is:
1. Which of the following services did you use?
(Check all that apply)
__ Counseling
__ Job placement
__ Remedial reading
__ Remedial math
__ Resume writing
The multiple response frequency analysis is used to summarize
these kinds of items. When designing a study that includes this type of
question, each choice is considered as a separate variable. The value labels need to be
specified only for the first variable, but it is fine if they are specified
for all the multiple response variables.
V1 "Services_1" Services Used
1=Counseling
2=Job placement
3=Remedial
reading
4=Remedial math
5=Resume
writing
V2 "Services_2" Services Used
V3 "Services_3" Services Used
V4 "Services_4" Services Used
V5 "Services_5" Services Used
The syntax for the multiple response frequency analysis
is:
FREQUENCIES <Variable list>
OPTIONS MR=Y
In this example, all the variables in the variable list
will be treated as one multiple response variable. Another way to use the MR
option is to respecify the variable numbers (not the variable names) that
should be grouped.
FREQUENCIES <Variable list>
OPTIONS MR=(<Variable list>)
Note that the parentheses are required around the variable
list in the options line. In the above example, the commands would be:
FREQUENCIES V1V5
OPTIONS MR=(V1V5)
The output will contain the counts and percents for each
of the response values. That is, how many times code 1 (counseling) was
chosen for any variable, how many times code 2 (job placement) was chosen for
any variable, etc. In other words, it will print the total number of times
that each response was recorded for variables 1, 2, 3, 4 and 5 combined.
The options line may be used to specify several multiple
response analyses by using additional sets of parentheses in the MR option.
The following commands would perform three different tasks (each one being a
multiple response analysis on a new set of variables).
FREQUENCIES V1V20
OPTIONS MR=(V1V10)(V11V15)(V16V20)
Multiple response may also be used when the
questionnaire limits the choices to less than the number of possible
responses. For example, the following question asks for two responses from
the same value labels list:
17 & 18. Write the numbers of your two favorite foods
from the list below.
_____
_____
1 = Hotdogs
2 = Hamburgers
3 = Fish
4 = Roast Beef
5 = Chicken
6 = Salad
Notice that there are two variables (17 & 18) that
hold the information for this question. Both variables use the same value
labels and the responses to both variables are weighted equally (i.e. the
first one is not more important than the second). Multiple response assumes
that all variables to be analyzed have the same value labels. In this
example, the command would be:
FREQUENCIES V17 V18
OPTIONS MR=(V17 V18)
Example of a Multiple Response Frequency Analysis
Category Creation
The actual categories in the frequency analysis can be
created either from the study design value labels (CC=L) or from the data
itself (CC=D). When the categories are created from the labels, the value
labels themselves will be used to create the categories for the analysis, and
data that does not match up with a value label code will be counted as
missing. That is, mispunched data will be counted as missing. When categories
are created from the data, all data will be considered valid whether or not
there is a value label for it.
One
Analysis
The oneanalysis
option allows you to print frequency analyses for several variables on one
page. This option is especially useful for management reporting when the
information needs to be condensed and concise.
All the variables specified with the OA option must have
the same value labels. An example might be a series of Yes/No questions or
Likert scale items. The important point is that each variable has exactly the
same value labels as the other variables. For example, suppose that variables
2130 are ten items asking the respondents to rate the item as low, medium or
high. The following commands would produce a one page summary of all ten
items:
FREQUENCIES V21V30
OPTIONS OA=Y
The one analysis option is limited by the number of
characters that can be printed on a line (i.e., by the pitch and carriage
width of the printer). If there are too many different value labels, they
will not be able to fit on one line and the analysis will be skipped. If this
should happen, try rerunning the analysis using a compressed pitch. As a
general rule, each value label will require ten spaces on the output.
Example of a OneAnalysis Printout
The OA option is used in frequency analyses to summarize
the frequencies of several variables that all contain the same value labels.
Note the difference between the OA and MR options. With the multiple response
option (MR), the items are treated as if they are a single variable. The one
analysis option (OA), however, treats each item as a separate analysis. The
results, however, will be summarized on one page.
When the MR option is used in conjunction with the OA
option, the variables in the MR options list will be treated as multiple
response variables. This makes it easy to create nets in a frequencies with
the OA=Y option.
For example, if V1V20 are the twenty variables, we
could add a net by first creating a duplicate copy of V1 with a new name, and
then including the MR option to combine the variables to make the net. The
net will be the sum of the counts of the individual variables that make up
the MR variable list.
STUDY Yourstudy
NEW (N1) "GrandTotal"
COMPUTE GrandTotal = V1
LABELS GrandTotal (1=Agree)(2=Neutral)(3=Disagree)
FREQ GrandTotal V2V20 V1V20
OPTIONS OA=Y MR=(GrandTotal V2V20)
..
The results might look like this:
Agree
Neutral Disagree
GrandTotal

 
Variable
1

 
Variable 2

 
Variable
3

 
etc.
The following is another example shows how you can use
MR option in conjunction with the OA option to create complex nets. It also
shows how the reserved word "RECORD" can be used to create blank
lines in the report.
Suppose we are conducting of survey of government
policies. We have nine "Agree/Disagree" items coded as 1=Agree and
2=Disagree. The first three items deal with "Social Policy"; the
next three items with "Foreign Policy"; and the last three items
with "Fiscal Policy". We would like to produce a report that looks
something like this:
Peoples Attitudes Towards Government Policies
(N=x)
Agree
Disagree
OVERALL


SOCIAL
POLICY
 
Item
1
 
Item
2
 
Item
3
 
FOREIGN POLICY
 
Item
4
 
Item
5
 
Item
6
 
FISCAL
POLICY 

Item
7
 
Item
8
 
Item
9
 
There are four different nets in this report. The
OVERALL net includes all variables. The SOCIAL POLICY net includes the first
three items, the FOREIGN POLICY net the next three items, and the FISCAL
POLICY net the last three items. For this example Items 19 are stored in
variables 1 to 9.
The spacing (indentation) in this example is used only
to make the procedure easier to understand. It is not necessary to use the
this type of spacing in your procedures.
STUDY Yourstudy
HEADING Peoples Attitudes Towards Government Policies
COMPUTE (N1) OVERALL=V1
COMPUTE (N1) SOCIAL POLICY=V1
COMPUTE (N1) FOREIGN POLICY=V4
COMPUTE (N1) FISCAL POLICY=V7
LABELS OVERALL
SOCIAL POLICY
FOREIGN POLICY
FISCAL POLICY
(1=Agree)(2=Disagree)
FREQ OVERALL
V2V9
Produces overall net
RECORD
Produces a blank line
SOCIAL POLICY
V2V3 Produces
social policy net
V1V3
Produces 3 social policy variables
RECORD
Produces
a blank line
FOREIGN POLICY
V5V6 Produces foreign policy net
V4V6
Produces 3 foreign policy variables
RECORD
Produces a blank line
FISCAL POLICY V8V9
Produces fiscal policy net
V7V9
Produces 3 fiscal policy variables
OPTIONS SV=N OA=Y
MR=(OVERALL V2V9)
(SOCIAL POLICY V2V3)
(FOREIGN
POLICY V5V6)
(FISCAL POLICY V8V9)
..
Special
Value Label HIDE
When performing a frequencies with the OA option, it is
often desirable to only display some of the response categories. Recoding
undesirable categories to missing is one method to exclude it from the table.
This will eliminate the column from the table and from any calculations of
percentages on the table.
For example, assume the following counts for V1:
1
2
3
Agree
Neutral
Disagree No
Response Total N
30
20
40
10
100
If PB=N, (denominator equals number of respondents), the
percents will be:
Agree
Neutral
Disagree
30%
20%
40%
If PB=R, (denominator equals number of responses), the
percents will be:
Agree
Neutral
Disagree
30/90=33%
20/90=22% 40/90=44%
We could use the following RECODE command to eliminate
the "Neutral" category from the table:
RECODE V1 (2= )
If PB=N, the percents will still be based on a
denominator of 100. If however, PB=R, then the percents will be based on a
denominator of 70 (30+40):
Agree
Disagree
30/70=43%
40/70=57%
The special value label "HIDE" may be used to
suppress printing of a value label without reducing the denominator for the
percents calculations. The following LABELS command could be used to
eliminate the "Neutral" category from the table, while still
including the "Neutral" count in the denominator:
LABELS V1 (1=Agree)(2=Hide)(3=Disagree)
Any row or column that has a value label of
"HIDE" will not be printed, but it will be included in the percent
calculations when PB=R. Note that the percentages are based on the counts for
all value labels (including the "Neutral" category), even though
all the value labels are not displayed in the table.
Agree
Disagree
30/90=33%
40/90=44%
If you only wanted the "Agree's" to show in
the table, you could use the following statements in the procedure. The
percentages in the table would still be based on 90:
LABELS V1 (1=Agree)(2=Hide)(3=Hide)
OPTIONS PB=R
Print Format
The results from the one analysis option may be printed
as row percents (PF=R), as counts (PF=N), or both (PF=NR). When row percents
are requested, the denominator used to calculate the percents will be the
number of nonmissing responses for that particular item. That is, when there
is missing data, the number of valid
responses to a particular question may be different than the number of valid
responses for any of the other questions.
Print Total
The PT option may be used in conjunction with the OA
(oneanalysis) option to print the total N for each variable. When there is
considerable missing data, this option is highly recommended since each of
the variables may be using a different N (number of valid responses). For
example, the following commands would produce a onepage report summarizing
variables 21 to 30. An additional column will be included on the output that
lists the number of valid cases for each of the variables.
FREQUENCIES V21V30
OPTIONS OA=Y PT=Y
Sort Variables
When performing a frequency analysis with the OA=Y
option, you can sort the variables by the contents of the first column of the
results. The SV (sort variables) option may be set to "N" for no
sort, "A" to sort in ascending order, or "D" to sort in
descending order. When no sort is specified, the variables will be listed in
the order that they appear in the analysis command variable list. The SV
option is applicable only when the OA=Y option is specified. However, if the
MR option is also specified, the SV option should be set to N.
Additionally, a digit may be added as a suffix to the
SV=A or SV=D. It is used to sort the variables excluding the last one or more
variables when the OA=Y option is specified. This is useful when the last
variable is an "other" variable, and you want to sort the
variables, but still leave the "other " as the last variable. For
example SV=D1 would sort the variables in descending order, except it would
leave the last variable as the last row regardless of its value.
Supplemental Heading
The supplemental heading will only be printed when the
OA=Y option is specified. It is a line of text that will appear before the
first row of the table. The supplemental heading may contain any text and
should be enclosed in quotes. When the pounds symbol is used in the
supplemental heading, it will be printed as the number of cases. The SH
option is usually used to indicate who is included in the table. The
following is an example of a supplemental heading:
OPTIONS SH="TOTAL RESPONDENTS = #"
Minimum Denominator
Percentages can be misleading if they are based on a
small denominator. The MD option may be used to suppress the printing of
percentages that are based on a small denominator. The MD option sets the
minimum denominator that StatPac will use for calculating percents. For
example, if MD=5, StatPac will calculate percentages if the denominator is
greater than or equal to 5. If a denominator were less than 5, StatPac would
print dashes instead of the percent. Valid values for MD are between 0 and
100. If MD=0, all percentages will be printed.
Print Mean
The mean average is generally not calculated for a
frequency because it involves the assumption of interval data. However, there
are some situations where you may want to display the mean as part of a frequency
analysis. The ME option may be used to request the mean (and standard
deviation). When ME=N, no mean will be printed. If ME=Y, the mean will be
printed, and if ME=S, both the mean and standard deviation will be printed.
When used in conjunction with the OA=Y option, a separate mean will be
printed for each variable.
Mean Position
When the ME option is used with the OA=Y option, the
means (and standard deviations) can be printed as the first or last column.
If MP=F, the means will be printed as the first column, and when MP=L, they
will be printed in the last column. When means are printed in the first
column (MP=F), and the SV option is used to sort the variables, they will be
sorted by the means instead of the percents.
Labeling and Spacing Options
Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label
(LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable labels
on the stub when OA=Y.

Truncate Labels

TL

Sets long value labels to be truncated when TL=Y.

Exact Width

EW

When OA=Y and EW=Y, the labeling width for the stub will
be exactly what is specified with the LW option. When EW=N, the width of
the stub will selfadjust based on the length of the stub labels.

Column Width

CW

Sets the minimum width of the columns (in inches) when
OA=Y.

Column Spacing

CS

Sets the spacing (in inches) between the columns of the
listing when OA=Y.

Extra Spacing

ES

When ES=Y, a blank line will be printed below the table
headings When ES=N, no blank line will be printed.

Blank Line Between Rows

BL

Sets the number of blank lines between rows when OA=Y.

Print Percent Symbol

PP

Sets whether percentage symbols will be shown.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Codes

PC

Sets whether the code (to the left of the equals symbol)
will be shown with the value labels.

OpenEnded Response Coding
It is often useful to code openended responses into response
categories in order to perform a frequency analysis or crosstabs. The
FREQUENCIES command with the OE option
allows you to examine and code openended alpha variables.
Openended response coding is requested by performing a
frequency analysis on the variables containing the verbatim text and setting
the option OE=Y. Do not use an IFTHEN SELECT command in the
same procedure.
FREQ Comment
OPTIONS OE=Y
..
When a verbatim response is held in more that one
variable, it is not necessary to specify the MR (multiple response) option.
All variables listed in the Frequencies command will be considered to be part
of the same verbatim comment. The following procedure says to begin an
openended response coding session on
variables one through three. Note that the text for all three variables will
be displayed at the same time during the coding session.
FREQ V1  V3
OPTIONS OE=Y
..
Each time you complete a coding session, StatPac will
create a new study and data file called STATPACVERBATIM. The STATPACVERBATIM
study contains the coded verbatim data, and the frequency analysis will be
performed on this file (i.e., the coded data). Your original study and data
file are not affected by the coding process. All of the coded information is
stored in the STATPACVERBATIM file. In order to use this coded information
in future analyses, the STATPACVERBATIM files must be merged with your study
and data files using the MERGE command.
Occasionally, you may start a coding session, and for
one reason or another, not be able to finish. You can quit the current coding
session and continue at a future time. To continue with a previously
unfinished coding session (i.e., one that you started coding before, but did
not finish), simply run the procedure again. StatPac will detect the
existence of a partially completed STATPACVERBATIM file, and ask if you want
to continue with the previous verbatim coding, or delete the existing
verbatim coding and begin a new coding session.
Click on the Continue Previous Session button to
continue with the previous coding, or the Start New Coding Session button to
delete the existing STATPACVERBATIM files. If your intent is to continue
with a previous session, you can bypass this question by changing the OE
option from OE=Y to OE=C (continuation).
Verbatim Blaster
StatPac for Windows has the Verbatim Blaster module
built in and it will automatically precode all openended responses.
Verbatim Blaster processes openended responses in two
steps. The first step is called precoding, and the second step is called final coding. You may
modify or interact with the coding process at either or both steps.
The precoding step will be performed first. Verbatim
Blaster will read the file and count all the unique words that occur in the
text. It attempts to combine variations of the same word into a single root
word. For example, it would attempt to combine singulars and plurals,
different tenses, prefixes and suffixes into a single root word. The
precoding is not perfect, but it will catch nearly all variations on each
root word.
The result of the precoding will be to present you with
a list of root words along with the number and percent of respondents who
used each word. The list will be initially sorted in descending order
by frequency of occurrence in the text. Thus, the words at the top of the
list appeared most often in the text. If a respondent used the same word more
than once, it will only be counted as one occurrence in calculating the
frequencies.
The precoding screen appears:
Step 1  Examine Words
The most important task in Step 1 (precoding) is to familiarize yourself with the basic
content of the verbatim comments.
When no words are selected (highlighted) in the word
list, the Previous and Next buttons will show the previous and next records.
A more important feature is the ability to examine the
context in which specific words are used. First select the word or words you
are interested in exploring by clicking on those words in the word list. Then
the Previous and Next buttons can be used to find the previous and next
comments that used any of the selected words. The selected word(s) will be
shown in red to draw your eyes to that portion of the comment.
Clicking on a word that is not already selected will
select the word. Clicking on a word that is already selected will deselect
the word.
Join Word Variations
During precoding, one of the major functions of
Verbatim Blaster is to combine all the
variations of each root word. It is generally a good idea to review the words
and to combine any variations that Verbatim Blaster may have missed. It will
usually not miss any, but it's still a good idea to check.
In the Type of Sort window, click on Alphabetic. Then
use scroll bar to scroll through the list. It will be easy to spot variations
of the same root word that were not combined since they will appear next to
each other in the alphabetically sorted list.
If you should find two variations of the same root word,
you will want to join them together so Verbatim Blaster treats them as a
single unique word. First select the words to be joined by clicking once on
each word. The selected words will be highlighted. You may join more than two
words at once by clicking on each word. Then click the Join button. The words
will be joined and the count and percent will be modified to reflect the new
values. The important thing is to join words that are variations of the same
root word or words that have the same meaning.
You may join words during precoding or final coding. However, joining word variations is easiest during
the precoding process because of the ability to perform an alphabetical
sort.
When you have finished joining words, click on Frequency
in the Type of Sort window to resort the list in descending order by
frequency of occurrence. During the final coding process, it is usually most
convenient to have the most common responses appear near the top of the
response category list.
Delete and Exclude Words
There are many words that add little meaning to a
sentence. Words like of, the, for, by, this,
and hundreds of others don't add much to the meaning of a respondent's
statement. These words could be excluded from a sentence without
substantially distracting from it's meaning.
Modifiers are words that are used to describe the
quantity or magnitude of the word or phrase that follows the modifier. They
are usually adverbs. Examples are usually, mostly, and greatly.
Modifiers do add meaning to a sentence, but they are generally not helpful in
determining the major topic of a sentence.
Verbatim Blaster maintains a list of exclusion/modifier
words in a file called EXCLUDE.TXT. This file is an ASCII text file and may
be edited with any word processor or text editor. It contains an alphabetical
listing of words that do not help identify the primary topic of a sentence.
The first thing Verbatim Blaster does when you ask it to
analyze a text file is to eliminate the exclusion/modifier words. It doesn't
really eliminate the words; it just pretends they're not there.
The exclusion/modifier list distributed with Verbatim
Blaster is fairly complete. However, specific applications may require that
you add new words to this list. You can use any text editor to modify the
EXCLUDE.TXT file. Words added with a text editor will be sorted into
alphabetic order the next time that Verbatim Blaster is run.
You can also add exclusion/modifier words to this file
during precoding. If you see a word that doesn't add substantial
meaning to sentence, add it to the exclusion/modifier list by clicking on the
word to select it, and then clicking on the Exclude button. The word(s) will
be added to the EXCLUDE.TXT file and will be excluded from all future
openended coding session.
Deleting words is different that excluding words.
Deleting a word with the delete button will delete the word from the current
session only. Future coding sessions with other verbatim text would show the
word. To delete a word, click on the word to highlight it and click the
Delete button.
Set the Minimum Percent to Display
A typical text file might contain 400 to 500 unique root
words (even after combining all the variations of the same words and
eliminating the exclusion words). Usually, we wouldn't want to look at a list
of this size. Instead, we're most often interested in responses that were
made by more than one respondent. Verbatim Blaster lets you adjust the size
of the word list by specifying a minimum percent. This is the minimum
proportion of respondents that used a word. For example, if you set the
minimum percent equal to five, Verbatim Blaster will display words that were
used by at least five percent of the respondents. Words that were used by
less than five percent of the respondents would be hidden from your view.
At any time during precoding, you may change the
minimum percent. When the minimum percent is set to zero, all words will be
displayed. To change the minimum percent, simply type the new minimum percent
and press enter.
Step 2  Select Words for Categories
The purpose of precoding is to identify the important
words that are mentioned by respondents. "Important" is, of course,
a subjective decision. The minimum percent feature will narrow the number of
words to a manageable list. Step 2 is to select those words that seem to hit
upon the key concepts in respondents' answers. Sometimes these will be easy
to identify, and other times they won't. If the survey question was extremely
specific, it will probably be easy to identify the key concept words, and if
the survey question was quite general, it might be extremely difficult to
identify the key concepts.
"Select Words for Categories" refers to
identifying the key concept words that will be carried forward to the final
coding process. To select a word, or to
deselect a word that has already been selected, click on it in the word list.
If you select any words, only those words will be
carried forward to the final coding process, and words not selected will be
excluded. If you do not select any words, all displayed words will be carried
forward to final coding.
Step 3  Final Coding
The final coding process is where you refine the coding
and the response category labels. The words selected in the precoding
process provide the foundation for the final response categories. These are the labels you give to the key
concepts. The initial response categories will be the words that were
selected during precoding.
The final coding process involves reviewing the actual
openended responses for each respondent, and using your understanding of the
comment(s) to refine the response category labels. A response category label
can be changed at any time. To change the text in a response category label,
double click on the label.
The text window (respondent's verbatim text) will
appear at the left of the screen, and the response categories window
will appear on the right. There will be an arrow to the left of all response
categories that were mentioned by the respondent, and the key words in the
text will be highlighted.
Select and Deselect a Response Category
Use the mouse to select and deselect a response
category. When a response category is selected, an arrow will appear to
the left of that category. This means that the current respondent made a
comment related to that response category. If a response category is not
selected, there will be no arrow. Clicking to the left of the response
category label (in the small area reserved for the arrows) will select or
deselect that response category.
Change Records
A record is the same as a respondent. Thus, when
we say changing records, it simply means displaying a different
respondent's answer. There are two ways to change records.
The first way is used to show a specific desired record.
Click on the record number shown on the top left of the Context window. After
clicking on the record number, change it to the desired record and press
enter.
The second way is to use the Previous and Next buttons
to change to the previous and next records. When no response category labels are
highlighted, the Previous and Next buttons will advance to the previous or
next record numbers. When one or more response category labels are
highlighted, the previous or next record that has been precoded into that
category will be displayed. The current record number will be displayed above
and to the left side of the context windows.
One method of performing the final coding would involve
repeatedly click the Next button to review the coding beginning with the
first respondent and going to the last respondent. Verbatim Blaster will skip
over respondent's who did not make a comment.
Another method of precoding would be to examine the
comments for each response category. First, highlight the response category
(or categories) you want to search for. Then click the Next button to search
for the next record that contains a reference to that response category. Each
time you click the Next button, the next record with a reference to that
response category will be displayed. When the last record in the text file is
reached, the search will be stopped.
The search feature provides a quick way to gain a better
understanding of a particular response category. It lets you scan all
comments related to a specific response category. While using the search
feature, the search will be limited to the response category currently being
searched. This makes it extremely easy to scan the relevant text. Scanning a
particular response category will give you a better understanding of the
comments coded into that category.
Change a Response Category Label
Sometimes, it might be necessary to change or delete an
existing response category, or to create a new category. The response
category labels can be changed at any time by simply typing the new text.
Doubleclick on the response category label you want to change and then you
will be able to edit the category label.
Delete and Create New Response Categories
To create a new response category, highlight a response
category and click the Create Category button. This will insert a blank line
in the response categories so you can type the new response category label on
that line. If you create a new response category using this method, you will
need to go through each record and decide whether or not that record falls
into the newly created response category. Verbatim Blaster does not
automatically code responses based on the words you type.
To delete a response category, highlight it and click
the Delete Category button. The response category will be immediately
eliminated. There is no automatic "undelete", so be careful. You
might use Delete Category button to eliminate a response category you
consider to be unimportant.
Join Two Response Categories
Sometimes you will want to combine response categories
that you initially thought were different. You may join response categories
(at any time) into a single category. Click the Join button. Then drag and
drop one of the categories onto the other category. The category you
dragged will be deleted and all the responses that were initially assigned to
that category will be reassigned to the category you drop it on. To drag a
category, move the mouse pointer over that category. Press and hold the left
mouse button. While still holding the mouse button, move the mouse so the
category outline is over the category to be joined and then release the mouse
button. Note that once you use the join category feature, there is no way to
return to the unjoined version. There is no automatic "unjoin", so
be careful.
Create a Net Response Category
Creating a net category is a useful method of
aggregating responses. It is similar to joining response categories except
that the secondary response category is not removed as a unique entry in the
response category list.
The most common use of a net category is to summarize a
group of related response categories without affecting the existing
categories. For example, suppose you were evaluating respondents preferences
for a new food and there were response categories of red, green, and blue.
You might want to create a net category called color. You could use the Join
button to join the three categories, but you would then be unable to break
down the respondents by their individual color choices. Creating a net
category is the solution to the problem.
To create a net category, first create a new blank line
for the Net category. Highlight a category and click on the Insert Category
button to open up a blank line in the response category list. This blank line
will become the net category. Next click the Net button to begin net
creation. Finally, drag the category you want to net and drop it on the blank
line. Additional response categories can now be added to the new net
category. Click Net again, and drag another category to the new Net category.
Response categories may be included in a net one at a time using this method.
If you make a mistake while creating or adding to a net,
click on the Cancel button to cancel the process. If you inadvertently add a
wrong variable to a net, delete the net with the Delete Category button and
recreate the net.
Change the Order of the Response Categories
It is sometimes desirable to rearrange the order of the
response categories. To move a response category, click the Move button. Then
select a category, drag it to a new position, and drop it in the new position
in the response category labels list
Finish the Coding Process
If you wish to exit the openended response coding
program before finishing the coding process, make a note of the current
record number and click the Stop For Now button. To continue where you left
off at a future time, run the same procedure again, and select
Continue Previous Session. Then click on the current record number on the top
left of the Context window, type the record number where you left off, and
press enter.
To finish the coding process, and run the frequency
analysis on the coded data, click the Analyze button. The frequency analysis
will be performed on the coded data.
After viewing the results of the analysis, StatPac will ask
if you want to merge the coded verbatim responses. For example, you may want
to use the coded verbatim information in other analyses (e.g. crosstabs of
the verbatim responses with other variables). Since StatPac can only analyze
one study at a time, you should merge the coded responses (i.e., the
STATPACVERBATIM file) with your original study and
After running the merge procedure, your original file
will contain all the original data (including the original verbatim comments)
and the new coded verbatim comments. The coded comments (from the
STATACVERBATIM file) will be added to the end of your original variables, so
running the procedure will increase the number of variables in your original
study.
Produce a List of Verbatim Comments for Each Response
Category
After merging a coded verbatim file, you can merge the
STATPACVERBATIM procedure file into your existing procedure file. The
STATPACVERBATIM procedure file is created automatically when you do a merge.
It contains a series of procedures to print a listing of the verbatim
responses that were coded into each response category. To merge the file,
position the cursor at the beginning of the line following two dots. Then select
File, Merge, and select STATPACVERBATIM as the file to merge.
Crosstabs is one of the easiest ways to look at the
relationship between two variables, and one of the most popular ways of
examining categorical data.
The syntax for the crosstabs analysis is:
CROSSTABS <Variable list>
BY <Variable list>
For example, let's look at how people's expectations for
learning (EXPECTATION) are related to their satisfaction with a lecture
(SATISFACTION). The command to request this crosstab analysis is:
CR EXPECTATION BY SATISFACTION
(CROSSTABS may be abbreviated CR)
The results will be printed in the form of a
twodimensional matrix. The first variable (EXPECTATION) will by printed on
the y axis, while the second variable (SATISFACTION) will be printed on the x
axis. The keyword BY is a mandatory part of the statement.
If several different crosstabs are desired, request them
by specifying a variable list instead of an individual variable. For example,
you might be interested in both SATISFACTION with the lecture and the amount
of actual LEARNING that occurred. The command to run this analysis would be:
CROSSTABS EXPECTATION BY SATISFACTION, LEARNING
The matrix size that the crosstabs program can
accommodate depends on the available RAM. The variables themselves may be
alpha or numeric. StatPac will not print a row or column when total count for
that row or column is zero. Missing data (blanks) will be excluded from the
analysis unless there is a value label for blank data (e.g., BLANK=Missing
data).
Threeway crosstabs may be requested by the following
command:
CROSSTABS <Var. list> BY <Var. list> BY <Var.
list>
A threeway crosstab is essentially a series of twoway
crosstabs controlled for a third variable. That is, the twoway crosstabs are
performed on subsets of the data as defined by the third variable. For
example, consider the following crosstabs command:
CROSSTABS EXPECTATION BY SATISFACTION BY SEX
This command will produce two different crosstab tables,
one for males and the other for females. The same results could be obtained
by executing the following two procedures:
IF SEX="M" THEN SELECT
CROSSTABS EXPECTATION BY SATISFACTION
..
IF SEX="F" THEN SELECT
CROSSTABS EXPECTATION BY SATISFACTION
..
Count/Percent & Observed/Expected Tables
There are two common ways to print crosstabs. One is
number, row percent, column and total percent. The second is observed,
expected, observed minus expected, and the cells contribution to the total
chisquare. Both of these
tables may be printed or excluded using Y or N options. To print both tables,
use the following options:
OPTIONS CP=Y OE=Y
The chisquare is an important statistic; it is used to
test whether two variables are independent of each other. In other words, do
the observed frequencies in the cells deviate markedly from the frequencies
we would expect if the two variables were not related to each other?
A large chisquare statistic indicates that the observed
frequencies differ significantly from the expected frequencies. A crosstab
with r rows and c columns is said to have (r1) times (c1) degrees of
freedom.
Using the chisquare distribution and its associated
degrees of freedom, you can calculate the probability that the differences
between the observed and expected frequencies occurred by chance. Generally,
a probability of .05 or less is considered to be a significant difference;
this probability is termed "probability of chance" in the output.
When a crosstab contains many cells with counts less than
five, the probability of chance for the chisquare statistic can be
inaccurate. Therefore, the user should consider grouping some rows and/or
columns if many cells have expected values less than five.
The second way of printing crosstabs (observed/expected
table) is useful in explaining the significance of the chisquare statistic.
The cells with high values in the "contribution to the chisquare"
are the ones that "contribute" the most to the significance of the
chisquare. This is useful in the discussion of the results of a study as
there are often only a few cells which deviate from independency.
Example of a Count/Percent Table
Print Format
Each cell of the crosstabs table may contain up to four
numbers. Their meanings are labeled in the upper left corner of the table.
You may choose to print or suppress any of these numbers by using the PF
option. The parameters for this option are:
N Number or observed frequency
R Row percent or expected frequency
C Column percent or observed minus expected
T Total percent or contribution to chisquare
One or more parameters may be used with the PF option. These
should not be separated from each other. For example, if you want to print
the number and total percent, use the following option:
OPTIONS PF=NT
If a table is too large to fit on one page, it will be
split to use as many pages as necessary. The actual number of columns that
can fit on a page is determined by the pitch and carriage width of your
printer.
Category Creation
The actual categories (rows and columns) in the crosstab
analysis can be created either from the study design value labels (CC=L) or
from the data itself (CC=D). When the categories are created from the labels,
the value labels themselves will be used to create the categories for the
analysis, and data that does not match up with a value label code will be
counted as missing. That is, mispunched data will be counted as missing. When
categories are created from the data, all data will be considered valid
whether or not there is a value label for it.
Sort Codes
The actual labeling for the x and y axes are taken from
the value labels. In most circumstances, the order that you entered the value
labels (during the study design) reflects the order in which you want the
value labels to be listed. You can override the order of the value labels in
the study design by using the option (SC=Y). The value labels will then be
displayed in ascending alphabetical or numeric order. This feature is
especially useful when the study design itself does not contain any value
labels. If this option is not used (i.e., SC=N), the order of the value
labels on the printout will reflect the order in which values are encountered
in the data file.
Statistics
When the statistics option is specified, several other
statistics will be calculated and printed.
Example of a Statistics Printout
A discussion of each statistic follows:
Phi
The Phi statistic is calculated and printed for
twobytwo tables. It may be interpreted as a measure of the strength of the
relationship between two variables. When there is no relationship, Phi is
zero. When there is a perfect positive relationship, Phi is one. When there
is a perfect negative relationship, Phi is minus one.
When comparing one crosstab table to another, Phi is
preferable to the chisquare because it corrects for the fact that the
chisquare statistic is directly proportional to the number of cases. In
other words, Phi could be used to compare two crosstabs with unequal N's.
Cramer's V
If Phi is calculated for tables larger than twobytwo,
there is no upper limit to its value. Therefore, the Phi statistic is not
printed for tables greater than twobytwo. Instead, Cramer's V is printed.
Cramer's V adjusts the Phi for the number of rows and columns so that its
maximum value is also one. It may be interpreted exactly like the Phi (e.g.,
a large Cramer's V indicates a high degree of association between the two
variables).
Contingency Coefficient
The contingency coefficient is another measure of
association based on the chisquare statistic. It may be calculated for any
size of table; however, its maximum value will vary depending on the number
of rows or columns. Therefore, the contingency coefficient should only be
used to compare tables with the same numbers of rows and columns.
Kendall's Tau Statistics
Kendall's tau statistics are used to measure the
correlation between two sets of rankings. It is the number of concordant
pairs of observations minus the number of discordant pairs adjusted so it has
a range of minus one to plus one. There are three different methods for
standardizing tau (taua, taub and tauc). Note that taub is only
calculated for square tables.
Gamma
Gamma is similar to the tau statistics except that it
may be interpreted directly as the difference in probability of like rather
than unlike orders for the two variables when they are chosen at random.
Gamma has a value of plus one when all the data is in the diagonal that runs
from the upperleft corner to the lowerright corner of the table. It has a
value of minus one when all the data is concentrated in the upperright to
lowerleft diagonal.
Cohen's Kappa
Cohen's Kappa is another measure of the degree to which
the data falls on the main diagonal. It is only calculated for square tables.
Somers' d
Somers' d is a measure of association for ordered
contingency tables when there is a dependent and independent variable. It may
be interpreted in the same fashion as a regression coefficient.
Odds ratio
The odds ratio is calculated for twobytwo tables. Its
value may vary between zero and infinity. A value greater than one indicates
a positive relationship while a value near zero represents a negative
relationship. A value of one indicates statistical independence. Note that
this is different than most measures of association.
Yule's Q and Yules Y
Yule's Q is a function of the odds ratio. Like the odds
ratio, its value will vary between zero and one; unlike the odds ratio, a
value of zero indicates statistical independence, while values of minus one
and one represent perfect negative and positive relationships. It will be
calculated for twobytwo tables.
Entropy
Entropy is a measure of disorder; that is, the extent to
which the data is randomly distributed in a contingency table. The greater
the disorder, the greater the entropy statistic. It is useful for comparing
different crosstab tables with each other. A low entropy (near zero)
indicates that the data tends to be clustered in only a few of the possible
categories. A high entropy indicates that the data is evenly distributed
among all the possible categories.
Yate's Correction
If degrees of freedom equals one (i.e., when the
crosstabs produces a twobytwo table), the chisquare statistic can have the
Yate's correction applied and be printed as the "Corrected
chisquare". The option YA=Y will enable Yate's correction for
twobytwo tables, while YA=N will disable it.
Residual Analysis
Residual analysis is one method used for identifying the
categories responsible for a significant chisquare statistic. This involves
calculating the standardized residual for each cell and adjusting it for its
variance. The normal distribution is used to
find the probability of the adjusted residual using a twotailed test of
significance. A significant adjusted residual indicates that the cell made a
significant contribution to the chisquare statistic.
The residual analysis may be turned on or off with the
option RA=Y and RA=N, respectively. A sample printout of a residual analysis
would look like this:
Example of a Residual Analysis Printout
Interaction Analysis
While many of the statistics indicate whether or not two
variables are related, Goodman's interaction analysis is a method of finding
out if the magnitude of the relationship is caused more by one part of the
table than another. Its purpose is to evaluate all possible combinations of
twobytwo tables for interaction effects.
The interaction is defined as the natural log of the
odds ratio. The purpose of the log function is to take into account the
possibility of a curvilinear relationship. The standard error of the
interaction is calculated as well as the standardized interaction. The standardized
interaction is used to calculate a twotailed probability using a normal
distribution.
The interaction analysis may be requested with the IA=Y
option. A sample printout would look like this:
Example of an Interaction Analysis Printout
Equiweighting
Equiweighting is a technique to eliminate distortions
from most measures of association caused by column marginal disparities. You should
use Equiweighting whenever there is a dependent/ independent variable
relationship (implying causality) and the column totals differ markedly for
each of the categories. Note that Equiweighting only applies to the
observed/expected table and the statistics that are printed with the table.
After Equiweighting, cell frequencies will no longer be integer values.
Equiweighting may be requested with the EQ=Y option.
Labeling and Spacing Options
Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label
(LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable
labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be
exactly what is specified with the LW option. When EW=N, the width of the
stub will selfadjust based on the length of the stub labels.

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Key

KY

Sets whether the top left corner of the banner will show
a legend of the cell contents.

Print Percent Symbol

PP

Sets whether percentage symbols will be shown.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Codes

PC

Sets whether the code (to the left of the equals symbol)
will be shown with the value labels.

Label Justification

LJ

Sets the justification for the banner variable label.

Label Underline

LU

Sets whether the banner variable label will be
underlined.

Column Justification

CJ

Sets the justification for the banner value label
columns.

Column Underline

CU

Sets whether the banner value label columns will be
underlined.

Bottom Justify

BJ

Sets whether the banner labels will be bottom justified.

Banner crosstabs are often used in marketing research
when it is important to display several crosstab tables as part of the same
printout. It is similar to the crosstabs program except that multiple
variables may be specified for the x and/or y axis. The variables across the
top of the page are called the banners and the variables down the side of the
page are called the stub. It has an advantage over regular crosstabs in that
there is much more control over the appearance of the output. The
disadvantage is that not all the statistical measures of association are
available with banners.
The syntax for the command to run banners is:
BANNERS <Stub variable list> BY <Banner variable
list>
The keyword BY is a mandatory part of the command syntax.
The first variable list (called the stub) will be displayed down the side of
the page, and the second variable list (called the banner) will be displayed
across the top of the page. The maximum table size is 250 rows and 60
columns. The variables may be alpha or numeric.
To print banners with variables 1 & 2 on the y axis
and variables 3 to 7 on the x axis, enter the command:
BA V1 V2 BY V3  V7 (BANNERS may be
abbreviated as BA)
If a table is too large to fit on one page, it will be
split over as many pages as necessary. The actual number of columns that can
fit on one page is determined by the pitch and carriage width of your
printer. Value labels appearing in the banner heading will be split into
multiple lines as necessary to fit in the banner column widths. The actual
positions of the word splits can be controlled by inserting a vertical bar
into the value labels at the locations where you want the words to split.
Type of Data
Two types of banner tables can be printed. The most
common type is the count/percent table. The data for the rows and columns is
categorical (nominal or ordinal). Each row and column in the table represents
a category. Set TY=C to select the count/percent table. Unlike crosstabs, the
banners program defines its row and columns from the value labels in the
study design. That is, the program uses the value labels to create the
banner rows and columns. Any data value not having a matching value label in
the study design will be counted as missing; therefore, set up the study
design labels to reflect the headings and labeling for the banners.
The second type of table is when the stub variables are
interval or ratio data. There aren't any defined categories for the rows.
Instead of counts and percents, we would want to see means and standard
deviations in the table. Set TY=P to indicate that the stub variables are
parametric. The table will show means and
standard deviations instead of counts and percents.
Print Format
Each cell of the banners table may contain up to four
numbers. To print or suppress any of these numbers, use the PF option.
The parameters for this option are:
N Number
R Row percent
C Column percent
T Total percent
One or more parameters may be used with the PF option.
These should not be separated from each other. For example, to print the
number and total percent, use the following option:
OPTIONS PF=NT
Example of a Banners Printout
Alternate Format
When TY=P (the stub is
parametric data), each cell of the banners table may contain up to four
numbers. To print or suppress any of these numbers, use the AF option.
The parameters for this option are:
N Number
M Mean
D Standard Deviation
E Standard Error
One or more parameters may be used with the AF option.
These should not be separated from each other. For example, to print the
number, mean and standard deviation, use the following option:
OPTIONS AF=NMD
Category Creation
In most cases you will want the rows and columns of a
banners table to be a reflection of the value labels. When CC=L
the categories (what StatPac defines as a row or column) will be created from
the value labels in the codebook. That is, a row
on the stub or column in the banners will be created for each value label in
the codebook. If a variable does not have value labels, it will not be
included in the table.
In some cases, however, you might not have previously
assigned value labels to a variable but still want the variable to be
included in the table. Set CC=D to create the categories from the data itself
instead of the value labels. For example, if you had a variable with data
values of 15 but no value labels, you could include this variable in the
banner table by setting CC=D. Alternatively, you could use the LABELS command in the
procedure to specify value labels for the variable.
Means & Standard Deviations
The reserved word MEAN may be used to print the mean
average of any row or column variable. The mean will be the average of the
value codes (not the value labels). The
word MEAN should be included in the variable list
immediately following any variable that you want to calculate the mean
average of. Either row or column means may be specified. For example, the
following command has two variables on each axis (V1 and V2 BY V3 and V4).
Means will be printed for both the first and second row variables, and
following only the first column variable.
BANNERS V1 MEAN V2 MEAN BY V3 MEAN V4
The standard deviations or standard errors may also be
printed below the means by specifying the SD=D or SD=E option, respectively.
When SD=E and the FP option is used to specify a finite population size, the
standard error will be calculated using the finite population correction
factor. (See the OPTIONS Command in the Keywords section of this manual.)
Row & Column Totals
The reserved word TOTAL can be used with the BANNERS
command to specify row and/or column totals anywhere
in the table. The word TOTAL can be used as a variable name in any position
(and as many times as desired) in either (or both) variable lists. When TOTAL
is used in the first variable list (for the Y axis), a row is included that
displays the totals for all columns in the table. When TOTAL is used in the
second variable list (for the X axis), the table includes a column that
contains row totals. As an example, a command to print totals in the first
row and first column of the banners table would be:
BANNERS TOTAL, SATSCORE, GPA BY TOTAL, CLASS
A row or column total reflects the number of cases
throughout the entire data file in which the value for the row or column
appears. Therefore, the numbers for one particular pair of intersecting
variables may not add up to the number for the row or column total. For
example, if a variable which recorded sex (male/female) is placed on the X
axis against a variable on the Y axis which recorded make of car owned, and
20 of the 100 women who completed the survey did not answer the car question,
the column total for females would be 100, but the sum of the females in all
the rows of the car variable would be only 80.
Male Female Total
Ford
50
20 100
Other
40
60 100
Total
100 100
It is easy to create a total column that reflects the row
totals irrespective of the other cell counts. First use the NEW command to
create a new variable called NET. The value of NET will be initialized as
missing for all cases. Then use the LABELS command to assign a label to
missing data. Since the banners program uses value labels to determine what
is a row and what is a column, it is necessary to use the LABELS command,
even though the label is set to blank. Finally, specify the new NET variable
instead of the TOTAL keyword in the BANNERS command.
NEW (N1) "NET" Totals
LABELS NET (=)
BANNERS TOTAL SCORE BY NET CLASS
Sort Stub
The actual labeling for the x and y axes are taken from
the study design information in the
codebook. In most circumstances, the value labels will reflect the order in
which you want the category codes to be listed. It is possible to override
the order of the value labels in the study design by using the option (SS=A
or SS=D). The category codes (value labels) on the stub will then be
displayed in ascending or descending numeric order by frequency.
Additionally, a digit may be added as a suffix to the
SS=A or SS=D. It is used to sort the stub excluding the last one or more
value labels. This is useful when the last value label is an
"other" or "don't know" category, and you want to sort
the stub, but still leave the "other "or "don't know" as
the last row on the stub. For example SS=D1 would sort the stub in descending
order by frequency, except it would leave the last value label as the last
row regardless of its frequency.
Compress Output
Compression refers to the
way the program creates page breaks. When compression is on (CO=Y), the
program will attempt to fit as many columns and rows on a page as possible.
That is, page breaks may occur between different value labels of the same
variable. When compression is off (CO=N), the program will break pages
between each variable on the yaxis. Of course, when there are many
categories for a variable, it may be necessary to split up a variable over
successive pages regardless of the compression setting. Setting CO=Y or CO=N
will apply to both the stub and banner. Compression my be selectively applied
to either the stub or banner using CO=S and CO=B, respectively. When compression
is set to the stub (CO=S), page breaks will occur between variables
(not between value labels), however, the program will still attempt to
maximize the number of variables that can fit on a page.
Percentage Base
The percentage base on a banners analysis can either be
the number of respondents (N) or the total number of responses. If PB=N, the
denominator for calculating percentages will be the number of respondents. If
PB=R, the denominator will be the total number of responses for all
individuals.
Special
Value Label HIDE
When creating a banner table, it is often desirable to
display only some of the response categories. The LABELS command may be used
to eliminate undesirable categories from the table. This will eliminate the
column from the table and from any calculations of percentages on the table.
For example, assume the following counts for V1:
1
2
3
Agree
Neutral
Disagree No
Response Total N
30
20
40
10
100
If PB=N, (denominator equals number of respondents), the
percents will be:
Agree
Neutral
Disagree
30%
20%
40%
If PB=R, (denominator equals number of responses), the
percents will be:
Agree
Neutral
Disagree
30/90=33%
20/90=22% 40/90=44%
We could use the following LABELS command to eliminate
the "Neutral" category from the table:
LABELS V1 (1=Agree)(3=Disagree)
If PB=N, the percents will still be based on a
denominator of 100. If however, PB=R, then the percents will be based on a
denominator of 70 (30+40):
Agree
Disagree
30/70=43%
40/70=57%
The special value label "HIDE" may be used to
suppress printing of a value label without reducing the denominator for the
percents calculations. The following LABELS command could be used to
eliminate the "Neutral" category from the table, while still
including the "Neutral" count in the denominator:
LABELS V1 (1=Agree)(2=Hide)(3=Disagree)
Any row or column that has a value label of
"HIDE" will not be printed, but it will be included in the percent
calculations when PB=R. Note that the percentages are based on the counts for
all value labels (including the
"Neutral" category), even though all the value labels are not
displayed in the table.
Agree
Disagree
30/90=33%
40/90=44%
If you only wanted the "Agree's" to show in
the table, you could use the following statements in the procedure. The
percentages in the table would still be based on 90:
LABELS V1 (1=Agree)(2=Hide)(3=Hide)
OPTIONS PB=R
..
Multiple
Response
Multiple response variables can be included in banner
crosstabs by using the MR option to combine those variables that should be
interpreted as a single variable. The syntax to combine multiple response
variables is:
OPTIONS MR=(<list 1>)(<list 2>)....(<list n>)
Each variable list represents
a group of multiple response items that should be grouped as if they were a
single variable. Each group must be enclosed in parentheses and specified as
a variable list (individual variables are separated by commas or spaces, and
ranges are specified with a dash). Either variable names or variable numbers
may be specified in the MR option variable list. The V prefix is optional for
variable numbers.
The sequence of variables specified in a multiple
response group list must match a sequence of the x or y axis banner list. For
example, consider the following BANNERS command:
BANNERS V5  V8, V10, V11 BY V1  V3, V5, V6
Multiple response variables
yaxis: 58,10,11 xaxis: 13,5,6
OPTIONS
MR=(13) Groups vars.
1, 2 & 3 on the xaxis
OPTIONS
MR=(2,3) Groups vars. 2
& 3 on the xaxis
OPTIONS MR=(8,1011) Groups vars. 8, 10 & 11 on
the yaxis
OPTIONS
MR=(5,6) Groups
vars. 5 & 6 on the x & yaxis
The following groupings might cause problems:
OPTIONS MR=(16) Groups vars. 1,
2 & 3 on the xaxis,
but not variables 5 or 6 because
variable 4 was not part of the banners
variable list
OPTIONS MR=(611) Groups vars. 6, 7
& 8 on the yaxis,
but not variables 10 or 11 because
variable 9 was not part of the banners
variable list
OPTIONS MR=(1015) Groups vars. 10 & 11 on the
yaxis and
ignores the extra variables
OPTIONS MR=(37) No
variables grouped
OPTIONS MR=(1120) No variables grouped
OPTIONS MR=(3,2,1) No variables
grouped
OPTIONS MR=(8,11,10) No variables grouped
In general, the MR option will never cause a fatal
error. If an invalid grouping is found, it is simply ignored and the
variables will not be grouped on the output. The banners program uses the
value labels from the first variable specified in each group list. The MR
option should be used only to group variables which share a list of common
value labels. The value labels must be specified in the study design (i.e.,
they will not automatically be determined from the data file). This was
implemented to prevent mispunched or spurious data from creating it's own row
or column in the output.
Net Codes
Net categories may be created and displayed on the stub using the NT option in conjunction with the MR option.
The NT option specifies the codes on the stub variable that are to be
interpreted as net categories. Net categories are excluded from the
calculations of totals and means. Multiple categories are separated with a
slash and enclosed in quotes. The general format is:
OPTIONS NT="code/code/code"
For example, suppose you want to create a banner table
where the stub (V5) is a fivepoint Likert scale. The scale is coded: (1=Very
good) (2=Good) (3=Fair) (4=Poor) (5=Very poor). You want the stub to
contain two net variables and look like this:
1=Very Good
2=Good
NET: Very Good or Good
3=Fair
NET: Poor or Very Poor
4=Poor
5=Very Poor
Mean &
SD
The first step is to create a NET variable and compute it
equal to values not used in the originally coded variable. In this example,
6, 7, 8, 9, and 0 are unused, so we could use any of them for the new NET
variable. Then use the LABELS command to relabel the stub categories in the
order you want them to appear. Include the new NET variable in the BANNERS
command as if it were a multiple response variable. Use the MR option to
specify multiple response and use the NT option to specify which codes are
the net categories.
New (N1) "NET"
If V5="1/2" Then Compute NET=6
If V5="4/5" Then Compute NET=7
Labels V5 (1=Very Good) (2=Good) (6=NET: Very Good or
Good) (3=Fair) (7=NET: Poor or Very Poor) (4=Poor) (5=Very Poor)
Banners V5 NET Mean By Total Age Gender Group
Options MR=(V5 NET) NT="6/7"
..
Weighting
Weighting is useful when the true incidence in the
population is
known, but data collection yielded a different incidence. In other words,
there was a sampling error (the sample does not adequately represent the
population). Weighting can be used to mathematically increase or decrease the
counts of any banner variables so they more accurately reflect the known
population parameters.
The WEIGHT command in StatPac will create a weighted
file using integer case weights where a probability function is used for the
noninteger portion of the weights. The WT option in the BANNERS command will
not create a new data file, but rather, simply adjusts the counts in the
banner table. The WEIGHT command and the WT option in banners are different
methods of accomplishing the same goal and should not be used together in the
same procedure.
Weighting the Entire Banner Table
Take for example a simple banner table with an automatic
total row and a mean row:
Title (#)
Banners V1 By Total Gender
Options AT=Y AM=Y PC=Y
..
The table might look like this:
Looking at the Total row, we see that our sample had
64.6% males and 35.4% females. However, we know that the population actually
has 55% males and 45% females, so the Total column might be producing an
inaccurate reflection of the total population due to a sampling error. To
correct the problem, we would weight the gender variable so the table
reflects the 55% and 45% proportions that we know exist in the population.
The first step is to calculate the weights for males and
females. The weights are easily calculated by the following formula:
Weight = Desired Percentage / Observed Percentage
Thus, the weight for males would be 55 divided by 64.6 =
.8514 and the weight for females would be 45 divided by 35.4 = 1.2712.
Typically, you'll create a variable
that contains the weight for each case. Subsequent procedures would specify
the WT option to weight the entire banner tables by the case weight variable
STUDY SEGMENT
NEW (N7) "CaseWeight"
IF GENDER = 1 THEN COMPUTE CaseWeight = 0.8514
IF GENDER = 2 THEN COMPUTE CaseWeight =
1.2712
SAVE
..
Banners V1 By Total Age Gender Group
Options WT=(CaseWeight)
..
Banners V11V20 By Total Age Gender Group
Options WT=(CaseWeight)
..
Weighting Individual Banner Variables in the Table
The other form of the WT option lets you weight
individual banner variables with their own weights
The format for the WT option when you
want to weight just one banner variable is:
OPTIONS WT=(variable code=weight code=weight)
Spaces or commas may be used within the parentheses to
separate each of the components of the option.
In this example, the codebook specifies 1=Male and
2=Female so the WT options would use codes of 1 and 2.
Title (#)
Banners V1 By Total Gender
Options AT=Y AM=Y PC=Y WT=(Gender 1=.8514 2=1.2712)
..
Rerunning the procedure would produce a weighted
analysis with an adjusted total row and total column.
More than one banner variable may be weighted. The
syntax is the same except additional sets of parentheses are added for each
variable to be weighted.
OPTIONS WT=(variable code=weight code=weight) (variable
code=weight code=weight code=weight)
When the WT option is used, the total column will
reflect the weighted values of the variable that follows it. If more than one
variable is weighted, it would be wise to specify more than one total column.
For example, if ethnicity were coded as an alpha variable (W=White and
B=Black), the following commands would produce a total column for gender and
a total column for ethnicity, and both would be weighted:
Banners V1 By Total Gender Total Ethnicity
Options AT=Y AM=Y PC=Y WT=(Gender 1=.8514 2=1.2712)(Ethnicity
W=.5672 B=1.8141)
..
Fractional Counts
The FC option may be used in conjunction with the WT
option to display fractional cell counts. FC=Y will show the decimal portion
of the cell counts and FC=N will display them as integers. While weighting
does create fractional cell counts, it is often confusing (e.g., how could
there be 178.6 males?). Using FC=N will round all cell counts to whole
numbers, while FC=Y will show the decimal portions.
Supplemental Heading
The supplemental heading is a line of text that will
appear after the heading and title, but before the banner table. It may
contain any text and should be enclosed in quotes. When the pound
symbol is used in the supplemental heading, it will be printed as the
number of cases. The SH option is usually used to indicate who is included in
the banner table. The following is an example if a supplemental heading:
OPTIONS SH="BASE: ALL RESPONDENTS (N=#)"
N Equals
The sample size can be displayed in the top left corner
of the table with the NE option. It may contain any text and should be
enclosed in quotes. When the pound symbol is used in the N Equals
option, it will be printed as the number of cases. The NE option is usually
used to indicate who is included in the banner table. The following is an
example of the N Equals option:
OPTIONS NE="(N=#)"
Significance Tests
StatPac offers significance testing in banner tables. To
bypass all significance testing, set the ST option to none (ST=N). The
following options control the type of significance tests:
OPTIONS ST=N (no significance tests)
OPTIONS ST=P (ttest between percents only)
OPTIONS ST=M (ttests between means only)
OPTIONS ST=T (ttests between percents and means)
OPTIONS ST=C (chisquare tests for each
subtable)
OPTIONS ST=A (ttests between means and percents and
chisquare tests)
TTests Between Proportions and Means
Twotailed ttests between column percents and means can
be performed with the ST option. When specified, StatPac will automatically
set the banner to include a code letter for each column, beginning with
column "A". An independent samples ttest will be performed between
all combinations of banner columns, and the results will be displayed in the
table if they are significant at the alpha levels set by the C1 and C2
options.
Upper case letters indicate "high significance: and
lower case letters indicate "moderate significance" (high and
moderate being defined by the values of C1 and C2). For example, suppose
C1=.05 and C2=.01. After running the analysis, you see a cell with the
letters "Ce". This means that the percentage in this cell is
significantly different from the percentage in column C at the .01 level, and
significantly different from the percentage in column E at the .05 level.
ChiSquare Tests
Banner crosstab tables may be broken down into several
combinations of smaller tables, consisting of one variable on each axis. For
example, the following BANNERS statement could be broken down into three subtables:
BANNERS V1 BY V2, V6, V9
The subtables would be V1 by V2, V1 by V6, and V1 by V9.
It is then possible to calculate a chisquare statistic for each subtable.
Use the option ST=C to request a chisquare analysis for all the combinations
of subtables. The chisquare, degrees of freedom and probability of chance will be printed for
each subtable.
It is not possible to calculate chisquare statistics
for tables with completely missing rows or columns; therefore, if any row or
column in a subtable is completely missing, it will not be included in the
calculation of the chisquare statistic or degrees of freedom (even though it
may be displayed in the count/percent table).
Example of a TwoWay ChiSquare Statistics Printout
When ST=A, all types of significance testing will be
performed. The output will include the ttests between percents and means and
twoway chisquare tests.
Yate's Correction
If degrees of freedom equals one (i.e., when the banners
program produces a twobytwo table), the chisquare statistic can have the
Yate's correction applied. The option YA=Y will enable Yate's correction for
twobytwo tables, while YA=N will disable it.
Zero Rows & Columns
You may choose whether or not to print zero rows and
columns. This situation (of zero rows or columns) could occur if there are
value labels in the study design for which there is no data. If you want the
reader of your report to know that a category exists, you will probably want
to print rows and columns with zero counts (ZR=Y ZC=Y). In most cases,
however, conserving space is more important, so you would set ZR=N and ZC=N.
Automatic Page Title Creation
When performing a series of banner analyses, each having
the same banner columns, and only one y axis variable (per page), it may be
desirable to make the page title the same as the y axis variable label. When
the title is set to a pounds symbol in parentheses, the title will become the
variable label for the y axis variable. (This can be changed to the x axis
label using a patch).
For example, let's say you had several demographic
variables as your banner points, and you wanted to look at several other
variables on the y axis (down the stub). You want a series of tables
that look like this:
Special
Study (Page
Heading)
The variable label of the "Some Variable" on the
y axis (Title)
Age
Sex
Income
Under 21 Male Female
Low Middle High
Some

 
  
Variable

 
  
The following procedure would
produce five similar tables, each on a different page, and each with a
different title:
STUDY Yourstudy
HEADING Special Study
TITLE (#)
BANNERS V1V5 BY AGE SEX INCOME
OPTIONS CO=N SH=""
..
Total Row Position
When the TOTAL keyword is imbedded
between other variables in the banners command line, the TP option is used to
determine whether the total should be printed for the previous variable or
the next variable. In the following example, the total row could be the last
stub for V1 or it could be the first stub for V2, depending on the setting of
the TP option.
BANNERS V1 TOTAL V2 BY AGE SEX INCOME
If TP=L, the last row for V1 will be a total row. If
TP=F, the first row for V2 will be a total row.
Total Counts
The TC option makes it possible to print only the counts
(without the percents) in total rows and total columns, even when percentages
are being printed in the rest of the table. If TC=Y, total rows and total
columns will only contain the counts. If TC=N, total rows and columns will be
defined by the PF option, and will contain the same number of values as the
other cells in the table.
Total Adjustment
The TA option may be used to set how total columns are
calculated. If TA=N, the total columns will be based on the number of non
missing cases for the stub variable. If TA=Y, the total column counts will be
the sum of the counts for the variable that follows it. If there are no
missing data for the banner variable, the counts will be the same, but if the
banner variable that follows the TOTAL keyword has missing data, the counts
in the total column will be different. Thus, when setting TA=Y, you
could insert the work TOTAL before each banner variable and each total column
might contain different counts.
Total Row Denominator
Normally, a total row will be based on the same
denominator at specified by the PB option (either N the number of cases, or R
the number of respondents). If PB is set to R, you can force the percentages
in a total row to be calculated using N as the denominator by setting TD=Y.
This is sometimes handy when the banner contains multiple response variables.
Total Total Intersections
When printing a table that contains both total rows and
total columns, there will be at least one intersection of a total row and a
total column. You must set the precedence as to how the intersection cell is
calculated. It may be based on the sum of the row counts (TT=R) or the sum of
the column counts (TT=C).
Automatic Total Row
The AT option may be used to automatically print a total
row for each variable on the stub. Its purpose is to eliminate the necessity
of having to type the TOTAL keyword for each of the stub variables. In the
previous example, if you wanted each stub variable to begin with a total row,
the command would be:
BANNERS TOTAL V1 TOTAL V2 TOTAL V3 TOTAL V4 TOTAL V5 BY AGE SEX
INCOME
OPTIONS CO=N
..
If the AT option is set to "Y", the total rows
will be included in the output, even when the TOTAL keyword is not included
in the stub variable list. The following procedure would produce the
same output as the previous procedure:
BANNERS V1V5 BY AGE SEX INCOME
OPTIONS CO=N AT=Y TR=F
..
The TR option is used in conjunction with the AT option
to determine whether the total row will be the first or last row on the stub.
Note that if you use the option AT=Y, then you should not use the TOTAL
keyword anywhere in the stub variable list.
Automatic Mean Row
The AM option may be used to automatically print a row
of means for each variable on the stub. Its purpose is to eliminate the
necessity of having to type the MEAN keyword for each of the stub variables.
In the previous example, if you wanted each stub variable to include a row of
means, the command would be:
BANNERS V1 MEAN V2 MEAN V3 MEAN V4 MEAN V5 MEAN BY AGE SEX INCOME
OPTIONS CO=N
..
If the AM option is set to "Y", a row of means
will be included in the output, even when the MEAN keyword is not included in
the stub variable list. The following procedure would produce the same output
as the previous procedure:
BANNERS V1V5 BY AGE SEX INCOME
OPTIONS CO=N AM=Y
..
Labeling and Spacing Options
Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label
(LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable
labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be
exactly what is specified with the LW option. When EW=N, the width of the
stub will selfadjust based on the length of the stub labels.

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Variable Spacing

VS

Sets the spacing (in inches) between the banner
variables.

Key

KY

Sets whether the top left corner of the banner will show
a legend of the cell contents.

Print Percent Symbol

PP

Sets whether percentage symbols will be shown.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Codes

PC

Sets whether the code (to the left of the equals symbol)
will be shown with the value labels.

Underline Stub Variable Labels

UL

Sets whether the stub variable labels will be
underlined.

Bottom Justify

BJ

Sets whether the banner labels will be bottom justified.

Heading Justification

HJ

Sets the justification for the banner variable labels

Bottom Justify Heading

BH

Sets whether the banner variable labels will be bottom
justified.

Value Label Justification

LJ

Sets the justification for the banner value labels.

Bottom Justify Value Labels

BL

Sets whether the banner labels will be bottom justified.

Extra Spacing

ES

When ES=Y, a blank line will be printed above and below
the banner value labels. When ES=N, no blank lines will be printed.

Code Justification

CJ

Sets the justification for the value label codes.

Print Stub Variable Label

SL

Sets whether the stub variable labels are shown.

Stub Variable Spacing

VY

Sets the number of blank rows between variables on the
stub.

Stub Label Spacing

LY

Sets the number of blank rows between value labels on
the stub.

Minimum Cell Count
Researchers often choose not to show the percentages for
cells containing a small N. The MC option may be used to suppress the
printing of percentages of cells with low counts. For example, if MC=5, StatPac
will print cell counts that are greater than or equal to 5. If a cell has a
count of less than 5, StatPac would print dashes instead of the percent. If
MC=1, StatPac will print dashes for cells where the count is zero. Valid
values for MC are between 0 and 100. If MC=0, all cell counts will be
printed.
Minimum Denominator
Percentages can be misleading if they are based on a
small denominator. The MD option may be used to suppress the printing of
percentages that are based on a small denominator. The MD option sets the
minimum denominator that StatPac will use for calculating percents. For
example, if MD=5, StatPac will calculate percentages if the denominator is
greater than or equal to 5. If a denominator were less than 5, StatPac would
print dashes instead of the percent. Valid values for MD are between 0 and
100. If MD=0, all percentages will be printed.
Descriptive statistics are usually the first step in the
analysis of interval or ratio data. They reveal central tendency and the
shape of the distribution.
The syntax of the command to run descriptive statistics
is:
DESCRIPTIVE <Variable list>
For example, if you are examining college entrance exam
scores for READING (V7), ARITHMETIC (V12) and VERBAL (V19) skills, descriptive
statistics could be requested with any of the following commands:
DESCRIPTIVE READING, ARITHMETIC, VERBAL
DESCRIPTIVE V7, V12, V19
DE V7 V12 V19
(DESCRIPTIVE may be abbreviated as DE)
There are a wide variety of descriptive statistics available.
To print or exclude individual statistics, use the appropriate option.
Missing data (blanks) will always be excluded from the
calculation of descriptive statistics. It will be reported as the number of
missing cases but will not be used for any calculations.
One Analysis
The oneanalysis option
allows you to print selected descriptive statistics for several variables on
one page. This option is especially useful for summary reporting, when you
only need a few descriptive statistics for a large number of variables.
All the variables specified with the OA option must be
numeric. For example, suppose that variables 2534 are ten numeric scores.
The following commands would produce a onepage summary of selected
descriptive statistics for each of the ten items:
DESCRIPTIVE V25V34
OPTIONS OA=Y
Example of a Descriptive Statistics OneAnalysis Printout
Statistics
When the OA option is specified, you may select which
descriptive statistics you want with the ST option. The codes for the ST
option are the same as the specific statistic codes described below. (The
only exception is NC, which stands for the number of valid cases). For
example, the following commands would report the mean, median, unbiased standard deviation and number of cases for variables 2534:
DESCRIPTIVE V25V34
OPTIONS OA=Y ST=(ME MD US NC)
Note that the commands are identical to the previous example
except the ST option is used to identify the specific statistics you want
calculated. The parentheses around the list of statistics is mandatory.
Sort Variables
When performing descriptive statistics with the
oneanalysis option (OA=Y), you can sort the variables by the contents of the
first column of the results. The SV (sort variables) option may be set to
"N" for no sort, "A" to sort in ascending order, or
"D" to sort in descending order. When no sort is specified, the
variables will be listed in the order that they appear in the analysis
command variable list. The SV option is applicable only when the OA=Y option
is specified.
Additionally, a digit may be added as a suffix to the
SV=A or SV=D. It is used to sort the variables excluding the last one or more
variables. This is useful when the last variable is an "other"
variable, and you want to sort the variables, but still leave the "other
" as the last variable. For example SV=D1 would sort the variables in
descending order, except it would leave the last variable as the last row
regardless of its value.
Minimum, Maximum, Range, & Sum
There are four very simple measures of dispersion that
give an overall picture of the data. These are the minimum data value,
maximum data value, range (maximum minus the minimum), and sum of the data.
An option line that would enable all of these features is:
OPTIONS MI=Y MA=Y RA=Y SU=Y
Mean, Median, & Mode
The best known descriptive statistics are the mean,
median and mode. They describe the central tendency of a distribution.
The mean (average) is the most popular. It is found by adding the values for
all the (nonmissing) cases and dividing by the number of (nonmissing)
cases. For example, to find the mean age of all your friends, add all their
ages together and divide by the number of friends. The mean average can
present a distorted picture of central tendency if the sample is skewed in
any way.
For example, let's say five people take a test. Their
scores are 10, 12, 14, 18, and 94. (The last person is a genius.) The
mean would be the sums of the scores 10+12+14+18+94 divided by 5. In this
example, a mean of 29.6 is not a "good" measure of how well people
did on the test in general. When analyzing data, be careful of using only the
mean average when the sample has a few very high or very low scores. These
scores tend to skew the shape of the distribution and will distort the mean.
The median provides a measure of central tendency such
that half the sample will be above it and half the sample will be below it.
For skewed distributions this is a better measure of central tendency. In the
previous example, 14 would be the median for the sample of five people.
The mode is the most common score or category  the one
which occurred most frequently. It is possible to have more than one mode if
there is not a single "most frequent score". For example, the
following set of data has two modes: 12 and 16.
12 12 12 13 14 15 15 16
16 16 17 18
The distribution of many variables follows that of a
bellshaped curve. This is called a "normal distribution". One must assume that data is
approximately normally distributed for many statistical analyses to be valid.
When a distribution is normal, the mean, median and mode in the population will all be equal.
If they are not equal, the distribution is distorted in some way.
Skewness, Kurtosis, & KolmogorovSmirnov
There are basically two ways that a distribution can be
distorted: skewness and kurtosis. Skewness refers to "top heavy" or
"bottom heavy"; (i.e., the tail of the curve). If the longest tail
of the curve goes to the right (the curve is top heavy), it is positively
skewed. If it is bottom heavy (the longest tail of the curve goes to the
left), it is negatively skewed. A value of zero for skewness represents a
symmetrical distribution, such as the normal distribution mentioned above.
Kurtosis refers to how peaked or flat the curve is. A
very flat curve is called "platykurtic" and has a kurtosis of less
than three. A very peaked curve is called "leptokurtic" and has a
kurtosis greater than three. A value of three for kurtosis indicates normal
peakedness and the distribution is termed "mesokurtic".
The KolmogorovSmirnov statistic provides a quick check
to determine the degree of normality in the data. The value provides a
relative indication of normality; as the value moves further away from zero,
we can be more certain that the data does not approximate a normal
distribution. The distribution is nonnormal:
at the .15 level if KS > .775
at the .10 level if KS > .819
at the .05 level if KS > .895
at the .025 level if KS > .955
at the .01 level if KS > 1.035
Standard Deviation & Variance
The standard deviation is a very useful statistic that
measures the dispersion of scores around the mean. On the average, 68 percent
of all the scores in a sample will be within plus or minus one standard
deviation of the mean and 95 percent of all scores will be within two
standard deviations of the mean.
The variance is calculated directly from the
distribution of raw scores. It is the sum of the squared deviations of each
score from the arithmetic mean divided by N. The standard deviation is simply
the square root of the variance. The unbiased estimates should be used when
sampling from the population. The formula for the unbiased estimates of the
variance and standard deviation is the same except that N1 is used in the
denominator.
Standard
Error & Confidence Intervals
Confidence intervals are very important. They allow us
to predict where the mean would fall if another sample is taken. The standard
error of the mean is used to estimate the range within which we would expect
the mean to fall.
Let's say the 95 percent confidence interval for the
mean is 12.4 to 22.8. In repeated samples of the same size, the mean would be
expected to fall between these two values 95 percent of the time. A similar
interpretation can be made for the 99 percent confidence interval. The 95 and
99 percent confidence intervals may be requested using the C5 and C9 options
respectively:
OPTIONS C5=Y C9=Y
The above formula for the standard error of the mean is
used when the sample size is small relative to the population size (say, less
than ten percent). When the sample size represents a substantial proportion
(greater than ten percent) of the population, the standard error is modified
by the finite population correction factor This has the effect of
reducing the standard error and narrowing the confidence interval band. When
the FP option is used to specify a population size, the standard error will
be adjusted and printed as the "Corrected Standard Error Of The
Mean". (See the OPTIONS command in the Keywords section of this manual
for information on using the FP option.)
Confidence intervals are accurate only if the
distribution of the data resembles a normal curve. Be careful; using
confidence intervals from nonnormal data is risky business.
Example of a Descriptive Statistics Printout
Quartiles & General "iles"
Quartiles are often used in education to divide a
distribution into 4 groups of equal N. A quartile printout will contain three
values (one less than the number of groups). If, for example, the value for
the first (lowest) quartile is 50, it means that 25% of the sample had a
score of 50 or less. You can specify any division with the IL option.
For example, if you specify IL=10, then deciles will be
printed. If the ninth decile (highest) value is 85, it means that 90% of the
distribution had a score of 85 or less, and 10% scored equal to or higher
than 85. The "ile" values are interpolated when necessary. Set
IL=1 to disable the "any iles" option
Example of a Quartile Printout
Labeling and Spacing Options
Option

Code

Function

Labeling

LB

Sets the labeling to print the variable label (LB=E),
the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable
labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be
exactly what is specified with the LW option. When EW=N, the width of the
stub will selfadjust based on the length of the stub labels.

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Label Justification

LJ

Sets the justification for the banner variable label.

Extra Spacing

ES

When ES=Y, a blank line will be printed between each variable
on the stub. When ES=N, no blank lines will be printed.

The breakdown program gives descriptive statistics for
one or more criterion variables broken down by one or more subgroup
variables. In other words, the breakdown program provides a way of
summarizing descriptive statistics for many subgroups. The same information
could be obtained by performing multiple descriptive statistics analyses
using the IFTHENSELECT command to limit each analysis to the desired
subgroup.
The syntax for the command to evoke the breakdown
program is:
BREAKDOWN <Criterion var. list> BY <Subgroup var.
list>
For example, let's say you want descriptive statistics
for AGE; however, you want these statistics broken down by RACE, SEX and
INCOME level. In other words, you are interested in comparing age for each of
the subgroups (e.g., average age of males versus average age of females).
A data file for this
analysis would look like this:
AM136 (record 1  race is
coded as A
sex is coded as M
income level is coded as 1
age is 36)
BF342 (record 2
 race is coded as B
sex is coded as F
income level is coded as 3
age is 42)
The criterion variable is AGE (V4). It is this variable
that you will be calculating descriptive statistics for, so it must be
interval or ratiotype data.
Up to ten subgroup variables may be included in the
subgroup variable list. These variables may
be either alpha or numeric. In our example, these would be: RACE (V1), SEX
(V2) and INCOME (V3). Each of the subgroup variables may contain up to 100
categories (value labels).
Any of the following commands would perform the
analysis:
BREAKDOWN AGE BY RACE, SEX, INCOME
BREAKDOWN AGE BY RACE  INCOME
BREAKDOWN AGE BY RACE SEX INCOME
BR V4 BY V1  V3 (BREAKDOWN may be
abbreviated as BR)
Notice that the keyword BY is mandatory. This is
necessary because you may want a breakdown on several criterion variables.
That is, several different variables may be broken down by the same subgroup
variables.
When a criterion variable list is specified, it is
equivalent to performing a different breakdown for each criterion variable.
For example, both AGE and IQSCORE could be broken down by RACE, SEX and
INCOME:
BREAKDOWN AGE IQSCORE BY RACE SEX INCOME
When a criterion variable list (AGE and IQSCORE) is
specified, it is the same as requesting a separate analysis for each variable
in the list. In this example, two tasks will be performed. They are:
BREAKDOWN AGE BY RACE SEX INCOME
BREAKDOWN IQSCORE BY RACE SEX INCOME
When specifying a criterion variable list, care must be
taken to insure that each variable in the criterion variable list is
different from those in the subgroup variable list. That is, a variable
cannot be broken down by itself.
The output from the breakdown program will print the
mean, standard deviation, number of cases, and percent for each of the
subgroups.
Example of a Breakdown Printout
Sort Type & Sort Order
The output from the breakdown analyses may be more meaningful
when the subgroup categories are displayed in sorted order. If no sort is
selected (ST=N), the subgroup categories will be displayed in the order they
appear in the study design. If the study design does not contain all the
values in the data file (such as mispunched data), the unlabeled values
will appear on the printout in the order that they are encountered in the
data file.
You can sort the subgroup categories by frequency of
response using the option ST=F, or by the category codes themselves (ST=C).
For example, the following option would sort the categories by frequency of
response in descending order. It would be requested with the following
options:
OPTIONS ST=F SO=D (Sort Type by
frequency of response)
(Sort Order is descending)
In most cases, you'll probably want to have the
breakdown printout appear in ascending order by the code. The options
statement to do this is:
OPTIONS ST=C SO=A (Sort Type is
by category code)
(Sort Order is ascending)
Notice that this type of sort is generally the way the
information would be listed in the study design. If this is the case, sorting
by category code will have no effect. Sorting by category codes is useful if
you did not enter value labels for the subgroup variable.
Print Missing
When a subgroup variable is missing, it may be included
or excluded from the analysis with the PM option. When PM=Y, all subgroup
variables that are missing will be grouped into a unique category and
descriptive statistics for the criterion variable will be reported for the
"missing category".
Category Creation
Sometimes there may be a subgroup category listed in the
study design that has no accompanying data. For instance, nobody in the
sample may be over 60 years old. Whether or not you want the label to appear
with a count of zero is a matter of preference.
The actual categories (value labels) in the breakdown
analysis can be created either from the study design value labels (CC=L) or
from the data itself (CC=D). When the categories are created from the labels,
the value labels themselves will be used to create the categories for the
analysis, and data that does not match up with a value label code will be
counted as missing. When categories are created from the data, all data will
be considered valid whether or not there is a value label for it.
Percentage Base
In addition to means and standard deviations, the
breakdown analysis also prints counts and percents for each of the
categories. The denominator for the percentages can either be the number of
respondents or the total number of responses. If PB=N, the denominator for
calculating percentages will be the number of respondents (i.e., the number
of records in the data file). If PB=R, the denominator will be the total
number of responses for all individuals.
Labeling & Spacing Options
Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label (LB=E),
the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable
labels on the stub when OA=Y.

Exact Width

EW

When EW=Y, the labeling width for the stub will be exactly
what is specified with the LW option. When EW=N, the width of the stub will
selfadjust based on the length of the stub labels.

Column Spacing

CS

Sets the spacing (in inches) between the columns when
OA=Y.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Percent Symbol

PP

Sets whether the percent symbol is shown.

Print Codes

PC

Sets whether the codes are printed with the value
labels.

Label Justification

LJ

Sets the justification for the banner variable label
when OA=Y.

Extra Spacing

ES

When ES=Y, a blank line will be printed after each
variable name or label on the stub. When ES=N, no blank lines will be
printed.

The ttest is a relatively simple
statistic to test the difference between two means even when the sample sizes
are small (less than 30). The two variables must be interval or ratiotype
data. StatPac lets you test the difference if the N's are equal or unequal.
The primary advantage of the t statistic is that it allows us to test the
difference between samples with small numbers of cases.
The t distribution depends on the size of the samples.
With small samples, the t distribution is leptokurtic; however, as the sample
size exceeds 30, the t distribution approaches that of the normal curve. The
standard error of the difference is used to establish a range where the
difference between the true means of the two populations would be expected to
fall.
The significance of the t statistic depends upon the
hypothesis the researcher plans to test. This hypothesis should be developed
before collecting the data. If interested in determining whether there is a
significant difference between two means, but you do not know which of the
means is greater, use the twotailed test. If interested in testing the
specific hypothesis that one mean is greater than the other, use the
onetailed test.
There are two basic kinds of ttests; one for matched
pairs and the other for independent groups.
TTest
For Matched Pairs
If each subject or unit being tested was measured in
both groups, then the appropriate ttest is for matched pairs. To perform
this type of analysis, you must enter the data so that both observations for
a subject are in the same data record. An example of an appropriate use of
the ttest for matched pairs might be to compare pretest and posttest scores
where each person took both a PRETEST (V1) and a POSTTEST (V2). Both values
are contained in each data record. An example of a data file for this analysis would look like this:
8592 (record 1 
Pretest = 85 & Posttest = 92)
7689 (record 2 
Pretest = 76 & Posttest = 89)
5276 (record 3 
Pretest = 52 & Posttest = 76)
The syntax of the command to perform one or more matched
pairs ttests is:
TTEST <Variable list> WITH
<Variable list>
The keyword WITH is mandatory if a variable list is
specified (i.e., more than one ttest is requested). If only one ttest is
being requested, the keyword WITH may be omitted. In our pretestposttest
example, the commands would be:
TTEST PRETEST WITH POSTTEST
TTEST PRETEST POSTTEST
TT V1 WITH V2 (TTEST
may be abbreviated as TT)
In a matched pairs ttest, it does not matter which
variable is listed first in the command. Identical results would be obtained
with the command:
TTEST POSTTEST WITH PRETEST
When a variable list is specified as part of the TTEST
command, more than one ttest analysis will be performed. For example, the
following command will perform four ttest analyses (V1 with V9, V1 with V23,
V7 with V9, and V7 with V23):
TTEST V1 V7 WITH V9 V23
Example of a t Test for Matched Pairs Printout
TTest
For Independent Groups
The other kind of ttest is for independent groups and
it is used for noncorrelated data. If each case in the data file is to be assigned
to one group or the other based on another variable, use the ttest for
independent groups. For example, to compare reading scores between males and
females, split the reading scores into two groups depending upon whether the
person is male or female. (Each record in the data file is assigned to one
group or the other.)
M83 (record 1  male with
score of 83)
F91 (record 2  female
with score of 91)
F84 (record 3  female
with score of 84)
The syntax for the command to perform an independent
groups ttest is:
TTEST <Var. list> WITH <Grouping var.
list>=(<Code 1>)(<Code 2>)
As in the matched pairs ttest, the keyword WITH is only
mandatory if a variable list is specified for the analyzed variable or the
grouping variable. If only one ttest is requested, its use is optional.
In the above example, SCORE is the variable under
analysis and SEX is the variable used to assign records to one group or the
other. The commands to perform this ttest are:
TTEST SCORE WITH SEX=(M)(F)
TTEST SCORE WITH SEX=(M/m)(F/f)
In the second example, notice that both upper and lower
case codes are specified; they are separated from each other by a slash. This
is done just in case our data entry operators were not consistent in the way
they entered the data. That is, sometimes a male was designated with an
"M", and other times with an "m". If you are certain that
upper case was always used, you could use the first command.
When a variable list is specified, several ttests will
be performed. For example, the following statement would request three
different ttests between males and females (one for each type of score):
TTEST SCORE1 SCORE2 SCORE3 WITH SEX=(M)(F)
There is no limit on the number of codes that can be
specified as part of a group. For example, let's say an INCOME variable is
coded into five income groups:
What is your annual income?
1=Under $10,000
2=$10,000  $20,000
3=$21,000  $30,000
4=$31,000  $40,000
5=Over $40,000
To compare scores for those that make up to $30,000 with
those that make over $30,000 per year, the command would be:
TTEST SCORE WITH INCOME=(1/2/3)(4/5)
When performing a ttest for independent groups, the
program will accept a wide variety of user styles and formats. Two basic
formats are possible. These are:
(<Code>/<Code>) or (<Code><Code>)
All of the following would be valid requests when
entering the code(s) or value(s) to split the data into two groups. Notice
that the reserved words LO and HI are valid when specifying a range of codes or
values.
(A/B/D) (Place all
cases with codes A, B or D in this group)
(AD)
(Place all cases with codes A to D in this group)
(LOD)
(Place all cases with codes up to D in this group)
(69)
(Place all cases with codes 69 in this group)
(LO21) (Place
all cases with up to a 21 in this group)
(22HI)
(Place all cases with a 22 or higher in this group)
Example of a tTest for Independent Groups Printout
NonParametric
Statistics
The nonparametric equivalents of the ttest can be
requested with the NP=Y option. Either the Wilcoxon test or the MannWhitney U test will be printed depending on whether you
are performing a matched pairs or independent groups ttest.
The Wilcoxon test for correlated samples is the
nonparametric equivalent of the matched pairs ttest. The data is assigned
rank values and the differences between the ranks are computed. The Wilcoxon
test statistic is the minimum of positive and negative differences in ranks.
If the number of cases is greater than or equal to ten, the probability is
calculated from the normal distribution. When there are fewer than ten cases, refer to
the appendix to determine the probability.
Example of Wilcoxon Statistic Printout
The MannWhitney U test is the nonparametric equivalent
of the ttest for independent groups. It may be used to evaluate the
difference between two population distributions. The data is first ranked.
The MannWhitney U is the number of times that one group is smaller than the
other.
For sample sizes of less than twenty, refer to the
appendix to find the probability of U. If the sample size is twenty or more,
the distribution approximates the normal distribution, and the normal deviate
will be used to calculate the probability. The MannWhitney U may be selected
by using MW=Y or suppressed by using MW=N.
Example of MannWhitney U Statistic Printout
Labeling and Spacing Options
Option

Code

Function

Labeling

LB

Sets the column headings to print the variable label
(LB=E), the variable name (LB=N), or the variable number (LB=C).

Labeling Width

LW

Set the maximum width (in inches) for the variable
labels on the stub.

Exact Width

EW

When EW=Y, the labeling width for the stub will be
exactly what is specified with the LW option. When EW=N, the width of the
stub will selfadjust based on the length of the stub labels.

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

Print Percent Symbol

PP

Sets whether the percent symbol is shown.

Label Justification

LJ

Sets the justification for the banner labels.

Blank Lines Between Rows

BL

Sets the number of blank lines between each variable on
the stub when OA=Y.

Correlation is a measure of association between two variables.
A correlation coefficient can be calculated for ordinal, interval or
ratiotype data. This program can print descriptive statistics and a
correlation matrix for up to
88 variables.
The syntax of the command to run a correlation analysis
is:
CORRELATE <Variable list>
For example, to run a simple correlation of EDUCATION
and INCOME, you could type the command as:
CORRELATE EDUCATION INCOME
CORRELATE V1, V2
CO V1
V2 (CORRELATE can be
abbreviates as CO)
The correlation program can also correlate more than two
variables. For example, to print a correlation matrix of AGE, INCOME, ASSETS
and TESTSCORE, you would type the command:
CORRELATE AGE, INCOME, ASSETS, TESTSCORE
The output would contain a correlation matrix of all
possible combinations of variable pairs. Several statistics can be printed
for each pair of variables. These are the correlation coefficient, number of
valid records, standard error of the estimate, t statistic and probability
of t.
Type of Correlation Coefficient
StatPac can calculate two different kinds of correlation
coefficients: Spearman's rankdifference correlation coefficient and
Pearson's productmoment correlation coefficient. When calculating a
correlation coefficient for ordinal data, choose Spearman's rankdifference
technique. For interval or ratiotype data, select Pearson's productmoment
formula.
It is your responsibility to select the appropriate type
of statistic. This can be accomplished by using the TY option. The TY option
may be specified as S (Spearman's) or P (Pearson's). For example, when
analyzing intervaltype variables, type:
OPTIONS TY=P
Descriptive Statistics
Descriptive statistics can be selected or rejected with
the option DS=Y or DS=N. If Pearson's productmoment correlation is selected,
the output will include the number of records, mean and standard deviation. Only the
number of records will be printed if Spearman's rankdifference correlation
is selected.
Example of a Descriptive Statistics Printout
Simple Correlation Matrix
The correlation matrix may be printed or suppressed with
the SC=Y or SC=N option respectively. Most of the time, you'll probably want
to print the correlation matrix. However, there may be times when you only
want descriptive statistics and/or Cronbach's alpha reliability statistic.
Correlation Coefficient
The correlation coefficient(s) can be printed with the CC
option. The option CC=Y will print the correlation coefficient while CC=N
will suppress it.
The value of a correlation coefficient can vary from
minus one to plus one. A minus one indicates a perfect negative correlation,
while a plus one indicates a perfect positive correlation. A correlation of
zero means there is no relationship between the two variables.
Number Of Cases
The number of cases (records) used to calculate the
correlation coefficient can be printed with NC=Y. This may or may not be the
same as the number of records in the data file. If either the X or Y value is
missing from a pair of data, the record will be skipped and not included in
the analysis.
Standard Error
The standard error of the estimate for a correlation
coefficient measures the standard deviation of the data points as they are
distributed around the regression line. The standard error of the estimate
can be used to specify the limits and confidence interval for a correlation
coefficient. It can only be calculated for interval or ratiotype data. The
standard error can be printed using the option SE=Y.
T Statistic
The significance of the correlation coefficient is
determined from the student's t statistic. The formula to calculate the t
statistic depends upon which type of correlation coefficient is specified.
The t statistic can be printed or not by using the option TT=Y or TT=N,
respectively. Although StatPac does not calculate the F statistic, it is
simply the square of the t statistic.
Probability Of Chance
The probability of the t statistic indicates whether the
observed correlation coefficient occurred by chance if the true correlation
is zero. It can be printed with the option PR=Y. StatPac uses a twotailed
test to derive this probability from the t distribution. Probabilities of .05
or less are generally considered significant, implying that there is a
relationship between the two variables.
When the t statistic is calculated for Spearman's
rankdifference correlation coefficient, there must be at least 30 cases
before the t distribution can be used to determine the probability. If there
are fewer than 30 cases, use the table in the appendix to find the
probability of the correlation coefficient.
Example of a Correlation Matrix Printout
Cronbach's Alpha
Cronbach's alpha is a measure of the internal
consistency of a group of items. It provides a unique estimate of reliability
for a given test administration. The value of Cronbach's alpha may vary
between zero and one. In general, it is a lower bound to the reliability of a
scale of items. In other words, Cronbach's alpha tends to be a very
conservative measure of reliability.
As well as being a measure of the reliability of a scale
of items, Cronbach's alpha may also be interpreted as an estimate of the
correlation of one test with an alternative form containing the same number
of items.
Labeling and Spacing Options
Option

Code

Function

Labeling

LB

Sets the labeling for descriptive statistics to print
the variable label (LB=E), the variable name (LB=N), or the variable number
(LB=C).

Column Width

CW

Sets the minimum width of the columns (in inches).

Column Spacing

CS

Sets the spacing (in inches) between the columns.

Decimal Places

DP

Sets the number of decimal digits that will be shown.

