|
StatPac for Windows User's Guide |
|
|
Basic Statistical AnalysesThese commands may be used in a procedure to set the type of analysis to be performed.
These additional commands may be used in a procedure when the Advanced Analyses module has been installed.
There are many different types of analyses that can be performed with StatPac. Most commands are easy to use since much of the required analysis information comes from the default parameter table. With the exception of the OPTIONS command all other analysis commands are mutually exclusive in any given procedure. In other words, a single procedure cannot perform more than one kind of analysis. A procedure file, however, may contain many procedures, each performing a different kind of analysis. The OPTIONS command may be used in any procedure to override the default values in the parameter table. It is used to control printing and analysis parameters. The analysis commands available in StatPac's basic package are: LIST, FREQUENCIES, DESCRIPTIVE, BREAKDOWN, CROSSTABS, BANNERS, TTEST, and CORRELATE. The analysis commands available in the StatPac Advanced Statistics module are: REGRESS, DISCRIMINANT, PCA, FACTOR, ANOVA, CANONICAL, CLUSTER, and MAP. Important User Tip Any of the analysis commands may be abbreviated by using only the first two characters of the keyword. For example, FREQUENCIES could be abbreviated to FR, FACTOR could be abbreviated as FA, and OPTIONS could be abbreviated as OP.
The LIST command is used to list selected variables in the data file. The command syntax is:
LIST <Variable list>
If the LIST command does not specify a variable or variable list, all variables will be listed. When used in this fashion, value labels will be listed instead of the raw data. For example, let's say you want to print a report consisting of only two columns. The first column is variable 7 (AGE) and the second column is variable 14 (SEX). Either variable numbers or names may be used to specify the variable list. The command line would be entered as either:
LIST V7 V14 LI AGE SEX ( LIST may be abbreviated as LI)
The keyword RECORD may be used as part of the variable list to print the record number as one of the columns. For example, the following parameter line will produce a report consisting of four columns, the first column being the sequence of the case in the data file:
LIST RECORD V12 V31 V83
You may specify as many variables to be included in the report that can be accommodated by the pitch and orientation of the output. If too many variables are specified, the output will be truncated. Missing data will be displayed as a series of dashes. The LIST command is often used as a way to trouble-shoot a procedure that is not working. For example, if the following procedure didn't work properly, we might try the LIST command to figure out what went wrong:
STUDY EXAMPLE COMPUTE (N4.1) AVG = V1 * V2 / 2 DESCRIPTIVE AVG .. We could replace the DESCRIPTIVE command with the LIST command and list the relevant variables. Also note that we added the SELECT command to limit the printout to the first twenty-five records (e.g., we don't need to list the whole file to find out why it is not working).
STUDY EXAMPLE COMPUTE (N4.1) AVG = V1 * V2 / 2 SELECT 1-25 LIST RECORD V1 V2 AVG ..
Example of a List Printout
To list an open-ended variable, simply specify it in the LIST command. The following would list a variable called "Comment". The COMPUTE line is used to calculate a record number so it can be included in the printout. The IF-THEN-SELECT line is used to select only those who made a comment. The OPTIONS line is used insert a blank line between each response.
COMPUTE (N5) REC=RECORD IF Comment <> " " THEN SELECT LIST Rec Comment OPTIONS BL=Y .. The output might look like this:
Example of a Verbatim Listing
Multiple Response & Combining VariablesThere are two options (MR and CB) to control the way that data gets displayed with the LIST command. The MR option is used to specify variables you want to be stacked on top of each other in a single column. The specified variables will be listed in a single column rather than using multiple columns on the listing. The variables do not have to be true multiple response variables in the codebook; you may use the MR option for any variables. The CB option is used to specify variables that you want to combine into a single field instead of being treated as individual fields. For example, if City, State, and Zip were separate variables, you could display them together using the CB option. Normally, all variables in a listing would appear side-by-side. The MR and CB options are used to create an easier to read format. For example, suppose variables six, seven, and eight are being used to hold respondents' verbatim answers to a question on a restaurant survey: "What three things could we do to improve your dining experience?" Three A70 variables were used. The following command would produce a listing of the data in a vertical format. Up to three lines in the report would be displayed for each respondent. The SELECT command is used to eliminate subjects who did not answer the question.
IF V6 <> " " THEN SELECT LIST V6-V8 OPTIONS MR=(V6-V8) .. An example of a single record in the printout might look like this:
Faster service. Reduce prices. Greater selection.
If the CB option were used instead of the MR option, all three responses would be combined into a single field (giving the appearance that the three responses we part of the same sentence or paragraph.
IF V6 <> " " THEN SELECT LIST V6-V8 OPTIONS CB=(V6-V8) ..
The CB option formats the printout so each record in the listing will use the number of lines that it needs to show the data. The listing might appear as follows:
Faster service. Reduce prices. Greater selection.
The MR and CB options may be used in conjunction with each other to produce desired outputs. For the next example, assume the following variables:
V1 Name V2 Street_Address V3 City V4 State V5 Zip V6 Phone_Number V7 Fax_Number V8 Email_Address
We might want to stack Name, Address, City, State, and Zip into a single column on the printout. We might also want to stack the Phone and Fax numbers into a single column. In the following procedure, the CB option is used to combine City, State, and Zip into a single field, and the MR option is used to specify which variables should be displayed in a vertical column.
LIST V1-V8 OPTIONS CB=(V3-V5) MR=(V1-V5)(V6-V7) ..
The output might look like this:
Example of a Listing Using the MR and CB Options
Labeling and Spacing Options
A frequency analysis is the simplest of all statistical procedures. It is ideal for data which has been coded into groups or categories. The coding can be either alpha or numeric-type data. The syntax of the command to run a frequency analysis is:
FREQUENCIES <Variable list>
For example, to find the percent of males and females in a sample, you would request a single analysis:
FR SEX ( FREQUENCIES may be abbreviated as FR)
Several frequency analyses can be requested with a single command. For example, to get a frequency analysis of SEX (V4), RACE (V5) and INCOME (V6), the request could be specified in several ways:
FREQUENCIES SEX, RACE, INCOME FREQUENCIES SEX RACE INCOME FREQUENCIES V4 V5 V6 FREQUENCIES V4-V6
Notice that either the variable name or the variable number may be specified as part of the variable list. A frequency analysis may be run on alpha or numeric-type variables. Missing data will be included in the frequencies only if there is a value label for missing data, (e.g., <BLANK>=No response). Table FormatThree types of printout formats are built into the program: expanded, condensed and automatic. The option to control the table format is:
OPTIONS TF=N (No table will be printed) OPTIONS TF=A (Formatting will be automatic) OPTIONS TF=E (Formatting will be expanded) OPTIONS TF=C (Formatting will be condensed)
Condensed formatting is especially useful when there are many unlabeled values. For example, if one of the variables is ID NUMBER, there are generally no value labels associated with this variable. It is often a good idea to check the data to be sure that no records were inadvertently entered twice (i.e. duplicate ID numbers). A condensed frequencies printout would allow you to quickly determine if any ID NUMBER is specified more than once. An example of condensed formatting might look like this:
Example of a Compressed Frequencies Printout
Automatic formatting is generally recommended since it minimizes the amount of paper that will be used. If automatic formatting is used and there are more than 50 unlabeled categories (no value labels), the printout will automatically be converted to condensed format. In most cases, this will result in the expanded format. An example of expanded formatting might look like this:
Example of an Expanded Frequencies Printout
Print Zero ValuesSometimes there may be a category listed in the value labels that has no accompanying data. For example, nobody in the sample may be over 40 years old or make over $30,000 a year. Whether or not you want the label to appear with a count of zero is a matter of preference. If you want the reader of your report to know that a category was available, you'd probably want to print zero values (ZV=Y). If you are interested in saving space, you might want to exclude zero values (ZV=N). Sort Type & Sort OrderFrequency analyses are often more meaningful when the output is displayed in sorted order. When working with nominal-type data and few categories, the order in which categories are presented is not very important. (e.g. It really doesn't make much difference whether males or females are listed first.) However, as the number of categories increases, it may be desirable to list those with the highest count first, followed by those with lower counts. This would be a sort by frequency of response in descending order. It would be requested with the following options:
OPTIONS ST=F SO=D (Sort Type by frequency of response) (Sort Order is descending)
When data is ordinal, it is more appropriate to present the output in order defined by the categories themselves. Usually this is the same as the alpha or numeric code used to represent a category. For example, take the following two survey questions:
How old are you? What is your annual income? A=Under 21 1=Under $10,000 B=21-30 2=$10,000-$20,000 C=31-40 3=$21,000-$30,000 D=Over 40 4=Over $30,000
Both questions are ordinal; the first one is coded alpha and the second is numeric. It would be desirable to have the frequencies printout appear in ascending order by the code (the same way they are listed above). The options statement to do this is:
OPTIONS ST=C SO=A (Sort Type is by category code) (Sort Order is ascending)
Notice that this type of sort is generally the way the information would be specified in the value labels. If this is the case, sorting by category code will have no effect. Sorting by category codes is useful if you did not enter value labels for the variable. If no sort type is specified (ST=N), the output will be displayed in the same order as specified by the value labels. If the value labels do not contain all the values in the data file (such as mispunched data), the unlabeled values will appear on the printout in the order that they are encountered in the data file. Additionally, a digit may be added as a suffix to the SO=A or SO=D. It is used to sort the value labels excluding the last one or more value labels. This is useful when the last value label is an "other" category, and you want to sort the value labels, but still leave the "other " as the last row in the report. For example ST=F SO=D1 would sort the value labels in descending order by frequency, except it would leave the last value label as the last row regardless of its frequency. Truncate LabelsVery long value labels may sometimes exceed the space allocated for them in the printout. In those situations, you may set the program to either truncate the value labels (TL=Y excludes the ending portion of the label), or to use multiple lines to print the entire value label (TL=N). Cumulative PercentsWhen the frequency table is printed in expanded format, you may print or exclude cumulative percents with the CP option. This would be specified as:
OPTIONS CP=Y (Turn on cumulative percents) OPTIONS CP=N (Turn off cumulative percents) Confidence IntervalsConfidence intervals for proportions can be requested with the CI option. For example, to request the 95% confidence intervals, you would use the option CI=95, and the 99% confidence intervals could be requested by CI=99. Confidence intervals allow us to estimate the proportions in the population for each of the response categories. If repeated samples are taken from the population, we would expect the category proportions to fall within the confidence intervals. When confidence intervals are requested, cumulative percents will not be printed regardless of the setting of the CP option. Confidence intervals are calculated by first computing the estimated standard error of the proportion, and then using the t distribution to find the actual interval. Note that the finite population correction factor (1-n/N) is used to adjust the standard error if the sample represents a large proportion (say greater than ten percent) of the population. When the sample is large, use the FP options to specify the population size (i.e., FP=x, where x is the size of the population). If the FP option is not specified, no correction will be applied.
Example of an Confidence Intervals Around A Percent
Critical T ProbabilityAfter performing a frequency analysis, researchers are often interested in determining if there is a significant difference between the various categories. The Chi-square statistic is often used to determine if the observed frequencies markedly differ from the expected frequencies. The problem with the Chi-square statistic is that it does not isolate the significant differences (i.e., it only tells whether or not one exists). StatPac uses a t-test to compare all possible pairs of categories to determine where the actual differences lie. The CT option may be set between 0 and 1. When CT=0, no t-tests will be performed or printed. If CT=1, the t-statistic and probability will be printed for all possible pairs of categories. A typical setting for the critical t probability is 5% (CT=.05). In this case, StatPac will print the t-statistic and two-tailed probability for all pairs of categories that have a probability of p=.05 or less. StatPac uses the following formula to calculate the t-statistic:
Example of a Critical T Probability Analysis
Percentage BaseThe percentage base on a frequency analysis can either be the number of respondents (N) or the total number of responses. If PB=N, the denominator for calculating percentages will be the number of respondents. If PB=R, the denominator will be the total number of responses for all individuals. Multiple ResponseSurveys often include questions in which the respondent is asked to make more than one response to a single question. An example of the kind of question that is appropriate for multiple variable response is:
1. Which of the following services did you use? (Check all that apply) __ Counseling __ Job placement __ Remedial reading __ Remedial math __ Resume writing
The multiple response frequency analysis is used to summarize these kinds of items. When designing a study that includes this type of question, each choice is considered as a separate variable. The value labels need to be specified only for the first variable, but it is fine if they are specified for all the multiple response variables.
V1 "Services_1" Services Used 1=Counseling 2=Job placement 3=Remedial reading 4=Remedial math 5=Resume writing
V2 "Services_2" Services Used
V3 "Services_3" Services Used
V4 "Services_4" Services Used
V5 "Services_5" Services Used
The syntax for the multiple response frequency analysis is:
FREQUENCIES <Variable list> OPTIONS MR=Y
In this example, all the variables in the variable list will be treated as one multiple response variable. Another way to use the MR option is to re-specify the variable numbers (not the variable names) that should be grouped.
FREQUENCIES <Variable list> OPTIONS MR=(<Variable list>)
Note that the parentheses are required around the variable list in the options line. In the above example, the commands would be:
FREQUENCIES V1-V5 OPTIONS MR=(V1-V5)
The output will contain the counts and percents for each of the response values. That is, how many times code 1 (counseling) was chosen for any variable, how many times code 2 (job placement) was chosen for any variable, etc. In other words, it will print the total number of times that each response was recorded for variables 1, 2, 3, 4 and 5 combined. The options line may be used to specify several multiple response analyses by using additional sets of parentheses in the MR option. The following commands would perform three different tasks (each one being a multiple response analysis on a new set of variables).
FREQUENCIES V1-V20 OPTIONS MR=(V1-V10)(V11-V15)(V16-V20)
Multiple response may also be used when the questionnaire limits the choices to less than the number of possible responses. For example, the following question asks for two responses from the same value labels list:
17 & 18. Write the numbers of your two favorite foods from the list below. _____ _____ 1 = Hotdogs 2 = Hamburgers 3 = Fish 4 = Roast Beef 5 = Chicken 6 = Salad
Notice that there are two variables (17 & 18) that hold the information for this question. Both variables use the same value labels and the responses to both variables are weighted equally (i.e. the first one is not more important than the second). Multiple response assumes that all variables to be analyzed have the same value labels. In this example, the command would be:
FREQUENCIES V17 V18 OPTIONS MR=(V17 V18)
Example of a Multiple Response Frequency Analysis
Category CreationThe actual categories in the frequency analysis can be created either from the study design value labels (CC=L) or from the data itself (CC=D). When the categories are created from the labels, the value labels themselves will be used to create the categories for the analysis, and data that does not match up with a value label code will be counted as missing. That is, mispunched data will be counted as missing. When categories are created from the data, all data will be considered valid whether or not there is a value label for it. One AnalysisThe one-analysis option allows you to print frequency analyses for several variables on one page. This option is especially useful for management reporting when the information needs to be condensed and concise. All the variables specified with the OA option must have the same value labels. An example might be a series of Yes/No questions or Likert scale items. The important point is that each variable has exactly the same value labels as the other variables. For example, suppose that variables 21-30 are ten items asking the respondents to rate the item as low, medium or high. The following commands would produce a one page summary of all ten items:
FREQUENCIES V21-V30 OPTIONS OA=Y
The one analysis option is limited by the number of characters that can be printed on a line (i.e., by the pitch and carriage width of the printer). If there are too many different value labels, they will not be able to fit on one line and the analysis will be skipped. If this should happen, try rerunning the analysis using a compressed pitch. As a general rule, each value label will require ten spaces on the output.
Example of a One-Analysis Printout
The OA option is used in frequency analyses to summarize the frequencies of several variables that all contain the same value labels. Note the difference between the OA and MR options. With the multiple response option (MR), the items are treated as if they are a single variable. The one analysis option (OA), however, treats each item as a separate analysis. The results, however, will be summarized on one page. When the MR option is used in conjunction with the OA option, the variables in the MR options list will be treated as multiple response variables. This makes it easy to create nets in a frequencies with the OA=Y option. For example, if V1-V20 are the twenty variables, we could add a net by first creating a duplicate copy of V1 with a new name, and then including the MR option to combine the variables to make the net. The net will be the sum of the counts of the individual variables that make up the MR variable list.
STUDY Yourstudy NEW (N1) "Grand-Total" COMPUTE Grand-Total = V1 LABELS Grand-Total (1=Agree)(2=Neutral)(3=Disagree) FREQ Grand-Total V2-V20 V1-V20 OPTIONS OA=Y MR=(Grand-Total V2-V20) ..
The results might look like this:
Agree Neutral Disagree Grand-Total -------- -------- -------- Variable 1 -------- -------- -------- Variable 2 -------- -------- -------- Variable 3 -------- -------- -------- etc.
The following is another example shows how you can use MR option in conjunction with the OA option to create complex nets. It also shows how the reserved word "RECORD" can be used to create blank lines in the report. Suppose we are conducting of survey of government policies. We have nine "Agree/Disagree" items coded as 1=Agree and 2=Disagree. The first three items deal with "Social Policy"; the next three items with "Foreign Policy"; and the last three items with "Fiscal Policy". We would like to produce a report that looks something like this:
Peoples Attitudes Towards Government Policies
(N=x) Agree Disagree
OVERALL ----- -----
SOCIAL POLICY ----- ----- Item 1 ----- ----- Item 2 ----- ----- Item 3 ----- -----
FOREIGN POLICY ----- ----- Item 4 ----- ----- Item 5 ----- ----- Item 6 ----- -----
FISCAL POLICY ----- ----- Item 7 ----- ----- Item 8 ----- ----- Item 9 ----- ----- There are four different nets in this report. The OVERALL net includes all variables. The SOCIAL POLICY net includes the first three items, the FOREIGN POLICY net the next three items, and the FISCAL POLICY net the last three items. For this example Items 1-9 are stored in variables 1 to 9. The spacing (indentation) in this example is used only to make the procedure easier to understand. It is not necessary to use the this type of spacing in your procedures.
STUDY Yourstudy HEADING Peoples Attitudes Towards Government Policies COMPUTE (N1) OVERALL=V1 COMPUTE (N1) SOCIAL POLICY=V1 COMPUTE (N1) FOREIGN POLICY=V4 COMPUTE (N1) FISCAL POLICY=V7 LABELS OVERALL SOCIAL POLICY FOREIGN POLICY FISCAL POLICY (1=Agree)(2=Disagree) FREQ OVERALL V2-V9 Produces overall net RECORD Produces a blank line SOCIAL POLICY V2-V3 Produces social policy net V1-V3 Produces 3 social policy variables RECORD Produces a blank line FOREIGN POLICY V5-V6 Produces foreign policy net V4-V6 Produces 3 foreign policy variables RECORD Produces a blank line FISCAL POLICY V8-V9 Produces f |