StatPac for Windows User's Guide
StatPac Home


System Requirements and Installation

System Requirements


Unregistering & Removing the Software from a PC

Network Operation

Updating to a More Recent Version

Backing-Up a Study

Processing Time

Server Demands and Security

Technical Support

Notice of Liability

Paper & Pencil and CATI Survey Process

Internet Survey Process

Basic File Types

Codebooks (.cod)

Data Manager Forms (.frm)

Data Files (.dat)

Internet Response Files (.asc or .txt)

Email Address Lists (.lst or .txt)

Email Logs (.log)

Rich Text Files (.rtf)

HTML Files (.htm)

Perl Script (.pl)

Password Files (.text)

Exported Data Files (.txt and .csv and .mdb)

Email Body Files (.txt or .htm)

Sample File Naming Scheme for a Survey

Customizing the Package

Problem Recognition and Definition

Creating the Research Design

Methods of Research


Data Collection

Reporting the Results



Systematic and Random Error

Formulating Hypotheses from Research Questions

Type I and Type II Errors

Types of Data


One-Tailed and Two-Tailed Tests

Procedure for Significance Testing

Bonferroni's Theorem

Central Tendency


Standard Error of the Mean

Inferences with Small Sample Sizes

Degrees of Freedom

Components of a Study Design

Elements of a Variable

Variable Format

Variable Name

Variable Label

Value Labels

Valid Codes

Skip Codes for Branching

Data Entry Control Parameters

Missing OK

Auto Advance

Caps Only

Codebook Tools

The Grid

Codebook Libraries

Duplicating Variables

Insert & Delete Variables

Move Variables

Starting Columns

Print a Codebook

Variable Detail Window

Codebook Creation Process

Method 1 - Create a Codebook from Scratch

Method 2 – Create a Codebook from a Word-Processed Document

Spell Check a Codebook

Multiple Response Variables

Missing Data

Changing Information in a Codebook


Data Input Fields

Form Naming Conventions

Form Creation Process

Using the Codebook to Create a Form

Using a Word-Processed Document to Create a Form

Variable Text Formatting

Field Placement

Value Labels

Variable Separation

Variable Label Indent

Value Labels Indent

Space between Columns

Valid Codes

Skip Codes

Variable Numbers

Variable List and Detail Windows

Data Input Settings

Select a Specific Variable

Finding Text in the Form

Replacing Text in the Form

Saving the Codebook or Workspace


Keyboard And Mouse Functions

Create A New Data File

Edit Or Add To An Existing Data File

Select A Different Data File

Change Fields

Change Records

Enter A New Data Record

View Data For A Specified Record Number

Find Records That Contain Specified Data

Duplicate A Field From The Previous Record

Delete A Record

Data Input Settings

Compact Data File

Double Entry Verification

Print A Data Record

Variable List & Detail Windows

Data File Format


HTML Email Surveys

Plain Text Email Surveys


Item Numbering

Codebook Design for a Plain Text Email Survey

Capturing a Respondent's Email Address

Filtering Email to a Mailbox

General Considerations for Plain Text Email


Internet Survey Process

Server Setup

Create the HTML Survey Pages

Upload the Files to the Web server

Test the survey

Download and import the test data

Delete the test data from the server

Conduct the survey

Download and import the data

Display a survey closed message

Server Setup

FTP Login Information

Paths & Folder Information

Design Considerations for Internet Surveys

Special Variables for Internet Surveys

Script to Create the HTML

Command Syntax & Help

Saving and Loading Styles

Survey Generation Procedure

Script Editor

Imbedded HTML Tags

Primary Settings

HTML Name (HTMLName=)

Banner Image(s)  (BannerImage=)

Heading  (Heading=)

Finish Text & Finish URL (FinishText= and FinishURL=)

Cookie (Cookie=)

IP Control (IPControl=)

Allow Cross Site (AllowCrossSite=)

URL to Survey Folder  (WebFolderURL=)

Advanced Settings - Header & Footer







FootnoteText & FootnoteURL

Advanced Settings - Finish & Popups



HelpWindowWidth & HelpWindowHeight





Advanced Settings - Control
















Advanced Settings - Fonts & Colors

Global Attributes

Heading, Title, Text, & Footnote Attributes

Instructions, Question, and Response Attributes

Advanced Settings - Passwords - Color & Banner Image





Advanced Settings - Passwords - Text & Control










Advanced Settings - Passwords - Single vs. Multiple

Password (single password method)

PasswordFile (multiple passwords method)

PasswordField & ID Field (multiple passwords method)


Advanced Settings - Passwords - Technical Notes

Advanced Settings - Server Overrides






Branching and Piping

Randomization (Rotations)

Survey Creation Script - Overview

Using Commands More than Once in a Script

Survey Creation - Specify Text







Survey Creation - Spacing and pagination



Survey Creation - Images and Links



Survey Creation - Help Windows

Survey Creation - Popup Windows

Survey Creation - Objects

Radio Buttons for a Single Variable

Radio Buttons for Grouped Variables (matrix style)

DropDown Menu

TextBox for a Single Variable

Adding a TextBox to a Radio Button,
    CheckBox, or Radio Button Matrix

TextBoxes for Grouped Variables

Sliders for Single or Grouped Variables

CheckBox for Multiple Response Variables


Uploading and Downloading Files from the Server

Auto Transfer


Summary of the Most Common Script Commands


Format of an Email Address File

Extract Email Addresses

List Statistics

Join Two or More Lists

Split a List

Clean, Sort, and Eliminate Duplicates

Add ID Numbers to a List

Create a List of Nonresponders

Subtract One List From Another List

Merge an Email List into a StatPac Data File

Send Email Invitations

Using an ID Number to Track Responses

Email Address File

Body Text File

Sending Email


Mouse and Keyboard Functions

Designing Analyses

Continuation Lines

Comment Lines

V Numbers



Variable List

Variable Detail

Find Text

Replace Text


Load, Save, and Merge Procedure Files

Print a Procedure File

Run a Procedure File

Results Editor


Table of Contents

Automatically Generate Topline Procedures

Keyword Index

Keywords Overview

Categories of Keywords

Keyword Help

Ordering Keywords

Global and Temporary Keywords

Permanently Change a Codebook and Data File

Backup a Study

STUDY Command

DATA Command

SAVE Command

WRITE Command

MERGE Command


TITLE Command


LABELS Command


SELECT and REJECT Commands

NEW Command

LET Command

STACK Command

RECODE Command



IF-THEN … ELSE Command

SORT Command

WEIGHT Command


LAG Command


DUMMY Command

RUN Command

REM Command

Reserved Words

Reserved Word RECORD

Reserved Word TOTAL

Reserved Word MEAN

Reserved Word TIME

Analyses Index

Analyses Overview

LIST Command






TTEST Command


Advanced Analyses Index



LOGIT and PROBIT Commands

PCA Command

FACTOR Command



ANOVA Command


MAP Command

Advanced Analyses Bibliography

Utility Programs

Import and Export

StatPac and Prior Versions of StatPac Gold

Access and Excel

Comma Delimited and Tab Delimited Files

Files Containing Multiple Data Records per Case

Internet Files

Email Surveys

Merging Data Files

Concatenate Data Files

Merge Variables and Data



Quick Codebook Creation

Check Codebook and Data


Random Number Table

Random Digit Dialing Table

Select Random Records from Data File

Compare Data Files


Date Conversions

Currency Conversion

Dichotomous Multiple Response

Statistics Calculator Menu

Distributions Menu

Normal distribution

T distribution

F distribution

Chi-square distribution

Counts Menu

Chi-square test

Fisher's Exact Test

Binomial Test

Poisson Distribution Events Test

Percents Menu

Choosing the Proper Test

One Sample t-Test between Percents

Two Sample t-Test between Percents

Confidence Intervals around a Percent

Means Menu

Mean and Standard Deviation of a Sample

Matched Pairs t-Test between Means

Independent Groups t-Test between Means

Confidence Interval around a Mean

Compare a Sample Mean to a Population Mean

Compare Two Standard Deviations

Compare Three or more Means

Correlation Menu

Sampling Menu

Sample Size for Percents

Sample Size for Means

Codebook Design

Components of a Study Design

All surveys begin by creating a codebook. The codebook contains the format and labels of each variable. If your survey contains 20 items, the codebook will also contain 20 items.

If the survey will be administered by paper and pencil or CATI there will also be a data entry form. The form refers to the screens that the data entry person will see while entering and editing data. The codebook and form usually have the same file names. Only the file extensions are different.

If the survey will be administered over the Internet or as an e-mail survey, a data entry form is not necessary.



There are many ways to design the codebook and form. The best way depends upon whether or not you already have typed the survey with a word-processor. When you run the program, the main screen will be displayed. The left side of the screen is for the Workspace window and the right side of the screen is used to show the list of variables in the codebook.


Elements of a Variable

The codebook defines the variables in the study. Each item on the survey is a variable. Thus, the number of variables in the codebook is the same as the number of items. (Note that several variables are required to define a multiple response item, so the number of variables in the codebook might actually exceed the number of items on the survey.)

There are five components to a variable. These are:

Variable Format

Variable Name

Variable Label

Value Labels (including valid codes and skip codes)

Data Entry Control Parameters

The Variable Format is mandatory because it defines where and how the data is stored in the data file.  All other components are optional.


Variable Format

The Variable Format defines the structure of the variable. This is the only information that is mandatory when defining a new variable. Once you define the structure of a variable, it will exist in the codebook.

The syntax for the Variable Format is:

 <Var. Type> <No. of Cols> . <Decimal>

The following are examples of variable formats:

N5        a numeric variable using 5 columns

N5.2     a numeric variable using 5 columns;
             the format of the variable will be ##.##

N2.0     a numeric variable using 2 columns;
             the variable will always be an integer

A1        an alpha variable using 1 column

A250    an alpha variable using 250 columns

Variable Type

StatPac has two types of variables: numeric and alpha.  Different analyses can be performed depending on the variable type.  StatPac requires that a variable type be specified as either N or A.

Numeric variables may contain numbers, a decimal point, a plus or minus sign and a D or E if scientific notation is used.  Alpha variables may contain any character (letters, numbers and special characters).  StatPac automatically left justifies alpha variables and right justifies numeric variables.

An example of a numeric variable might be the following question on a survey:

How many years of formal education have you completed?

The response would always be a number that could be contained in two columns of the data file.  The response would also always be numeric.  A numeric-type specification is required for interval or ratio-type-data.

Some questions have coded responses and could use the alpha-type format.  An example of alpha-type question on a survey would be:

Which product do you prefer?

A = Product A

B = Product B

C = Product C

N = No preference

In this question, the responses are coded into categories.  The categories, are not arithmetically related.  That is, a response of C does not mean twice as much product as response A.  Nominal and ordinal-type data can use either an alpha or numeric format.

Another example of an alpha variable would be an open-ended response. The respondent could answer anything to the following question:

What could we do to improve our product?

Likert-scale questions and preference scales are often given a numeric format so that descriptive statistics can be calculated.  This is a generally accepted procedure in marketing and social science research, the assumption being that the perceived intervals between the selections are equal.

Number of Columns

The number of columns component of the format statement is the field width allocated for the variable.  This is the number of characters needed to write the longest data value.  There is not a maximum number of columns for an alpha variables, although the practical limit for data entry is about 1,000 characters.  For a numeric variable, the maximum number of columns is 22 characters.

The field width for numeric variables must be large enough to hold the number, a plus or minus sign, and a decimal point (if necessary).  For example, a numeric one-to-ten scale would require two characters; racing times for a hundred meter sprint (with accuracy to the hundredth of a second) would require five characters (two for the seconds, one for the decimal point, and two for the hundredths of seconds). An alpha variable to hold an entire open-ended sentence might require 150 characters.

It is very important that you leave a sufficient number of columns for your data.  After you begin entering data, changing the number of columns for a variable will become more complex (since this requires restructuring of the data already entered).  When in doubt, allow more columns rather than less.

Decimal Places

The decimal format is the number of significant decimal places that the variable will contain.  This component of the format statement is optional and may be omitted.  If <decimal> is not specified, the data will be stored exactly as entered (with or without a decimal point).  In the format statement itself, the number of decimal digits is preceded by a decimal point.


Variable Name

The variable name is simply a name that may be used to reference the variable when designing analyses. While the variable name is optional, its use is highly recommended. As a general rule, the variable name is a short word or abbreviation. It's primary purpose is to help you keep track of variables while designing analyses.

There are several rules governing variable names.  All of these are automatically checked by StatPac so it will not be possible to enter an invalid variable name.

1. A variable name must be unique from all other variable names and may not be the same as any analysis keyword.  (The keywords are listed in another section of the manual.)

2. The first character of a variable name may not be a number or a space.

3. A variable name may not be the same as a V number.  For example, you cannot name a variable "V12".

4. A variable name may not contain a comma or period.  The variable name may include a space; however, for the purpose of clarity, we recommend using a dash or underscore character instead of a space.

5. A variable name may not be D, E, RECORD, TIME, LO, HI, WITH, BY, THEN, TOTAL or MEAN.  These words have special meaning to StatPac.


Variable Label

The variable label is a written description of the variable. For surveys, the variable label is usually the question itself. There are no restrictions on the content or length of a variable label. It may contain any character on the keyboard.

When creating a series of multiple response variables, identical variable labels should be used for each of the multiple response items. This tells StatPac how to format the data entry form, and thereby improves data entry.


Value Labels

Value labels are used in the reports to label the response categories.  They may include any upper or lower case character except a semicolon. The format for a value label is:


The code on the left of the equals sign is what will be typed during data entry.  The label on the right of the equals sign is the definition of the code and will be used to label the output.

In the following example, there are four value labels.  These are entered as four separate lines.

1=6 years or less

2=7 to 9 years

3=10 to 12 years

4=Over 12 years

There are no spaces between the code and the equals sign. There are also no spaces between the equals sign and the label. The code on the left of the equals sign may not be greater than the field width defined in the format statement. There is not a limit on the length of the value label (on the right of the equals symbol); however, short value labels (20 or fewer characters) generally produce more condensed and easier to read printouts.

For alpha variables, it is important to note that upper and lower case characters are different.  When you enter the code on the left of the equals sign, the code should be the same case as you plan to enter the data. For example, if the data entry person will be entering a lower case m and f for male and female, the value labels would be:



The value labels also define what will be accepted as valid data during data entry.  Whenever a value label is specified, the code (on the left of the equals sign) will be interpreted as a valid code during data entry.  If no value labels are specified, all data will be considered valid.

Many variables do not need any value labels.  They are required only when a coded response will be entered.  Numeric interval and ratio data, as well as open ended alpha data, do not require any value labels.

The following questions would not need value labels:

What is your age?

What is your first name?

What score did you get on the test?

What is your favorite number?

What would best describe your feelings?


Valid Codes

When there are no value labels (such as a test score variable), valid codes for data entry can still be specified by simply typing the valid codes or ranges.  The format for entering valid codes is:

<Code or Range>

<Code or Range>

<Code or Range>

In this case, each valid code (or valid range) is entered on a different line.  Alternately, a slash (/) may be used to list a series of valid codes on the same line:

<Code or Range>/<Code or Range>/<Code or Range>

The following examples illustrate various ways to specify valid codes.

1/3/5      accept codes 1, 3 & 5

1-3         accept codes 1 to 3

1-3/5      accept 1 to 3 & 5

15-99     accept values 15 to 99

A-D       accept codes A, B, C and D

A-D/X   accept codes A, B, C, D and X

#            accept anything

Notice that the pound symbol (#) is used to specify "accept any number or letter during data entry".  If the field is numeric, this means any number is an acceptable value.  If the field is alpha, it means that any character is acceptable input.  When the valid codes, labels, and skips field is completely empty, any input will be accepted (i.e., it is the same as the # symbol). If the pounds symbol is specified, it should be the last line of the value labels for the variable.


Skip Codes for Branching

Skip codes allow you to specify conditions for passing over certain variables during data entry depending on the values entered for other variables.  This is commonly referred to as branching.

For instance, if variable 6 contains responses to the question "Have you ever read Music Magazine?" and variable 7 stored answers to the question "How much do you like Music Magazine?", you would want to skip to variable 8 for a person who responded "No" to variable 6.

A semicolon and "branch to" number may be used on a <Code>=<Label> line to control data entry branching.  For the Music Magazine example above, the value labels would be:


N=No ;8

Note that the semicolon and variable to branch to follow the value label.  In this example, the space before the semicolon is for readability only.  All of the following lines are equivalent:


N=No                     ;8

 N=No                    ;               8

As another example, consider a questionnaire that includes a "dwelling" variable for which 1=Apartment, 2=Condominium, and 3=House.  If three separate sections within the questionnaire corresponded to each type of dwelling, the value labels and skip codes for the dwelling variable could be:

1=Apartment        ;14

2=Condominium ;23

3=House                               ;29

The skip codes would direct data entry to variables 14, 23 or 29 depending on whether a 1, 2 or 3 was entered.  Again note that the spacing is for readability only.

The pound symbol (#) may be used in a skip code to mean any value or code.  That is, it is an absolute jump to a variable regardless of the data entered.  For example, #=;14 means to jump to variable 14 after entering the current field.  This feature is useful when you want to end a branch and rejoin with a common variable, as in the dwelling example above.

Complex branching is also supported. This means that a branch can based on the response to a previous variable. The following is an  example of how to use complex branching. Assume it is the value labels for variable 10. If the data for variable 10 is entered as a 1 or 2, the complex skip will be evaluated. In this example, the skip pattern is the same regardless of whether a 1 or 2 is entered. If the previous response to variable 5 was 1 then the skip will go to variable 25, and if the previous response to variable 5 was 2 then the skip will go to variable 30. A response of 3 for this variable would skip to variable 35.

1=Yes  ; #V5=1 ; 25  #V5=2 ; 30

2=No  ; #V5=1 ; 25  #V5=2 ; 30

3=No Answer  ; 35

Note that a semicolon is used to begin the complex skip and before each "skip to" variable. Also note that a pound symbol is used to start each portion of the complex skip. All spacing is optional. Complex skip patterns are not automatically updated if you insert or delete a variable from the codebook. Therefore, they are generally added after the structure of the codebook has been finalized.

If you specify a skip to a nonexistent variable number, it will be interpreted as an instruction to branch to the end of the questionnaire.  For example, if you have a survey with fifty questions, a skip to variable ninety-nine would mean to immediately end the current questionnaire, and begin a new interview with the next respondent.

Be careful when defining skip codes, as it is quite possible to create an endless data entry loop.


Data Entry Control Parameters

The Data Entry Control parameters determine how the data entry program will operate. They can be set independently for each variable, and are all of the yes/no variety.

Missing OK

The decision whether or not to allow missing data for a particular variable depends upon the variable itself.  For example, ID number may be something you want to make mandatory during data entry (no missing data will be allowed). Some variables however, should accept missing data.  For example, in surveys, respondents may leave questions blank or simply prefer not to answer others; in agricultural research, some of the crop dies; in public health research, participants move, etc. When in doubt, missing data should be allowed.  This only means that the data entry person will be able to skip over this variable if they need to.

For Internet surveys, we strongly recommend allowing missing data for all variables. The ease in which a person can leave an Internet survey makes it exceedingly important that they not become frustrated by the process. Requiring input when a respondent does not wish to answer an item will most assuredly result in the partially completed survey.

Auto Advance

When the Auto Advance is set, the cursor will automatically move to the next field when the current field is filled with characters.  This means that during the data entry process, if you type the same number of characters that were reserved for field width (in the format statement), you will not need to press <enter> to move to the next field. This will significantly speed up the data entry process since it eliminates a keystroke (i.e., <enter>) for each variable. This parameter will be ignored for Internet surveys.

Caps Only

The Caps Only parameter determines whether the characters typed on the keyboard will be converted to upper case letters in the data file.  This is especially useful if a field is coded alpha, and you do not want the data entry operator to be able to inadvertently enter lower case characters.  It is identical to using the caps lock on your keyboard. This parameter will be ignored for Internet surveys.


Codebook Tools

There are two main tools for entering and changing the information in the codebook: the Grid and the Variable Detail. Either tool may be used at any time. Generally, the Grid is used when you are beginning a new codebook, and the Variable Detail is used to make changes to individual variables. There is also a Analysis utility program “Quick Codebook Creation” to create a codebook from an extended format statement.


The Grid

One method of designing a codebook is to use the Grid. Click on the Grid button and the Grid will be displayed.



A row in the Grid represents a variable. If your study has 50 variables, there will be 50 rows in the Grid. When you start the Grid, only one row will be showing. More rows will appear as needed as you enter the codebook. When you enter a variable format for the current variable, a blank row for a new variable will appear.

To begin entering information into the Grid, click in the name field of the first row.

Use the Tab key and Shift Tab keys to move from one column to the next. You can also use the left or right mouse buttons to select a field.

The Variable Label and Value Label fields will display a larger window when you enter those fields. If either of these windows is showing, you can minimize it by clicking on it with the right mouse button, or clicking on another field.


Codebook Libraries

There are many features to make the codebook design easier. One of these is the ability to load variables from other codebooks. In other words, you can establish a "library" of commonly used questions. The library can be a codebook that you designed especially for this purpose, or it can be a codebook that you used for a previous study.

To load a variable or variables from a library, select File, Open Library.



After loading the library, you can choose one or more variables to copy to the new codebook. To select multiple variables, hold down the control key while you click on the individual variables in the library. After selecting the variables, click on the Copy To Grid Button in the top left corner of the library window.

Duplicating Variables

Many times, consecutive variables in the codebook are similar. While working with the Grid, you can copy the information from the previous variable to the current variable. While entering a new variable, click on the Duplicate Button to repeat all the information from the previous variable. StatPac will automatically change the variable name since two variables cannot share the same name. StatPac will not duplicate any fields that are not blank in the current variable.

The Duplicate Button is especially useful when creating a series of variables that share the same value labels or a series of multiple response variables. For example, if you are entering a series of variables that all use the same value labels, you could enter the variable format, name, label, and then click the Duplicate button to repeat the value labels from the previous variable. When entering multiple response variables, you could use the Duplicate Button to repeat the entire variable.

The library feature can also be used to duplicate variables in the current codebook. Unlike the Duplicate Button (which duplicates only the previous variable), the library can be used to duplicate variable(s) that appear anywhere in the codebook. First save the codebook by clicking on the Save button or selecting File, Save Codebook. Then click on the row where you want the new variables to be inserted. Select File, Open Library and select the current codebook as the library. Finally select the variables you want to duplicate, and click the Copy To Grid Button.

Insert & Delete Variables

Normally, while you are designing a study, variables are added one after another to the end of the existing variables. However, you can also insert a new variable in the middle of the codebook.

Click on the Grid row you want to be immediately below the new variable. Then click the Insert Button to open up a blank new row in the Grid.

To delete a variable, first click on the Grid row you want to delete. Then click the Delete Button.

Move Variables

The order of the variables can be changed using the Up and Down Arrow Buttons. First, click on the variable you want to move. Then click the Up or Down Arrow Buttons to move the variable.

Starting Columns

Starting columns refer to the beginning location of the variables in the data record. During data entry, each variable that is entered will be stored in the data record beginning at a certain location. The starting columns are these locations.

Starting columns are automatically determined, and you do not need to be concerned about them. That is, starting columns are assigned by the program while entering new variables into the Grid. They are assigned so the data record will store variables in consecutive (contiguous) columns. Thus, the starting columns are being automatically handled by the program and not displayed as part of the Grid. They can be displayed by selecting Options, Show Start Columns.

Print a Codebook

To print a codebook, select File, Print, Codebook. The Print Dialog window will give you the opportunity to choose various printing options. Printing a codebook is especially important if you give your data file to someone else, since the codebook will tell them exactly how the data is formatted.


The Variable Selection lets you select which variables from the codebook will be printed. The list of variables to print can use spaces or commas to separate variables, and dashes to indicate a range of variables.

A codebook printout will always include the variable numbers, names, and formats for the variables. The What To Print items let you select what additional information from the codebook will be printed.

Variable Labels - When this parameter is set, variable labels will be printed.

Value Labels -  When this parameter is set, value labels will be printed.

Valid Codes - When this parameter is set, valid codes will be printed.  This specifically refers to valid codes that are not part of a <Code>=<Label>.

Skip Codes- When this parameter is set, skip patterns will be printed as part of the value labels.


Variable Detail Window

To show the Variable Detail window, select View, Variable Detail. If the Variable List window is showing, you can also double click on a variable to evoke the Variable Detail window. The variable window gives you the ability to add or modify nearly all the information in the codebook. While the layout is different, it gives you the same functionality as the Grid.

The Variable Detail window can be moved around the screen by pressing the mouse button on any gray area of the window and dragging the window to a new location.

When you change any of the fields in the Variable Detail window, the change is instantly reflected in the codebook. See Elements of a Variable for a complete description of each field.


Codebook Creation Process

The basic steps involved in designing a codebook depend upon whether or not you have a survey typed with a word processor.

Method 1: If you do not have a word-processed survey, you are essentially "starting from scratch" and it will be necessary to manually enter the labeling for the codebook. Once completed, StatPac can automatically create a form for data entry and that can be loaded into your word-processor, an Internet survey, or an e-mail survey.

Method 2: If you already have a word-processed survey, considerable time can be saved by loading it into the Workspace window and then copying text from it to the codebook labels in the Variable Detail window.

Method 1 - Create a Codebook from Scratch

There are three ways to set up a new codebook:

1. Use the codebook design features that are built into the program. The Grid and Variable Detail tools let you create and edit variables, as well as being able to extract variables from other studies or libraries of questions. A library of questions is simply a codebook with commonly asked questions. Rather than retyping a question with each new survey, you can extract it from a library.

2. Use Quick Codebook Creation (an Analysis utility program) to enter a format statement that describe the variables and their format. This is the fastest way to create a new codebook. However, the codebook it creates will not have any variable names, labels or value labels (although these can easily be added later).

3. If you import data from another format, a codebook will be created. Depending on the import format, the codebook may or may not have variable names.

Method 2 – Create a Codebook from a Word-Processed Document

Save the survey with your word processor in .rtf (Rich Text Format). In StatPac, select File, Open, Rich Text File, and load the word-processed document into the workspace. You can also load a text (.txt) document into the workspace.

Activate the Variable Detail window by selecting View, Variable Detail, or by double clicking on “<New>” in the Variable List window. Then create the codebook one variable at a time by specifying a format for the variable, and copying selected text from the form to the Variable Detail window.

When creating a new variable, first type its format into the Variable Format field. Then copy text from the workspace to the Variable Detail window to fill in the rest of the variable information.

To copy text, first highlight the text on the form. It will automatically be copied to the clipboard when you highlight it. That is, it is not necessary to select Edit, Copy, or press <Ctrl C>. Next, click on one of the fields in the Variable Detail window. The text will be copied to the Variable Detail window. You can copy text from the form to the Variable Name, Variable Label, or Value Labels fields. Depending on the text, you may need to edit it in the Variable Detail window. These feature may be turned off by selecting Format, and then unchecking Semi-Automatic Copy/Paste.

While using the Variable Detail Window, right clicking in the Variable Label field will repeat the previous question in its entirety. You could use this when creating multiple response variables. Right clicking in the Value Labels field will repeat only the value labels from the previous question. You could use this when creating a series of questions all using the same value labels (e.g., a series of Likert scale items).


Spell Check a Codebook

To check the spelling in a codebook, select Design, Spell Check. The spelling checker dialog box will be shown.


The default dictionary for the spelling check is American English. The software also includes spelling dictionaries for British English, French, Spanish, and German. To change the dictionary that StatPac uses, you must edit the StatPac.ini file. Find the line that says DictionaryName = English. Change the word "English" to "British", "French", "Spanish", or "German".


Multiple Response Variables

If an item on a questionnaire allows for more than one response, it is called a multiple response item.  For instance, in the following question we would need to allow for five possible responses:

Which of the following brands of toothpaste have you used in the last year?  (Check all brands you've used)

    _____ Gleem

    _____ Colgate

    _____ Pepsodent

    _____ Crest

    _____ Other

Each of the five choices is viewed as a unique variable.  That is, five variables would be required to accommodate all possible responses. 

When designing a study in the Grid, using the Duplicate button will properly create all multiple response variables.

Generally, the following conventions are observed when creating multiple response variables.

1. The format for all multiple response variables must be the same.

2. The same (identical) variable label should be given to each of the multiple response variables.

3. If you will be creating a Web survey from the codebook, the number of variables must be the same as the number of value labels. Since there are five choices (value labels), there must be five identical variables.

The five variables for our example would contain the following information:


V1 Format:


V1 Name:


V1 Label:

Which of the following brands of toothpaste have you used in the last year?

V1 Value Labels:













The second, third, fourth and fifth variables would be identical to the first variable, except the variable names would be: Toothpaste_2, Toothpaste_3, Toothpaste_4, and Toothpaste_5.

Note the above example uses value labels and a skip code. The skip code says to skip to variable six if nothing is entered for a variable.

The Missing OK parameter should be set to "Yes" for all five variables.

Note that during data entry, any toothpaste code can be entered for any variable.  That is, if a person had only checked Crest, a "4" would be typed for the first variable.  For the second variable, the data entry person would just press <enter> and this would cause the program to skip to variable six (the continuation of the questionnaire).

Sometimes surveys ask questions that limit the number of responses.  For example, the following questionnaire item limits the respondent to two choices, even though there are five items listed. Note that the following method of limiting the number of choices may not be used for Web surveys.

From the following list, choose the two items most important to you.  (Two only please)

_____ Friendship

_____ Love

_____ Financial security

_____ Freedom

_____ Spirituality

In this example, we controlled the number of responses by the way we asked the question.  Two variables need to be created to hold the responses to this item (one for each check).

The study design would contain two variables for these multiple response variables:


V1 Format:


V1 Name:


V1 Label:

From the following list, chose the two items most important to you.

V1 Value Labels:





3=Financial security









V2 Format:


V2 Name:


V2 Label:

From the following list, chose the two items most important to you.

V2 Value Labels:





3=Financial security






Both items have the same format and variable label.  Variable and value labels are only assigned to the first variable.  The second variable will accept valid codes 1-5.  Notice that the first variable also contains a skip pattern that says jump to variable three if nothing is specified for the first variable.

The two variables IMPORTANT and IMPORTANT_2 are not weighted.  That is, they could be swapped without affecting the results of any analysis (one is not more important than the other).  Codes were assigned to each of the possible responses.

If the above question was asked in the following way, the variables would be weighted; that is, one variable is more important than the other:

From the following list, write a 1 next to the item that is most important to you and a 2 next to the item that is second most important to you.

 _____ Friendship

 _____ Love

 _____ Financial security

 _____ Freedom

_____ Spirituality

Notice that this is no longer a true multiple response question; it is really asking two different questions (which is first and which is second).  Unlike the previous examples, both responses are not weighted equally.  Whenever a question asks the respondent to rank a list of items in some sort of prioritized order, it is not multiple response.  Instead, it is essentially a series of separate (but related) variables.  Two variables would be created for this question, each having it's own variable name, label and value labels:


V1 Format:


V1 Name:


V1 Label:

From the following list, what item


is the most important to you.

V1 Value Labels:





3=Financial security







V2 Format:


V2 Name:


V2 Label:

From the following list, what item


is the second most important to you.

V2 Value Labels:





3=Financial security





While both variables in this example share the same value labels, they are still considered to be separate variables.  The criteria to determine whether or not a question is multiple response is the issue of priority.  If all responses are weighted equally, the question is appropriate for multiple response.  If the question involves any sort of ranking of the items, it is best viewed as a series of individual variables.

When StatPac copies variables from the codebook to the data entry form, variables with the same variable label are interpreted as multiple response variables, and they will be automatically grouped together on the data entry template and in the HTML created for Web surveys.


Missing Data

Missing data may be handled in one of two ways.  Regardless of the method used, it is easy to change missing data using the Analysis program.

In most cases, no special provisions need to be made regarding what to do with missing data.  If any variable in the data file is left blank, it will be treated as a missing value and will be excluded from the analysis.  The analysis will print the number of missing cases, but will not include these when performing any statistical test.

The other method of handling missing cases is to enter an additional value label.

A=6 years or less

B=7 to 9 years

C=10 to 12 years

D=Over 12 years

   =Missing Cases

Note that the code (on the left of the equals sign) is a space.  All missing data in StatPac is stored as a spaces (or blanks) during data entry.

When a variable is numeric, it is not appropriate to specify a value label of <space>=Value Label.  Since a space is not a valid numeric code, it cannot be included in a numeric calculation.  Therefore, missing data will automatically be excluded from most analyses of interval or ratio numeric data.  It is possible, however, to recode missing data to a valid numeric value (such as zero), so that it will be included in the analyses.  Also, several multivariate procedures include an option to use mean substitution for missing data.

It is important to understand the consequences of recoding numeric missing data to something else.  Zero and missing are not the same.  Analytical techniques involving computations on a variable treat zero differently than missing data.  Missing data is excluded from all numerical calculations, whereas zeros are treated just like any other numeric value.


Changing Information in a Codebook

When initially designing a codebook and form, you can change any information for any variable. You can also insert new variables and delete existing variables.  This will continue to be true up to the time that data is entered into a data file.  After that, StatPac will issue a warning when you load a codebook that has an associated data file. StatPac gives this warning because these operations (i.e., adding new variables and deleting existing variables) would make the existing data file no longer match the codebook.  You can, however, change any other study design information at any time.

If you receive the warning message, StatPac will let you activate a safety feature that prevents inadvertent additions or deletions of variables by disabling the Insert and Delete Buttons.



If you choose not to utilize the safety feature, be careful not to inadvertently add, delete, or change the order of any variables since this would make the existing data file incompatible with the modified codebook. However, you may still make changes to any other codebook information including a variable's format.

If you change the format of a variable, the associated data file adjusted accordingly. For example, if you change a variable format from A50 to A100, all the existing data records would be 50 characters too short. However, when you save the revised codebook, each data record will be padded with spaces so it matches the new codebook information. Note that this feature normally only changes one data file (the one with the same name as the codebook). Advanced users may wish to change multiple data files that all use the same codebook. To enable changing multiple data files, edit StatPac.ini and set AllowMultipleDataFiles = 1.

Advanced users may wish to turn on or turn off the safety feature so the prompt is not displayed. The CodebookSafety parameter can be edited in the StatPac.ini file to control this feature. Set CodebookSafety = 1 to always enable the safety feature, CodebookSafety = 2 to always disable the safety feature, and CodebookSafety = 0 (the default) to ask you each time that a codebook is loaded.

Note that the above information applies only when you load a codebook for which there is an associated data file.

This is important because entering a few records of dummy data is often the best way to discover errors in the study design.  You would begin a typical project by designing the variables and creating a form.  Then you could enter a few records into the data file as a test.

Entering a few dummy records is one of the best ways to test your codebook.  You might discover a variable on the questionnaire that was inadvertently omitted from the study, an alpha field that's not wide enough to hold a response, or some other major change to the study design. If you don't need the data file (i.e., it's just dummy test data), you can simply delete the data file. To delete a data file, select File, Open, Data File. Right click on the data file you wish to delete and select Delete.

If you have already entered a substantial number of real data records, and then discover you need to add a new variable, you cannot simply add the variable to the codebook.  Doing so would make the format of the codebook different than the data file.  Instead, new variables should be created in an analysis, where both the codebook and the data file will be updated to include the new variable.