Codebook Design
Components of a Study Design
All surveys begin by creating a codebook.
The codebook contains the format and labels of each variable. If your survey
contains 20 items, the codebook will also contain 20 items.
If the survey will be administered by paper and pencil
or CATI there will also be a data entry form. The form refers
to the screens that the data entry person will see while entering and editing
data. The codebook and form usually have the same file names. Only the file
extensions are different.
If the survey will be administered over the Internet or
as an e-mail survey, a data entry form is not necessary.

There are many ways to design the codebook and form. The
best way depends upon whether or not you already have typed the survey with a
word-processor. When you run the program, the main screen will be displayed.
The left side of the screen is for the Workspace window and the right side of
the screen is used to show the list of variables in the codebook.
The codebook defines the variables in the study. Each
item on the survey is a variable. Thus, the number of variables in the
codebook is the same as the number of items. (Note that several variables are
required to define a multiple response item, so the number of variables in
the codebook might actually exceed the number of items on the survey.)
There are five components to a variable. These are:
Variable Format
Variable Name
Variable Label
Value Labels (including valid codes and skip
codes)
Data Entry Control Parameters
The Variable Format is mandatory because it defines
where and how the data is stored in the data file. All other
components are optional.
The Variable Format defines the structure of the
variable. This is the only information that is mandatory when defining a new
variable. Once you define the structure of a variable, it will exist in the
codebook.
The syntax for the Variable Format is:
<Var. Type> <No. of Cols> .
<Decimal>
The following are examples of variable formats:
N5 a numeric
variable using 5 columns
N5.2 a numeric variable using 5
columns;
the
format of the variable will be ##.##
N2.0 a numeric variable using 2
columns;
the
variable will always be an integer
A1 an alpha
variable using 1 column
A250 an alpha variable using 250 columns
Variable Type
StatPac has two types of variables: numeric and
alpha. Different analyses can be performed depending on the variable
type. StatPac requires that a variable type be specified as either N or
A.
Numeric variables may contain numbers, a decimal point,
a plus or minus sign and a D or E if scientific notation is used. Alpha
variables may contain any character (letters, numbers and special
characters). StatPac automatically left justifies alpha variables and
right justifies numeric variables.
An example of a numeric variable might be the following
question on a survey:
How many years of formal education have you completed?
The response would always be a number that could be
contained in two columns of the data file. The response would also
always be numeric. A numeric-type specification is required for
interval or ratio-type-data.
Some questions have coded responses and could use the
alpha-type format. An example of alpha-type question on a survey would
be:
Which product do you prefer?
A = Product A
B = Product B
C = Product C
N = No preference
In this question, the responses are coded into
categories. The categories, are not arithmetically related. That
is, a response of C does not mean
twice as much product as response A. Nominal and ordinal-type data can
use either an alpha or numeric format.
Another example of an alpha variable would be an
open-ended response. The respondent could answer anything to the following
question:
What could we do to improve our product?
Likert-scale questions and preference scales are often
given a numeric format so that descriptive statistics can be
calculated. This is a generally accepted procedure in marketing and
social science research, the assumption being that the perceived intervals
between the selections are equal.
Number of Columns
The number of columns component of the format statement is the field
width
allocated for the variable. This is the number of characters needed to
write the longest data value. There is not a maximum number of columns
for an alpha variables, although the practical limit for data entry is about
1,000 characters. For a numeric variable, the maximum number of columns
is 22 characters.
The field width for numeric variables must be large
enough to hold the number, a plus or minus sign, and a decimal point (if necessary).
For example, a numeric one-to-ten scale would require two characters; racing
times for a hundred meter sprint (with accuracy to the hundredth of a second)
would require five characters (two for the seconds, one for the decimal
point, and two for the hundredths of seconds). An alpha variable to hold an
entire open-ended sentence might require 150 characters.
It is very important that you leave a sufficient number
of columns for your data. After you begin entering data, changing the
number of columns for a variable will become more complex (since this
requires restructuring of the data already entered). When in doubt,
allow more columns rather than less.
Decimal Places
The decimal format is the number of significant decimal
places that the variable will contain. This component of the format
statement is optional and may be omitted. If <decimal> is
not specified, the data will be stored exactly as entered (with or without a
decimal point). In the format statement itself, the number of decimal
digits is preceded by a decimal point.
The variable name is simply a name that may be used to
reference the variable when designing analyses. While the variable name is
optional, its use is highly recommended. As a general rule, the variable name
is a short word or abbreviation. It's primary purpose is to help you keep
track of variables while designing analyses.
There are several rules governing variable names.
All of these are automatically checked by StatPac so it will not be possible
to enter an invalid variable name.
1. A variable name must be unique from all other
variable names and may not be the same as any analysis keyword. (The
keywords are listed in another section of the manual.)
2. The first character of a variable name may not be a number
or a space.
3. A variable name may not be the same as a V
number. For example, you cannot name a variable "V12".
4. A variable name may not contain a comma or
period. The variable name may include a space; however, for the purpose
of clarity, we recommend using a dash or underscore character instead of a
space.
5. A variable name may not be D, E, RECORD, TIME, LO,
HI, WITH, BY, THEN, TOTAL or MEAN. These words have special meaning to
StatPac.
The variable label is a written description of the
variable. For surveys, the variable label is usually the question itself.
There are no restrictions on the content or length of a variable label. It
may contain any character on the keyboard.
When creating a series of multiple response variables,
identical variable labels should be used for each of the multiple response
items. This tells StatPac how to format the data entry form, and thereby
improves data entry.
Value labels are used in the reports to label the
response categories. They may include any upper or lower case character
except a semicolon. The format for a value label is:
<Code>=<Label>
The code on the left of the equals sign is what will be
typed during data entry. The label on the right of the equals sign is
the definition of the code and will be used to label the output.
In the following example, there are four value
labels. These are entered as four separate lines.
1=6 years or less
2=7 to 9 years
3=10 to 12 years
4=Over 12 years
There are no spaces between the code and the equals
sign. There are also no spaces between the equals sign and the label. The
code on the left of the equals sign may not be greater than the field width
defined in the format statement. There is not a limit on the length of the
value label (on the right of the equals symbol); however, short value labels
(20 or fewer characters) generally produce more condensed and easier to read
printouts.
For alpha variables, it is important to note that upper
and lower case characters are different. When you enter the code on the
left of the equals sign, the code should be the same case as you plan to
enter the data. For example, if the data entry person will be entering a
lower case m and f for male and female, the value labels would be:
m=Male
f=Female
The value labels also define what will be accepted as
valid data during data entry. Whenever a value label is specified,
the code (on the left of the equals sign) will be interpreted as a valid code
during data entry. If no value labels are specified, all data will be
considered valid.
Many variables do not need any value labels. They
are required only when a coded response will be entered. Numeric
interval and ratio data, as well as open ended alpha data, do not require any
value labels.
The following questions would not need value labels:
What is your age?
What is your first name?
What score did you get on the test?
What is your favorite number?
What would best describe your feelings?
When there are no value labels (such as a test score variable), valid codes
for data entry can still be specified by simply typing the valid codes or
ranges. The format for entering valid codes is:
<Code or Range>
<Code or Range>
<Code or Range>
In this case, each valid code (or valid range) is
entered on a different line. Alternately, a slash (/) may be used to
list a series of valid codes on the same line:
<Code or Range>/<Code or Range>/<Code or
Range>
The following examples illustrate various ways to
specify valid codes.
1/3/5 accept codes 1, 3
& 5
1-3 accept
codes 1 to 3
1-3/5 accept 1 to 3 & 5
15-99 accept values 15 to 99
A-D accept codes A, B, C
and D
A-D/X accept codes A, B, C, D and X
#
accept anything
Notice that the pound symbol (#) is used to specify
"accept any number or letter during data entry". If the field
is numeric, this means any number is an acceptable value. If the field
is alpha, it means that any character is acceptable input. When the
valid codes, labels, and skips field is completely empty, any input will be
accepted (i.e., it is the same as the # symbol). If the pounds symbol is
specified, it should be the last line of the value labels for the variable.
Skip codes allow you to specify conditions for
passing over certain variables during data entry depending on the values
entered for other variables. This is commonly referred to as branching.
For instance, if variable 6 contains responses to the
question "Have you ever read Music Magazine?" and variable 7 stored
answers to the question "How much do you like Music Magazine?", you
would want to skip to variable 8 for a person who responded "No" to
variable 6.
A semicolon and "branch to" number may be used
on a <Code>=<Label> line to control data entry
branching. For the Music Magazine example above, the value labels would
be:
Y=Yes
N=No ;8
Note that the semicolon and variable to branch to follow
the value label. In this example, the space before the semicolon is for
readability only. All of the following lines are equivalent:
N=No;8
N=No
;8
N=No
;
8
As another example, consider a questionnaire that
includes a "dwelling" variable for which 1=Apartment,
2=Condominium, and 3=House. If three separate sections within the questionnaire
corresponded to each type of dwelling, the value labels and skip codes
for the dwelling variable could be:
1=Apartment
;14
2=Condominium ;23
3=House
;29
The skip codes would direct data entry to
variables 14, 23 or 29 depending on whether a 1, 2 or 3 was entered.
Again note that the spacing is for readability only.
The pound symbol (#) may be used in a skip code
to mean any value or code. That is, it is an
absolute jump to a variable regardless of the data entered. For
example, #=;14 means to jump to variable 14 after entering the current
field. This feature is useful when you want to end a branch and rejoin
with a common variable, as in the dwelling example above.
Complex branching is also supported. This means that a
branch can based on the response to a previous variable. The following is
an example of how to use complex branching. Assume it is the value
labels for variable 10. If the data for variable 10 is entered as a 1 or 2,
the complex skip will be evaluated. In this example, the skip pattern is the
same regardless of whether a 1 or 2 is entered. If the previous response to
variable 5 was 1 then the skip will go to variable 25, and if the previous
response to variable 5 was 2 then the skip will go to variable 30. A response
of 3 for this variable would skip to variable 35.
1=Yes ; #V5=1 ; 25 #V5=2 ; 30
2=No ; #V5=1 ; 25 #V5=2 ; 30
3=No Answer ; 35
Note that a semicolon is used to begin the complex skip
and before each "skip to" variable. Also note that a pound symbol
is used to start each portion of the complex skip. All spacing is optional.
Complex skip patterns are not automatically updated if you insert or delete a
variable from the codebook. Therefore, they are
generally added after the structure of the codebook has been finalized.
If you specify a skip to a nonexistent variable
number, it will be interpreted as an instruction to branch to the end of the
questionnaire. For example, if you have a survey with fifty questions,
a skip to variable ninety-nine would mean to immediately end the
current questionnaire, and begin a new interview with the next respondent.
Be careful when defining skip codes, as it is
quite possible to create an endless data entry loop.
Data Entry Control Parameters
The Data Entry Control parameters determine how the data
entry program will operate. They can be set independently for each variable,
and are all of the yes/no variety.
Missing OK
The decision whether or not to allow missing data for a particular variable depends upon the
variable itself. For example, ID number may be something you want to
make mandatory during data entry (no missing data will be allowed). Some
variables however, should accept missing data. For example, in surveys,
respondents may leave questions blank or simply prefer not to answer others;
in agricultural research, some of the crop dies; in public health research,
participants move, etc. When in doubt, missing data should be allowed.
This only means that the data entry person will be able to skip over this
variable if they need to.
For Internet surveys, we
strongly recommend allowing missing data for all variables. The ease in which
a person can leave an Internet survey makes it exceedingly important that
they not become frustrated by the process. Requiring input when a respondent
does not wish to answer an item will most assuredly result in the partially
completed survey.
Auto Advance
When the Auto Advance is set, the cursor will
automatically move to the next field when the current field is filled with
characters. This means that during the data entry process, if you type
the same number of characters that were reserved for field width (in the
format statement), you will not need to
press <enter> to move to the next field. This will significantly
speed up the data entry process since it eliminates a keystroke (i.e., <enter>)
for each variable. This parameter will be ignored for Internet surveys.
Caps Only
The Caps Only parameter determines whether the
characters typed on the keyboard will be converted to upper case letters in
the data file. This is especially useful if a field is coded alpha, and
you do not want the data entry operator to be able to inadvertently enter
lower case characters. It is identical to using the caps lock on your
keyboard. This parameter will be ignored for Internet surveys.
There are two main tools for entering and changing the
information in the codebook: the Grid and the Variable Detail. Either
tool may be used at any time. Generally, the Grid is used when you are
beginning a new codebook, and the Variable Detail is used to make changes to
individual variables. There is also a Analysis utility program “Quick
Codebook Creation” to create a codebook from an extended format statement.
One method of designing a codebook is to use the Grid.
Click on the Grid button and the Grid will be displayed.

A row in the Grid represents a variable. If your study
has 50 variables, there will be 50 rows in the Grid. When you start the Grid,
only one row will be showing. More rows will appear as needed as you enter
the codebook. When you enter a variable format
for the current variable, a blank row for a new variable will appear.
To begin entering information into the Grid, click in
the name field of the first row.
Use the Tab key and Shift Tab keys to move from one
column to the next. You can also use the left or right mouse buttons to
select a field.
The Variable Label and Value Label fields will display a
larger window when you enter those fields. If either of these windows is
showing, you can minimize it by clicking on it with the right mouse button,
or clicking on another field.

Codebook Libraries
There are many features to make the codebook design
easier. One of these is the ability to load variables from other codebooks.
In other words, you can establish a "library"
of commonly used questions. The library can be a codebook that you designed
especially for this purpose, or it can be a codebook that you used for a
previous study.
To load a variable or variables from a library, select
File, Open Library.

After loading the library, you can choose one or more
variables to copy to the new codebook. To select multiple variables, hold
down the control key while you click on the individual variables in the
library. After selecting the variables, click on the Copy To Grid Button in
the top left corner of the library window.
Duplicating Variables
Many times, consecutive variables in the codebook are
similar. While working with the Grid, you can copy the information from the
previous variable to the current variable. While entering a new variable,
click on the Duplicate Button to repeat all
the information from the previous variable. StatPac will automatically change
the variable name since two variables cannot share the same name. StatPac
will not duplicate any fields that are not blank in the current variable.
The Duplicate Button is especially useful when creating
a series of variables that share the same value labels or a series of multiple response variables. For example, if you are entering a
series of variables that all use the same value labels, you could enter the
variable format, name, label, and then click the Duplicate button to repeat
the value labels from the previous variable. When entering multiple response
variables, you could use the Duplicate Button to repeat the entire variable.
The library feature can also be used to duplicate
variables in the current codebook. Unlike the Duplicate Button (which
duplicates only the previous variable), the library can be used to duplicate
variable(s) that appear anywhere in the codebook. First save the codebook by
clicking on the Save button or selecting File, Save Codebook. Then click on
the row where you want the new variables to be inserted. Select File, Open
Library and select the current codebook as the library. Finally select the
variables you want to duplicate, and click the Copy To Grid Button.
Insert & Delete Variables
Normally, while you are designing a study, variables are
added one after another to the end of the existing variables. However, you
can also insert a new variable in the middle of the codebook.
Click on the Grid row you want to be immediately below
the new variable. Then click the Insert Button to open up a blank new row in
the Grid.
To delete a variable, first click on the Grid row you
want to delete. Then click the Delete Button.
Move Variables
The order of the variables can be changed using the Up
and Down Arrow Buttons. First, click on the variable you want to move. Then
click the Up or Down Arrow Buttons to move the variable.
Starting Columns
Starting columns refer to the beginning location of the
variables in the data record. During data entry, each variable that is
entered will be stored in the data record beginning at a certain location.
The starting columns are these locations.
Starting columns are automatically determined, and you
do not need to be concerned about them. That is, starting columns are
assigned by the program while entering new variables into the Grid. They are
assigned so the data record will store variables in consecutive (contiguous)
columns. Thus, the starting columns are being automatically handled by the
program and not displayed as part of the Grid. They can be displayed by
selecting Options, Show Start Columns.
Print a Codebook
To print a codebook, select File, Print, Codebook. The
Print Dialog window will give you the opportunity to choose various printing
options. Printing a codebook is especially important if you give your data
file to someone else, since the codebook will tell them exactly how the data
is formatted.

The Variable Selection lets you select which variables
from the codebook will be printed. The list of variables to print can use
spaces or commas to separate variables, and dashes to indicate a range of
variables.
A codebook printout will always include the variable
numbers, names, and formats for the variables. The What To Print items let
you select what additional information from the codebook will be printed.
Variable Labels - When this parameter is set, variable
labels will be printed.
Value Labels - When this parameter is set, value
labels will be printed.
Valid Codes - When this parameter is set, valid codes
will be printed. This specifically refers to valid codes that are not
part of a <Code>=<Label>.
Skip Codes- When this parameter is set, skip patterns
will be printed as part of the value labels.
To show the Variable Detail window, select View,
Variable Detail. If the Variable List window
is showing, you can also double click on a variable to evoke the Variable
Detail window. The variable window gives you the ability to add or modify
nearly all the information in the codebook. While the layout is different, it
gives you the same functionality as the Grid.
The Variable Detail window can be moved around the
screen by pressing the mouse button on any gray area of the window and
dragging the window to a new location.
When you change any of the fields in the Variable Detail
window, the change is instantly reflected in the codebook. See Elements of a
Variable for a complete description of each field.

Codebook
Creation Process
The basic steps involved in designing a codebook depend
upon whether or not you have a survey typed with a word processor.
Method 1: If you do not have a word-processed
survey, you are essentially "starting from scratch" and it will be
necessary to manually enter the labeling for the codebook. Once completed,
StatPac can automatically create a form for data entry and that can be loaded
into your word-processor, an Internet survey, or an e-mail survey.
Method 2: If you already have a word-processed
survey, considerable time can be saved by loading it into the Workspace
window and then copying text from it to the codebook labels in the Variable
Detail window.
Method 1 - Create a Codebook from Scratch
There are three ways to set up a new codebook:
1. Use the codebook design features that are built into
the program. The Grid and Variable Detail tools let you create and edit
variables, as well as being able to extract variables from other studies or libraries
of questions. A library of questions is simply a codebook with commonly
asked questions. Rather than retyping a question with each new survey, you
can extract it from a library.
2. Use Quick Codebook Creation (an Analysis utility
program) to enter a format statement that describe the variables and their
format. This is the fastest way to create a new codebook. However, the
codebook it creates will not have any variable names, labels or value
labels (although these can easily be added later).
3. If you import data from another format, a codebook
will be created. Depending on the import format, the codebook may or may not
have variable names.
Method 2 – Create a Codebook from a Word-Processed Document
Save the survey with your word processor in .rtf (Rich
Text Format). In StatPac, select File, Open, Rich Text File, and load the
word-processed document into the workspace. You can also load a text
(.txt) document into the workspace.
Activate the Variable Detail window by selecting View,
Variable Detail, or by double clicking on “<New>” in the Variable
List window. Then create the codebook one variable at a time by
specifying a format for the variable, and copying selected text from the form
to the Variable Detail window.
When creating a new variable, first type its format into
the Variable Format field. Then copy text from the workspace to the Variable
Detail window to fill in the rest of the variable information.
To copy text, first highlight the text on the form. It
will automatically be copied to the clipboard when you highlight it. That is,
it is not necessary to select Edit, Copy, or press <Ctrl C>. Next,
click on one of the fields in the Variable Detail window. The text will be
copied to the Variable Detail window. You can copy text from the form to the
Variable Name, Variable Label, or Value Labels fields. Depending on the text,
you may need to edit it in the Variable Detail window. These feature may be
turned off by selecting Format, and then unchecking Semi-Automatic
Copy/Paste.
While using the Variable Detail Window, right clicking in
the Variable Label field will repeat the previous question in its entirety.
You could use this when creating multiple response variables. Right clicking
in the Value Labels field will repeat only the value labels from the previous
question. You could use this when creating a series of questions all using the
same value labels (e.g., a series of Likert scale items).
To check the spelling in a codebook, select Design,
Spell Check. The spelling checker dialog box will be shown.

The default dictionary for the spelling check is
American English. The software also includes spelling dictionaries for
British English, French, Spanish, and German. To change the dictionary that
StatPac uses, you must edit the StatPac.ini file. Find the line that says DictionaryName
= English. Change the word "English" to "British",
"French", "Spanish", or "German".
Multiple Response Variables
If an item on a questionnaire allows for more than one
response, it is called a multiple response item. For instance, in the
following question we would need to allow for five possible responses:
Which of the following brands of toothpaste have you
used in the last year? (Check all brands you've used)
_____ Gleem
_____ Colgate
_____ Pepsodent
_____ Crest
_____ Other
Each of the five choices is viewed as a unique
variable. That is, five variables would be required to accommodate all
possible responses.
When designing a study in the Grid, using the Duplicate
button will properly create all multiple response variables.
Generally, the following conventions are observed when
creating multiple response variables.
1. The format for all multiple response variables must
be the same.
2. The same (identical) variable label should be given
to each of the multiple response variables.
3. If you will be creating a Web survey from the
codebook, the number of variables must be the same as the number of
value labels.
Since there are five choices (value labels), there must be five identical
variables.
The five variables for our
example would contain the following information:
|
V1 Format:
|
N1
|
|
V1 Name:
|
Toothpaste
|
|
V1 Label:
|
Which of the following brands of toothpaste have you
used in the last year?
|
|
V1 Value Labels:
|
1=Gleem
|
|
|
2=Colgate
|
|
|
3=Pepsodent
|
|
|
4=Crest
|
|
|
5=Other
|
|
|
;6
|
The second, third, fourth and fifth variables would be
identical to the first variable, except the variable names would be:
Toothpaste_2, Toothpaste_3, Toothpaste_4, and Toothpaste_5.
Note the above example uses value labels and a skip
code. The skip code says to skip to variable six if nothing is entered for a
variable.
The Missing OK parameter
should be set to "Yes" for all five variables.
Note that during data entry, any toothpaste code can be
entered for any variable. That is, if a person had only checked Crest,
a "4" would be typed for the first variable. For the second
variable, the data entry person would just press <enter> and
this would cause the program to skip to variable six (the continuation of the
questionnaire).
Sometimes surveys ask questions that limit the number of
responses. For example, the following questionnaire item limits the
respondent to two choices, even though there are five items listed. Note that
the following method of limiting the number of choices may not be used for
Web surveys.
From the following list, choose the two items most
important to you. (Two only please)
_____ Friendship
_____ Love
_____ Financial security
_____ Freedom
_____ Spirituality
In this example, we controlled the number of responses
by the way we asked the question. Two variables need to be created to
hold the responses to this item (one for each check).
The study design would contain two variables for these
multiple response variables:
|
V1 Format:
|
N1
|
|
V1 Name:
|
Important
|
|
V1 Label:
|
From the following list, chose the two items most
important to you.
|
|
V1 Value Labels:
|
1=Friendship
|
|
|
2=Love
|
|
|
3=Financial security
|
|
|
4=Freedom
|
|
|
5=Spirtuality
|
|
|
;3
|
|
|
|
|
V2 Format:
|
N1
|
|
V2 Name:
|
Important_2
|
|
V2 Label:
|
From the following list, chose the two items most
important to you.
|
|
V2 Value Labels:
|
1=Friendship
|
|
|
2=Love
|
|
|
3=Financial security
|
|
|
4=Freedom
|
|
|
5=Spirituality
|
Both items have the same format and variable
label. Variable and value labels are only assigned to the first
variable. The second variable will accept valid codes 1-5. Notice
that the first variable also contains a skip pattern that says jump to
variable three if nothing is specified for the first variable.
The two variables IMPORTANT and IMPORTANT_2 are not
weighted. That is, they could be swapped without affecting the results
of any analysis (one is not more important than the other). Codes were
assigned to each of the possible responses.
If the above question was asked in the following way,
the variables would be weighted; that is, one variable is more important than
the other:
From the following list, write a 1 next to the item
that is most important to you and a 2 next to the item that is second most
important to you.
_____ Friendship
_____ Love
_____ Financial security
_____ Freedom
_____ Spirituality
Notice that this is no longer a true multiple response
question; it is really asking two different questions (which is first and
which is second). Unlike the previous examples, both responses are not
weighted equally. Whenever a question asks the respondent to rank a
list of items in some sort of prioritized order, it is not multiple
response. Instead, it is essentially a series of separate (but related)
variables. Two variables would be created for this question, each
having it's own variable name, label and value labels:
|
V1 Format:
|
N1
|
|
V1 Name:
|
Most_Important
|
|
V1 Label:
|
From the following list, what
item
|
|
|
is the most important to you.
|
|
V1 Value Labels:
|
1=Friendship
|
|
|
2=Love
|
|
|
3=Financial security
|
|
|
4=Freedom
|
|
|
5=Spirituality
|
|
|
|
|
V2 Format:
|
N1
|
|
V2 Name:
|
Second_Most_Impt
|
|
V2 Label:
|
From the following list, what
item
|
|
|
is the second most important to
you.
|
|
V2 Value Labels:
|
1=Friendship
|
|
|
2=Love
|
|
|
3=Financial security
|
|
|
4=Freedom
|
|
|
5=Spirituality
|
While both variables in this example share the same
value labels, they are still considered to be separate variables. The
criteria to determine whether or not a question is multiple response is the
issue of priority. If all responses are weighted equally, the question
is appropriate for multiple response. If the question involves any sort
of ranking of the items, it is best viewed as a series of individual
variables.
When StatPac copies variables from the codebook
to the data entry form, variables with the same variable label are
interpreted as multiple response variables, and they will be
automatically grouped together on the data entry template and in the HTML
created for Web surveys.
Missing data may be handled in one of two ways.
Regardless of the method used, it is easy to change missing data using the
Analysis program.
In most cases, no special provisions need to be made
regarding what to do with missing data. If any variable in the data
file is left blank, it will be treated as a missing
value and will be excluded from the analysis. The analysis will print
the number of missing cases, but will not include these when performing any
statistical test.
The other method of handling missing cases is to enter
an additional value label.
A=6 years or less
B=7 to 9 years
C=10 to 12 years
D=Over 12 years
=Missing Cases
Note that the code (on the left of the equals sign) is a
space. All missing data in StatPac is stored as a spaces (or blanks)
during data entry.
When a variable is numeric, it is not appropriate to
specify a value label of <space>=Value Label. Since a
space is not a valid numeric code, it cannot be included in a numeric
calculation. Therefore, missing data will automatically be excluded
from most analyses of interval or ratio numeric data. It is possible,
however, to recode missing data to a valid numeric value (such as zero), so
that it will be included in the analyses. Also, several multivariate
procedures include an option to use mean substitution for missing data.
It is important to understand the consequences of
recoding numeric missing data to something else. Zero and missing are
not the same. Analytical techniques involving computations on a
variable treat zero differently than missing data. Missing data is
excluded from all numerical calculations, whereas zeros are treated just like
any other numeric value.
Changing Information in a Codebook
When initially designing a codebook and form, you can
change any information for any variable. You can also insert new variables
and delete existing variables. This will continue to be true up to the
time that data is entered into a data file. After that, StatPac will
issue a warning when you load a codebook that has an associated data file.
StatPac gives this warning because these operations (i.e., adding new variables
and deleting existing variables) would make the existing data file no longer
match the codebook. You can, however,
change any other study design information at any time.
If you receive the warning message, StatPac will let you
activate a safety feature that prevents inadvertent additions or deletions of
variables by disabling the Insert and Delete Buttons.

If you choose not to utilize the safety feature, be
careful not to inadvertently add, delete, or change the order of any
variables since this would make the existing data file incompatible with the
modified codebook. However, you may still make changes to any other codebook
information including a variable's format.
If you change the format of a variable, the associated
data file adjusted accordingly. For example, if you change a variable format
from A50 to A100, all the existing data records would be 50 characters too
short. However, when you save the revised codebook, each data record will be
padded with spaces so it matches the new codebook information. Note that this
feature normally only changes one data file (the one with the same name as
the codebook). Advanced users may wish to change multiple data files that all
use the same codebook. To enable changing multiple data files, edit
StatPac.ini and set AllowMultipleDataFiles = 1.
Advanced users may wish to turn on or turn off the
safety feature so the prompt is not displayed. The CodebookSafety
parameter can be edited in the StatPac.ini file to control this feature. Set CodebookSafety
= 1 to always enable the safety feature, CodebookSafety = 2 to
always disable the safety feature, and CodebookSafety = 0 (the
default) to ask you each time that a codebook is loaded.
Note that the above information applies only when you
load a codebook for which there is an associated data file.
This is important because entering a few records of
dummy data is often the best way to discover errors in the study
design. You would begin a typical project by designing the variables
and creating a form. Then you could enter a few records into the data
file as a test.
Entering a few dummy records is one of the best ways to
test your codebook. You might discover a variable on the questionnaire
that was inadvertently omitted from the study, an alpha field that's not wide
enough to hold a response, or some other major change to the study design. If
you don't need the data file (i.e., it's just dummy test data), you can
simply delete the data file. To delete a data file, select File, Open, Data
File. Right click on the data file you wish to delete and select Delete.
If you have already entered a substantial number of real
data records, and then discover you need to add a new variable, you cannot
simply add the variable to the codebook. Doing so would make the format
of the codebook different than the data file. Instead, new variables
should be created in an analysis, where both the codebook and the data file
will be updated to include the new variable.
|