Utilities
Seven utility programs are provided to give greater
control and more versatility over studies and data files They can be run from
the Analysis, Utilities menu.

The Import and Export program
will allow you to read files created by other software or write files that
can be read by other software. Several formats are supported: Access,
Excel, all prior versions of StatPac, all prior versions of
StatPac Gold, comma delimited, tab delimited, multiple record files,
Internet response files, and plain-text e-mail.
The Merge program is used to merge variables from
different studies and data files or to rearrange the sequence of variables in
a file. It can merge data from up to five individual data files. It can also
be used to concatenate (join) data files using the same codebook.
The Aggregate program
is used to create a true or compositional aggregate study and data file. Aggregate files are useful for summarizing
subgroups of data.
The Codebook program is used to quickly create a
codebook, or to check a codebook and data file for errors. The Check program
is used when you suspect that there is a problem in the codebook or data
file. If a specific procedure won't run, this program can sometimes provide a
solution. A common use of this program is when you are planning to use a data
file created from a source other than StatPac, and you want to make sure that
your study design matches the data file.
The Sampling program is used to generate a random
number table, create a random digit dialing table for telephone studies, and
to select a random sample from a data file.
The Compare Data Files program is used to compare
two data files for differences. It is used to check the accuracy of data
entry when a double entry system has been used.
The Statistics Calculator is used to calculate
distributions, probabilities and other statistics from proportions and
summary data.
StatPac can import and export information to other
software. Select Utilities, and then select import or export. Import means
you want to convert a non-StatPac file into StatPac for Windows format.
Export means you want to convert a StatPac for Windows file to a different
format. When importing or exporting data, the original file(s) will remain
intact and a new file(s) will be created.
When importing, select the type of file and name of the
file to be imported. If you are importing from previous versions of StatPac,
it will be assumed that the codebook/study name and the data file name are the
same. The names for the StatPac for Windows codebook and data may be
different.
Each of the import and export file formats is explained
below.
StatPac and Prior Versions of StatPac Gold
Prior versions of StatPac and StatPac Gold used a
"codebook" file or "study design" files to store variable
format and labels information. StatPac for Windows stores all this
information in a codebook file. Because older versions of StatPac have
limited labeling space, some labels may be truncated when exporting to an
older version of StatPac.
The import program assumes that the data file name is
the same as the study file name for the previous version. Thus, the
"Name of the File to Import" will indirectly also specify the data
file name. For example, if the file to import is SURVEY.EZ0, the program will
try to import a data file called SURVEY.DAT from the same folder. If the data
file does not exist, only the codebook will be imported. StatPac for Windows
stores data in the same format as prior versions of StatPac (fixed format
sequential ASCII). Therefore, if there is not a matching data file name, you
can simply copy the old data file to the folder where you imported the
codebook and use it without modification.
Access and
Excel
StatPac can import or export to a variety common data
base/worksheet formats. The appropriate extension will be used when you
select the type of data base to be imported or exported. When importing to
StatPac, the import procedure will create a new StatPac codebook and data
file. Do not create a StatPac codebook before doing the import or it will be
overwritten by the import procedure.
When importing from Lotus to StatPac, the default
variable names are the worksheet column letters (e.g. Column A, Column B,
etc.). If your worksheet contains locked column headings, they will be used
as variable names for those columns. The column headings may be locked in
Lotus by using the /Worksheet Titles Horizontal command. If the column titles
are not locked, they will be written as the first record in the StatPac data
file (an undesirable situation). Also, be sure your worksheet does not
contain any empty rows or dividing lines between the titles and the first row
of data.
Comma Delimited and Tab Delimited Files
StatPac can import and export comma and tab delimited
files. There are many software packages that can interchange data in this
format. A comma delimited file is a sequential ASCII file where the variables
are separated from each other by commas (rather than each variable using a
fixed number of characters). In a tab delimited file, the separator is a tab
character. Tab delimited imports and exports are generally more reliable than
comma delimited files.
When importing, StatPac creates a new codebook based on
the field widths required to hold the data. If a codebook already exists, you
may use it instead of creating a new codebook. If a data file already exists,
you'll be offered the option of appending to the existing data file or
deleting it. Selecting append will add the newly imported data to the end of
the existing data file.
Many software packages write quotation marks around
alpha fields, while others do not. When importing a comma delimited file, all
quotation marks will automatically be eliminated, since StatPac does not use
quotes. When exporting to a comma delimited file, any field containing a
comma will always be enclosed in quotes. The StatPac.ini file contains a
setting QuoteAlphaFields. When QuoteAlphaFields=1, all alpha
fields will be quoted when exporting to comma delimited. When set to zero,
only fields containing a comma will be enclosed in quotes.
Many software packages can read or write a header record
in comma and tab delimited files. A header record is usually the first record
in the data file. It contains the names of the variables instead of actual
data. If you are importing a comma or tab delimited file and don't know if
there is a header record, load the file into your word processor and look at
it. It the first line in the file is the names of the variables, there is a
header record. If the first record looks just like the other records, it's
data, and not a header record.
When exporting to a tab or comma delimited file, StatPac
will give the option to convert the raw data to the value labels. For
example. if the first variable were gender and coded 1=Male and 2=Female, a
normal export would write a 1 or 2 for the variable. If the Expand To Text
option is selected, it would write Male or Female to each data record instead
of the raw data.
The tab delimited import utility can be used to import a
text file for Verbatim Blaster open-ended response coding. For example, you
might have used Microsoft Word to enter verbatim comments into a .txt file.
Each person's comments were entered as a paragraph (i.e., a continuous
string of text ending with a carriage return). This file can be imported as a
tab delimited file. Since there are actually no tabs in the file, StatPac
will correctly import the text into a codebook and data file containing a
single variable. The variable will be an alpha type and will be as long as
necessary to hold the open-ended comments.
Exported tab delimited files may use a .txt or .tsv (tab
separated variables) extension. Exported comma delimited files may use a .txt
or .csv (comma separated variables) extensions
When exporting to a tab or comma delimited file, keep in
mind that many programs (Excel, Access, etc.) limit the number of columns to
255 (while StatPac can have as many as 2,000). If your codebook has more than
255 variables, an export to an Access file is preferred because it will split
the data into multiple tables as necessary. Otherwise, you’ll have to
use the Write command to create a series of codebooks and data files (each
containing 255 or fewer variables) and then export each one individually to a
delimited file.
Files Containing Multiple Data Records per Case
Many researchers want to use data that is in card-image format on a
mainframe computer. Also, many data entry services are capable of only
punching data in card-image format. While it is relatively easy to download
data from a mainframe, it often comes in 80-column format. If there is only
one record per case, this data can be read by StatPac without performing an
import. However, when there is more than one "card image" (i.e.,
record) per case, it becomes necessary to concatenate (join) the
"card-image" records together to produce a StatPac readable file.
Importing a multiple record file that looks like this...
Card 1 Case 1
Card 2 Case 1
Card 3 Case 1
Card 1 Case 2
Card 2 Case 2
Card 3 Case 2
etc.
will become a StatPac file that looks like this...
Card 1 Case 1 Card 2 Case 1 Card 3 Case 1
Card 1 Case 2 Card 2 Case 2 Card 3 Case 2
etc.
StatPac requires that a data record be a continuous
stream of characters terminated with a carriage return and linefeed. This
program will read a file in multiple record format and create a new data file
with one record per case. The filename should have a .txt extension.
StatPac assumes that there are 80 characters in each
record of the multiple record file. If the "card-image" record
length is less than 80, StatPac will pad the records with spaces before
combining them
You will need to specify how many records there are for
each case. If the downloaded data file has 3 records per case, you will
answer 3 (even if the third "card-image" record is only partially
used).
Internet Files
The preferred method of performing Internet surveys is
to store the responses in a file on the server. When using the method,
responses are stored in ASCII (.asc) text
format. When you're ready to perform an analysis, download the file to your
local computer using an FTP program or Auto Transfer. If you use a different
FTP program, be sure to set it to downloaded the file as an ASCII (not binary)
file.
If you use Auto Transfer, the downloaded file will
automatically be imported into StatPac. If you manually download the file,
you will need to use this import utility to convert the .asc file to StatPac data.
Downloaded Internet response files are not automatically
deleted from your server. Therefore, each time you download the responses, it
will be the entire set of responses since the beginning of the survey.
StatPac will offer you the choice of deleting the existing data file or
appending to the end of it. Since the downloaded file is usually the entire
data set, you would normally want to replace the existing data file with the
newly downloaded data.
Email Surveys
Because of the variety of Email programs, it is not
possible to describe the exact steps you must take to import a returned Email
survey. Each Email program operates a little differently, and you will need
to experiment with your program.
StatPac provides import capabilities for CGI and plain
text Email surveys. CGI Email would be produced by a survey placed on a web
site that used StatPac's email method of capturing responses. A plain-text
survey would be produced by a survey that was simply part of the text in the
body of an Email.
Select e-mail as the import type and use the browse
button to select the file to be imported. Usually, this would be a .mbx file
(i.e., a mailbox in Outlook Express or Eudora where the e-mails were filtered
to. Use the browse button to select the existing StatPac codebook and specify the name of
the data file. If the data file does not exist, it will be created by the
import procedure. If it does exist, the new data will be appended to the end
of the existing data file. Finally, select Text as the Email type and click
OK. StatPac will advise you if any errors were encountered during the import.
If so, the notepad will appear on the menu bar. Click Notepad on your menu
bar to see a description of the errors.
Setting Defaults for the Email Import
An e-mail consist of two parts. The first part is the
e-mail header and the second part is the contents (or body) of the e-mail.
The header contains many lines that are often hidden by e-mail readers, but
can be seen by loading an e-mail into the notepad. StatPac must be able to
properly identify where the header starts and stops in order to know where
the e-mail body begins. The settings in the StatPac.ini file may be adjusted to be compatible with your
e-mail reader or language.
The StartEmailHeader and EndEmailHeader settings should
be set to the text that begins and ends the header section. The
StartEmailHeader parameter should be the text that begins the header section,
and the EndEmailHeader parameter should be set to the last Email header line.
If you are manually copying and pasting incoming e-mails to mailbox file, it
may be important to change these settings. The default values for these
parameters are:
StartEmailHeader = Return-Path:
EndEmailHeader = X-UIDL:
Other e-mail parameters may also be set in the
StatPac.ini file. StartEmailField and EndEmailField can be used to
change the brackets from [ and ] to other characters. The EmailPrefix
parameter tells StatPac what line contains the name/e-mail address of the
respondent. The EmailVarName is the name of the StatPac codebook variable
that will automatically capture the respondent's e-mail address in a
plain-text e-mail, and the EmailDateField parameter is used to get the date
of the e-mail in order to more precisely report which e-mails contained
errors. By modifying these parameters, StatPac can be made to work with any
e-mail reader or language.
There are two basic ways that data files can be
merged. The first is called concatenation, and it is used
to merge two or more data files that contain the same variables in the same
order. The second type of merge lets you join data containing different
variables. Select Utilities, Merge, and then the type of merge you want to
perform.

Concatenate Data Files
Many times, several data entry operators will
simultaneously enter data into a data file on
their own machines. When all the data files have been entered, they can be
merged into one large file by concatenating (joining) the data files.
For example, let's say you have three months of data in
three separate files (JAN.DAT, FEB.DAT and MAR.DAT). The following DOS
command would create a new file called QUARTER1.DAT which contained all three
months of data. You could then run your analysis on all the data for the
first quarter.
The concatenation-style merge assumes that the codebook(s) for all the data files are exactly the same. The
Merge program will let you concatenate any number of data files into a new
(larger) data file. You can type the data file names or use the browse button
to select data files. Only one data file name should appear per line.
Do not confuse concatenating files with the MERGE
utility program. If all your data files reference identical study information
(contain the same variables in the same order), use concatenation to merge
your data into one file. If your data files, however, contain different
variables, use the MERGE utility program.

Merge Variables and Data
The merge program allows you to extract selected
variables from up to five studies and create an entirely new study that will
be saved on disk. If data files have already been entered for any of the
studies, they can also be restructured to match the new study format.
Do not confuse the function of this program with data
file concatenation. If two data files have identical formats (i.e., they
contain the same variables in the same order), the data files should be
merged with the concatenation program.
The restructure and merge program can be used to
reorganize a single study (and data file) or to combine several studies (and
their associated data files). It allows complete versatility with regard to
which variables are selected from each of the studies and the order of the
variables.
The program will ask for the name(s) of the codebooks,
data files, and common variables that will be utilized. For each specified
codebook, also enter the name of the associated data file (if one exists). If
no data file is specified for a particular study, the program will use blanks
for all variables requested from that study.

Also select the common variable in each of the studies.
This refers to a variable that can be used to match up the records from each
data file (e.g., "CASE ID"). If there is not a common variable, it
is imperative that the data files contain the same number of records and in
the same order. That is, record one from data file one should represent the
same respondent (case) as record one from data file two.
If a data record is missing in any of the data files, it
could cause data from one file to be matched with the wrong data from another
file. Therefore, it is always a good idea to have a common variable in each
of the data files (and associated study information) that represents a unique
case identification number. All data files must be sorted by this variable
before running this program. If one of the data files is missing a
particular record, blanks will be merged into the output file.
Click OK to continue. The study numbers and names will
be displayed, and the program will request the format for the new
study. The format statement defines the
structure of the new codebook.

The general format for creating a new file structure is:
(<Study number>) <Variables> or <Variable
range>
An example of a format statement is:
(1) 1-3,8,4 (3) 2-7 (2) 9 14 (1) 12
This statement indicates that the new study format
should contain variables in the following order:
From study 1 - variables 1, 2, 3, 8 and 4
From study 3 - variables 2, 3, 4, 5, 6 and 7
From study 2 - variables 9 and 14
From study 1 - variable 12
Notice that the study number is enclosed in parentheses
with no spaces. Individual variables may be separated by either commas or
spaces. A range of variables is specified by a dash (minus sign) with no
spaces on either side of the dash. If the format statement requires more than
one line, just continue typing and word-wrap will correctly break the line
Variables may be specified in any order. The study
numbers will be displayed at the top of the screen and are assigned by the
computer simply for convenience when specifying the new study format. The
individual variable numbers for each codebook can be determined by examining
the Variable Names windows
The new study format will be checked for validity before
processing begins. If errors are found, you will be asked to re-enter the
format. The new study and data file (if specified) will be written.
The aggregate utility program creates a new study and
data file that consist of aggregate statistics for subgroups of the original
data. Any descriptive statistic may be included in the aggregate files. The
program allows the creation of both compositional and true aggregate files.
For example, let's say we've distributed a questionnaire
to 200 people in each of 50 communities. After performing some preliminary
analyses, we want to compare the communities on a number of the interval or
ratio-type questions. We could, of course, use the IF-THEN SELECT and WRITE
commands to create subfiles for each of the communities and then perform
descriptive statistics analyses on each of the subfiles. Obviously, this
would be a very time consuming procedure. The aggregate utility program provides a much more
efficient way to derive this information.
By using the aggregate program, we could create a new
codebook and data file that just contain the descriptive statistics we
desire. Each record in the new aggregate file would represent one community.
The record would contain the descriptive statistics for the community as a
whole (and not the raw data from the original file). Since there are 50
communities, the aggregate file would contain 50 records. This type of
aggregate file is called a true aggregate file. It is made up of just the
aggregate statistics and does not contain the original data collected. After
creating a true aggregate file, the LIST command could be used to print a
summary of the descriptive statistic for the communities.
The other type of aggregate file is referred to as a
compositional file. Using the same example as above, let's say we want to
compare each case in our original file to the descriptive statistic for the
community. For example, we might want to compare the individual's age with
the mean age in that person's community. In other words, we want each record
in the aggregate data file to contain both the original raw data and the
descriptive statistic for the community as a whole. The number of records in
the compositional aggregate file will contain the same number of records as
the original raw data file. However, the aggregate file will contain more
variables (the original variables plus the aggregate statistics).
When creating either a true or compositional aggregate
file, a new study information file will also be automatically created to
match the new aggregate data file.
Before running the aggregate program, the data file must
be sorted by the variable that contains the group code. For example, if
you plan to create an aggregate file by community, the data file must be
sorted by community before running the aggregate utility program. The sort
order is not important, however, it is important that all cases from the same
community fall together in the file. The aggregate program will accommodate a
minimum of 1000 individual groups.
To sort the file, you might use the following procedure:
STUDY GOVT
SORT (A) COMMUNITY
SAVE
..
Then run the Aggregate program. It will ask for the
codebook name, data file and the variable containing the group code. This
refers to the codebook and data file that already exist (not the new
aggregate files). The variable containing the group code is the same variable
that was used to sort the data file before running this program. In this
example, it is the "community" variable. You must also select the
type of aggregate file to be created, either compositional or true.

Click OK to continue. Now you can select the variable(s)
for which you want to calculate aggregate statistics. Select the desired
variable. Then click on the statistics you want for that variable. Each time
you click on a statistic, an aggregate statement will be created in the
Aggregate Statement window. Each aggregate statement will create one new
aggregate variable.

When performing a compositional aggregate procedure, the
new aggregate variables will be added to the end of each data record. If the
study and data file contain 10 variables, and you type two aggregate
statements, the new aggregate variables would be added as variables 11 and
12.
When performing a true aggregate procedure, the first
variable in the aggregate file will always be the group code (that is, the
variable used to determine the groups). Each aggregate statement will produce
a statistic that is added as the next variable in the file. The first
aggregate statement would create variable two, the next variable three, and
so forth.
Aggregate statistics can only be calculated for
numeric-type variables. There is one exception to this rule: If the
variable used to split the data file into groups is alpha, you may still
calculate the number of valid cases. In our example, if community were coded
alpha, it would be acceptable to ask for the number of valid cases (statistic
17) for this variable.
Each aggregate statement you enter will create a new
variable in the aggregate file. After entering all the aggregate statements,
click OK. A new codebook will be created. The new variable labels in
this study will include both the original labels and the types of statistics.
After the new study has been created, the program will perform all the
aggregate calculations and write the new data file.
Because many calculations are involved in creating an
aggregate file, the program may take some time to finish. It will display a
message informing you of successful completion.
If any statistic cannot be calculated, or if there are
an insufficient number of columns to hold the aggregate statistic, the output
file will contain spaces for that variable. For example, if you requested the
mode, and the group was multi-modal, the aggregate statistic would be stored
as blanks.
There are two utility programs for codebooks. The Quick
Codebook Creation utility creates an entire codebook using a single
FORTRAN-like statement. The Check Codebook & Data utility is used to
verify the integrity of the codebook and to fix errors in the file.
Quick Codebook Creation
The fastest way to create a codebook is to use the Quick
Codebook Creation program. However, this will create a "barebones"
codebook consisting of only the format for each variable. In most cases,
you’ll want to use the Grid or Variable Detail window to create a new
codebook.
Select Analysis, Utilities, Codebook, Quick Codebook
Creation. You will need to enter a file name for the new codebook and a format
statement.
This is essentially a data definition statement and is similar to a FORTRAN
style format statement.

The Format Statement defines the number and type of
variables that will be in the new study. It is the combination of
all the individual variable formats. Using the format
statement can save considerable time if variable and value labels are not
required, or if you plan to use a fixed format data file from another source.
The syntax for each component of a format statement is:
<No. of Vars.> <Var. Type> <No. of Cols> .
<Decimals>
<No. of Vars.> is the number of
consecutive variables that use the format defined by the next three
parameters. If this component of the format statement is omitted, the
default is one.
<Var. Type> is always A or N and
refers to whether the variable(s) are alpha or numeric. StatPac
automatically left justifies alpha variables and right justifies numeric
variables.
<No. of Cols> is the field width
allocated for the variables(s). This is the total field width for the
variable(s) and it must be large enough to hold a plus or minus sign and a
decimal point if necessary.
. <Decimals> is the number of
significant decimal places that the variable(s) will contain. This
component of the format statement is optional and may be omitted. If <decimals>
is not specified, the data will be stored exactly as entered (with or without
a decimal point).
Examples of Format Statements
|
1N5
|
creates 1 numeric variable using 5 columns
|
|
N5
|
creates 1 numeric variable using 5 columns
|
|
12N3
|
creates 12 numeric variables each using 3 columns
|
|
N5.2
|
creates 1 numeric variable using 5 columns the format of
the variable will be ##.##
|
|
7N2.0
|
creates 7 numeric variables each using 2 columns; the format
of the variables will be ## (always rounded to an integer)
|
|
A1
|
creates 1 alpha variable using 1 column
|
|
2A35
|
creates 2 alpha variables each using 35 columns
|
|
5N4 2A1 3N7.2
|
creates a study with 10 variables.
1-5 are numeric each using 4 columns,
6-7 are alpha using 1 column each,
8-10 are numeric using 7 columns each
with 2 significant decimal places
|
Check Codebook and Data
This utility program will verify the integrity of a
codebook and data file. If errors are found the program will attempt to fix
them. If you have created a codebook to match a foreign data file (one
created by a program other than StatPac), use this program to make sure that
the data record lengths match the codebook you created.
Select the codebook and data file to be checked and
click OK. If the program corrects any errors, they will be listed in the
notepad.

The Sampling program is used to generate a random number
table, create a random digit dialing table for telephone studies, and to select a random
sample from a data file.

Random Number Table
When planning to conduct a survey, choosing the sample
is just as important as the survey itself. If the sample is incorrectly
chosen, any results are likely to be distorted. That is, the characteristics
of the sample will not represent the characteristics of the population.
One of the best ways to choose a sample is to use a
random sampling technique. If the sample is randomly chosen from the
population, it will represent the population. That is, characteristics of the
sample are likely to be found in similar proportions in the population.
The classical method of selecting the sample is to give
each case in the population a number and then randomly select numbers until
the sample size is achieved. The second function of this program is to print
a random number table.

You should first select whether the numbers should be
selected with or without replacement. When replacement is used, a number may
be selected more than once (selection does not eliminate it from being
available for future selection). When random numbers are selected without
replacement, the selection of a number eliminates it from the pool of
available numbers. The algorithm used for selection without replacement will
display the random numbers in sequential order.
Enter the number of random numbers you want to be
printed. This relates to the sample size determined with the Statistics
Calculator. Be sure to add a sufficient number to the ideal sample size to
accommodate a pilot test and replacement of nonresponders (if part of your
study design).
Enter the smallest allowable random number and the
largest allowable random numbers. Typically, the lowest value would be one
and the highest value would be equal to the number of cases in the
population.
Enter the name of the StatPac codebook
and data file to store the random numbers and click OK. A StatPac codebook
and data file will be created that contains one variable called
"RANDOM". Finally, the random numbers will be displayed in a
compressed format in the Notepad. You do not need to save them with Notepad
since they are already stored in a StatPac data file.
Random Digit Dialing Table
Telephone surveys sometimes use random digit dialing to
secure the sample. While this method will result in many non-working or
non-voice numbers, it will produce a random sample of people who have
telephones. Since local prefix codes are set (i.e., predefined by the phone
company), only the last four digits of a phone number can be randomly
selected. The random number method of creating a telephone file allows
you to specify a series of local prefix codes and the number of random telephone
numbers you want created for each prefix code.
There is an important consideration to keep in mind when
creating a random digit file. Many of the random numbers will not be
useful. For example, a number may be non-existent, a business office,
or a fax or computer line. There are several algorithms for maximizing
the number of home phone numbers, however, these techniques have generally
produced poor results and are not included in StatPac. Therefore, it is
usually a good idea to select more phone numbers than you actually need.
The random number utility program allows you to specify
any number of prefixes and to specify how many numbers you want from each
prefix. For local surveys, the prefix will be three digits (the local
exchange); for long distance surveys, the prefix will be seven digits (i.e.,
1 + three digits for the area code + three digits for the local exchange).

In the Local Exchange examples on the screen display, 50
numbers would be created with a 929 prefix and 35 numbers would be created
with a 987 prefix. For the Long Distance examples, 25 numbers would be
created that begin with 1-612-925 and 50 numbers would be created that begin
with 1-807-927.
After you have finished typing the prefixes and
quantities, click OK to create the phone number file. A StatPac codebook and data file will
be created that contains one variable "TELEPHONE_NUMBER".
Finally, the random numbers will be displayed in a compressed format in the
Notepad. You do not need to save them with Notepad since they are already
stored in a StatPac data file.
The actual technique used to create the file is called
random number selection without replacement. This means that as a phone
number is selected, it will be eliminated from the pool of available numbers
for the next selection. This eliminates the possibility of selecting
the same number (with the same prefix) twice.
Depending on the number of prefixes and the quantities
from each prefix, the actual creation of the file may take a little
while. Please be patient; the program will inform you when the sample
selection has been completed.
Select Random Records from Data File
With this utility, you can select a specified number of
random records from a data file and write them to a new data file. If you
have a very large data base and a long procedure file, you might use this
utility to create a shorter data file, and perform a test run of the
procedure file on it.
Enter for the name of the existing data file, the new
data file, and the number of records to be selected and written to the new
data file.

Many data entry operators use a double entry method of
data verification. Data is entered into one data file and the same data
is re-entered into another data file. The two data files are then
compared for differences.
The purpose of this utility program is to identify
possible errors in the data; it does not have any editing features.

Enter the name of the StatPac codebook and the names of the two data files to be
compared. The data files should contain the same number of records in
the same order.
Upon completion, the total number of errors will be
reported. If differences are found, the record numbers and which variables
are different will be shown in the Notepad. Use the notepad to print the
errors listing
StatPac supports only two data types, alpha and numeric.
This can make it difficult to work with dates and currency variables. These
utilities simplify the task of working with date and currency variables.
The conversion utilities read an existing codebook and
data file, and create a new codebook and data file with a new converted
variable. The original date or currency variable is not modified and will remain
“as is” in the codebook and data. Instead, a new variable (the converted
field) is created and added to the end of the codebook and data.
Date Conversions
The most common functions with dates are sorting and
selecting. Typically, a user would create an alpha variable for a date
variable because it contains non-numeric characters such as slashes or
dashes. Regardless of the format, sorting by date or selecting the records
between two dates can be difficult unless the date can be readily converted
to a numeric eight-column (N8) variable in the format YYYYMMDD.

The first function will take one or more date variables
in any format and create new N8 variable(s) in YYYYMMDD format. The new N8
variable(s) can be used with the Sort command to sort a file by date. It can
also be used with the Select command to select a range of dates.
The second function will calculate the number of days
between two dates. The two dates can be any date format and the new variable
(number of days) will be an N5 format. The absolute value of the difference
between the two dates will be calculated and added to the end of the new
codebook and data file.
The third function will create an English text version
of a date in “D Mon, YYYY” format (e.g., 5 Oct 2005). The purpose is to make
it possible for the user to subsequently use the List command to create an
easily readable listing of the data.
Currency Conversion
The currency conversion utility is useful for adding or
removing the $ or £ symbols, interpreting a K or M suffix, and removing
commas from currency fields.
When conducting internet surveys (where the respondent
is entering their own response) currency fields can create problems. You can
require numeric input but that is often frustrating for respondents who want
to enter something like 50K or 10M or $25,000. If you believe respondents
will want to enter anything other than a number, you can specify the field as
alpha in the codebook (which will accept any input from the respondent).
After the survey is closed, use this utility to convert the data to a numeric
field.
The CurrencySymbol setting in the defaults (StatPac.ini)
file can be set to your country’s currency symbol. When converting the alpha
field to a number, commas will be removed, the letter K will multiple the
value times a thousand, and the letter M will multiple the value times a
million.

|