 |
The
Statistics Calculator
Statistical
Analysis Tests At Your Fingertips
|
Means menu
Researchers usually use the results
from a sample to make inferential
statements about the population. When the
data is interval or ratio scaled, it
usually described in terms of central
tendency and variability. Means and
standard deviations are usually reported
in all research.
The Means menu has seven
selections:
Mean and standard deviation of a sample
This menu selection will let you enter
data for a variable and calculate the
mean, unbiased standard deviation,
standard error of the mean, and median.
Data is entered using a standard
spreadsheet interface. Finite population
correction is incorporated into the
calculation of the standard error of the
mean, so the population size should be
specified whenever the sample size is
greater than ten percent of the
population size.
Example
A sample of ten
was randomly chosen from a large
population. The ten scores were:
20 22 54 32 41
43 47 51 45 35
----------------------------------------------------
Mean = 39.0
Unbiased standard deviation = 11.6
Standard error of the mean = 3.7
Median = 42.0
Matched pairs t-test between means
The matched pairs t-test is used in
situations where two measurements are
taken for each respondent. It is often
used in experiments where there are
before-treatment and after-treatment
measurements. The t-test is used to
determine if there is a reliable
difference between the mean of the
before-treatment and the mean of the
after treatment measurements.
Pretreatment Post-treatment
Johnny -------------------- Johnny
Martha -------------------- Martha
Jenny ---------------------- Jenny
Sometimes, in very sophisticated
(i.e., expensive) experiments, two groups
of subjects are individually matched on
one or more demographic characteristics.
One group is exposed to a treatment
(experimental group) and the other is not
(control group).
Experimental
Control
Johnny -------------------------- Fred
Martha --------------------------
Sharon
Jenny ----------------------------
Linda
The t-test works with small or large
N's because it automatically takes into
account the number of cases in
calculating the probability level. The
magnitude of the t-statistic depends on
the number of cases (subjects). The
t-statistic in conjunction with the
degrees of freedom are used to calculate
the probability that the difference
between the means happened by chance. If
the probability is less than the critical
alpha level, then we say that a
significant difference exists between the
two means.
Example
A example of a
data set for a matched-pairs t-test might
look like this:
| Pretest |
Post-test |
| 8 |
31 |
| 13 |
37 |
| 22 |
45 |
| 25 |
28 |
| 29 |
50 |
| 31 |
37 |
| 35 |
49 |
| 38 |
25 |
| 42 |
36 |
| 52 |
69 |
-----------------------------------------------------------
Var.1: Mean =
29.5 Unbiased SD = 13.2
Var. 2: Mean = 40.7 Unbiased SD = 13.0
t-statistic = 2.69
Degrees of freedom = 9
Two-tailed probability = .025
You might make a
statement in a report like this: The mean
pretest score was 29.5 and the mean
post-test score was 40.7. A matched-pairs
t-test was performed to determine if the
difference was significant. The
t-statistic was significant at the .05
critical alpha level, t(9)=2.69, p=.025.
Therefore, we reject the null hypothesis
and conclude that post-test scores were
significantly higher than pretest scores.
Independent groups t-test between means
This menu selection is used to
determine if there is a difference
between two means taken from different
samples. If you know the mean, standard
deviation and size of both samples, this
program may be used to determine if there
is a reliable difference between the
means.
One measurement is taken for each
respondent. Two groups are formed by
splitting the data based on some other
variable. The groups may contain a
different number of cases. There is not a
one-to-one correspondence between the
groups.
| Original Data
Set |
|
After
Splitting Data into 2 Groups |
| Score |
Sex |
|
Males |
Females |
| 25 |
M |
|
25 |
27 |
| 27 |
F |
|
19 |
17 |
| 17 |
F |
|
|
21 |
| 19 |
M |
|
|
|
| 21 |
F |
|
|
|
Sometimes the two groups are
formed because the data was collected
from two different sources.
| School
A Scores |
School
B Scores |
| 525 |
427 |
| 492 |
535 |
| 582 |
600 |
| 554 |
|
| 520 |
|
There are actually two different
formulas to calculate the t-statistic
for independent groups. The t-statistics
calculated by both formulas will be
similar but not identical. Which formula
you choose depends on whether the
variances of the two groups are equal or
unequal. In actual practice, most
researchers assume that the variances are
unequal because it is the most
conservative approach and is least likely
to produce a Type I error. Thus, the
formula used in Statistics Calculator
assumes unequal variances.
Example
Two new product
formulas were developed and tested. A
twenty-point scale was used to measure
the level of product approval. Six
subjects tested the first formula. They
gave it a mean rating of 12.3 with a
standard deviation of 1.4. Nine subjects
tested the second formula, and they gave
it a mean rating of 14.0 with a standard
deviation of 1.7. The question we might
ask is whether the observed difference
between the two formulas is reliable.
Mean of the
first group: 12.3
Unbiased standard deviation of the first
group: 1.4
Sample size of the first group: 6
Mean of the
second group: 14.0
Unbiased standard deviation of the second
group: 1.7
Sample size of the second group: 9
-------------------------------------------------------------------
t value = 2.03
Degrees of freedom = 13
Two-tailed probability = .064
You might make a
statement in a report like this: An
independent groups t-test was performed
to compare the mean ratings between the
two formulas. The t-statistic was not
significant at the .05 critical alpha
level, t(13)=2.03, p=.064. Therefore, we
fail to reject the null hypothesis and
conclude that there was no significant
difference between the ratings for the
two formulas.
Confidence interval around a mean
You can calculate confidence intervals
around a mean if you know the sample size
and standard deviation.
The standard error of the mean is
estimated from the standard deviation and
the sample size. It is used to establish
the confidence interval (the range within
which we would expect the mean to fall in
repeated samples taken from the
population). The standard error of the
mean is an estimate of the standard
deviation of those repeated samples.
The formula for the standard error of
the mean provides an accurate estimate
when the sample size is very small
compared to the size of the population.
In marketing research, this is usually
the case since the populations are quite
large. Thus, in most situations the
population size may be left blank because
the population is very large compared to
the sample. However, when the sample is
more than ten percent of the population,
the population size should be specified
so that the finite population correction
factor can be used to adjust the estimate
of the standard error of the mean.
Example
Suppose that an
organization has 5,000 members. Prior to
their membership renewal drive, 75
members were randomly selected and
surveyed to find out their priorities for
the coming year. The mean average age of
the sample was 53.1 and the unbiased
standard deviation was 4.2 years. What is
the 90% confidence interval around the
mean? Note that the population size can
be left blank because the sample size of
75 is less than ten percent of the
population size.
Mean: 53.1
Unbiased standard deviation: 4.2
Sample size: 75
Population size: (left blank -or- 5000)
Desired confidence interval (%): 90
-------------------------------------------------
Standard error
of the mean = .485
Degrees of freedom = 74
90% confidence interval = 53.1 .8
Confidence interval range = 52.3 - 53.9
Compare a sample mean to a population mean
Occasionally, the mean of the
population is known (perhaps from a
previous census). After drawing a sample
from the population, it might be helpful
to compare the mean of your sample to the
mean of the population. If the means are
not significantly different from each
other, you could make a strong argument
that your sample provides an adequate
representation of the population. If,
however, the mean of your sample is
significantly different than the
population, something may have gone wrong
during the sampling process.
Example
After selecting
a random sample of 18 people from a very
large population, you want to determine
if the average age of the sample is
representative of the average age of the
population. From previous research, you
know that the mean age of the population
is 32.0. For your sample, the mean age
was 28.0 and the unbiased standard
deviation was 3.2. Is the mean age of
your sample significantly different from
the mean age in the population?
Sample mean = 28
Unbiased standard deviation = 3.2
Sample size = 18
Population size = (left blank)
Mean of the population = 32
---------------------------------------
Standard error
of the mean = .754
t value = 5.303
Degrees of freedom = 17
Two-tailed probability = .0001
The two-tailed
probability of the t-statistic is very
small. Thus, we would conclude that the
mean age of our sample is significantly
less than the mean age of the population.
This could be a serious problem because
it suggests that some kind of age bias
was inadvertently introduced into the
sampling process. It would be prudent for
the researcher to investigate the problem
further.
Compare two standard deviations
The F-ratio is used to compare
variances. In its simplest form, it is
the variance of one group divided by the
variance of another group. When used in
this way, the larger variance (by
convention) is the numerator and the
smaller is the denominator. Since the
groups might have a different sample
sizes, the numerator and the denominator
have their own degrees of freedom.
Example
Two samples were
taken from the population. One sample had
25 subjects and the standard deviation
4.5 on some key variable. The other
sample had 12 subjects and had a standard
deviation of 6.4 on the same key
variable. Is there a significant
difference between the variances of the
two samples?
First standard
deviation: 4.5
First sample size: 25
Second standard deviation: 6.4
Second sample size: 12
-----------------------------------------
F-ratio = 2.023
Degrees of freedom = 11 and 24
Probability that the difference was due
to chance = .072
Compare three or more means
Analysis of variance (ANOVA) is used
when testing for differences between
three or more means.
In an ANOVA, the F-ratio is used to
compare the variance between the groups
to the variance within the groups. For
example, suppose we have two groups of
data. In the best of all possible worlds,
all the people in group one would have
very similar scores. That is, the group
is cohesive, and there would be very
little variability in scores within the
group. All the people in group two would
also have similar scores (although
different than group one). Again, there
is very little variability within the
group. Both groups have very little
variability within their group,
however, there might be substantial
variability between the groups.
The ratio of the between groups
variability (numerator) to the within
groups variability (denominator) is the
F-ratio. The larger the F-ratio, the more
certain we are that there is a difference
between the groups.
If the probability of the F-ratio is
less than or equal to your critical alpha
level, it means that there is a
significant difference between at least
two of groups. The F-ratio does not tell
which group(s) are different from the
others...just that there is a difference.
After finding a significant F-ratio,
we do "post-hoc" (after the
fact) tests on the factor to examine the
differences between levels. There are a
wide variety of post-hoc tests, but one
of the most common is to do a series of
special t-tests between all the
combinations of levels for that factor.
For the post-hoc "lsd" (least
significant difference) t-tests, use the
same critical alpha level that you used
to test for the significance of the
F-ratio.
Example
A company has
offices in four cities with sales
representatives in each office. At each
location, the average number of sales per
salesperson was calculated. The company
wants to know if there are significant
differences between the four offices with
respect to the average number of sales
per sales representative.
| Group |
Mean |
SD |
N |
| 1 |
3.29 |
1.38 |
7 |
| 2 |
4.90 |
1.45 |
10 |
| 3 |
7.50 |
1.38 |
6 |
| 4 |
6.00 |
1.60 |
8 |
-----------------------------------------------------------------------------------
| Source |
df |
SS |
MS |
F |
p |
| Factor |
3 |
62.8 |
20.9 |
9.78 |
.0002 |
| Error |
27 |
57.8 |
2.13 |
|
|
| Total |
30 |
120.6 |
|
|
|
Post-hoc
t-tests
| Groups
Compared |
t-value |
df |
p |
| 1 & 2 |
2.23 |
15 |
.0412 |
| 1 & 3 |
5.17 |
11 |
.0003 |
| 1 & 4 |
3.58 |
13 |
.0034 |
| 2 & 3 |
3.44 |
14 |
.0040 |
| 2 & 4 |
1.59 |
16 |
.1325 |
| 3 & 4 |
1.09 |
12 |
.0019 |
How to Order Statistics Calculator
|