|
What does "statistical significance" really
mean?
Many researchers get very excited when they have
discovered a "statistically significant"
finding, without really understanding what it means. When
a statistic is significant, it simply means that you are
very sure that the statistic is reliable. It doesn't mean
the finding is important or that it has any
decision-making utility.
For example, suppose we give 1,000 people an IQ test,
and we ask if there is a significant difference between
male and female scores. The mean score for males is 98
and the mean score for females is 100. We use an
independent groups t-test and find that the difference is
significant at the .001 level. The big question is,
"So what?". The difference between 98 and 100
on an IQ test is a very small difference...so small, in
fact, that it's not even important.
Then why did the t-statistic come out significant?
Because there was a large sample size. When you have a
large sample size, very small differences will be
detected as significant. This means that you are very
sure that the difference is real (i.e., it didn't happen
by fluke). It doesn't mean that the difference is large
or important. If we had only given the IQ test to 25
people instead of 1,000, the two-point difference between
males and females would not have been significant.
Significance is a statistical term that tells how sure
you are that a difference or relationship exists. To say
that a significant difference or relationship exists only
tells half the story. We might be very sure that a
relationship exists, but is it a strong, moderate, or
weak relationship? After finding a significant
relationship, it is important to evaluate its strength.
Significant relationships can be strong or weak.
Significant differences can be large or small. It just
depends on your sample size.
Many researchers use the word "significant"
to describe a finding that may have decision-making
utility to a client. From a statistician's viewpoint,
this is an incorrect use of the word. However, the word
"significant" has virtually universal meaning
to the public. Thus, many researchers use the word
"significant" to describe a difference or
relationship that may be strategically important to a
client (regardless of any statistical tests). In these
situations, the word "significant" is used to
advise a client to take note of a particular difference
or relationship because it may be relevant to the
company's strategic plan. The word
"significant" is not the exclusive domain of
statisticians and either use is correct in the business
world. Thus, for the statistician, it may be wise to
adopt a policy of always referring to "statistical
significance" rather than simply
"significance" when communicating with the
public.
One-Tailed and Two-Tailed Significance
Tests
One important concept in significance testing is
whether you use a one-tailed or two-tailed test of
significance. The answer is that it depends on your
hypothesis. When your research hypothesis states the
direction of the difference or relationship, then you use
a one-tailed probability. For example, a one-tailed test
would be used to test these null hypotheses: Females will
not score significantly higher than males on an IQ test.
Blue collar workers are will not buy significantly more
product than white collar workers. Superman is not
significantly stronger than the average person. In each
case, the null hypothesis (indirectly) predicts the
direction of the difference. A two-tailed test would be
used to test these null hypotheses: There will be no
significant difference in IQ scores between males and
females. There will be no significant difference in the
amount of product purchased between blue collar and white
collar workers. There is no significant difference in
strength between Superman and the average person. The
one-tailed probability is exactly half the value of the
two-tailed probability.
There is a raging controversy (for about the last
hundred years) on whether or not it is ever appropriate
to use a one-tailed test. The rationale is that if you
already know the direction of the difference, why bother
doing any statistical tests. While it is generally safest
to use a two-tailed tests, there are situations where a
one-tailed test seems more appropriate. The bottom line
is that it is the choice of the researcher whether to use
one-tailed or two-tailed research questions.
Procedure Used to Test for Significance
Whenever we perform a significance test, it involves
comparing a test value that we have calculated to some
critical value for the statistic. It doesn't matter what
type of statistic we are calculating (e.g., a
t-statistic, a chi-square statistic, an F-statistic,
etc.), the procedure to test for significance is the
same.
- Decide on the critical alpha level you will use
(i.e., the error rate you are willing to accept).
- Conduct the research.
- Calculate the statistic.
- Compare the statistic to a critical value
obtained from a table.
If your statistic is higher than the critical value
from the table:
- Your finding is significant.
- You reject the null hypothesis.
- The probability is small that the difference or
relationship happened by chance, and p is less
than the critical alpha level (p < alpha ).
If your statistic is lower than the critical value
from the table:
- Your finding is not significant.
- You fail to reject the null hypothesis.
- The probability is high that the difference or
relationship happened by chance, and p is greater
than the critical alpha level (p > alpha ).
Modern computer software can calculate exact
probabilities for most test statistics. If you have an
exact probability from computer software, simply compare
it to your critical alpha level. If the exact probability
is less than the critical alpha level, your finding is
significant, and if the exact probability is greater than
your critical alpha level, your finding is not
significant. Using a table is not necessary when you have
the exact probability for a statistic.
|