**Inferential statistics make it possible
for researchers to make statements about a population based upon a sample
taken from the population. In order for these statements to be accurate,
the sample must be representative of the population and the underlying
assumptions of the statistical test being used must be met. Even when
randomization is used, there is always a possibility for sampling error to
infect the results and, therefore, make a statement less accurate (i.e.,
valid). Sometimes, however, randomization is not possible
(especially for educational research) and intact groups must be matched
and selected. Obviously, this lack of randomization can introduce
sampling bias into a study, infect its results and, consequently, make any
statement that generalizes the findings to the population less accurate
(i.e., valid). In addition, if the assumptions that underlie statistical
tests are violated (e.g., parametric tests require a normal distribution of
scores), then bias enters into the statistical tests being used and the
results of those tests are less accurate (i.e., valid) and generalizations
to the population from the sample are more problematic. The former
condition is the not the fault of the researcher while the latter
condition is the fault of the researcher.**

**Most statistical models assume
error-free measurement, particularly of the independent variables.
But, researchers know that there is no such thing as error-free
measurement because of random, chance variation in the population being
sampled. (The only case where there is error-free measurement is possible
is in a census.) That being the case, the larger the amount of
measurement error, then the more likely it is that a researcher will not
find a statistically significant result. To conceptualize this, think
about how parametric tests of significance report a ratio where the
denominator is the measure of error. Thus, the larger the denominator
(the error), the larger the numerator (the difference between groups) must
be in order to attain a statistically significant finding.**

**STATISTICAL POWER**

**Statistical power (aka, "probability
level," "level of significance") is the probability
that the researcher will avoid making a Type I error—rejecting a null
hypothesis that is true—and is selected by a researcher prior to the
research project. Thus, a "higher" (or "bigger") alpha level
(e.g., α = .05), the more likely it is the researcher will reject a true
null hypothesis (a Type I error). That is, the researcher judges a
significant difference exists between the sample means when there isn't
one. On the other hand, the "lower" (or "smaller") the alpha level
(e.g., α = .001), the more likely it is that the researcher will accept
a false null hypothesis (a Type II error). That is, the researcher
judges that there is not a significant difference between the sample
means when there is one.**

**Assuming the researcher can't conduct a census
yet wants to ensure that it will be tougher to detect a significant difference
between two sample means (that is, to reject the null hypothesis), the
researcher will select a smaller alpha level. For example, where α = .05 (a
"bigger" alpha—5 in 100 chances of a probability of error) the researcher is
more likely to find a significant difference than when α is .001 (a "smaller"
α—1 in 1000 chances of a probability of error).****
The challenge confronting the researcher is that if **
**α**** is
set either too high or too low, it is likely that the researcher will make a
wrong determination regarding the null hypothesis. Don't forget: the null
hypothesis says that there is no difference between the means, that is, they are
equal at a stated probability of error (i.e., the researcher is wrong).**

**This is the point where confusion can
enter into the picture, especially as students begin to learn these
relatively basic concepts. To avoid any confusion, re-read the
statements in the preceding paragraph by applying them to the Null
Hypothesis Chart:**

**
Null Hypothesis**

__True__
__False__

__Accept__ Correct
Type II Error

Decision

__Reject__ Type I
Error Correct

Decision

**A Type I Error is when the researcher
rejects a true null hypothesis. That is, the researcher says there
is a significant difference between the sample means when, in fact,
there is no significant difference. A Type II Error is when the researcher
accepts a false null hypothesis. That is, the researcher says that there
is no significant difference when, in reality, there is a significant difference. Thus, if an
analysis has little statistical power, the researcher is likely to
overlook or miss the outcome s/he desired to discover because the analysis
did not have enough statistical power to detect the significant difference
that would have been evident if the statistical power had been greater.**

**Then, the question becomes how does a
researcher increase power?**

**There are three ways to accomplish this,
all of which are interrelated, meaning that they impact one another. The
obvious first choice is to increase the sample size which decreases the
amount of sampling error present into the sample. The second choice
involves tinkering with the significance level (e.g., ***a priori*
changing p = .05 to p = .01). The third choice is to alter the effect
size, that is, to seek an outcome of a statistical test that departs more
from the null hypothesis. Thus, as the sample size, significant level, and
the effect size increase, so does the power of the significance
test....which is logical, because power increases automatically with an
increase in the sample size and virtually any difference can be made
significant if the sample is large enough.

**EFFECT SIZE**

**Effect size is a numerical way of
expressing the strength or magnitude of a reported relationship, be it
causal or not.**

**The basic formula to calculate the
effect size is to subtract the mean of the control group from that of the
experimental group and, then, to divide the numerator by the standard
deviation of the scores for the control group. Effect size is expressed
as a decimal number and, while numbers greater and 1.00 are possible, they
do not occur very often. Thus, an effect size near .00 means that, on
average, experimental and control groups performed the same; a positive
effect size means that, no average, the experimental group performed
better than the control group; and, a negative effect size means that, on
average, the control group performed better than the experimental group
did. For positive effect sizes, the larger the number, the more effective
the experimental treatment. As a general rule of thumb, an effect size in
the .20's (e.g., .27) indicates a treatment that produces a relatively
small effect whereas an effect size in the 80s (e.g., .88) indicates a
powerful treatment.**

**Thus, the greater an effect size the
researcher desires, the greater the difference has to be between the
experimental group and control group means.**

**This having been said, how might someone
think about effect size?**

**The best way is to relate the concept to
what one already knows. As a teacher, for example, one may be
wondering―all research begins with a problem―about what is the best way to
help students learn course-related material. That teacher may be asking:
Should I use cooperative learning groups, assign homework, or should I
assign and grade homework?**

**Conveniently, a meta-analysis has found
cooperative learning to produce an effect size of .76; the effect size for
assigned homework is .28; the effect size for graded homework is .79 (Whalberg,
1984).**

**So, the teacher should consider a
strategy that uses cooperative learning with graded homework. The worst of
all strategies would be for the teacher to assign homework and hand it
back to the students with no meaningful comments communicated on the
homework.**

**Now, all of that is well and done. But,
at the same time, in seeking a greater effect size, one opens the door to
the possibility of committing a Type I error. That is, a researcher will
more likely falsely reject a true null hypothesis. (Go back to the Null
Hypothesis Chart and check it out.)**

**What all of this means, then, is that
the researcher must decide*** a priori* how much statistical
significance (meaning that the results are likely to occur by chance at a
predetermined level of probability) is needed to determine the practical
significance of the study.

**References**

Whalberg, H. J. (1984). Improving the productivity of America's
schools. *Educational Leadership, 41*(8), 19-27.