Disability Policy Document Archive

Opportunities and risks of high-stakes student testing

Date Mailed: Friday, March 9th 2001 06:51 AM

>From the file in portable document format
http://www.law.harvard.edu/civilrights/conferences/SpecEd/heubertpaper2.=
pdf

Background

High- Stakes Testing: Opportunities and Risks for Students of
Color, English- Language Learners, and Students with Disabilities
1 c 

Jay P. Heubert, J. D., Ed. D. Teachers College, Columbia
University Columbia Law School 

Introduction. The stated objective of the "standards" movement in
American public education is to hold all schools, teachers and
students to high standards of teaching and learning. 2 The
movement reflects awareness that student proficiencies in
literacy and mathematics largely determine success in school and
employment (Murnane and Levy, 1996; Sum, 1999). 

Accountability can take many forms. It is now common, for
example, for schools and school districts to receive favorable or
adverse publicity based on student test scores. In some states,
school districts or schools are subject to specific rewards or
sanctions based on 

student performance. This paper focuses not on school or teacher
accountability but only on tests that have high stakes for
individual students. They are "high- stakes" tests because they
are used in making decisions about which students will be
promoted or retained in grade and 

which will receive high- school diplomas. Section 1 below briefly
describes the growth and current scope of promotion and
graduation testing in the United States. Section 2 explores
current controversies regarding the likely effects of promotion
and graduation tests on minority students (especially blacks, 

Latinos, and Native- Americans), low- SES students, English-
language learners, and students with disabilities. While many
agree that high- stakes testing will affect such students in
significant ways, there are disputes over whether the effects
will be beneficial or harmful. Section 3 describes some important
and broadly accepted norms of appropriate test use, which, if
observed, would reduce the negative effects of high- stakes
testing. Section four describes some elements of a sound testing
program. 

1. The extent of graduation and promotion testing in the U. S.
Graduation testing has gone through several stages of development
in the U. S., and varies considerably from state to state. In the
1970s and 1980s, a number of states adopted requirements under
which students had to pass "minimum competency tests" as a
condition of getting high- school diplomas, even if the students
had satisfied all other requirements for graduation. In the late
1980s and 1990s - responding in part to A Nation At Risk, a
report that warned of "a rising tide of mediocrity" in American
public education, and to the rise of today's "standards"
movement, 

1 This paper will be published as a chapter in Pines, M., ed.,
The Continuing Challenge: Moving the Youth Agenda Forward (Policy
Issues Monograph 00- 02, Sar Levitan Center for Social Policy
Studies). Baltimore, MD: Johns Hopkins University Press. A
shorter version was published in 

September 2000 as "Graduation and promotion testing: Potential
benefits and risks for minority students, English- language
learners, and students with disabilities." Poverty and Race 9
(5): 1- 2, 5- 7. Washington, DC: Poverty and Race Research Action
Council. 2 In principle, standards- based reform has three key
elements: (1) state standards that identify what 

students should know and be able to do, (2) efforts to align
teaching and learning with the state standards, and (3) student
assessments, also aligned with the state standards, the results
of which are used to hold school systems, schools, educators and
students "accountable" for improvements in teaching and learning
(Elmore, 2000). 

2 which emphasizes high standards for all students - some states
replaced minimum competency tests with graduation exams measuring
knowledge and skills at the tenth- grade level or higher. At
present, about 23 states require students to pass graduation
tests (American Federation of Teachers (AFT), 1999), up from
eighteen in 1998 (National Research Council (NRC), 1999). The
number is expected to increase to 29 by 2003 (Shore et al.,
2000). Of the 23, fourteen now set graduation- test standards at
the tenth- grade level or 

higher (AFT, 1999). In response to concerns about "social
promotion," a rapidly growing number of states - thirteen, about
twice as many as a year ago - now require students to pass
standardized tests as a condition of grade- to- grade promotion
(AFT, 1999). In addition, many school districts, particularly in
urban areas, have also adopted promotion- test policies. This
means that large numbers of the nation's minority students and
English- language learners are now subject to state or local
promotion- test programs. 

Further, under current federal law, students with disabilities
and English- language learners - whom many states and school
districts have traditionally exempted from largescale assessments
- must now be included in state and local testing programs, with
accommodation and alternative assessment where necessary. To
serve this objective, states and school districts must not only
assess such students but also publish disaggregated data on their
performance (Individuals with Disabilities Education Act, 1997;
Improving America's Schools Act, 1994). Significantly, federal
law takes no position on whether states and districts should use
test results to determine whether individual students will
receive highschool diplomas or be promoted to the next grade. 2.
Effects of high- stakes testing. Many researchers and
practitioners believe that standards- based reform and high-
stakes testing will have the greatest impact on blacks, 

Latinos, English- language learners, students with disabilities,
and low- SES students. There are serious disputes, however, over
whether promotion and graduation testing will help such students
or hurt them. Proponents of standards- based reform and high-
stakes testing point out that these students are among those who
are most often educated poorly, and who therefore have the most
to gain from a movement whose central objective is to hold all
schools, teachers and students to high standards of teaching and
learning. Meanwhile, critics of high- stakes testing fear that
many such children will be harmed by high- stakes tests: that
they will 

disproportionately be retained in grade or denied high- school
diplomas - both of which have highly negative consequences for
students - because their schools do not expose them to the
knowledge and skills that students need to pass the tests. 

Both arguments are plausible and, as discussed below, both find
support in the literature. The story is complex, however, and the
evidence incomplete. 

Even on graduation tests that measure basic skills, for example,
minority students and students with disabilities usually fail at
higher rates than other students, especially in the years after
such tests are first introduced. For example, in the 1970s, when
minimum 

competency tests gained popularity, 20 percent of black students,
compared with 2 percent of white students - a discrepancy of ten
to one - initially failed Florida's graduation tests and were
denied high- school diplomas (Debra P. v. Turlington, 1979). And
while many students with disabilities were excluded from state
graduation- test programs (NRC, 1999), those who did participate
failed at rates over 50 percent (McLaughlin, 2000). 

For a variety of reasons, failure rates typically decline among
all groups in the years after a new graduation test is introduced
(Linn, 2000). This was true of the early minimum 

3 competency tests; after a few years, for example, black failure
rates were far lower than 20 percent. It also appears to be true
for graduation tests adopted more recently. Texas, for example,
which has a graduation test set at the seventh- or eighth- grade
level (Schrag, 2000), reports that pass rates of blacks and
Latinos roughly doubled between 1994 and 1998, and that the gap
in failure rates between whites, blacks, and Latinos narrowed
considerably during that time (Viadero, 2000). Even so, 1998 data
from the Texas graduation tests show 

continuing disparities: cumulative failure rates of 17.6 percent
for black students, 17.4 percent for Hispanic students, and 6.7
percent for white students (Natriello and Pallas, 1999). Data for
students with disabilities are harder to find, but they show a
similar pattern. 

On one hand, there is evidence that many students with
disabilities do pass state tests in higher numbers over time
(Ysseldyke et al., 1998); New York reports, for example, that the
number of students with disabilities who passed the state's
English Regents exam in 1998- 99 was nearly twice as high as the
number who took the exam two years earlier (Keller, 2000). On the
other hand, 1998 data from fourteen states show gaps that remain
quite high: Students with disabilities consistently fail state
graduation tests at rates 35 to 40 percentage points higher than
those for nondisabled students (Ysseldyke et al., 1998). 

An important, largely unanswered question concerns the extent to
which improved pass rates on graduation tests actually reflect
improved teaching and learning on the part of teachers and
students. Such improvements are plainly one explanation, and the
most desirable one. During the 1980s, however, when many states
reported sharply improved pass rates on graduation tests, scores
on the National Assessment of Educational Progress (NAEP) - a
highly regarded nationally administered examination - showed
little or no improvement in student learning. Indeed, evidence
that minimum competency tests were not producing improved student
performance on the NAEP is one reason why the current standards 

movement emphasizes higher standards, and why some states have
been raising graduationtest standards. More recent fourth and
eighth grade NAEP scores suggest improvements in student
mathematics performance - especially for black students, Latino
students and lowSES students - during the period 1990- 96,
particularly in some states (including Texas and North Carolina)
that invested heavily in smaller class sizes, preschool programs,
and better resources for teachers (Grissmer et al, 2000). Gains
reported on state tests continued to exceed the improvements
measured by NAEP, however, and it is unclear to what extent
improved fourth and eighth grade NAEP scores are due to high-
stakes graduation testing rather than to the specific educational
interventions just mentioned. 

What factors other than improved achievement may explain
increased pass rates on state tests? First, it is well known that
scores on a test can increase as students become familiar with
that test's format, "with or without real improvement in the
broader achievement constructs that tests and assessments are
intended to measure" (Linn, 2000: 4). Studies show that
improvements on a state's tests may not be confirmed when
students take other tests that supposedly measure the same
knowledge and skills (Koretz et al., 1991; Koretz and Barron,
1998). In such circumstances, increases on state tests could be
due in part to "teaching to the test," i. e., focus on subject
matter and formats that appear on the test, and 

students become familiar with that test's format (Mehrens, 1998).
Second, some states may reduce high failure rates, actual or
projected, by making the state graduation tests easier or by
setting lower cutoff scores that students must achieve to pass.
In New York, for example, failure rates on a state test dropped
substantially after the state created a temporary "low pass"
category for students who were below the state's 

4 original passing score. Similarly, increased pass rates in
Texas may be due in part to changes in the test that made it
easier for students to pass (Schrag, 2000). 

Third, if low- achieving students are not part of the test-
taking population, then the pass rates of those who remain will
be higher - even if the achievement of those who actually take
the test has not improved. Thus, reported pass rates should be
viewed in the context of 

such factors as (a) dropout rates; (b) whether states count among
dropouts, or include in computing graduation rates, students who
choose (or are even encouraged) to leave school to pursue general
equivalency diplomas 3 ; (c) exemptions of students with
disabilities or English- language learners from the test- taking
population, which are far higher in some states 

than in others (Ysseldyke et al., 1998) 4 ; and (d) improper
testing accommodations that may artificially inflate some
students' scores (Sack, 2000; Allington, 2000). Not surprisingly,
there is also a spirited debate about whether graduation testing
causes increased dropout rates. On one hand, it appears that many
low- achievers start to disengage from school well before
graduation tests loom. On the other hand, there are reputable
scholars who argue - credibly - that fear of failing a graduation
test increases the likelihood that low achievers will leave
school (Clarke et al., 2000). 5 Also, the current climate of
accountability places new pressures on schools to increase
student pass rates, 

which in turn can lead to increased and/ or understated dropout
rates (Schrag, 2000). Unfortunately, this critical issue is
complicated by a lack of uniformity among the states in defining
and counting dropouts (Viadero, 2000). 

Even as these debates continue, other developments are
fundamentally changing the landscape. One such development,
already noted, is that some states are raising the bar: setting
higher standards on state graduation exams. The most ambitious
states are adopting graduation tests that reflect "world- class"
standards such as those embodied in NAEP. 

Based on national NAEP data, about 38 percent of all students
would fail tests that reflect such "world- class" standards if
they were administered today (Linn, 2000). 

For minority students and English- language learners, moreover,
there is clear evidence that failure rates on tests embodying
"world- class" standards would be extremely high - about 80
percent 6 - at least at first. These predictions are consistent
with recent data from Massachusetts, where students have begun
taking graduation tests that reflect "world 

3 It is well known that the general equivalency diploma, or GED,
has far less value than a regular highschool diploma in terms of
an individual's future opportunities for education or employment.
4 In 1998, for example, New York and Massachusetts included over
90 percent of students with disabilities in their state
assessment programs, compared with 50 percent in Texas
(Ysseldyke, et al., 

1998). 5 Such fears are presumably greater in states where
graduation- test standards are higher. 

6 These estimates are based on the proportion of students scoring
below "basic" on the NAEP. For example, in 1996, 40 percent of
students taking the eighth grade math test scored below "basic,"
and in the District of Columbia public schools roughly 80 percent
scored below "basic" (Linn, 2000, citing Reese et al., 1997). 

5 class" standards. 7 For students with disabilities, it is also
reasonable to assume that initial failure rates on such tests
would also be very high: in the 75 to 80 percent range. 8 Second,
the proliferation of large- scale promotion testing, which is
especially pronounced in large, urban school districts (AFT,
1999), has led to sharply higher rates of retention in grade,
especially for black students, Latino students, and English-
language learners. In New York City, Chicago, and other cities,
hundreds of thousands of students, the 

vast majority black, Latino, and/ or English- language learners,
have failed promotion tests and been retained in grade, and it is
reasonable to expect that students with disabilities would also
be retained in large numbers. 

The single strongest predictor of whether students will drop out
of school is whether they have been retained in grade. The rapid
growth of promotion testing, particularly in our large cities, is
therefore likely to create an increasingly large class of
students - disproportionately comprised of blacks, Latinos,
English- language learners, students with disabilities, and low-
SES students - who are at increased risk of dropout by virtue of
having been retained in grade one or more times. Those retained
in grade even once are much likelier to drop out later than are
students not retained, and the effects are even greater for 

students retained more than once (NRC, 1999; Hauser, 1999;
Shepard and Smith, 1989). 9 Moreover, much of the increase in
dropout rates shows up only years later, and the harm is thus
largely invisible at the time retention occurs. In this sense,
retention in grade is somewhat like high blood pressure. 

Promotion testing is thus likely to increase, perhaps
significantly, the numbers of students who suffer the serious
consequences of dropping out. 10 It is also likely to reduce the
numbers of students who remain in school long enough to take
graduation tests. It would be unfortunate - and hardly evidence
of success - if states, school districts, or schools achieved
high pass rates on graduation tests because large numbers of low
achievers had already left school and were no longer among the
test takers. Given the relationships between promotion testing,
retention in grade, and increased dropout rates, promotion-
testing policies warrant closer attention than they have received
thus far. 

Promotion and graduation testing may also have unintended
consequences for teachers. As noted above, high- stakes testing
is intended to raise teacher motivation and effectiveness, and
there is evidence that with appropriate professional development,
support, resources, and time teaching effectiveness can improve
significantly (Elmore, 2000). There is already evidence, however
that the negative publicity associated with poor test scores can 

7 In Massachusetts, roughly 40 percent of white students failed
the "MCAS" in 1999, compared with 80 percent of black students
and 82 percent of Hispanic students. Passing the MCAS is not now
required for graduation, but soon will be. 

8 As noted earlier, students with disabilities consistently fail
state tests at rates 35 to 40 percentage points higher than those
for nondisabled students (Ysseldyke et al., 1998). If the failure
rate for nondisabled students is 38 percent, the estimated
failure rate for students with disabilities would be in the range
of 75 to 80 percent. 

9 Retention has other negative consequences as well. Strong
evidence indicates that retained students are less well off
academically and socially than similar low- performing students
who are promoted (NRC, 1999; Hauser, 1999; Shepard and Smith,
1989). 

10 These include sharply reduced earnings, reduced prospects for
employment and further education, and significantly increased
risk of involvement with the criminal justice system. 

6 lead experienced teachers to leave urban schools for the
suburbs (See, e. g., Lee, 1998). Plainly, efforts to improve low-
performing urban schools - and to educate all children
effectively - will be undermined if those schools lose strong
teachers. As noted above, policies that lead to improved teaching
and learning are likely to 

benefit minority students, English- language learners, and
students with disabilities even more than they do other students.
In New York, Education Commissioner Richard Mills defends
stringent graduation- test requirements partly because he hopes
they will bring an end to lowtrack classes, in which students -
most of them black students, Latino students and/ or English-
language learners - typically receive poor quality, low- level
instruction. This position is grounded in solid evidence that
placement in typical low- track classes is educationally harmful
for students (NRC, 1999; Oakes, Gamoran and Page, 1992), and that
students will learn more if they are placed in more demanding
classes (NRC, 1999; Weckstein, 1999). 

Advocates for minority children and low- SES children hope that
high standards will provide the political and legal leverage
needed to improve resources and school effectiveness so that all
children receive the high- quality instruction they need to be
able to meet demanding academic standards. Disability- rights
groups likewise hope that state standards 

and tests will drive teachers to upgrade the individualized
education programs (IEPs) of students with disabilities, so that
IEPs reflect more of the knowledge and skills that nondisabled
students are expected to acquire - and here, too, there is
evidence that higher expectations and improved instruction lead
to improved achievement (Individuals With Disabilities Education
Act, 1997; Ysseldyke et al., 1998). Moreover, some proponents of
high- stakes testing argue that the fear of negative consequences
- retention or diploma denial for students, negative publicity
and (in rare instances) adverse personnel action for educators -
can be a positive force, one that increases the motivation of
teachers to teach and students to learn. 

3. Standards of appropriate test use: widely accepted, sometimes
ignored. Whether graduation testing helps or hurts low achievers
depends largely on whether such tests are used to promote high-
quality education for all children - the stated objective of
standardsbased reform - or to penalize students for not having
the knowledge and skills that they have not been taught in
school. This is the principal theme that Education Secretary
Richard Riley, a strong proponent of standards- based reform,
emphasized in his February 22, 2000 "State of 

American Education" address. Riley called for a "midcourse
review" of the standards movement, a step he said was needed
"because there is a gap between what we know we should be doing
and what we are doing" (Riley, 2000: 6). 

Specifically, Secretary Riley said that state standards should be
"challenging but realistic..[ Y] ou have to help students and
teachers prepare for these [high- stakes] tests - they need the
preparation time and resources to succeed, and the test must be
on matters that they have been taught" (Riley, 2000: 7). He also
advised states not to rely on any single 

measure of students' knowledge in making high- stakes decisions:
"All states should incorporate multiple ways of measuring
learning" (Riley, 2000, 6). 

Not coincidentally, perhaps, these concerns are also reflected in
norms of appropriate test use that the testing profession, the
National Research Council, and the American Research Association
(AERA) have articulated, and only quite recently. The Standards
for Educational and Psychological Testing, issued in December
1999 by the American Educational Research Association, the
American Psychological Association, and the 

7 National Council on Measurement in Education (and referred to
here as the Joint Standards), 

assert that promotion and graduation tests should cover only the
"content and skills that students have had an opportunity to
learn" (AERA, APA, and NCME, 1999: 146, Standard 13.5). The
Congressionally mandated NRC study, High Stakes: Testing for
Tracking, Promotion, and Graduation reached a similar conclusion
in 1999: "Tests should be used for high- stakes decisions. only
after schools have implemented changes in teaching and curriculum
that ensure that students have been taught the knowledge and
skills on which they will be tested" (NRC, 1999). So does the
AERA, which, in a July 2000 Policy Statement 

Concerning High Stakes Testing, recommends the following
"condition[] essential to sound implementation of high- stakes
educational testing programs": "When content standards and
associated tests are introduced as a reform to. improve current
practice, opportunities to access appropriate materials and
retraining consistent with the intended changes should be
provided before. students are sanctioned for failing to meet the
new standards" (AERA, 2000: 2). 

Unfortunately, there are often discrepancies between what high-
stakes tests measure and what students have been taught. Results
of a recent ten- state study led by Andrew Porter suggest that
there is surprisingly little overlap between a state's standards
and what teachers in the state say they are actually teaching
students. The actual overlap ranged from a low of from 5 percent
to a high of 46 percent, depending on the subject, grade level,
and state 

(Boser, 2000). If these states use promotion or graduation
testing, or are representative of practice elsewhere in the U.
S., then some states and school districts appear to be using
promotion and graduation tests in a manner inconsistent with
widely accepted norms of 

appropriate test use. Moreover, such discrepancies are likely to
be particularly high where minority students, English- language
learners, and students with disabilities are concerned, 11 and
where students are expected to master "world- class" standards.
12 Similarly, as noted above, increasing numbers of states and
school districts 

automatically deny promotion or high- school diplomas to students
who fail state or local tests, regardless of how well the
students have performed on other measures of achievement, such as
course grades. Secretary Riley is not alone in believing that
states and school districts should weigh information other than
test scores in making high- stakes decisions about promotion and
graduation. The NRC study (1999: 279) emphasizes that educators
should always buttress test score information with "other
relevant information about the student's knowledge and skills,
such as grades, teacher recommendations, and extenuating
circumstances" when making high- stakes decisions about
individual students. This is also 

consistent with the testing profession's Joint Standards, which
state that "in elementary or secondary education, a decision or
characterization that will have a major impact on a test taker
should not automatically be made on the basis of a single test
score. Other relevant 

11 Minority students are often overrepresented among those who do
not receive high quality curriculum and instruction, including
those assigned to low- track classes. There are also many
students with disabilities whose IEPs do not ensure that students
receive the instruction they need to pass large- scale promotion
and graduation tests, partly because such students have not
traditionally been included in large- scale assessment programs.
Similarly, many English- language learners have not had the
opportunity to acquire the subject- matter knowledge or the
levels of English proficiency they need to pass such tests. 12 In
most of the nation, much needs to be done before world- class
curriculum and instruction will be in place (National Academy of
Education, 1995). 

8 information. should be taken into account if it will enhance
the overall validity of the decision" (APA, AERA, and NCME, 1999:
146, Standard 13.7). Similarly, the AERA Policy Statement (AERA,
2000: 2) provides that "[ d] ecisions that affect individual
students' life chances or educational opportunities should not be
made on the basis of test scores alone. Other relevant
information should be taken into account to enhance the overall
validity of such decisions." 

Why is it so important to use multiple measures in making
important decisions about individuals? The answer is that any
single measure is inevitably imprecise and limited in the
information it provides. Proponents of high- stakes testing
sometimes point out the problems associated with exclusive
reliance on student grades in making promotion and graduation 

decisions: there has been considerable grade inflation during the
last three decades, for example, and there is considerable
variation between teachers, schools, and school districts in what
particular grades mean. They are right. But that does not mean
that they should be ignored altogether. 

For standardized tests, like grades, are limited in what they
measure. It is well known, for example, that grades are a far
better measure than standardized tests of f student motivation
over time, a factor critical to later success in school and in
the workplace. Moreover, as these examples illustrate, even the
best standardized tests are far less precise than most people
realize: First, what are the chances that two students with
identical "real achievement" will score more than 10 percentile
points apart on the same Stanford 9 test? For two ninth graders
who are really at the 45th percentile in math, the answer is 57
percent of the time. In 4th grade reading, the probability is 42
percent. Second, how often will a student who really belongs at
the 50th percentile according to national test norms actually
score within 5 percentile points of that ranking on a test? The
answer is only about 30 percent of the time in mathematics and 42
percent in reading (Viadero, 1999: 3, citing Rogosa, 1999). 

Given the imprecision of grades and test scores, judgments based
on combinations of both are more accurate and reliable than those
based on either by itself. To use either one by itself when both
are readily available is like telling one's physician to conduct
a physical exam relying only on a thermometer or only on a single
blood test. Unfortunately, as Secretary Riley noted, "there is a
gap between what we know we should be doing and what we are
doing." This is the case in the many states and school districts
that make promotion or graduation decisions relying solely on
student test scores. Such practices, though widespread, do not
seem consistent with norms of appropriate test use. 

To complicate matters, there is at present no satisfactory
mechanism for ensuring that states and school districts respect
even widely accepted norms of appropriate, nondiscriminatory test
use. The two existing mechanisms - professional discipline
through the associations that produce the Joint Standards, or
legal enforcement through the courts or administrative agencies -
have complementary shortcomings. Professional associations such
as the American Educational Research Association, the American
Psychological Association, 

and the National Council on Measurement in Education have
detailed standards, but lack mechanisms for monitoring or
enforcing compliance with those standards. For courts and federal
civil- rights agencies, the reverse is true; they have complaint
procedures and enforcement power, but lack specific, legally
enforceable standards on the appropriate use of 

9 high- stakes tests. Recognizing the problem, the U. S.
Department of Education's Office for Civil Rights has released a
draft resource guide that, while not legally binding, aims to
promote appropriate use of high- stakes tests. 13 4. Elements of
a sound testing policy. Given these concerns, what are some
elements of a sound high- stakes testing policy within the larger
context of standards- based reform? First, states should adopt
standards for what students should know and be able to do. And 

while such standards continually evolve, this is something
virtually all the states have done (AFT, 1999). Second,
policymakers and educators should strive to bring the curriculum
into alignment with the state's standards; according to Lauren
Resnick, a national leader of the standards movement, many states
are still experiencing problems at this stage. 14 A third step 

is to bring actual instruction into line with the state standards
and curriculum. This objective is a challenging one, requiring
substantial investments in staff development. Teachers - and
administrators, who are increasingly called upon to serve as
instructional leaders (Elmore, 2000) - need considerable
training, about how to enact the new curriculum, how to identify
the aspects of the curriculum that create problems for students,
and how best to address 

students' learning needs. Some schools will also have to upgrade
facilities. Here, too, there is evidence of major gaps
(Natriello, 1998), including Andrew Porter's recent findings
(Boser, 2000) about the limited overlap between state standards
and what teachers say they teach. 

Note that the steps mentioned thus far do not mention high-
stakes testing. There is no reason why states cannot use large-
scale assessments to help drive changes in curriculum and
instruction, and many states do. But the Joint Standards, the
1999 NRC study, and the July 2000 AERA Policy Statement all
assume that the preceding measures will be in place before such
instruments become high- stakes tests for students. As noted
above, all three say that 

tests should be used to decide whether individual students will
be promoted or given highschool diplomas only after students have
been taught the kinds of knowledge and skill that the tests
measure. This is not the situation in every state. Often,
graduation testing and promotion testing precede the alignment of
curriculum and instruction with state standards (Elmore, 2000),
and in many cases the tests are not well aligned with state
standards: "There is little evidence to suggest that exit exams
in current use have been validated properly against the defined
curriculum and actual instruction; rather, it appears that many
states may not have taken adequate steps to validate their
assessment instruments, and that proper 

studies would reveal important weaknesses" (NRC, 1999: 179,
citing Stake, 1998). The Joint Standards (1999), the NRC study
(1999), and the AERA Policy Statement (2000) describe measures a
state or school district should take if it elects to use tests
for highstakes 

purposes. One, just noted, is not to use tests for high- stakes
purposes until schools are actually teaching students the
relevant knowledge and skills. Second, test users should make
sure that a high- stakes test is valid for its intended purpose.
This may sounds obvious, but it is not something every test user
does. Chicago, for example, has gotten national publicity for its
use of the Iowa Test of Basic Skills (ITBS) in making promotion
decisions, but the district's chief accountability officer has
candidly acknowledged that the ITBS is not valid as 

a measure of which students should be promoted or held back.
Third, test developers should 13 The draft, dated June 6, 2000,
is entitled The Use of Tests When Making High- Stakes Decisions
for Students: A Resource Guide for Educators and Policymakers. It
draws heavily on the Joint Standards and the 1999 NRC study. 14
Personal conversation with Lauren Resnick, January 7, 2000,
Washington, DC. 

10 take students with disabilities, English- language learners,
minority students and other groups into account beginning with
initial test development, and should take steps to ensure that
the test is equally valid for all major student populations that
will take it (NRC, 1999; AERA, APA, NCME, 1999; AERA, 2000). 

Fourth, test users should not rely solely on test- score
information in making promotion and graduation decisions (NRC,
1999; AERA, APA, NCME, 1999; AERA, 2000). Instead, as colleges
do, states and school districts should look at multiple measures
of student achievement and readiness, and allow high achievement
on one measure to balance lower performance on another. Further,
some states measure not only absolute achievement - in the form
of a 

percentage of students passing a test - but also improvement over
time (i. e. higher percentages of students passing a test). And
some states measure whether school districts or schools are
succeeding in closing the gap between high- achieving and low-
achieving students. Each of these measures adds something
important. An absolute standard signals 

that schools set high expectations for all students rather than
lower expectations for some. A standard based on improvement
recognizes that different students, schools, and school districts
start out at different places, and rewards progress. A standard
based on whether schools are closing the achievement gap -
between white students and minority students, between nondisabled
students and students with disabilities, between native English
speakers and English- language learners - encourages schools to
pay more attention to these very important goals. 

Fifth, a test use is inappropriate unless it leads to the best
available treatment or placement for students (NRC, 1999). This
means that states and school districts should refrain from using
test scores (or other information) to justify educational
decisions that are demonstrably harmful to students. Based on the
weight of research evidence, two placements or treatments that
typically harm students are retention in grade and placement in
typical low- track classes (NRC, 1999; Hauser, 1999; Oakes,
Gamoran, and Page, 1992). 

Retention and low- track placements are inimical to the goal of
helping all students reach high levels of achievement. Both are
inconsistent with principles of appropriate test use. Sixth, the
debate over high- stakes testing often frames the issue in
"either- or" terms: Either we promote a student who is not ready
or we retain him in grade. Either we give someone a diploma or we
deny a diploma. Neither alternative is attractive, of course, but
there is almost always another, better, approach. Any information
schools can use to make a promotion or graduation decision can be
used years earlier - before the "gate" is reached - to determine
which children are performing poorly and to help get them the
support they will need to be able to meet high standards.
Teachers typically know, long before a promotion or graduation
test, which students will need help if they are to pass.
Effective early intervention is critical, as recent research
shows (Grissmer et al., 2000). 

Seventh, tests by themselves do not improve learning, any more
than a thermometer reduces fever. At best, good tests provide
information. It is important that this information, along with
information from other sources, be available - in an
understandable form - to policymakers, educators, parents and
students. And it is equally important for all concerned to know
which policies and practices are likeliest to produce improved
teaching and learning (Elmore, 2000; Grissmer et al, 2000).
Educators and parents also need access to the 

resources that it takes to make the necessary changes in teaching
and learning. Unfortunately, it is well known that many school
districts and schools lack resources they 

11 need to enable all children to reach high levels of
achievement (National Academy of Education, 1995; NRC, 1999). 

Last but not least, these questions all call for additional
research: on what interventions work, on how treatments effective
in some settings can be implemented widely, and, not least, on
how high- stakes testing policies affect student learning and
dropout rates, for students generally and for such important
groups as students of color, English- language learners, and
students with disabilities. 15 In conclusion. The standards
movement and high- stakes testing present both 

opportunities and risks to students of color, English- language
learners, and students with disabilities. These students are
among those who stand to benefit most if all students receive
high- quality instruction. Such students are also at great risk,
however, especially in states that administer high- stakes
promotion and graduation tests before having made the
improvements in instruction that will enable all students to meet
the standards. Even failure rates far below 75 to 80 percent are
plainly unacceptable, for these students and for our entire 

society. Educating all students to high levels is something no
society has achieved to date, and reaching that objective will
obviously be no simple matter. Promotion and graduation tests are
one part of this picture, and there are those who question the
necessity or desirability of such testing even as it becomes more
widespread. One thing is clear, however: If states and school
districts are going to use high- stakes 

testing, then it is critical that such testing be done well. The
basic principles of appropriate test use are relatively clear and
enjoy broad support among researchers (NRC, 1999; AERA, APA,
NCME, 1999; AERA, 2000). 

States and school districts that disregard these principles put
their students at risk - and also themselves. The prospect of
high failure rates has already produced a political backlash
against some states' high- stakes testing programs, and lawsuits
are also likely, if only because there exist no alternatives by
which to ensure appropriate use of tests that affect students'
life chances in such important ways. 

The stakes are high. 15 As the NRC study (1999: 281) notes, "[ h]
igh- stakes testing programs should routinely include a
welldesigned evaluation component. Policymakers should monitor
both the intended and unintended consequences of high stakes
assessments on all students and on significant subgroups of
students, including minorities, English- language learners, and
students with disabilities." 

12 REFERENCES Allington, R. (2000). "Letters: On Special
Education Accommodations." Education Week 19 (35): 48. 

American Education Research Association (2000). AERA Position
Statement Concerning High- Stakes Testing in PreK- 12 Education.
Available: http// www. aera. net. about/ policy/ stakes. htm 

American Educational Research Association, American Psychological
Association, and National Council on Measurement in Education
(1999). Standards for Educational and Psychological Testing.
Washington, DC: American Psychological Association. 

American Federation of Teachers (1999). Making Standards Matter
1999. Washington, DC: American Federation of Teachers. 

American Federation of Teachers (1998). Making Standards Matter
1998. Washington, DC: American Federation of Teachers. 

Clarke, M., W. Haney, and G. Madaus (2000). "High Stakes Testing
and High- School Completion." Boston: National Board on
Educational Testing and Public Policy 1 (3), 111. 

Council of Chief State School Officers (1999). Trends in State
Student Assessment Programs. Washington, DC: Council of Chief
State School Officers. 

Cronbach, L. J. (1971). Test validation. In Educational
Measurement, 2d Edition, R. L. Thorndike, ed. Washington, DC:
American Council on Education. 

Debra P. v. Turlington, 474 F. Supp. 244 (M. D. Fla. 1979); aff'd
in part and rev'd in part, 644 F. 2d 397 (5th Cir. 1981); rem'd,
564 F. Supp. 177 (M. D. Fla. 1983); aff'd, 730 F. 2d 1405 (11th
Cir. 1984). Elmore, R. (2000). Building a New Structure For
School Leadership. Washington, DC: The 

Albert Shanker Institute. Grissmer, D. (2000). Improving Student
Achievement: What State NAEP Scores Tell Us. Santa Monica, CA:
Rand. 

GI Forum v. Texas Education Agency, 87 F. Supp. 2d 667 (W. D.
Tex. 2000). Harvard Educational Review. (1994). Symposium: equity
in educational assessment. 

Harvard Educational Review: 64 (1). Hauser, R. (1999). "Should We
End Social Promotion? Truth and Consequences." In Orfield, G. and
M. Kornhaber, eds., Raising Standards or Raising Barriers?
Inequality and High Stakes Testing in Education. New York: The
Century Fund. 

13 Improving America's Schools Act of 1994, 20 U. S. C. sections
6301 et seq. 

Individuals with Disabilities Education Act, 20 U. S. C. section
1401 et. seq. (1997). Keller, B. (2000). "More N. Y. Special
Education Students Passing State Tests." Education 

Week 19 (31): 33. Kober, J. and M. Feuer (1996). Title I Testing
and Assessment: Challenging Standards for 

Disadvantaged Children. Washington, DC: National Academy Press.
Koretz, D., R. Linn, S. Dunbar, and L. Shepard (1991). "The
Effects of High- Stakes Testing on Achievement: Preliminary
Findings About Generalization Across Tests." Paper 

presented at the annual meeting of the American Educational
Research Association, Chicago, IL. 

Koretz, D. and S. Barron (1998). The Validity of Gains on the
Kentucky Instructional Results Information Systems (KIRIS). Santa
Monica, CA: Rand. 

Lee, J. (1998). "Using High Stakes Test Results to Give
Disadvantaged Kids Access to Outstanding Responsive Teachers."
Paper presented at the Harvard Civil Rights Project/ Teachers
College Conference on High- Stakes Testing and Civil Rights,
December 4, 1998, New York. 

Linn, R. (2000). "Assessments and accountability." Educational
Researcher 29 (2), 4- 16. McLaughlin, M. (2000). "High Stakes
Testing and Students with Disabilities." Presentation 

at the National Research Council Conference on the Role of the
Law in Achieving High Standards for All. Washington, DC, June 30.
Mehrens, W. A. (1998). Consequences of Assessment: What is the
Evidence? Vice Presidential 

Address for Division D, annual meeting of the American
Educational Research Association, San Diego. Murnane, R. and F.
Levy (1996). Teaching the New Basic Skills. New York: The Free 

Press. National Academy of Education (1995). Improving Education
Through Standards- Based 

Reform, M. McLaughlin, L. Shepard, and J. O'Day, eds. Washington,
DC: National Academy of Education. 

National Research Council, Heller, K. A., W. H. Holtzman, and S.
Messick, eds. (1982). 

Placing Children in Special Education: A Strategy for Equity.
Committee on Child Development Research and Public Policy,
National Research Council. Washington, DC: National Academy
Press. 

14 National Research Council, Heubert, J., and R. Hauser, eds.
(1999). High Stakes: Testing 

for Tracking, Promotion, and Graduation. Committee on Appropriate
Test Use. Washington, DC: National Academy Press. 

Natriello, G. (1998). The New Regents High School Graduation
Requirements: Estimating the Resources Necessary to Meet the New
Standards. New York: The Community Service Society. Natriello, G.
and A. Pallas (1999). The Development and Impact of High Stakes
Testing. 

Paper presented at the Conference on Civil Rights Implications of
High- Stakes Testing, sponsored by the Harvard Civil Rights
Project, Teachers College, and Columbia Law School. Oakes, J., A.
Gamoran, and R. Page (1992). "Curriculum Differentiation:
Opportunities, 

Outcomes, and Meanings. Jackson, P., ed., Handbook of Research on
Curriculum. New York: MacMillan Publishing Company. Reese, C. M.,
Miller, K. E., Mazzeo, J., and Dossey, J. A. (1997). NAEP 1996
Mathematics 

Report Card for the Nation and the States. Washington, DC:
National Center for Education Statistics. 

Riley, R. W. (2000). Setting New Expectations. Seventh Annual
State of American Education Address. Paper presented at Southern
High School (Durham, NC, February 22, 2000). 

Sack, J. (2000). "Researchers Warn of Possible Pitfalls in Spec.
Ed. Testing." Education Week 19 (32): 12 

Schrag, P. (2000). "Too Good to Be True." The American Prospect 4
(11), 46. Shepard, L. A. (1993). "Evaluating Test Validity." In
L. Darling- Hammond (ed.), The Review of Research in Education,
19, 405- 450. 

Shepard, L. A. and M. L. Smith, eds. (1989). Flunking Grades:
Research and Policies on Retention. London: Falmer Press. 

Shore, A., G. Madaus, and M. Clarke (2000). Guidelines for Policy
Research on Educational Testing. Boston: National Board on
Educational Testing and Public Policy 1 (4). 1- 7. 

Stake, R. (1998, July). "Some Comments on Assessment in U. S.
Education." Educational Policy Analysis Archives [on- line
serial], 6 (14). Available http:// epaa. asu. edu/ epaa/ v6n14.
htm Sum, A. (1999). Literacy in the Labor Force. Washington, DC:
National Center for 

Education Statistics. Title VI, Civil Rights Act of 1964, 42 U.
S. C. section 2000( d). 

15 Title VI regulations, 34 C. F. R. sections 100 et seq.
Viadero, D. (1999). "Stanford Report Questions Accuracy of
Tests." Education Week 19 

(6): 3. Viadero, D. (2000). "Testing System in Texas Yet to Get
Final Grade." Education Week, 

May 31: 1. Weckstein, P. (1999). "School Reform and Enforceable
Rights to an Adequate Education." 

In Law and School Reform: Six Strategies for Promoting
Educational Equity, J. Heubert, ed. New Haven: Yale University
Press. 

Wilgoren, J. (2000). "National Study Examines Reasons Why Pupils
Excel." New York Times, July 26: 14. Ysseldyke, J. E., M. L.
Thurlow, K. L. Langenfeld, J. R. Nelson, E. Teelucksingh, and A.
Seyfarth. (1998). Educational Results for Students with
Disabilities: What Do the Data Tell Us? Minneapolis, MN: National
Center on Educational Outcomes.

----------
End of Document





-- 
TNET Mail-To-News Gateway Version - 1.6
For information about this gateway email programs@tnet.com
Dimenet Network Page Generation Copyright (c) 2004-2005 DIMENET and TNET Services, Inc.
Module: archive.php - Version: 2.50 - Build: July 24 2004 15:33:40 MST
Valid HTML 4.01!   Valid CSS!