Teachers' responses to test-driven accountability pressures:

"If I change, will my scores drop?"

 

Reference:
Miller, S. D. (1995). Teachers' responses to test-driven accountability pressures: "If I change, will my scores drop?", Reading Research and Instruction, 34, 332-351.

Samuel D. Miller

University of North Carolina at Greensboro

Abstract

This study describes how seven third grade teachers modified their skills-based reading and language arts programs because they believed students were unable to apply skills to authentic learning situations. Teachers modified their instruction by increasing the number of opportunities students had to write extended prose while studying together for extended periods of time. To evaluate the program, we examined how the teachers' instructional changes corresponded to changes in students' reading and language arts standardized achievement test scores, special education referrals, and retentions. Across the project's two-years, teachers decreased the total number of reading and language arts assignments by 62% as they increased the percentage of their multiple day (16.3%) and collaborative (10.4%) paragraph-level writing assignments. The language arts scores in four of the seven classrooms increased over the project's duration (when compared with the teachers' scores for the three years prior to the intervention), and fewer students (decrease of 81%) were retained or referred for special education services (decrease of 47%). Students experienced few difficulties with the instructional changes while the teachers' major difficulty was with the grading of writing assignments. Teachers' concerns that students' standardized achievement test scores would drop as a result of their instructional changes persisted despite evidence to the contrary. The discussion focuses on how the teachers' concerns were reinforced by the school district's distribution and interpretation of standardized achievement test results.

Teachers' responses to test-driven accountability pressures:

"If I change, will my scores drop?"

Standardized achievement tests have become the primary criterion of educational effectiveness as administrators and politicians attempt to hold teachers accountable for students' academic performances (Darling-Hammond & Wise, 1985; Haladyna, Nolen, & Hass, 1991; Smith, 1991). While the debate over the benefits of using standardized achievement tests as a criterion continues within various educational and political circles (Brown, 1989; Hiebert & Calfee, 1992; Smith, 1991; Valencia, Pearson, Peters, & Wixson, 1989), teachers face the daily dilemma of presenting instructional activities that both increase students' standardized achievement test scores and insure their acquisition of independent reading and writing skills. This study examines why seven third grade teachers modified their skills-based reading and language arts program, describes the instructional changes they made over a two year period, and examines corresponding changes in students' reading and language arts standardized achievement test scores, special education referrals, and end-of-grade retentions.

Theoretical Context

This study extends the findings of an earlier project that examined why elementary school teachers selected certain reading and language arts assignments and their students' reactions to them (Miller, Adkins, & Hooper, 1993). The teachers in this study believed worksheet-type assignments increased students' standardized achievement tests scores because such classwork focused on the very skills these instruments covered; furthermore, teachers stated that they avoided any activities which involved lengthy writing or discussions because such activities limited the number of test-defined skills they could cover. Subsequent evaluations of the teachers' instructional activities at several points in the school year confirmed their assertions; they primarily relied on skill-based worksheet activities with few lengthy writing assignments or discussions. Consequently, despite having average standardized achievement test scores and passing grades, most students in this study had few, if any, opportunities to read or write extended prose.

While several possible explanations exist for why these teachers used a skills-based instructional approach, their statements are consistent with a growing concern that teachers are responding to test-driven accountability pressures in ways that are counterproductive to students' long-term educational needs (Brown, 1991b; Cohen & Spillane, 1992; Madaus, 1988; Paris , Lawton, Turner, & Roth, 1991; Resnick & Resnick, 1985; Shepard, 1991; Smith, 1989; Smith & O'Day, 1990). Studies of how instructional practices influence standardized achievement test scores have been mostly speculative (Brown, 1991a; Cannell, 1988; Jennings & Nathan, 1977; Shepard, 1991; Smith, 1991), with few studies documenting deliberate attempts by teachers to increase standardized achievement test scores by altering their instructional practices (Darling-Hammond & Wise, 1985; Mehrens & Kaminski, 1988; Smith, 1991). As a result, few recommendations exist for teachers who want to develop an instructional approach that addresses both the short-term goal of obtaining higher standardized achievement test scores and the long-term goal of offering students adequate opportunities to practice sophisticated reading and writing skills.

Koretz (1988) presented a conceptual framework to explain how teachers might deal with accountability pressures to increase students' standardized achievement test scores. At a minimum, Koretz suggested that teachers may raise students' scores by providing answers to actual test items as has been done by teachers in several states. A second option was for teachers to focus primarily on those skills that are included on standardized achievement tests, ignoring any activities which would interfere with their ability to cover these skills. This option is consistent with the instructional practices found by Haladyna et al., (1991), Miller et al., (1993), and Paris, Wasik, & Turner, (1991b). A more positive option was for teachers to design instructional activities that required such higher-level literacy skills as the writing of persuasive essays, designing and evaluating different hypotheses, or reading and analyzing challenging books (activities suggested by Koretz). This option differed from the previous two by placing its primary emphasis on students' long-term educational needs rather than the immediate need to raise students' standardized achievement test scores. Implicit in this option is the assumption that standardized achievement test performances will improve if students successfully complete challenging assignments. Koretz argued that teachers are sacrificing the quality of their instruction whenever they follow either of the first two options: others raised similar concerns (Anderson, Hiebert, Scott, & Wilkinson, 1985; Brown, 1989; Bennett, Desforges, Cockburn, & Wilkinson, 1984; Smith, 1991; Thomas, Strage & Curley, 1988).

Study's Goals and Evaluation Framework

The seven third grade teachers who participated in this study faced many of the test-driven accountability pressures that were described in that earlier study as well as other studies (Darling-Hammond & Wise, 1985; Haladyna et al., 1991; Paris et al., 1991b; Smith, 1991). Each year they developed a school improvement plan which was based on how well their students performed on a commercial criterion-referenced reading skills test, end-of-unit basal criterion-reference tests, state-designed criterion-referenced social studies and science tests, and standardized achievement tests. Sudents' scores on these measures were expected to increase by so many points over a five-year time period. The teachers stated that they initially responded to pressures to increase students' test scores by emphasizing the very skills those instruments measured. Over time, however, they became concerned because students were unable, and in some cases unwilling, to apply their knowledge to situations where they had to write coherent prose and read connected text (to be referred to as authentic learning situations). More importantly, the teachers believed the students' difficulties were related to the emphasis they placed on isolated-skills practice. As a result, the teachers decided to address the problem by redesigning their instructional activities so that they would be consistent with Koretz's third option.

My participation in the project began after the principal and teachers asked the university for assistance. Central to our efforts was the assumption that any redesigned instructional changes would influence the types of knowledge students acquired as well as their attitudes towards learning (Blumenfeld, Mergendollar, & Swarthout, 1987; Doyle, 1983; Stodolsky, 1988). After extensive discussions, three objectives were identified based on the belief that each would facilitate students' success in authentic learning situations. The first was to increase the number of opportunities students had to write extended prose, (i. e., the writing of single or multiple paragraphs). An examination of the teachers' instructional activities prior to the implementation of this objective indicated that students primarily completed assignments by writing single words or by underlining or crossing out desired responses (to be described more fully in a later section). Research supports the link between this modification and the ability to transfer knowledge to authentic learning situations (Merritt, 1978; Paris, Calfee, Filby, Hiebert, Pearson, Valencia, & Wolf, 1992; Weinstein & Mayer, 1986; Wittrock, 1986). The second objective was to increase the number of paragraph-level writing assignments that required student collaboration. This objective was selected so that students would have frequent opportunities to share ideas while receiving feedback about their work. This modification was implemented with the belief that it would promote knowledge transfer (Slavin, 1990). The final objective was to increase the number of opportunities students had to complete paragraph-level writing assignments which lasted for more than one day. The reason for this modification was the assumption that such assignments promoted the acquisition of those self-regulatory learning behaviors which facilitate knowledge transfer (Borkowski, Carr, Rellinger, & Pressley, 1990; Paris, Wasik, & Turner, 1991a; Pressley, Goodchild, Fleet, Zajchowski, & Evans, 1989). An example of how these three criteria were used to redesign the teachers' instruction is included in the procedures section.

At the project's start, the teachers said that one indication of success would be if the students' test scores did not drop--a concern they repeatedly voiced throughout the two years. No teacher mentioned the possibility that students' standardized achievement test scores would improve as they implemented the desired instructional changes. Their concern for how any redesigned instructional activities might affect standardized achievement test performances strongly influenced the project's evaluation. First of all, the teachers and principal wanted to document how their instructional activities changed over time. The rationale was that such evidence could be used in a possible discussion about the importance of various educational goals. If standardized achievement test scores declined as the teachers suspected, the discussion would focus on whether different stakeholders wanted the school to attend to the short-term goal of higher standardized achievement test scores or the long-term goal of requiring instructional activities where students had to write extended prose in authentic learning situations. The teachers and principal also wanted to evaluate how the instructional changes influenced students' standardized achievement test scores. They understood the difficulties associated with assuming a direct causal link between instructional practices and students' standardized test performances, yet believed such an evaluation was needed because of their district's accountability concerns. They thought it would be naive to ignore the pressures they experienced to raise students' standardized achievement test scores. The teachers rejected using another school as a control because they did not believe a comparable site existed in the district. They also rejected the idea of using control classes within the school because they thought it would undermine collegiality. We finally agreed to compare each teacher's students' standardized achievement test scores (national percentiles for reading and language arts) for the project's two years with her students' test performances prior to the project. During this time the teachers used the same commercial reading and language arts texts and they completed school improvement plans which included similar assessment instruments. As the project ended, a guidance counselor, who had not been present in any of our planning meetings, requested that students' special education referrals and end-of-grade retentions be included in the evaluation. Her request was based on her suspicion that fewer students had been referred for special services and retentions. Teachers were interviewed at the end of the project to evaluate their perceptions of the project's successes and difficulties.

Method

Subjects

Seven third grade teachers volunteered to participate in the project. The average number of years of experience was 16.8 years (minimum=11 years; maximum=28.5 years.). At the start of the second year, a teacher retired and was replaced by a first year teacher. All teachers were female, and all but one were white. The remaining teacher was African-American.

The school is located in the Piedmont area of North Carolina. The number of students in each class ranged from 20 to 25 across the five years for which standardized achievement test scores were obtained. About 15% of the students were African-American and mainly lived in nearby public housing. The remaining students were white and lived in nearby upper-middle class neighborhoods. The school has a reputation in its district for high academic standards. Standardized achievement test data obtained from the year prior to the project's implementation revealed that about one-third of the students scores above the 90th percentile on the California Achievement Test (Total Battery Score).

Materials and Procedure

Standardized Achievement Tests. Students' total reading and language arts scores were obtained from the California Achievement Test which is administered at the end of each year beginning with the third grade. The total reading score at the third grade level measures vocabulary and comprehension. The vocabulary subtest consists of items measuring a student's ability to select synonyms, antonyms, homonyms, and the correct words in context. The comprehension subtest focuses on a student's ability to recall facts, analyze characters, identify central thoughts, interpret events, and identify different forms of writing. The total language arts at the third grade level measures language mechanics and expression. The language mechanics subtest consists of items measuring a students' ability to capitalize pronouns, nouns, adjectives, beginning words and titles and to punctuate with periods, question marks, exclamation points, commas, colons, semi-colons, and quotation marks. The language expression subtest focuses on how well students use nouns, pronouns, verbs, adjectives, and adverbs to make a statement. All of the items have a multiple-choice format.

Assignment Evaluation. Reading and language arts assignments were collected by a graduate student or me at five two-week intervals (October, December, January, March, & April) for a total of 10 collections across the project's two years. Consistent with the project's goals, teachers received feedback after each collection about the total number of assignments, the amount of writing that each assignment required, its duration, and whether it was completed alone or with others (social organization). Writing was coded as: (a) simple marks [underline, numbering, copying, single word or phrases], (b) single or multiple sentences, or (c) single or multiple paragraphs. Duration was coded as (a) single or (b) multiple days. Multiple day included any assignment where the teacher required more than one class for its completion; if an assignment was presented on one day, completed for homework, and discussed the next day, it was coded as a single day assignment. Social organization was coded as (a) alone or (b) with others. During the second year we also identified whether the teacher used the assignment to determine a student's grade. This additional criterion was added at the request of the teachers because they were interested in examining whether differences existed in their grading practices. Xeroxed copies of the students' work were coded during the first year. Planning sheets were developed during the second year which provided the necessary coding information. Teachers stated that the assignment collection findings represented normal instruction in their classrooms.

Teacher Interviews. Teachers were interviewed individually in May of the second year by a graduate assistant. Each teacher was asked to describe how she had changed her instruction ("What changes have occurred in your classroom as a result of the project?"), the difficulties she experienced during the project ("What was the most difficult thing for you?"), how her students reacted to these changes ("How did your students react to these changes?"), whether she believed her changes influenced students' standardized achievement test scores ("Do you think what you are doing in your classroom will affect students' standardized achievement test scores?"). The interviews were taped and transcribed. Each interview was read by the study's author and a graduate student. Given the limited number of participants, all responses were read by both evaluators. Furthermore, because the interview included primarily low-inference descriptive items, few, if any, disagreements existed when coding the data.

Planning. The teachers and I met weekly during the school day to design instructional activities that included as many of the project's three goals as possible. Our first step was to read all the stories in a basal unit and select those stories that supported a common theme. We generally selected 25% to 33% of the available stories (prior to the intervention teachers used about 90% of all stories). Teachers generated a list of activities that supported the theme and each teacher selected those activities which she wanted to use. Most of the activities we listed (over 90%) did not come from the students' texts since few of the assignments found in these materials included characteristics which were consistent with the project's three goals (Miller & Blumenfeld, 1993). When designing instructional activities, teachers attempted to develop assignments that were related to students' interests and background. Teachers discussed how they planned to use each selected activity; no effort was made to have teachers use the same set of activities. Further discussions focused on each teacher's successes and difficulties with implementing different activities, e.g., how different activities were introduced and students' responses to them. Frequent visits were made to the classrooms to assist with the project's implementation.

The following instructional unit is an example of how teachers redesigned their reading and language arts lessons. Teachers read all the stories in a basal unit and identified the theme, "How things change over time." One teacher began her activity with the story, "An Oak Tree," which documents how a log changed during its journey down a river to a paper mill. For the first assignment, students selected items or persons they knew and described how they changed over time. A discussion about the paper mill led to a visit to the local newspaper and the development of a class paper. A class thank you letter to which all students contributed was published in the local paper. During this time, students read stories about producing a newspaper, had jobs on the class paper, and wrote newspaper articles. The class learned how to conduct interviews and interviewed relatives about their personal goals and how they changed over time. Some of the interviews appeared in the class newspaper. The students then read a story about a person whose goal was to design the Alaskan flag and how his entry was accepted. Students wrote to Alaska's Chamber of Commerce to request information on this person, then wrote stories for the class newspaper. Each of the above activities involved planning, paragraph-level writing, and social collaboration; most lasted for several days or weeks. The entire theme lasted five weeks. If the teacher had simply followed the basal instructor's manual as she had previously done, students would have completed 18 assignments in two weeks with only one assignment requiring paragraph-level writing.

Teachers stated that this procedure increased the amount of time that they spent planning lessons, yet it shortened the time they spent collecting and grading assignments. Moreover, teachers commented that this procedure placed greater demands on their ability to develop lessons and probably would not have occurred without having planning periods during the day.

Results

Assignment Evaluation

Table 1 presents the average number of weekly assignments for each teacher at four points in time (2 times for each of the two years) and their characteristics (writing level, social collaboration, & duration). During the first collection, teachers had an average of 25.9 reading and language arts assignments per week (SD=5.2). About one-quarter of these assignments (27.7%) required paragraph-level writing. The assignments rarely lasted more than a single lesson (1.0%) or required student collaboration (6%). Thus, students generally completed a daily reading, language arts, spelling, journal writing, and homework assignment; most of which were completed alone within a single class period. This first collection's findings were consistent with the teachers' statements that they used a skills-based instructional approach.

Table 1

Total Number of Assignments and Their Characteristics for Eight Third Grade Teachers at Five Two-Week Periods Over Two Years

 

Total number

% of paragraph-

% of paragraph-level assignments that were:

 

of assignments

level assignments

Multiple-day

Collaborative

Graded

 

Year 1

Year 2

Year 1

Year 2

Year 1

Year 2

Year 1

Year 2

Year 1

Year 2

 

Oct.

Oct.

May

Oct.

Oct.

May

Oct.

Oct.

May

Oct.

Oct.

May

Oct.

Oct.

May

 

High Implementation Group

Teacher F

50

24

14

30

17

36

20

00

60

20

00

40

---

25

60

Teacher G

45

24

14

33

17

36

00

25

50

13

25

50

---

25

83

Teacher H

46

25

13

29

28

46

00

43

50

00

43

50

---

13

33

Teacher I

40

24

16

20

29

38

30

43

50

17

43

50

---

43

66

M=

SD=

45.2

4.1

24.3

0.5

11.8

5.3

28.0

5.6

22.8

6.7

40.8

4.6

12.5

15.0

27.8

20.4

52.5

5.0

12.5

8.8

27.8

20.4

47.5

5.0

---

16.4

8.8

68.8

9.9

 

Low Implementation Group

Teacher K

60

20

20

12

35

20

29

14

75

00

14

25

---

14

25

Teacher L

45

23

21

40

22

19

00

00

50

00

60

50

---

20

25

Teacher M

70

17

22

21

24

23

00

25

20

10

00

40

---

20

33

Teacher N

55

26

24

29

19

25

00

60

33

00

40

33

---

00

40

 

57.5

10.4

21.5

3.9

21.7

1.7

25.5

11.9

25.0

6.9

21.8

2.8

7.3

14.5

21.3

25.8

44.5

23.7

2.5

5.0

28.5

26.8

37.0

10.6

---

13.5

9.4

30.8

7.2

A repeated-measures ANOVA showed that he total number of weekly assignments decreased consistently across the two years as the teachers modified their instruction (F (3, 18) = 60.16, p.=.0001). This decrease went from 26.21 at the first collection (SD=4.89), to 14.29 at the second collection (SD = 2.78), to 10.43 at the third collection (SD = 2.46), to 9.79 at the final collection (SD=1.63). Post-hoc testing showed that differences between all of the earlier and subsequent testings were significant at the .05 (Scheffe) level except the last two comparisons.

Within this context, paragraph-level writing assignments remained relatively stable across the two years (27.7% & 28.1%, respectively) while multiple day assignments increased by 16.3% (1.0% to 17.3%) and collaborative assignments increased by 10.4% (6% to 16.4%). Thus, over the two year period, the average weekly number of reading and language arts assignments decreased markedly (62%) with corresponding increases in the percentage of multiple day and collaborative assignments. As noted on Table 1, this profile reflects the changes that occurred in each teachers' s classroom.

The relative stability of paragraph level writing assignments across the two years masks important differences that occurred regarding how teachers redesigned these assignments. At the beginning of the project, most paragraph assignments involved 10 to 15 minutes of journal writing prior to the start of school (>80%): teachers usually did not read these entries, the journal entries were not used to determine students' grades, nor were students required to revise them. The teachers replaced the journal entries with assignments which required more complex forms of writing. Students now wrote to explain reactions to a reading selection, to create stories, or to gather information about a particular topic. The new writing assignments often lasted for one or more weeks, and involved multiple drafts, peer collaborations and frequent revisions. Thus, while the overall percentage of paragraph writing assignments remained stable across the two years, their demands changed dramatically.

Standardized Test Scores, Retentions, and Special Education Referrals

To evaluate how the teachers' instructional activities corresponded to changes in students' standardized achievement test scores, each teacher's scores for the project's two years (time2) were compared with her scores from the previous three years of teaching (time1). This comparison was possible for every teacher except one who only had been at the school for two years; for this teacher, we used her scores for the two years she was at the school. We were not able to compare standardized achievement test scores prior to the project's intervention because such tests are not given before the third grade level. The principal and teachers, however, stated that they had no reason to suspect the ability levels of their students varied over the five years on which our comparison was based. Standardized achievement test scores, total national percentile scores for reading and language arts, were obtained from the California Achievement Test which is administered each April in grades 3 through 8. We conducted separate ANOVA's for the reading and language arts scores with teacher, intervention (prior to and during), and their interaction as the independent variables. For the reading analysis, there were main effects for teacher (F (1, 761) = 3.241, p.=.0502) and intervention (F (6, 761)=3.241, p.=.0038). The intervention main effect favored the intervention years (M=67.31 versus 63.87). The teacher main effect was not examined further because differences among teachers were expected and they did not occur for the intervention. For the language arts analysis, there were main effects for teacher (F (1, 761) = 10.10, p.=.0001), intervention (F (1, 761) = 19.43, p.=.0001), and their interaction (F(1, 761) = 2.29, p.=.0334). The intervention main effect favored the intervention years (M=72.23 versus 64.77). Post-hoc comparisons on the two-way interaction (paired t-tests) revealed differences for 4 teachers (Teacher 1: t=2.20, p.=.033; Teacher 2: t=2.41, p.=.020: Teacher 3: t=2.14, p.=.041; Teacher 5: t=3.46, p.=.001). Table 2 lists the means and standard deviations for each teacher's standardized reading and language arts achievement test scores across the two time periods. None of the teachers' reading or language arts scores decreased over the two year intervention period.

Table 2

April Means and standard deviations by classroom for standardized reading and language arts achievement test prior to (T1) and after treatment (T2)

     

Reading

Language Arts

   

n

M

SD

M

SD

Teacher 1

T1

64

71.41

21.97

76.89

20.84

 

T2

48

78.17

19.08

85.42

13.61

             

Teacher 2

T1

69

61.15

25.51

63.99

27.27

 

T2

47

70.66

25.14

76.36

22.73

             

Teacher 3

T1

43

58.67

30.04

55.33

28.71

 

T2

47

67.89

23.84

71.43

22.12

             

Teacher 4

T1

67

65.96

27.53

72.28

26.34

 

T2

47

64.72

24.04

72.59

22.27

             

Teacher 5

T1

69

58.78

30.62

52.97

27.12

 

T2

50

65.50

25.08

69.80

22.73

             

Teacher 6

T1

69

66.51

25.17

68.90

25.16

 

T2

48

63.85

25.68

70.32

25.89

             

Teacher 7

T1

70

62.49

27.64

59.30

28.71

 

T2

47

60.36

26.99

59.96

26.06

The next analyses examined retentions and special education referrals. Retention data were available for the three years prior to the intervention. Special education referrals were available for only two-years prior to the intervention. Teachers retained 23 students during the control period (10, 7, and 6) and 3 after the intervention (1 and 2) for a average yearly decrease of 81%. Teachers referred 32 students for special education testing during the control period (14 and 18) and 17 after the intervention (7 and 10) for a yearly decrease of 47%. These differences were spread uniformly among the teachers.

Teachers' Interviews

Teachers' description of their instructional changes reflected the goals they set at the project's beginning. Each teacher referred to the fact that she was requiring more writing assignments and integration of the reading and language arts curricula. Teacher 4 stated:

I think more quality work has been done, instead of an emphasis on quantity. I've tried to incorporate more lessons in a single assignment. We've really been doing more writing and less repetitious assignments, less purple masters, really getting more quality from the students.

Teacher 7 also contrasted the difference between her previous emphasis on isolated skill practice and the project's intended changes:

In the beginning I planned separately for each subject. I've now started doing more integration. See I really thought giving more assignments was better. And you know I was really proud of the fact that I could pack in a lot of stuff in one day. In the past I'd give them a skills lesson and then give them a skills worksheet to check up, maybe for a grade or whatever, because that is easier and it's easier for me to check and it's faster for them to do ... The more I get used to this way the more I like it.

Less consensus was found regarding the types of difficulties teachers had implementing the project's goals. The most common response given by three teachers involved grading writing assignments. Their concern was the objectivity of this practice. Teacher 1 stated, "The most difficult thing is the grading part of the writing assignments. I'm still not sure how to grade the writing assignments and be totally fair." Two other teachers found the planning sessions to be difficult because they were placed in a position where they had to decide how they would reach the project's goals. Teacher 2 stated:

When we first started this project we were expecting to be told and shown exactly what to do and I think as time went on I realized we were going to have to decide what we were going to do and that seemed to work better for me and my classroom.

Another teacher simply stated that change of any type was difficult for her whereas the remaining teacher said she experienced no real difficulties.

Teachers' responses to the question of how students reacted to their instructional changes were split with four teachers saying students reacted positively while the other three said students experienced difficulties in the beginning. For example, Teacher 1 said, "They've enjoyed reading and writing a whole lot better than in the past." and Teacher 6 said, "They reacted positively -- they used to hate coming to reading." Representing the second profile, Teacher 2 stated:

In the beginning they (the students) did not want to get their papers back, have to correct them, and then return them for a grade. They complained verbally about that, but after a few times of doing it, they expected to have it back, fix it, and return it. So their attitudes have changed.

In sum, none of the teachers saw any evidence that students experienced long-term difficulties with their instructional changes.

Six of the seven teachers were not convinced that their new instructional approach would have positive effects on standardized achievement test scores. Teacher 2's statements reflected this opinion:

This sort of thing teaches the children to think on their own, and to write more on their own and express themselves more individually which is the ultimate goal I'm sure. And yet, as far as the testing goes, I don't know what kind of a difference it will make.

Teacher 3's comments supported this statement:

Honest to goodness, I think they will go down because the CAT is not based on the whole-child learning. The CAT is based on individual skills and the way I teach now, I don't emphasize individual skills. I don't use the same vocabulary the CAT uses.

Teacher 1 was the only one who believed her students' standardized achievement test performances would improve, "I think it really did affect the CAT scores -- not everyone, but three-fourths of the class." She was the only one who believed a relationship existed between her instructional changes and students standardized achievement test scores.

Discussion

The teachers of this study became frustrated with their skills-based reading and language arts instruction because they believed students were not able to apply their skills to authentic learning situations. As a result, they altered the nature of their instructional approach so that students would have more opportunities to read and write extended prose while studying together for extended periods of time. Teachers were concerned that such changes would negatively affect students' standardized achievement test scores; a concern they maintained despite evidence to the contrary.

Assignment Evaluation

At the start of the project, teachers generally required a daily assignment in reading, language arts, spelling, and journal writing; students rarely were required to collaborate and the vast majority of assignments were completed within a designated class period. Teachers viewed such an instruction as indicative of their practices prior to the project's start. Teachers emphasized those skills which were covered by standardized achievement tests and avoided any assignments which limited the number of skills they could cover. As the project's goals were implemented, the number of assignments students were required to complete decreased by more than a half as their classwork become more complex. Instead of finishing an assignment every 25 minutes by underlining a response or by writing a single word, students now wrote multiple drafts as they studied collaboratively on assignments which lasted for several days or weeks.

Standardized Test Scores, Retentions, and Special Education Referrals

The language arts standardized achievement test scores of four of the teachers increased over the project's two years. This improvement is noteworthy because teachers placed far less emphasis during the intervention on the language arts skills which standardized achievement tests covered. Whenever teachers identified these skills, which was far less frequently than they were prior to the intervention, they did so within the context of the students' writing. No differences were noted in the teachers' standardized reading achievement test scores.

Coupled with this improvement in standardized language arts achievement test performances was a marked decrease across all classrooms in the number of students' retentions and special education referrals. Retentions dropped by about 80% and special education referrals dropped by about half. Teachers were not surprised to learn that fewer students were being retained or tested because they believed their lower achievers benefited the most from their instructional changes because they now were required to participate in more demanding activities. The teachers believed the lowest achievers often needed more direction than did their higher achieving peers, yet they believed these students benefited the most from learning and applying skills in authentic learning contexts.

Teachers' Interviews

Teachers expressed concern at the start of the project that students might have difficulty with the project's intended instructional changes because the students were not accustomed to completing such challenging assignments Their concerns were not realized as students were viewed as having few, if any, difficulties with the new assignments. Any concerns that the students expressed diminished as they became accustomed to the new instructional routines. The teachers' main concern was with the grading of writing assignments, a concern which continued throughout the project. It should be noted, however, that the parents did not express any more concerns with how the teachers graded writing assignments than were expressed with their grading of worksheet-type assignments. Teachers were not convinced that their instructional changes had any direct effects on students' standardized achievement test performances; the new instructional approach did not emphasize those skills which were found on standardized achievement tests. Such doubts reflected the teachers' belief that they had to make a choice between focusing on standardized achievement test skills or focusing on students' long-term literacy needs--they saw very little overlap between these two areas.

Implications

The study's first implication relates to the question of why gains were not found on both standardized achievement measures or in every classroom. Teachers were not surprised that language arts rather than reading performances on the standardized achievement tests showed the greatest change because they believed the project altered language arts instruction more so than it did reading instruction. Students now completed a greater variety of writing assignments, wrote for longer time periods, and shared their writing with each other. On the contrary, teachers did not think students did any more reading than they had done in previous years. Students often read a particular story several times, but the amount of time they spent reading remained relatively unchanged. While these differences might explain why differences were found only on the language arts standardized achievement tests, they do not explain why differences were found in some classrooms and not others. An examination of the assignment evaluation data might explain this finding. First of all, three of the four teachers with gains on the standardized language arts achievement test graded a higher percentage of paragraph level writing assignments than did their colleagues (Teacher 1=55%, Teacher 2 = 44%, Teacher 3 = 50%, Teacher 5 = 33% versus Teacher 4 =33%, Teacher 6 = 31%, Teacher 7 = 31%). Also, based on classroom visits these teachers' multiple day assignments generally lasted for more than a week whereas their colleagues' multiple day assignments usually lasted for only three or four days. Thus, it appears that standardized achievement test differences were found in those classrooms where students completed the most complex assignments and were held accountable for them. Moreover, none of the teachers experience a decrease in their scores. As noted in the next section, however, teachers were not overly convinced that these instructional changes had a strong effect on students' standardized achievement test performances.

The second implication relates to the teachers' doubts about the relationship between instructional practices and students' standardized achievement test scores. To provide a context for this discussion, I believe it's necessary to state that neither the teachers nor I believed a single study could explicate this complex relationship. Moreover, what is most critical to this study is the lack of information teachers received and its interpretation by district administrators. For example, each year teachers received standardized achievement test scores which they placed in students' permanent record folders. Attention was drawn to how a class' average scores differed from the schools, and how the school's differed from the district's and state's. Any change, however minimal, was viewed as significant. A drop of even two points was viewed as noteworthy! Moreover, teachers were unable to remember how their scores changed across several years and they never received such information. Teachers said that even if they received this information, they lacked the expertise to interpret it. The question of how a teacher's instructional practices might be related to her students' standardized achievement test scores was never addressed adequately because teachers did not have the necessary information or expertise to make this determination. What might have been an empirical question became a highly-charged emotional issue without any possible solution. Such an atmosphere seemed to undermine teachers' views of themselves and their students. Thus, the intractability of the teachers' concerns regarding the possible negative effects of any instructional changes were reinforced by factors which largely were under the district's control--the distribution and interpretation of students' standardized achievement test scores.

Future Research and Limitations

The study's limitations are related to the complex relationship that exists between teachers' instructional practices and students' standardized achievement test scores (Porter, 1989; Porter, Archbald, & Tyree, 1991). At the end of the project, standardized achievement test scores did not drop as teachers had feared; to the contrary, it appeared that the instructional practices had a modest positive effect in several classrooms. This improvement needs to be interpreted cautiously because direct causality between instructional practices and standardized achievement test performances cannot be based on one study. At a very minimum, as stated by Koretz (1988), such findings perhaps offer a glimmer of hope for teachers who seek to improve students' standardized achievement test scores without sacrificing the quality of their instruction. Further studies need to be conducted so that teachers and administrators will understand better the relationships that exists between different instructional practices and various short- and long-term educational goals.

 

References

Anderson, R. C., Hiebert, E., Scott, J. A., & Wilkinson, I. A. (1985). Becoming a nation of readers. Washington, DC: National Institute of Education.

Bennett, N., Desforges, C., Cockburn, A., & Wilkinson, B. (1984). The quality of pupil learning experiences. Hillsdale, NJ: Erlbaum.

Blumenfeld, P. C., Mergendollar, J., & Swarthout, D. (1987). Task as a heuristic for understanding student learning and motivation. Journal of Curriculum Studies, 19, 135- 148.

Borkowski, J., Carr, M., Rellinger, E., & Pressley, M. (1990). Self-regulated cognition: Interdependence of metacognition, attributions, and self-esteem. In B. F. Jones (Eds.), Dimensions of thinking and cognitive instruction (pp. 53-92). Hillsdale, NJ: Erlbaum.

Brown, R. G. (1989). Testing and thoughtfulness. Educational Leadership, 46, 31-34.

Brown, R. G. (1991a). Schools of thought; How the politics of literacy shape thinking in the classroom. San Francisco: Jossey-Bass.

Brown, R. G. (1991b). Policy and the rationalization of schooling. In E. H. Hiebert (Ed.), Literacy for a diverse society: Perspectives, practices, and policies (pp. 279-298). New York: Teachers College Press.

Cannell, J. J. (1989). How public educators cheat on standardized achievement tests. Albuquerque, NM: Friends of Education.

Cohen, K. K., & Spillane, J. P. (1992). Policy and practice: The relations between governance and instruction. In G. Grant (Ed.), Review of research in education (Vol. 18) (pp. 3-50). Washington, DC: American Educational Research Association.

Darling-Hammond, L., & Wise, A. E. (1985). Beyond standardization: State standards and school improvement. Elementary School Journal, 85, 315-336.

Doyle, W. (1983). Academic work. Review of Educational Research, 53, 159-199.

Haladyna, T. M, Nolen, S. B., & Hass, N. S. (1991). Raising standardized achievement test scores and the origins of test score pollution. Educational Researcher, 20, 2-7.

Hiebert, E. H., & Calfee, R. C. (1992). Assessing literacy: From standardized tests to portfolios and performances. In S. J. Samuels & A. E. Farstrup (Eds.), What research has to say about reading instruction. (pp. 70-101) Newark, DE: International Reading Association.

Jennings, W., & Nathan, J. (1977). Startling, disturbing research on school program effectiveness. Kappan, 58, 568-572.

Koretz, D. (1988). Arriving in Lake Wobegan: Are standardized tests exaggerating achievement and distorting instruction? American Educator, 12, 8-15.

Madaus, G. (1988). The influence of testing on the curriculum. In L. Tanner (Ed.), Critical issues in the curriculum (pp. 83-121). Chicago: University of Chicago Press.

Mehrens, W. A., & Kaminski, J. (1988). Using commercial test preparation materials for improving standardized test scores: Fruitful, fruitless, or fraudulent? Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.

Mehrens, W. A., & Kaminski, J. (1989). Methods for improving standardized test scores: Fruitful, fruitless, or fraudulent? Educational Measurement: Issues and Practice, 8, 14-22.

Merritt, J. (1978). Reading, writing and relevance. London: Open University Press.

Miller, S. D., & Blumenfeld, P. C. (1993). Characteristics of tasks used for skill instruction in two basal readers. Elementary School Journal, 93, 33-47.

Miller, S. D., Adkins, T., & Hooper, M. L. (1993). Why teachers select certain tasks and students' reactions to them. JRB: A Journal of Literacy, 25, 69-96.

Paris, S. G., Calfee, R. C. Filby, N., Hiebert, E. H., Pearson, P. D., Valencia, S. W., & Wolf, K. P. (1992). A framework for authentic literacy assessment. The Reading Teacher, 46, 88-99.

Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L. (1991a). A developmental perspective on standardized achievement testing. Educational Researcher, 20, 12-20.

Paris, S. G., Wasik, B. A., & Turner, J. C. (1991b). The development of strategic readers. In R. Barr, M. Kamil, P. B. Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research (Vol. II) (pp. 609 - 640). New York: Longman.

Pressley, M., Goodchild, R., Fleet, J., Zajchowski, R., & Evans, E.D. (1989). The challenges of classroom strategy instruction. Elementary School Journal, 89, 301-342.

Porter, A. D. (1989). External standards and good teaching: The pros and cons of telling teachers what to do. Educational Evaluation and Policy Analysis, 27, 343-356.

Porter, A. C., Archbald, D. A., & Tyree, A. K. (1991). Reforming the curriculum: Will empowerment policies replace control? In S. H. Fuhrman & B. Malen (Eds.), The politics of curriculum and testing (pp. 11-36). New York: Falmer Press.

Resnick, D. P., & Resnick, L. B. (1985). Standards, curriculum, and performance: A historical and comparative perspective. Educational Researcher, 14, 5-20.

Shepard, L. A. (1991). Negative policies for dealing with diversity: When does assessment and diagnosis turn into sorting and segregation. In E. H. Hiebert (Ed.), Literacy for a diverse society: Perspectives, practices, and policies (pp. 279-298). New York: Teachers College Press.

Slavin, R. E. (1990). Cooperative learning: Theory, research, and practice. Englewood Cliffs, NJ: Prentice Hall.

Smith, F. (1989). Insult to intelligence: The bureaucratic invasion of our classrooms, Portsmouth, NH: Heineman.

Smith, J. L. (1991). Put to the test: The effects of external testing on teachers. Educational Researcher, 20, 8-11.

Smith, M., & O'Day, J. (1990). Systemic school reform. In S. Fuhrman & B. Malen (Eds.), The politics of curriculum and testing (pp. 233-267). Philadelphia: Falmer Press.

Stodolsky, S. S. (1984). Frameworks for studying instructional processes in peer work groups. In P. L. Peterson, L. C. Wilkinson, & M. T. Hallinan (Eds.) The social context of instruction: Group organization and group processes (pp. 107-124). New York: Academic Press.

Stodolsky, S. (1988). The subject matters! Classroom activity in math and social studies. Chicago: University of Chicago Press.

Thomas, J. T., Strage, A., & Curley, R. (1988). Improving students' self-directed learning: Issues and guidelines. Elementary School Journal, 88, 313-326.

Valencia, S. W., Pearson, P. D., Peters, C. W., & Wixson, K. K. (1989). Theory and practice in statewide reading assessment: Closing the gap. Educational Leadership, 46, 57-64.

Weinstein, C., & Mayer, R. (1986). The teaching of learning strategies. In. M. Wittrock (Ed.), Handbook of research on teaching (pp. 315-327). New York: MacMillan.

Wittrock, M. (1986). Students' thought processes. In M. Wittrock (Ed.), Handbook of research on teaching (pp. 297-314) New York: MacMillan.