An Analysis of the Assessment Requirements
Mandated by IASA Title I Legislation 1

Mark D. Reckase
Michigan State University
May 1999

Overview

Title I of the improving America’s School Act of 1994 (IASA), the reauthorization and expansion of the Elementary and Secondary Education Act of 1965 (ESEA), is extremely ambitious legislation of very broad scope. While the full text of the legislation is much too lengthy to reproduce here, the statement of purpose, with the nine mechanisms for achieving the purpose, is presented below to provide the motivation for the following sections of this paper.

Statement of Purpose- The purpose of this title is to enable schools to provide opportunities for children served to acquire the knowledge and skills contained in the challenging State content standards and to meet the challenging State performance standards developed for all children. This purpose shall be accomplished by-

    1. ensuring high standards for all children and aligning the efforts of

States, local educational agencies, and schools to help children served

under this title to reach such standards;

    1. providing children an enriched and accelerated educational program,

including, when appropriate, the use of the arts, through school wide

programs or through additional services that increase the amount and

quality of instructional time so that children served under this title

receive at least the classroom instruction that other children receive;

    1. promoting school wide reform and ensuring access of children (from

the earliest grades) to effective instructional strategies and challenging

academic content that includes intensive complex thinking problem-

solving experiences;

    1. significantly upgrading the quality of instruction by providing staff in

participating schools with substantial opportunities for professional

development;

    1. coordinating services under all parts of this title with each other, with

other educational services, and, to the extent feasible, with health and

social services programs funded from other sources;

    1. affording parents meaningful opportunities to participate in the

education of their children at home and at a school;

    1. distributing resources, in amounts sufficient to make a difference, to

areas and schools where needs are the greatest;

    1. improving accountability, as well as teaching and learning, by using

State assessment systems designed to measure how well children

served under this title are achieving challenging State student

performance standards expected of all the children; and

    1. providing greater decision making authority and flexibility to schools and teachers in exchange for greater responsibility for student performance.

(IASA, Sec.1001(d))

Parts (1), (8), and (9) of the Statement of Purpose provide the catalyst for this paper. Collectively, these parts indicate that assessment should be used to determine whether schools are meeting their responsibility to provide educational programs that will help students to meet standards. Parts (2) through (7) elaborate on the needs to be addressed by the program and the audience for assessment results. That is, the assessments should facilitate reform by providing useful information to students, parents, teachers, and educational administrators.

U.S. Department of Education (USDE) Secretary Richard Riley (1999) gives a concise restatement of this purpose as it applies to the current educational context. The goal of the legislation is to "raise expectations for all children helping States and school districts to set high standards and establish goals for improving student achievement. The 1994 Act included provisions to improve teaching and learning, increase flexibility and accountability for states and local districts, strengthen parent and community involvement, and target resources to the highest poverty schools and communities." Because of the current emphasis on high standards, this statement stresses the accountability use of the assessment results.

The goal of this paper is to provide the results of an analysis of the EASA and IASA legislation and supporting documentation that is specifically focused on determining the role of assessment in the implementation of legislation. Both the stated and implied assessment requirements are considered, and practical implementation issues are discussed. The paper first presents an overview of the legislation, followed by a summary of assessment requirements. Finally, an approach to assessment for Title I purposes is described.

Implementation of Title I.

The ESEA/ISEA legislation is very complex and dynamic. While the general goals for the legislation have remained the same over its 33- year existence, the details of the legislation, and the specifics of implementation have been fine-tuned over the years. The dynamic nature of the legislation explains, to some extent, why Title I reforms are not yet complete. The outcomes of the program are continually being evaluated, resulting in refinements and reorientation of the goals. As though goals change, the requirements for receiving Title I funds change and the states and school districts must adapt to the changes. One result is that many states and districts are still in the process of phasing in Title I related reforms.

Another reason that Title I implementation is ongoing is that the process of educational reform takes time. A recent report from the National Assessment of Title I (NATI) acknowledges the slow pace of educational reform.

Early standards-setting state found that it takes three to five years for standards setting to move from state legislation, to the development of consensus by teachers, parents, businesses, and institutions of higher education to implementation. At the end of that period, standards exist but assessments, implementation activities, and professional development are just being planned (USDE,1999).

This same report indicates that despite the slow pace of change, progress is being made toward the stated goals of Title I legislation. "The recent achievement gains of students whom Title I is intended to be benefit provide clear indication that Title I, and the larger educational system it supports, is moving in the right direction (USDE,1999)." Moreover. It is stated in the IASA legislation that "The achievement gap between disadvantaged children and other children has been reduced by half over the last two decades (IASA Sec. 1001 (b) (1))." Whether these results can be attributed to the Title I program is open to debate, but Secretary Riley (1999) considers them as a positive indication that efforts are moving in the right direction.

Changing Focus.

In line with the dynamic nature of Title I, the 1994 reauthorization changed the focus of the program to some degree. In the initial ESEA legislation, the goal was stated as: "the payment under this title will be used for programs and projects...(A) which are designed to meet the special educationally deprived children in school attendance areas having high concentrations of children from low-income families and (B) which are of sufficient size, scope, and quality to give reasonable promise of substantial progress toward meeting those needs." (Sec.205 (a) (1)) This goal was to be accomplished by ensuring that students from low-income families receive the same challenging curriculum as their peers from higher-income families.

The IASA reauthorization continued to acknowledge that "the most urgent need for educational improvement is in schools with high concentrations of children from low-income families," ( Sec.1001 (b) (2)) but it also extended the program to "children with limited English proficiency, children of migrant workers, children with disabilities, Indian children, children who are neglected or delinquent, and young children and their parents who are in need of family- literacy services..." (Sec. 1001 (b)(3)) Also, a general school improvement theme was added. Tittle I programs "need to become even more efficient in improving schools in order to enable all children to achieve high standards..." (Sec. 1001 (b) (4)The theme of school improvement is further reinforced by reference to Goal 3 of the Goals 2000: Educate America Act:

    1. STUDENT ACHIEVMENT AND CITIZENSHIP.–
    1. By the year 2000, all students will leave grades 4, 8, and 12 having demonstrated competency over challenging subject matter including English, mathematics, science, foreign languages, civics and government, economics, arts, history, and geography, and every school in America will ensure that all students learn to use their minds well, so they may be prepared for responsible citizenship, further learning, and productive employment in our Nation’s modern economy. (Sec.102 (3))

The overall results of these changes to ESEA is that IASA funds can be used for a wider variety of educational purposes than were funded by ESEA. The breadth of these purposes makes evaluations of the success of IASA extremely challenging. Since assessment is a critical component in evaluation of success, the assessment requirements for IASA Title I the challenging nature of this task.

Assessment Requirements for IASA Title I.

The Assessment requirements for IASA Tile I are quite explicit and extensive. These include:

    1. Yearly Student assessment in mathematics and reading or language arts;
    2. The same assessment for all children;
    3. Assessments aligned to challenging content and performance standards;
    4. assessments that are reliable and valid and meet technical standards for quality;
    5. assessments administered at least some time during grades3 through5, grades 6 through 9, and grades 10 through 12;
    6. Assessments involving multiple up-to-date measures of performance;
    7. Assessments measuring higher order thinking skills and understanding;
    8. Reasonable adaptations and accommodations for students with diverse learning needs;
    9. For students with limited English proficiency, assessment in the language most likely to yield accurate and reliable information on what students know and can do in subject matter areas other than English;
    10. Interpretive and descriptive reports for individual students;
    11. Reports by state, local education agency, and school disaggregated by gender, racial and ethnic group, English proficiency status, migrant status, disability status, and economic status; and
    12. reports of the accomplishments of adequate yearly progress goals.

Some of these requirements appear contradictory. For example, (2) specifies that the same assessment be used for all children while (8) and (9) suggest different assessments for students with special needs. Other requirements need to be combined to clearly indicate the characteristics of the needed assessments. Number (5) indicates the levels of students that need to tested every year.

Overall, the assessment requirements place extensive demands on schools that receive Title I funds. At the time of the implementation of the IASA legislation, many states did not have assessment programs that could support all of these requirements. To realistically address the issue, the legislation allows states to develop and phase in the necessary assessment features over a period of years. However, if appropriate assessments are not in place by 2001, States applying for Title I funds will need to use assessment programs that have been approved for the other States.

The twelve requirements for assessments in the legislation can be organized around four major topics: (1) stimulating instructional improvement; (2) documenting achievement of well defined goals; (3) using multiple measures to describe student capabilities; and (4) holding schools accountable for results. Each of these topics is discussed below.

Stimulating Instructional Improvement. In recent years the use of assessment has shifted from being solely a source of information about the functioning of educational programs to a mechanism for changing instructional practice. For example, Lesh and Lamon (1992) indicate that "High- stakes tests are widely regarded as powerful leverage points to influence curriculum reform, because such tests tend to be aimed precisely at the infrastructure of schooling." (p.8) The USDE (1996) indicates that "Research suggest that high standards, when coupled with valid and reliable assessments and aligned support, can exert a powerful influence over what children are taught and how much they learn." Riley (1999) considers "basic skills exams at different grade levels" as a desirable way to make things happen.

Title I legislation contains a number of features that suggest that assessment should guide instruction. First, the assessments are required to measure higher order thinking and understanding. The implications is that if schools do not currently teach higher order thinking and understanding, that they should change their curriculum to include such material. Second, adequate yearly progress must be defined and its level of accomplishment documented. Legislation requires the use of content and performance standards and assessments that are aligned with those standards. These features of the legislation obliges schools to make goals more explicit. If schools comply with all of these requirements, the result will be tightly integrated goals, instructional interventions, and assessments.

While there is some evidence that Title I is being successful at moving target schools in the direction of an integrated instruction/assessment program, there is also some concern in that simple checklists are used to determine if States are meeting Title I requirements. "Simple measures of the status of state- adopted standards run the risk of oversimplification. Instead, sensitive analyses must take into account the multiple dimensions that standards currently address and the flexibility that states have to address standards in unique ways (USDE, 1999)." Thus, merely specifying goals for educational programs is not sufficient. The educational approach must be carefully monitored to determine whether the spirit of the legislation is being followed as well as the letter of the law.

Documenting Achievement. The IASA legislation requires that states assess students to determine whether they have met the state’s performance standards.

Specifically, the legislation requires

Each state plan shall demonstrate that the State has developed or adopted a set of

high -quality, yearly student assessments, including assessments in at least mathematics and reading or language arts, that will be used as the primary means of determining yearly performance of each local educational agency and school served under this part in enabling all children served under this part to meet the State’s student performance standards. (Sec.1111(b)(3))

An additional requirement for the assessment is included in the Federal regulations for IASA 1994 Title I. These regulations state that three levels of standards must be defined. These include "two levels of high performance-proficient and advanced- that determined how well children are mastering the material in the State’s content standards," (Sec. 200.2 (a) (2)(i)(B)) and "a third level of performance- partially proficient- to provide complete information to measure the progress of lower performing children toward achieving to the proficient and advanced levels of performance." (Sec. 200.2 (a)(2)(i)(C))

Those States applying for Title I funds must produce a state assessment plan and submit that plan to USDE for approval. The guidelines for reviewing the assessment are provided in "Reviewer Guidance for State Content and Performance Standards Under Title I" USDE, Undated). The plan must document that standards and assessments are in place or they are scheduled to be in place. The state must also document that the standards are challenging. If the state is not developing standards for all students, they must develop them for the students that are participating in Title I, Part A programs (USDE, 1999). The goal is to determine whether "students in high- poverty schools will show gains in reading/language arts and math at least comparable to those of other students in their state (USDE, 1996)."

While there are detailed requirements and guidelines for Title I assessments, states also have substantial flexibility in the procedures for meeting the requirements. Title I does not require a specific type of assessment. If a state has its own accountability measures, they can be used for Title I purposes. However, the USDE will encourage states to produce an accountability system if one does not exist because it is "good educational policy and the right thing to do for the children." The definition of adequate yearly progress and the overall standards for performance are also left for the states to define. However, the assessment must be carefully aligned to the goals and standards that the state specify.

Using Multiple Measures. The IASA requires that the state assessments "involve multiple up-to-date measures of student performance, including measures that assess higher order thinking skills and understanding (Sec. 1111(b)(3)(E))." The implications of this part of the legislation foe assessment practices is unclear. The regulations that support the IASA legislation give little guidance. The only relevant comment seems to be "For example, and assessment in an academic subject such as social studies may sufficiently measure performance in reading/language arts. Particularly at the secondary level, the Secretary believes it may be especially appropriate to measure performance in reading/language arts through assessment in content areas." (USDE,1995) This could be interpreted to mean that a content area test in social studies could be used as one of the multiple measures used to assess reading/language arts proficiency.

Although the requirement of multiple measures is part of the legislation, the 1996 NATI report provides no information about states’ compliance with the requirement. In fact, the report makes absolutely no mention of the need to use multiple measures to document the performance of Title I students. Nor does the final report on Title I (USDE, 1999). It seems that this requirement is not given much emphasis in practice.

Holding Schools Accountable. IASA/ESEA Title I provides money to schools to help them educate disadvantaged students to high educational standards. Attached to the acceptance of these funds is the requirement that schools show that these goals have been accomplished. The ESEA states "that effective procedures, including provision for appropriate objective measurements of educational achievement, will be adopted for evaluating at least annually the effectiveness of the programs in meeting the special educational needs of educationally deprived children." (Sec.205 (a) (5)) Further, local education agencies (LEAs) are to make annual reports of progress to the state educational agency about the educational achievement of students in Title I programs.

There has been continuing concern that the required evaluations of programs have not been done well. The first report on the progress made by the Title I schools provides this comment on the quality of the documentation that was provided.

This report represents the first national effort at self-evaluation of broad educational programs designed to assist educationally deprived children. Although falls short of long-range goals for accurate assessment of progress, it provides a guide for state and local agencies to improve their evaluation procedures, and it illuminates the need for more attention to the testing and assessment objectives of the Elementary and Secondary Act of 1965. (U.S. Department of Health, Education and Welfare, 1967)

Bailey and Mosher (1968) were less diplomatic in their review of Title I evaluation procedures. They felt the reports of progress "fell woefully shy of measured and objective evaluations of accomplishments." (p.128) "The precise impact upon the educational achievement of affected youngsters, however, could not generally be measured. " (p.166)

Then Commissioner of Education, Harold Howe II ( U.S. Office of Education, 1967), reported that "Forty states presented incomplete test data and eleven presented none." The reasons given for the failure to provide the required data were:

  1. There was not enough uniformity in the objective tests to justify a compilation.
  2. Post testing was not attempted because the state was committed to obtain only baseline data in fiscal year 1966.
  3. Test data could not be compiled.
  4. There were no state wide testing programs.
  5. Appropriate measuring instruments were not available.

Despite these inadequacies in reporting, the first Annual Report indicated that some states felt "Title I money resulted in higher student achievement, especially in reading, and especially with children ."

The early problems with holding schools accountable have not disappeared. Jaeger and Tucker (1998) suggest that at the time of writing A Guide to Practice for Title I and Beyond " states and school districts have not yet begun collecting the information needed to address all Title I requirements..." Similarly, a National Center for Educational Statistics (1998) report indicates that principals "are probably not fully aware of what implementing the changes (required by Title I) would entail, and that they are not very far along in the process of implementation."

Several approaches have been or are being put in place to hold schools accountable. Congress mandated a National Assessment of Title I (NAT I). The mandate requires examination of " both student and system performance at the national level (USDE, 1996)." Riley (1999) calls for a "strong accountability mechanisms" to demonstrate that Title I funds are used appropriately. To support the accountability function, Riley (1999) recommends an annual report card at the state, district, and school level as a requirement for receiving IASA funds. Among other things, the report card should provide information on student achievement. This information should be reported by demographic subgroup to allow greater focus on gaps between disadvantaged students and other students. The Voluntary National Test is suggested as one vehicle for the accountability role. "The inclusion of all children in appropriate assessments is intended to hold school systems accountable for all children, whether or not they have limited-English-proficiency or disabilities." (USDE,1996)

Summary. This Overview of the major assessment goals for Title I should give some sense of the complexity of the assessment requirements that are built into the legislation. It is not enough to give a test and look for improvement. The assessments must be tightly connected to standards and they must be sensitive to changes in students’ performance. The assessments should support instruction, but also be useful for holding schools accountable for properly using Title I funds. These are all very demanding requirements.

To further define the assessment requirements for Title I, the next section of this paper will summarize the characteristics of the student population and the educational content to be assessed. These features must be considered when assessments for Title I are designed and developed.

The Assessment Targets

Student Population.

The eligibility for Title I funds under the original ESEA (1965) authorization was based on a number of criteria. First, students were identified that were from families with income less than a "low-income factor." That factor was initially sat at $2,000. Then, a school district was eligible for funds if they had at least 100 students or 3% of their student population, whichever was less, in the low income group. In no case could the number be less than 10. Students in the age range from five to 17 could be considered in determining the number of students with low-income status. The department of Commerce

was charged with providing data for determining the number of students that met the Title I criteria.

During the first year of Title I implementation (1965), about 92% of all LEAs were eligible for Title I funds. Of those that were eligible, about 70% actually participated. The first year report indicates that Title I served 8.3 million children Most of these children (65%) were enrolled in pre-school through grade six, and almost all were from public schools (92%) (Bailey & Mosher, 1968).

IASA maintained the focus on students from "higher poverty schools and communities," but the 1994 reauthorization provided more flexibility in the use on Title I funds so that all students in eligible schools could benefit from the funding. The flexibility in funding allows students to be helped with programs funded by Title I without being labeled a Title I student. The result is that the population of students that should be assessed for Title I purposes is broader than that which meets the strict Title I definition.

The current program reaches children "primarily in the early elementary grades: one in five first graders participates" (USDE, 1996). In 1996-97, 11 million students were served by Title I. The ethnic characteristics of this population were: 36% white; 30% Hispanic; and 28% African- American. Overall, 17% were limited English proficient. Students attended approximately 51,000 schools. Of these schools, about 44% use Title I funds for schoolwide programs. The rest use funds for in- classroom or pull-out programs (National Center for Educational Statistics, 1998). While the number of students served by Title I has increased over the years, there is still the need to work with "more high-poverty high schools (USDE,1996)."

The statistics in the USDE reports give the impression that the Title I student population is well defined, but in reality, there is confusion in the definition of the target population. By default, enrollment in Free and Reduced Price Lunch Programs seems to be the criterion for being considered for Title I, but whole school programs bring in a variety of other students (New York State Education Department, 1998). IASA also specifically includes students in other categories besides economically disadvantaged. Students with limited English proficiency, children of migrant workers, those with disabilities, Indian children, children who are neglected or delinquent, and young children and their parents who are in need of family literacy services are also eligible under the IASA legislation. Jaeger and Tucker (1998) see the addition of these categories as raising issues about the "specification of appropriate definitions for categories of students."

In addition to the targeted categories of the students, the Federal regulations for IASA give targeted grade levels. Assessments must be administered at some time during Grades 3 through 5, Grades 6 through 9, and Grades 10 through 12 (Sec.200.4 (b) (5)). However, the actual Title I population contains relatively few secondary school students. Data from 1995 indicate that only 30 percent of public secondary schools are participating. These numbers may increase with a renewed emphasis on secondary education.

Content Domains.

The IASA legislation explicitly specifies that schools that use Title I funds must assess their students "in at least mathematics and reading or languages arts." ( Sec.1111 (b) (3)) There is little additional guidance in the IASA regulations concerning definitions of these curriculum areas or how to determine whether assessments are aligned with the content domains. Currently, the only source of guidance in this area is an informal set of instructions for reviewers of programs (USDE, Undated). More detailed information about the definitions of the curriculum areas is presented in a series of reports by the Council of Chief State School Officers (CCSSO). According to CCSSO (1998), 37 states have curriculum standards ready for implementation in English/language arts and 42 states have them in mathematics. CCSSO (1997) gives examples of content standards and benchmarks for states that have produced such standards.

The general impression left by these documents is that states are quite varied in the way they present standards. For example, Colorado has six mathematics standards for Grade 8; Illinois has five mathematics standards for the middle school/junior high level. The standards seem to be on the same general topics, but they are at different levels of details. For example, the Colorado Model Content Standards for Mathematics include a number sense standard that is described in this way :

Students develop number sense and use numbers and number relationships in problem-solving situations and communicate the reasoning used in solving these problems.

The number sense standard in Illinois Academic Standards in Learning Areas is:

Demonstrate knowledge and sense of numbers and their representation, including basic operations, ratios and proportions, by using multiple ways of obtaining exact values and estimates to understand patterns involving numbers and their applications.

While these two standards cover roughly the same content, they are at different levels of detail and they have slightly different emphases. The variability in level of detail and focus is general characteristic of all state content standards.

In order to provide a framework for addressing the assessment of content standards relative to the Title I requirements, a conceptual assessment model will be presented followed by an example of its implementation. The model is very general and should apply equally well to the wide variety of state assessment systems.

A Conceptual Model

for the Interaction of Assessment and Instruction

Assessment does not occur in a vacuum. It is always embedded within some cultural setting. Assessments required by Title I occur within the environment of the educational system. To provide a context for further discussion of assessment issues related to Title I , a brief discussion of the functioning of assessment within the educational system is provided. Special emphasis is given to the impact of assessment on instruction.

Goals for Instruction.

The goals for instruction are to (1) specify the content and skills to be learned, (2) provide a framework for developing the content and skills in students, and (3) provide support to the students as they acquire the content and skills. Assessment can be conceptualized as both a part of the instructional process and an external independent device for monitoring the quality of education. In recent years, there has been an increased emphasis on assessment as part of the instructional process rather than as an external monitoring device. For example, Wiggins (1998) in his book Educative Assessment, indicates "The proposals presented in this book are all based on a simple principle: assessment should be deliberately designed to improve and educate student performance, not merely to audit it as most school tests currently do." (p.xi)

In the specific case of Title I, assessment has both the goal supporting instruction and the goal of auditing the success of the instructional intervention. To meet these dual goals , the assessments need to provide two different types of information. The first is information that can inform the instructional process. Many have argued that this information should include models of good performance, diagnostic information that can be used to customize instruction to meet the needs of individuals or small groups of students, and timely feedback that can be used to reinforce progress or provide early warning to those not meeting expected levels of performance (e.g.,Tombari & Borich, 1999). The second type of information concerns the functioning of educational institutions at the classroom, school, or district level. Such information supports the accountability requirement of the Title I program. Despite the different level of detail and the type of use for two types of assessment information, they both focus on the same complex knowledge and skill domains–that is, the domains defined by each state’s curriculum in reading/language arts and mathematics.

Assessment Model.

In a very general sense, assessments are based on observations of the interactions of students with assessment tasks. An evaluation of the product of a student’s interaction with the task provides the assessment information. For example, for multiple-choice tests, students’ answer choices are the products of the interaction of students with test items. These answer choices are evaluated and converted to 0, 1-item scores. Frequently, the item scores are summed to produce raw test scores, but testing programs like NAEP use more complex IRT-based methods for converting the item scores to reported scores (Educational Testing Service,1990).

In recent years, there has been greater emphasis on having students interact with more elaborate tasks to support the instructional role of assessment (Mitchell, 1992). The essays produced in response to writing prompts and the outcomes of hands-on science assessments are of this type. Highly trained readers using carefully prepared scoring rubrics evaluate the products of these interactions. The end result of most assessment is a numerical score from each student/task interaction that is combined or aggregated to provide the assessment information that is needed. It is not always necessary to give assessment results in quantitative form. Results can also be provided as narratives or using other descriptive devices. Results in non-quantitative forms are more difficult to aggregate, but they can be very useful in supporting instructional goals.

Portfolio assessment provides a means of collecting interactions of the students with even more elaborate tasks. Very often the tasks are instructional activities themselves, rather than fixed tests. Still, the assessment result is an evaluation of the product produced by the interaction of the student with the task. The evaluation of the product is more difficult, expensive, and time consuming, but the same conceptual model applies.

In some cases, the important outcome of the interaction of the student with the task is a process rather than a product. Process outcomes can be evaluated by observers and accumulated or aggregated to give quantitative measures of performance.

Implications for Assessment Design.

When the assessment of student capabilities is considered as the collection of the outcomes of the interactions between students and assessment tasks, the test design process becomes one of determining how the tasks will be selected, and how the environment for the interactions with the task will be designed. The possible assessment tasks vary in the amount of time they take and the types of product that they generate. There are three different philosophies for the selection of assessment tasks. The first is that the tasks are considered as a sample from a domain of possible tasks related to a particular content area. Although the theory underlying this approach assumes random sampling from a well-defined set of tasks (Nunally,1967 p.175), it is seldom that anything close to a random sample is ever used in practice. At best, the domain sampling approach results in assessment tasks that span the content given in a domain definition.

The second philosophy for selecting tasks for a test is identifying tasks that provide information about the location of an examinee on a continuum defined by a hypothetical construct. The goal is to select tasks that are highly related to performance on the hypothetical continuum (Lord & Novick, 1968 p.351). The resulting scale is created to provide measures of the hypothetical construct. This model for assessment does not match the requirements for Title I since the curriculum areas are not usually defined as unidimensional continua. Rather, they are complex collections of knowledge and skill domains.

The third philosophy of test development requires selecting tasks that are indicators of particular capabilities, but that do not themselves necessarily require the capabilities. Performance on the indicators is summarized as an index. This approach is used in economics to create such measures as the Dow Jones Industrial Index. The Dow Jones Industrial Index is not based on a representative sample of stocks, but rather is based on a collection of stocks that are judged to be important indicators of the health of the economy. It seems possible to construct a similar indicator of academic performance. In fact a "market-basket" index has been suggested as a means for reporting NAEP results (Mislevy, 1998). While useful as a summary measure, the indicator approach does not provide detailed information that can be used to guide classroom instruction.

Of these three philosophies of test development, the domain sampling approach seems to best match Title I requirements. The remaining portion of this paper will use the domain sampling approach to test development to frame the discussion of the issues related to three major factors of assessment in Title I. These are: (1) the reporting requirements for Title I; (2) the content domains to be assessed; and (3) the match of the assessments to the student population.

Issues in Assessment of Title I Outcomes

Reporting Issues.

Assessment devices are developed to provide information about the capabilities of specific individuals for purposes specified by the audiences for the assessment results. To extent that the needs of the audiences are met, the assessment approach is successful. For this reason, the discussion of assessment issues related to Title I begins with the reporting requirements for the assessments. These requirements indicate the types of information that need to be provided by the assessments.

 

The review of assessment requirements for Title I indicates that accountability is the major focus of the reporting requirement. Secretary Riley clearly makes this point. Educational administrators at the school, state, and Federal level are the primary audience for accountability information. Secondary audiences are the teachers, parents, and students, but there is little evidence that report of Title I results are reaching teachers, parents, and students in a form that is useful to them. Certainly, state testing programs provided reports of results for teachers, parents, and students, but these reports would exist even if Title I did not.

There are two facets to the accountability requirements for Title I assessments. The first facet relates to the goal of closing the gaps between the performance of at-risk groups and the performance of the rest of the student population. In order to determine whether the gaps are closing, the legislation requires that results be desegregated by gender, racial/ethnic, economic, migrant, and disability groups. To perform the disaggregations, demographic information about the students must be collected, and students must be accurately classified into the categories. These classifications may be difficult for economic, migrant, and disability groups. As mentioned earlier, economically disadvantaged has been operationally defined as enrolling for a free or reduced lunch program. This operational definition of economic disadvantages is easy to use, but it may not adequately capture the intent of the legislation.

For the information on the gap between achievement or disadvantage groups and the rest of the student population to be useful, it must be accurate and comparable over time so that trends in the success of Title I programs can be monitored. To achieve these requirements the difficulty level of the assessment must be matched to the purpose so that sufficient measurement precision can be achieved. If the assessments are to difficult. Resulting in large standards errors of measurements, gaps in performance levels for target groups will not be detectable.

The second facet of the accountability requirements for Title I is monitoring gains in school improvement. The legislation indicates that results should be reported by state, LEA and school. The percent of students exceeding standards should be reported and targets for adequate yearly progress should be specified. All those reporting requirements are aimed at the assessment of growth.

The assessment of growth has always been a challenging psychometric problem because each test score contains error and growth measures based in multiple assessments accumulate the error. In order to assess growth, assessment tools must be comparable and sufficiently reliable that accumulated error does not mask changes in performance. The assessments also need to focused at the range of capabilities where change is most likely to occur.

The orientation of Title I assessments toward accountability has overshadowed requirements for student reports. Title I legislation indicates that descriptive and interpretive reports should be prepared for individual students. However, no specific requirements for such reports could be found.

Implications. The reporting requirements for Title I indirectly specify requirements for the assessment procedures used to support the program. These requirements are summarized here.

    1. The assessment must be focused on the levels of performance of targete

groups. In many cases, these targeted groups have very low levels of

performance. To be appropriate for these groups of students, the level of

difficulty of the assessment tools should be matched to the capabilities of the

students.

    1. The assessments must be of sufficient accuracy that differences in mean levels

of performance for the different target groups can be detected. The

assessments should also be sensitive to gains in performance.

    1. To track gains over years and between groups, comparable reporting mechanisms are needed. Allowing LEP students to be assessed in their native language is a particularly challenging part of the comparability requirement. It is very difficulty to equate tests written in different languages, or translated from language to another. Yet, if the results on the assessments are to compared, equability, or some other form of comparability is an absolute requirement.

Together, these three requirements imply a particular type of assessment. It is one that is appropriate for the target groups, especially not too difficult, with a fairly high level of measurement precision. It would be best if the same assessments were used for all so that comparability would be assured, but if that is possible, then it must be possible to transform the assessment results to a form that is directly comparable. Creating such transformations is a very difficult psychometric challenge.

Student Population.

The two major factors in the reporting requirements, the need to document growth and the concern with gaps between the target population and the rest of the student population, suggest receiving Title I services. Such targeting will result in instruments that are sensitive to changes in performance of those students. However, since there is also a need to compare the performance of targeted students with those in the regular population, the assessments must provide reasonable measurements across the entire range of achievement.

Students receiving Title I services are of two major types. First, there are those who have been identified as being educationally deprived because of certain demographic factors. The demographic factors include: children from low-income families, children of migrant workers, American Indian children, and neglected or delinquent children. There is nothing in these classifications that is explicitly related to level of achievement, but historically, these groups have had low levels of achievement.

The second type of students that receives Title I services has characteristics that actively influence the educational process. Students with limited English proficiency and those with disabilities fall into this category. The legislation indicates that assessment accommodations should be used with these students, but it must still be possible to compare the performance of these students with those in the general student population.

In addition to the major types of students that receive the Title I services, different ages are specified in the legislation. Students must be assessed at least during grades 3 through 5, grades 6 through 9, and grades 10 through 12. Since it is unlikely that single assessment can accurately assess all of these grade levels, this requirement implies least three levels of assessment.

Implications. The make up of the examinee population has the following implications for the development of the assessment instruments of the use with Title I.

  1. Assessment must be produced for at least three grade levels.
  2. Assessment should be most sensitive to changes in performance for those in the lower part of the achievement distribution. Such instruments will allow accurate documentation of adequate yearly progress for the students served by Title I.
  3. Assessment should have enough breadth of coverage that comparisons between the performance of Title I students and the rest of the student population can be accurately made.
  4. In non- language areas, accommodations should be available for LEP students.
  5. Assessment accommodations should be available for the students with the disabilities.
  6. It should be possible to compare results from assessment with accommodations to those from the regular assessments.

These requirements suggest the development of assessments that are focused at the capabilities of the typical Title I student, but that have substantial ceiling so that growth can be documented and comparisons can be made to the level of performance of the rest of the student population. They also suggest that the challenge of producing assessments with accommodations that can be compared to those without accommodations needs to be met. This is an extremely difficult problem that will require extensive development efforts and sophisticated psychometric analyses.

Content.

The Title I legislation indicates that at least two content areas must be assessed: mathematics and reading/language arts. Of course, language arts must be assessed in English, but mathematics can be assessed using another language if such procedures will give a better estimate of students’ mathematical skills.

The assessments are to evaluate student’s acquisition of challenging content and whether they meet performance standards that have been developed by the state. Theses standards are to include complex thinking and problem solving skills, and higher order thinking skills and understanding. The legislation also indicates that the same assessment should be used for all children, and that the standards apply to all children.

A domain sampling approach to assessment development best meets these assessment specifications. The requirement that results be reported relative to standards of performance suggests that the goal is to determine what portion of the domain specified by the content standards has been acquired. Producing assessments that meet these requirements is not an easy task because the content domain is typically not very well specified and it is difficult to produce a sampling plan from that domain.

As an example of the problem, consider the content specification from the Colorado Model Content Standards for Mathematics discussed earlier.

Students develop number sense and use numbers and number relationships in problem-solving situations and communicate the reasoning used in solving these problems. What is the domain that is defined by this content standard? Is it possible to determine if an item assesses that domain or it does not ? What are the boundaries of the domain ? What types of complex thinking and problem solving should be required by the assessment relative to this standard? What proportion of an assessment should be devoted to this domain? These questions are not easily answered. A detailed domain definition is needed that clearly delimits the boundaries of the domain. From the definition it must be possible to clearly determine whether an assessment task is appropriate for the domain or it is not. The relative importance of each content standard must be determined so that the emphasis of the content on the assessment can be determined.

Implications. These issues have the following implications for assessment development.

  1. The domains implied by each content standard should be explicitly defined. This includes clear delineation of the boundaries of the domain.
  1. The level of higher- order thinking skills required in each domain should be specified.
  2. Rationales should be developed for the emphasis given to each content standard in the development of the assessment instruments.

The requirements listed above, when combined with the requirement that the same assessments be used for all students, dictate a very challenging assessment development task. The match to challenging content standards and complex thinking skills will force the assessments to be fairly difficult. The fact that they must apply to all children means that there must be sufficient floor on the assessment to document growth for struggling students. This implies assessments that evaluate very broad ranges of skills, from those just being developed to those demonstrated by students who have mastered the material. A sampling plan for the domains of content and skills will need to insure that skills at the low end of the range are evaluated as well as those at the high end .

Technical Requirements.

The technical psychometric requirements for the assessments are that they be reliable and valid, and that comparisons be possible with adapted versions for LEP students and accommodations for disabilities. Of course, reliability and validity refer to specific applications of assessments should result in substantial variation in the true score estimates and that the standard error of estimates of achievement should be acceptably small. The standard errors must be small enough to allow sensitive evaluations of growth in performance as well as evaluations of differences in performance for targeted group

The validity requirement implies that the assessment provide good estimates of accomplishments of the domain of content given in the content standards. Information on

the validity of assessments of this type is typically given by careful judgments of the match between assessment tasks and the content domain definition.

The issues of adaptations and accommodations imply that solid equating or concordance procedures be developed for the various versions of the assessments so that the required comparisons can be done to show that the different versions of the assessments do not result in differential effects for different examinee groups.

Implications. The collection of technical requirements imply that the following features will be needed in the assessment procedures.

  1. The assessments must be sufficiently long and have enough breadth of coverage to

provide small standard errors for the Title I student population, small standard errors

at cut scores for performance standards, and sufficient precision to allow group

comparisons to be made.

  1. The assessments should be evaluated to determine the match to the content standards

and for inclusion of higher order thinking skills.

  1. Psychometric methods such as equating and concordance methods, or differential

item functioning procedures should be used to develop methods for comparing results for adapted versions of tests for LEP students and for accommodations for students with disabilities.

This collection of requirements suggest that a fairly sophisticated set of assessment tools need to be developed for Title I. It seems unlikely that an off-the-shelf, standardized assessment will meet all of the requirements, especially when each state has developed different content standards. The assessments will need to be carefully designed to meet both the content and the psychometric requirements given in the legislation.

An Approach to Title I Assessment

While there are many different assessment approaches that can be used to meet all of the IASA Title I requirements, a general approach will be presented as an example of the procedure that might be followed to meet all of the guidelines. This example is not meant to match the procedures used by any state, nor is it meant to prescribe methods to be used by any state. The goal is to describe, in a very general way, one approach to meeting the many requirements implied by the IASA legislation and regulations.

Content Specifications.

The initial step in developing specifications for an assessment that meets the IASA requirements is developing a detailed description of the content to be measured by the assessment. To accomplish this goal, the test developers must internalize the content standards that have been produced by a state and create a domain description for the assessment tasks. The domain description must be sufficiently detailed that a reviewer can determine whether assessment tasks are consistent with the domain description or they are not.

The traditional approach to representing content specifications is to produce a two-way table with content areas specifying the rows and levels of cognitive complexity specifying the columns. This is especially appropriate for Title I assessments because the legislation requires the inclusion of higher-order thinking skills and comprehension. The cells of the table define sub-domains of content and cognitive levels. It is from these sub-domains that assessment tasks are sampled. Table 1 provides a simple representation of the typical table of specifications.

Table 1

Example of Table of Specifications

 

Cognitive Level 1

Cognitive Level 2

Cognitive Level 3

Content Area 1

Subdomain

Subdomain

Subdomain

Content Area 2

Subdomain

Subdomain

Subdomain

Content Area 3

Subdomain

Subdomain

Subdomain

When this type of table specifications is developed, the emphasis for each content area and cognitive level is specified as well, usually as a number of assessment tasks, or number of points awarded, for each cell in the table. The task of the test developer is to produce or select a set of assessment tasks that adequately represent the subdomains within the limits of the number of tasks or testing time that can be allocated to each cell. This part of the assessment plan can also include requirements for multiple measures. For example, if the ability to synthesize ideas presented in multiple documents is the skill to be assessed, it is unlikely that can be accomplished with multiple-choice items. The type of assessment should be matched to the qualitative characteristics of the subdomain.

For IASA, test plans of this sort are needed for mathematics and reading/language arts, for each of the three grade ranges specified in the legislation. The number of content areas and cognitive levels will depend in the content standards that have been produced by the state.

Psychometric Specifications.

Along with the content specifications for the assessments, psychometric specifications are needed to insure that the purposes for the assessments can be supported. The purposes indicate that it must be possible to distinguish between average performance for target groups, document growth for Title I students compared to standards for adequate yearly progress, and accurately classify students relative to performance standards. To achieve these goals, assessments must be produced that provide information with small standard errors of measurement at the performance standards and at the levels of performance where growth is most likely to occur.

Psychometric test specifications can be conveniently produced using item response theory methodology through target information functions. These target information functions can be derived from specifications for desired standards errors at points along an achievement scale. Tasks can then be selected from the content subdomains so that the resulting assessment device matches the target information function.

Similar psychometric specifications can be developed using classical test theory. The desired standard errors of measurement are the starting point. Using the classical test theory model, some assumptions about the distribution of student performance is also needed so that a target mean score and reliability can be estimated.

Because of the need to assess growth, it would be desirable that the test specifications consider means for linking the score scales over grade levels. This can be done if it is possible to administer at least part of an assessment to students at more than one grade level. Related methods will also be needed to determine how to translate scores on assessments in different languages onto a common scale so comparisons can be made. Similar procedures will be needed for assessments using accommodations for disabilities.

In order for the scales from these different versions of the assessments to be linked, all of the assessments must be measuring the same combinations of skills. If the skills measured are highly similar, it may be possible to formally equate the different versions of the assessments. If the different versions of the assessments have non-overlapping content, if, for example, use of a different language on a mathematics test changes the statement of the problems, than the best that can be accomplished is a concordance relationship between the assessments.

At a minimum, the assessment designs should plan to include methods for investigating comparability of assessment scores at different levels and for different accommodations. Depending on the results of the comparability studies, plans for equating, or plans for developing a concordance should be developed to produce scores that can be used to assess growth, compare groups, and determine the proportion above performance standards.

Summary and Conclusions

The purposes for this paper have been to review the assessment requirements for ESEA and IASA Title I legislation, To determine the practical implications of those requirements, and to suggest a general approach that can be used to meet the requirements. The information provided about the assessment requirements was obtained from the text of the legislation, the regulations that have been produced to implement the legislation, and a number of documents that have been developed to help state educational programs meet the requirements for Title I funding. The reference list gives a sampling of these materials. A much more extensive set of sources exists, and many were reviewed in the development of this paper.

Title I assessment requirements focus predominantly on accountability. Schools must be able to document that the funds hat the receive from the Federal Government for Title I purposes are being used effectively to improve the educational programs for economically, culturally, and physically disadvantaged students. Effectiveness is defined in the legislation as reducing the differences in performance between Title I and other students, improving the performance of targeted students, and generally helping all students in schools that receive Title I funds to achieve performance standards that have been set by the states.

To document effectiveness for Title I purposes, states must use assessments that are sensitive to changes in performance of Title I students on knowledge and skills specified in state content standards. These standards must include challenging materials with at least some component of higher order thinking skills and comprehension. There must also be accommodations for differences in native language and for disabilities. Psychometric procedures must be used to allow reasonable comparisons despite the differences in assessments brought about by the accommodations.

Overall, the assessment requirements in Title I legislation and regulations are very challenging. To meet the requirements, state-of-the-art procedures in test design, equating, and standard setting must be applied.

References

Bailey, S.K. & Mosher, E.K. (1968). ESEA: The Office of Education Administers a Law. Syracuse, NY: Syracuse University Press.

Balnk, R.K., Manise, J. G., Brathwaite, B.C.,& Lanesen, D. (1998). State education Indicators with a Focus on Title I. Washington, DC: Council of Chief State School Officers and U.S. Department of Education.

Council of Chief State School Officers (1998). Standards, Graduation, Assessment, Teacher Licensure, Time and Attendance: A 50-State Report. Washington DC: Author.

Council of Chief State School Officers (1997). Mathematics and Science Content Standards and Curriculum Frameworks: State Progress and Implementation. Washington DC: Author.

Educational testing Service (1990). 1990 NAEP Technical Report. Princeton, NJ: Author.

Elementary and Secondary Education Act of 1965, Public Law No. 89-10, Section 2,201-212.

Goals 2000: Educate America Act. Public Law No. 103-1804.

Hoff, D.J.(1998).Civil rights group claims softening of Title I standards. Education Week. 18, September 23,1998.

Improving America’s School Act of 1994. Public Law No. 103-382, Section 2,1001-14802.

Jaeger, R.M. & Tucker, C.G. (1998). A Guide to Practice for Title I and Beyond. Washington, DC: Council of Chief State School Officers.

Lesh, R. & S.J.(1992). Trends, goals, and priorities in mathematics assessment. In R. Lesh & S.J. Lamon (Eds.) Assessment of Authentic Performance in School Mathematics. Washington DC: American Association for the Advancement of Science.

Lord, F.M. & Novick, M.R. (1968). Statistical Theories of Mental Scores. Reading, MA: Addison-Wesley.

Mislevy, R. J. (1998). Implications of market-basket reporting for achievement- level setting. Applied Measurement in Education. 11(1), 49-63.

Mitchell, R. (1992). Testing for Learning: Now New Approaches to Evaluation Can Improve American Schools. New York: The Free Press.

National Center for Educational Statistics (1998). Status of educational reform in public elementary and secondary schools: Principals’ perspectives ( Statistical Analysis Report NCES 98-025). Washington DC: U.S. Department of Education.

New York State Education Department (1998). 1997-98 LEAP Manual: Student Assessment Data Collection. Albany, NY: The University of the State of New York.

Nunnally, J. C. (1967). Psychometric Theory. New York: McGraw-Hill.

Riley, R. W. (1999, February). Statement before the U.S. Senate Committee on Health, Education, Labor, and Pensions on the Reauthorization of the Elementary and Secondary Education Act of 1965. Washington, DC: Department of Education.

Sack, J. L. (1998). Groups revving up for ESEA reauthorization. Education Week, 18, November 5, 1998.

Tombari, M. & Borich, G. (1999). Authentic Assessment in the Classroom: Applications and Practice. Upper Saddle River, NJ: Merrill.

U.S. Department of Education (Undated). Reviewer Guidance for State Content and Performance Standards under Title I. Washington DC: Author.

U.S. Department of Education (1999). Promising Results, Continuing Challenges: Final Report of the National Assessment of Title I. Washington DC: Author.

U.S. Department of Education (1996). Mapping Out the National Assessment of Title I: the Interim Report. Washington DC: Author.

U.S. Department of Education (1995). Title I- Helping Disadvantaged Children Meeting High Standards: Final Regulations. Washington DC: Author.

U.S. Department of Health, Education and Welfare. (1967). The First Year of Title I. Washington, DC: U.S. Government Printing Office.

U.S. Office of Education (1967). First Annual Report, Title I, Elementary and Secondary Education Act of 1965. Washington, DC: U.S. Government Printing Office.

Wiggins, G. (1998). Educative Assessment: Designing Assessments to Inform and Improve Student Performance. San Francisco: Josey-Bass.