- Kim H. KohKim H. KohWerklund School of Education, University of Calgary
Authentic tasks replicate real-world challenges and standards of performance that experts or professionals typically face in the field. The term “authentic assessment” was first coined by Grant Wiggins in K‒12 educational contexts. Authentic assessment is an effective measure of intellectual achievement or ability because it requires students to demonstrate their deep understanding, higher-order thinking, and complex problem solving through the performance of exemplary tasks. Hence authentic assessment can serve as a powerful tool for assessing students’ 21st-century competencies in the context of global educational reforms. The review begins with a detailed explanation of the concept of authentic assessment. There is a substantial body of literature focusing on the definitions of authentic assessment. However, only those that are original and relevant to educational contexts are included.. Some of the criteria for authentic assessment defined by the authors overlap with each other, but their definitions are consistent. A comparison of authentic assessment and conventional assessment reveals that different purposes are served, as evidenced by the nature of the assessment and item response format. Examples of both types of assessments are included. Three major themes are examined within authentic assessment research in educational contexts: authentic assessment in educational or school reforms, teacher professional learning and development in authentic assessment, and authentic assessment as tools or methods used in a variety of subjects or disciplines in K‒12 schooling and in higher education institutions. Among these three themes, most studies were focused on the role of authentic assessment in educational or school reforms. Future research should focus on building teachers’ capacity in authentic assessment and assessment for learning through a critical inquiry approach in school-based professional learning communities or in teacher education programs. To enable the power of authentic assessment to unfold in the classrooms of the 21st century, it is essential that teachers are not only assessment literate but also competent in designing and using authentic assessments to support student learning and mastery of the 21st-century competencies.
The term “authentic assessment” was first coined in 1989 by Grant Wiggins in K‒12 educational contexts. According to Wiggins (1989, p. 703), authentic assessment is “a true test” of intellectual achievement or ability because it requires students to demonstrate their deep understanding, higher-order thinking, and complex problem solving through the performance of exemplary tasks. Authentic tasks replicate real-world challenges and “standards of performance” that experts or professionals (e.g., mathematicians, scientists, writers, doctors, teachers, or designers) typically face in the field (Wiggins, 1989, p. 703). For instance, authentic tasks in mathematics need to elicit the kind of thinking and reasoning used by mathematicians when they solve problems.
In the assessment literature, some authors have argued that the term “authentic” was first introduced by Archbald and Newmann (1988) in the context of learning and assessment (Cumming & Maxwell, 1999; Palm, 2008). However, the term “authentic” in Archbald and Newmann (1988) was associated with achievement rather than assessment. A few years later, Newmann and Archbald (1992) provided a detailed explanation of authentic achievement. Cumming and Maxwell (1999) have aptly pointed out that authentic assessment and authentic achievement are interrelated, as it is important to identify the desired student learning outcomes and realign the methods of assessment to them. Authentic assessment should be rooted in authentic achievement to ensure a close alignment between assessment tasks and desired learning outcomes. This alignment is of paramount importance in the worldwide climate of curriculum and assessment reform, which places greater emphasis on the development of students’ 21st-century competencies—including critical and creative thinking, complex problem solving, effective communication, collaboration, self-directed and lifelong learning, responsible citizenship, and information technological literacy, just to name a few.
In addition to K‒12 education, “authentic assessment” was further defined by Gulikers, Bastiaens, and Kirschner (2004) in the context of professional and vocational training that incorporates competence-based curricula and assessments. To better prepare students for their future workplace, there is a need for assessment tasks used in professional and vocational education to resemble the tasks students will encounter in their future professional practice. Authentic assessments in competence-based education should create opportunities for students to integrate learning and working in practice, which results in students’ mastery of professional skills needed in their future workplace.
Authentic assessment has played a pivotal role in driving curricular and instructional changes in the context of global educational reforms. Since the 1990s, teacher education and professional development programs in many education systems around the globe have focused on the development of assessment literacy for teachers and teacher candidates which encompasses teacher competence in the design, adaptation, and use of authentic assessment tasks or performance assessment tasks to engage students in in-depth learning of subject matter and to promote their mastery of the 21st-century competencies (e.g., Darling-Hammond & Snyder, 2000; Koh, 2011a, 2011b, 2014; Shepard et al., 2005; Webb, 2009). Although many of the 21st-century competencies are not new, they have become increasingly in demand in colleges and workplaces that have shifted from lower-level cognitive and routine manual tasks to higher-level analytic and interactive tasks (e.g., collaborative problem solving) (Darling Hammond & Adamson, 2010). The amount of new information is increasing at an exponential rate due to the advancement of digital technology. Hence, rote learning and regurgitation of facts or procedures are no longer suitable in contemporary educational contexts. Rather, students are expected to be able to find, organize, interpret, analyze, evaluate, synthesize, and apply new information or knowledge to solve non-routine problems.
Students’ mastery of the essential 21st-century competencies will enable them to succeed in colleges, to thrive in a fast-changing global economy, and to live meaningfully in a complex, technological connected world. According to Darling-Hammond and Adamson (2010), the role of performance assessment is critical in helping both teachers and students to achieve the 21st-century standards of assessment and learning. Many authors in extant research have used “performance assessment” and “authentic assessment” interchangeably (e.g., Arter, 1999; Darling-Hammond & Adamson, 2010). Some authors have distinguished between performance assessment and authentic assessment (Meyer, 1992; Palm, 2008; Wiggins, 1989). Thorough review of the literature suggests that there is a need to differentiate performance assessment from authentic assessment.
All authentic assessments are performance assessments because they require students to construct extended responses, to perform on something, or to produce a product. Both process and product matter to authentic assessments, and hence formative assessment—such as open questioning, descriptive feedback, self- and peer assessments—can be easily incorporated into authentic assessments. In other words, the process is as important as the product. As such, authentic assessments also capture students’ dispositions such as positive habits of mind, growth mindset, persistence in solving complex problems, resilience and grit, and self-directed learning. The use of scoring criteria and human judgments are two of the essential components of authentic assessments (Wiggins, 1989).
Although all performance assessments include constructed responses or performances on open-ended tasks, not all performance assessments are authentic. As Arter (1999) pointed out, the two essential components of a performance assessment include tasks and criteria. This suggests that the line between performance assessment and authentic assessment is thin. Hence, the authenticity of a performance assessment or performance-based tasks is best to be determined by Gulikers et al.’s (2004) five dimensions of authenticity; Koh and Luke’s (2009) criteria for authentic intellectual quality; Newmann, Marks, and Gamoran (1996) “intellectual quality” criteria; and Wiggins’s (1989) four key features of authentic assessment. The dimensional framework proposed by Gulikers et al. is appropriate for use with assessments in professional and vocational training contexts including higher education institutions, while Wiggins (1989), Newmann et al. (1996), and Koh and Luke (2009) are appropriate for use with assessments in K‒12 school contexts. The criteria for authentic intellectual quality by Koh and Luke (2009) have also been linked to the Singapore Classroom Coding Scheme, which was developed by Luke, Cazden, Lin, and Freebody (2005) to conduct classroom observations of teachers’ instructional practices. Some of the criteria for authentic intellectual quality were adapted from Newmann et al.’s (1996) authentic intellectual work, Lingard, Ladwig, Mills, Bahr, Chant, & Warry’s (2001) productive pedagogy and assessment, and the New South Wales model of quality teaching (Ladwig, 2009). Lingard et al. (2001) have used the term “rich tasks” instead of authentic tasks in the Queensland School Reform Longitudinal Study. According to the authors, rich tasks are open-ended tasks that enable students to connect their learning to real-world issues and problems.
In short, this section presents a detailed explanation of the concept of authentic assessment. The remaining sections of this article will include a comparison of authentic assessment and conventional assessment, criteria for authenticity in authentic assessment, authentic assessment research in educational contexts (research problems/questions and methods included), and future research in authentic assessment.
Authentic Assessment Versus Conventional Assessment
Authentic assessment serves as an alternative to conventional assessment. Conventional assessment is limited to standardized paper-and-pencil/pen tests, which emphasize objective measurement. Standardized tests employ closed-ended item formats such as true‒false, matching, or multiple choice. The use of these item formats is believed to increase efficiency of test administration, objectivity of scoring, reliability of test scores, and cost-effectiveness as machine scoring and large-scale administration of test items are possible. However, it is widely recognized that traditional standardized testing restricts the assessment of higher-order thinking skills and other essential 21st-century competencies due to the nature of the item format. From an objective measurement or psychometric perspective, rigorous and higher-level learning outcomes(e.g., critical thinking, complex problem solving, collaboration, and extended communication) are too subjective to be tested. An overemphasis on objective measurement and closed-ended item formats has led to the testing of discrete bits of facts and procedures. As such, curriculum is fragmented and dumbed down as many of the desired learning outcomes are measured as atomized bits of knowledge and skills.
Standardized paper-and-pen tests are administered in uniform ways to ascertain student achievement for summative purposes (i.e., grading and reporting at the end of a unit or a semester, certification at the completion of a course). At the classroom level, standardized tests are typically used in summative assessment at the end of instruction. Assessment is seen to be detached from instruction. Large-scale administration of standardized paper-and-pen tests is often used for cross-national comparisons of student achievement. The use of standardized paper-and-pen tests on a large-scale basis is predominant in state/provincial assessments and international assessments. Examples of state/provincial assessments are the Foundation Skills Assessments (FSA) in British Columbia, Canada; the Provincial Achievement Tests (PAT) in Alberta, Canada; and the National Assessment of Educational Progress (NAEP) in the United States. International assessments include the Trends in Mathematics and Science Study (TIMSS); the Progress in International Reading Literacy Study (PIRLS); and the Program in International Student Assessment (PISA). The closed-ended item response format in standardized tests tends to encourage students to fill in the bubbles or provide short answers using their rote memorization of discrete facts and procedures. Students are either rewarded or punished depending on whether they get that one answer right according to the answer keys or marking schemes. Such a testing format is aligned with the behaviorist learning theory that promotes the use of rewards to reinforce positive behaviors and of sanctions to remove negative behaviors.
Both summative and international assessments are high stakes because student achievement data derived from these assessments are used for making important decisions or policies, which may lead to unintended consequences for students, teachers, or school administrators. Oftentimes, teacher job performance is evaluated based on student performance on high-stakes assessments. In many high-performative education systems, teachers are held accountable by policy makers, parents, and school administrators for students’ performance. Such a high accountability demand has led to teachers’ tendency to teach to the content and format of state/provincial, national, or international assessments. For example, Koh and Luke’s (2009) large-scale empirical study of the quality of teachers’ assessment tasks in Singapore, one of the high-performative education systems in the world, has shown that worksheets and summative tests were two of the most commonly used assessment methods in the teaching of core subject areas such as English, mathematics, and science at both elementary and secondary levels. Teachers’ instructional practices were driven by preparing students for high-stakes examinations. As a result, the intended curriculum was reduced to a drill-and-practice of decontextualized factual and procedural knowledge.
Authentic assessments are characterized by open-ended tasks that require students to construct extended responses, to perform an act, or to produce a product in a real-world context—or a context that mimics the real world. Examples of authentic assessments include projects, portfolios, writing an article for newsletter or newspaper, performing a dance or drama, designing a digital artifact, creating a poster for science fair, debates, and oral presentations. According to Wiggins (1989), authentic tasks must “involve students in the actual challenges, standards, and habits needed for success in the academic disciplines or in the workplace” (p. 706). In other words, authentic tasks need to be designed to replicate the authentic intellectual challenges and standards facing experts or professionals in the field. Such assessment tasks are deemed able to engage and motivate learners when they perceive the relevance of the tasks to the real world or when they find that a completion of the tasks is meaningful for their learning.
The purpose of authentic assessment is to provide students with ample opportunity to engage in authentic tasks so as to develop, use, and extend their knowledge, higher-order thinking, and other 21st-century competencies. Authentic tasks are often performance-based and include complex and ill-structured problems that are well aligned with the rigorous and higher-order learning objectives in a reformed vision of curriculum (Shepard, 2000). Most professional challenges in the current and future workplace require individuals to strike a balance between individual and group achievement (Wiggins, 1989). The nature of authentic tasks enables students to learn how to achieve such a balance by engaging in independent learning of possible solutions and by collaborating with peers in a socially supportive learning environment over an extended period of time. As such, authentic tasks also support problem-based learning, inquiry-based learning, and other learner-centered pedagogical approaches. Productive discourse or extended communication in a social context is important in the process of arriving at solutions to problems. Hence students are able to “experience what it is like to do tasks in workplace and other real-life contexts” (Wiggins, 1998, p. 24). John Dewey, a prominent philosopher of education, underscored the importance of experience in education by arguing that learners cannot know something without directly experiencing it. Dewey inspired the use of the project method in his laboratory school at the University of Chicago from 1896 to 1904. The project method enabled children to reflect and examine critically at their prior beliefs or preexisting knowledge in the light of new experiences. Children were expected to learn content knowledge and procedural skills in a context that was relevant to their real-world lives. The context usually entails a complex, real-life problem or authentic project, with many levels of embedded problems and solutions. The project method was further defined as a “hearty purposeful act” by Kilpatrick (1918, p. 320) in his essay “The Project Method,” which became known worldwide.
Authentic tasks assess not only students’ authentic performance or work, but also their dispositions such as persistence in solving messy and complex problems, positive habits of mind, growth mindset, resilience and grit, and self-directed learning. Given that the use of scoring rubrics is a key component of authentic assessment, it enables the provision of descriptive feedback, self- and peer assessment using criteria and standards as in the form of holistic or analytic rubrics. It is important that students receive timely and formative feedback from the teacher and/or peers so that they are able to use the feedback to improve the quality of their performance or work. Such a formative assessment or assessment for learning practice has long been advocated in key assessment literature that urges teachers to use classroom assessment to support student learning or to promote a learner-centered classroom culture (e.g., Black & Wiliam, 1998; Shepard, 2000). From a social-constructivist learning approach (Shepard, 2000), the opportunities for productive discourse or dialogue in the process of collaborating with peers and of giving/receiving peer feedback in completing authentic tasks underscore the importance of co-construction of knowledge and meaning-making through socially supported interactions.
Since the 1990s, the social-constructivist learning theory has played a key role in the curriculum and assessment reform movement. The social-constructivist learning theory was named an emergent constructivist paradigm in Shepard’s (2000) reconceptualization of classroom assessment practice for the 21st century. The emergent constructivist paradigm was characterized by the shared principles of a reformed vision of curriculum, cognitive and constructivist/social-constructivist learning theories, and classroom assessment. The shared principles emphasize that all students can learn, and thus they must be given an equal opportunity to be exposed to intellectually challenging subject matter and assessment tasks that are aimed at developing their higher-order thinking, problem solving, and dispositions. The principles of classroom assessment in Shepard’s (2000) emergent constructivist paradigm are similar to those that characterize authentic assessment.
Criteria for Authenticity in Authentic Assessment
There is a substantial curriculum and assessment literature focusing on the features or characteristics of authentic assessment. The use of “features” and “characteristics” seems to suggest that an assessment or a task can be quantifiable for its authenticity. I prefer to use the term “criteria” to determine and describe the degree of authenticity of an assessment or a task. This section includes a review of the relevant literature on the criteria of authentic assessment.
According to Wiggins (1989, 1998), assessment is central to learning and must be linked to real-world demands. In these articles, some of the criteria for authentic assessment are overlapping. They can be summarized into eight criteria:
First, authentic assessment “is realistic” (Wiggins, 1998, p. 22). This means that the authentic task or tasks must replicate how a student’s knowledge, skills, and/or dispositions are assessed in a real-world context. In other words, the authentic task or tasks should replicate or simulate the real-world contexts in which adults are assessed in the workplace, in social life, and in personal life. This enables students to experience what it is like to work or perform in real-life contexts, which are often messy, ambiguous, and unpredictable. Such a “learning by doing” experience is in line with Dewey’s experiential education.
Second, the authentic task or tasks require students to make good judgments and be creative and innovative in solving complex and non-routine problems or performing a task in new situations. This enables the assessment of transferable skills to new tasks or contexts. In addition, students need to be competent and confident in using a repertoire of knowledge, skills, and dispositions to tackle and complete authentic tasks that are intellectually challenging. Hence, authentic tasks serve as an effective tool for assessing students’ demonstrations of critical thinking, complex problem solving, and creativity and innovation. These are some of the essential 21st-century competencies.
Third, an authentic assessment or task enables students to deeply engage in the subject or discipline through critical thinking and inquiry. Instead of rote learning and reproduction of facts and procedures, students need to be able to think, act, and communicate like experts in the subject or discipline. This is akin to Shulman’s (2005) signature pedagogies.
Fourth, in authentic assessment, students are given opportunities to rehearse, practice, look for useful resources, and receive timely quality feedback so as to improve the quality of performance or product. Students also need to present their work publicly and be given the opportunity to defend it. This suggests that assessment for learning or formative assessment practice can be easily incorporated into authentic assessment.
Fifth, authentic tasks look for multiple evidences of student performance over time and the reasons or explanations behind the success and failure of a performance. In addition, both reliability and validity of judgment about complex performance depend upon multiple evidences gained over many performances across multiple occasions. To ensure fairness and equity, the teacher must be provided with informative data of students’ strengths and weaknesses at the end of each assessment. This will ensure that the teacher’s feedback is aimed at helping all students to make progress toward the standards.
Sixth, a multifaceted scoring system is used, and scoring criteria must be transparent. Sharing of scoring criteria explicitly with students will enable them to understand and internalize the criteria of success.
Seventh, student self-assessment must play a pivotal role in authentic assessment.
Finally, the reliability or defensibility of teachers’ professional judgment or scoring of student performance or work is achieved through social moderation, in which teachers of the same subjects gather to set criteria and standards for scoring, and to compare their scores (Klenowski & Wyatt-Smith, 2010).
Authentic achievement rather than authentic assessment is used in Newmann and Archbald (1992). They identify three criteria or standards for authentic achievement, namely, construction of knowledge, disciplined inquiry, and value beyond school. In their later work, Newmann et al. (1996) and Newmann, Bryk, and Nagaoka (2001) have used the term “criteria for authentic intellectual work” instead of “standards for authentic achievement.” Definitions of the three criteria for authentic achievement are as follows:
Construction of Knowledge
This criterion clearly indicates that students need to engage in construction or production of knowledge instead of reproduction of knowledge. Construction of knowledge is expressed in written and oral discourse. Examples of construction of knowledge are writing an article for a newsletter, performing a musical piece of work, creating a poster for a science fair, completing a group project, and designing a digital portfolio. All of these authentic assessments require students to engage in higher-order thinking, problem solving, communication, and collaboration. At the same time, students also need to present and defend their work in public.
This criterion suggests that students need to be actively involved in critical inquiry within academic subjects or professional disciplines. Disciplined inquiry consists of three main components: prior knowledge base, in-depth understanding, and elaborated communication (Newmann et al., 2001, p. 15). Students’ authentic performance is built on their prior knowledge in a subject or discipline. To engage in critical inquiry, students need to be able to tap into their prior knowledge base or the content knowledge that they have acquired before. The prior knowledge base or previously learned content knowledge includes facts, terminologies, vocabularies, concepts, theories, algorithms, procedures, and conventions. In-depth understanding refers to the ability to probe deeper into a problem and to organize, interpret, analyze, evaluate, and synthesize different types of knowledge or information that can be used to solve the problem. In-depth understanding helps students to engage actively in intellectual discourse or in making extended communication to explain their solutions to the problem. All experts or professionals in a subject or discipline are expected to use sophisticated forms of written and oral communication (i.e., elaborated communication) to carry out their work and to express their solutions to problems.
Value Beyond School
This criterion underscores the importance of having a value dimension in assessment tasks. To be intrinsically motivating for students, authentic tasks must have aesthetic, utilitarian, or personal value in the eyes of the learner.
Newmann et al. (1996) have pointed out that all three of these criteria are necessary for assessing the authenticity of student performance across grade levels and subject areas. They aptly stated that “construction of knowledge through disciplined inquiry to produce discourse, products, or performance that have value beyond success in school can serve as a standard of intellectual quality for assessing the authenticity of student performance” (Newmann et al., 1996, p. 287). However, they also cautioned that not all instructional activities and assessment tasks will meet all the three criteria at all times.
Building upon the three criteria of authentic achievement, Newmann et al. (1996) have further developed seven criteria for assessing the intellectual quality of assessment tasks. The criteria are organization of information, consideration of alternatives, disciplinary content, disciplinary process, elaborated written communication, problem connected to the world, and audience beyond the school. Organization of information and consideration of alternatives reflect the importance of assessing students’ higher-order thinking or critical thinking in solving real-world problems. Disciplinary content emphasizes students’ ability to engage in critical inquiry into the ideas, theories, and perspectives central to their academic subject or professional discipline, while disciplinary process refers to the ability to use sound methods of inquiry, research, and communication, which is central to their academic subject or professional discipline. The use of elaborated written communication suggests that authentic tasks must involve students in using extended communication or sustained writing to express deep understanding and problem solving. The last two criteria, namely, problem connected to the world and audience beyond the school, indicate that assessment tasks need to expose students to the real-world issues or problems that they encounter in their daily lives or are likely to encounter in their future colleges, workplaces, and lives.
Gulikers et al. (2004) have proposed five criteria for defining authentic assessment in the context of professional and vocational training. Similar to Wiggins (1989) and Newmann and Archbald (1992), they contend that authenticity of assessment is a multifaceted concept. In determining the authenticity of an assessment, there is a need to take into account students’ perceptions of authenticity. In other words, students’ perceptions of the meaningfulness or relevance of the assessment is central to the determination of authenticity. The five criteria for authenticity or dimensions of authenticity are task, physical context, social context, assessment form, and criteria (Gulikers et al., 2004). The criteria are summarized below:
Using Messick’s (1994) question of authentic to what, Gulikers et al. (2004) have argued that the degree of authenticity of an assessment or a task is measured against a criterion situation. According to them, “a criterion situation reflects a real-life situation that students can be confronted with in their work placement or future professional life, which serves as a basis for designing an authentic assessment” (Gulikers et al., 2004, p. 75). Therefore, an authentic assessment task should resemble the complexity of the knowledge, skills, and dispositions required in the criterion situation. And students should see the relevance or meaning of their performances on the authentic task to their future professions. The degree of authenticity of an assessment task can further be determined by whether the task requires multiple solutions and whether it is ill-structured and involves multiple disciplines.
In this criterion, three components are identified by Gulikers et al. (2004) to determine the degree of authenticity of an assessment: similarity to the professional work space (fidelity), availability of professional resources (methods/tools/materials, relevant or irrelevant information), and time given to complete the assessment task. Sufficient time for the completion of a task is important so that students’ thinking and acting will not be restricted by time constraints. Many professional activities in real life involve planning and execution of tasks over an extended period of time.
The social processes of an authentic assessment must resemble those of a professional context. If the professional context or real-life situation requires collaboration with peers in solving problems, then the assessment should also involve students in collaboration and problem solving. However, it is important to note that if a professional context or real-life situation typically requires individual work then the assessment should not enforce collaboration. In other words, fidelity of the social processes in authentic assessment to those in a real-life situation is essential.
The authenticity of assessment form is determined by the degree to which students are observed for their demonstrations of competences when performing on a task or creating a product. The observation will enable an inference about students’ competences in future professional contexts. The authenticity of the form of assessment also depends on the use of multiple tasks and indicators of learning. This is similar to Wiggins’s (1989) multifaceted scoring system, which emphasizes the use of multiple evidences of student performance. Many measurement and assessment experts also advocate for the use of multiple methods or tasks and multiple indicators of learning to ensure the accuracy, fairness, reliability, and validity of professional judgment about student performance (Messick, 1994; Shavelson, Baxter, & Gao, 1993; Wiggins, 1989). Hence, students’ professional competence should neither be assessed by a single task nor be judged based on a single performance.
Scoring criteria used in authentic assessment should be based on criteria used in professional practice or a real-life situation. In addition, scoring criteria should concern the development of relevant professional competence, which means that assessment of students’ learning progression is an important practice in the context of authentic assessment. Similar to Wiggins (1989), Gulikers et al. (2004) have argued that scoring criteria must be transparent and be shared explicitly with students to facilitate their learning. Hence, criterion-referenced rubrics should be used to judge students’ performance or work in authentic assessment.
Research in Authentic Assessment
Since the 1990s, research in authentic assessment was focused on three themes: authentic assessment in educational or school reforms, teacher professional learning or development in authentic assessment, and authentic assessment as tools or methods used in a variety of subjects or disciplines in K‒12 schooling and in higher education institutions. Among these three themes, most studies were focused on the role of authentic assessment in educational or school reforms. Due to space limitations, only key studies concentrating on authentic assessment in educational or school reforms have been reviewed.
Authentic Assessment in Educational or School Reforms
Since the late 1990s, authentic assessment has become a key lever for educational or school reforms that aim to develop students’ 21st-century competencies and prepare them for a global knowledge-based economy in a technologically connected world. In the curriculum frameworks of many education systems, there is a shift from low-level learning outcomes (e.g., factual knowledge and procedural skills) to higher-order learning outcomes (i.e., higher-order thinking, problem solving, and other essential 21st-century competencies). Likewise, teachers have been urged to move toward the use of social-constructivist, learner-centered pedagogy, authentic assessment, and formative assessment. Such changes have resulted in a substantial body of research focusing on teachers’ assessment practices and building teachers’ capacity in classroom assessment.
In the United States, Newmann and his associates (Newmann et al., 1996; Newmann et al., 2001) have conducted empirical studies to examine the impact of authentic pedagogy on student performance in Chicago public elementary schools. The focus of Newmann et al.’s (1996) study was to determine the relationship between authentic pedagogy and student performance in schools that used authentic pedagogy as a school reform initiative. Authentic pedagogy was comprised of authentic instruction and authentic assessment based on the criteria for authentic intellectual work. The study involved teachers who taught mathematics and social studies in three different grades ranging from elementary schools to high schools. Data included classroom observations of the teachers’ daily lessons and analyses of the assessment tasks and students’ written responses to the tasks that were embedded within the lessons. The data were analyzed using the criteria for authentic intellectual work. Student responses to the assessment tasks were used as evidence of student performance.
Most studies on educational or assessment reforms have often used standardized test scores as an indicator of improved student learning even when an educational innovation involves a new form of assessment. Student responses to tasks or student work samples are embedded within teachers’ instructional practices and hence serve as a better indicator of student performance. Newmann et al. (1996) found that authentic pedagogy was strongly associated with students’ authentic academic performance at all grade levels in both mathematics and social studies. Students who were exposed to assessment tasks with high intellectual demands demonstrated higher authentic performance than students who did not have the same exposure. In addition, the effects of authentic pedagogy were found to be equitably distributed among students of diverse social backgrounds, indicating that all students should have an equal access to the standards of intellectual quality. The findings suggest that student performance is dependent on the quality of teachers’ assessment tasks, and authentic assessment can play a pivotal role to raise the quality of students’ learning and performance irrespective of their gender, ethnic group, and socioeconomic status. Authentic assessment can serve as a powerful mechanism to ensure equitable learning opportunities and outcomes for all students.
In a second study, Newmann et al. (2001) examined the effects of authentic assignments or assessments on students’ authentic intellectual work in the day-to-day classroom and students’ achievement in high-stakes standardized tests. Samples of classroom assignments were collected from 19 elementary schools in Chicago. The study involved approximately 5,000 students and their teachers in grades 3, 6, and 8. These grades were purposefully selected because of the relevance of using test scores from both the statewide and national testing programs. This allowed the researchers to “link teacher assignments both to student performance on state tests of reading, writing, and mathematics and to results from the national norm-reference tests of reading and mathematics” (Newmann et al. 2001, p. 16). In addition to test scores, teacher assignments in writing and mathematics were analyzed for their intellectual demands. A group of teachers from the Chicago public schools were trained to judge the quality of teacher assignments using scoring rubrics that consisted of the criteria for authentic intellectual work. Newmann et al. (2001) found that when teachers organized instruction around authentic assignments, students not only produced more authentic, intellectually complex work but also gained greater scores in both statewide and national tests in reading and mathematics. Similar results were noted in some very disadvantaged classrooms. Newmann et al. (2001) also pointed out that the intellectual demands in teacher assignments or assessment tasks played a far more important role than a particular teaching strategy or pedagogical method to influence student engagement in learning. Hence, professional development for teachers should focus on their capacity in designing and using curriculum materials and classroom assessments that include high authentic intellectual challenge.
Newmann et al.’s (1996) work, originating in the United States, has been adapted and expanded in the Queensland School Reform Longitudinal Study (Lingard et al., 2001). The criteria for authentic intellectual work provided the basis for the Queensland model of productive pedagogies, assessment, and performance (Lingard et al. 2001). In Lingard et al.’s (2001) criteria for productive assessment, the three Newmann criteria of authentic intellectual work were extended to include knowledge criticism, technical metalanguage, inclusive knowledge, and explicitness of expectations as new indicators. Similar to Newmann et al.’s (1996) authentic pedagogy, productive pedagogies were intellectually demanding, connected to the real world, supportive of student learning, and diversity valuing. Lingard et al. (2001) found that the levels of intellectual or cognitive demand of teachers’ assessment tasks were positively associated with the quality of students’ performance as evidenced in students’ written work. This important finding has led to the New Basics trial of curriculum in grades 1‒9 in Queensland schools. The New Basics curriculum was aligned with productive pedagogies and rich tasks (i.e., authentic tasks). The trial yielded positive outcomes. As such, the use of rich tasks and teacher-moderated judgment of students’ work in response to rich tasks have become exemplary assessment practices in many Queensland schools. Such exemplary assessment practices are applauded by policy makers, school administrators, educators, and researchers around the globe. This has led to the Core 1 Pedagogy and Assessment project in Singapore (Luke, Freebody, Lau, & Gopinathan, 2005).
Both the Newmann et al. (1996) and Lingard et al. (2001) studies served as the basis for Koh and Luke’s (2009) study of Singaporean teachers’ assessment practices. As one of the world’s high-performing education systems, Singapore has launched a variety of educational reforms since the beginning of the 21st century. Like their counterparts in other developed countries, Singaporean teachers have been urged to implement new forms of assessment (i.e., authentic assessment and formative assessment) to capture higher-order learning outcomes in the intended curriculum. The Koh and Luke study was conducted to examine Singaporean teachers’ assessment practices as well as the quality of teachers’ assessment tasks and the quality of students’ work in grades 5 and 9 in seven subject areas: English, social studies, mathematics, sciences, Mandarin Chinese, Malay, and Tamil. It was the first large-scale empirical study of teachers’ assessment practices and the data were drawn from a representative sample of Singaporean classrooms. Following the framework of Newmann et al. (1996) and the work from Anderson and Krathwohl (2001), Marzano (1992), and Nitko (2004), Koh and Luke (2009) have devised nine criteria for assessing the quality of teachers’ assessment tasks and six criteria for assessing the quality of students’ work in response to the assessment tasks.
The nine criteria for assessment tasks were depth of knowledge, knowledge criticism, knowledge manipulation, sustained writing, task clarity and organization, connections to the real world beyond the classroom, supportive task framing, student control, and explicit performance standards or marking criteria. The six criteria for assessing the quality of students’ work included depth of knowledge, knowledge criticism, knowledge manipulation, sustained writing, quality of students’ writing or answers, and connections to the real world beyond the classroom (Koh, 2011a).
Brief descriptions of the criteria are as follows:
Depth of Knowledge
According to the revised Bloom’s taxonomy of intended student learning outcomes, there are three types of knowledge, namely, factual knowledge, procedural knowledge, and advanced concepts or conceptual knowledge (Anderson & Krathwohl, 2001). Factual knowledge is knowledge of discrete and decontextualized content elements (i.e., bits of information), while procedural knowledge entails knowledge of using discipline-specific skills, rules, algorithms, techniques, tools, and methods. Conceptual knowledge involves knowledge of complex, organized, and structured knowledge forms (e.g., how a particular subject matter is organized and structured, how the different parts or bits of information are interconnected and interrelated in a more systematic manner, and how these parts function together). All three types of knowledge are essential for student learning.
Based on models of critical literacy and critical pedagogy, knowledge criticism is a predisposition to the generation of alternative perspectives, critical arguments, and new solutions or knowledge (Luke, 2004). Knowledge criticism enables students to judge the value, credibility, and soundness of different sources of information or knowledge through comparison and critique rather than to accept and present all information or knowledge as given.
Knowledge manipulation calls for an application of higher-order thinking and reasoning skills in the reconstruction of texts, intellectual artifacts, and knowledge. It involves organization, interpretation, analysis, synthesis, and/or evaluation of different sources of knowledge or information (Anderson & Krathwohl, 2001). Authentic assessments or tasks should provide students with more opportunities to make their own hypotheses and generalizations in order to solve problems, arrive at conclusions, or discover new meanings, rather than only to reproduce information expounded by the teacher or textbooks, or to reproduce fragments of knowledge and preordained procedures.
This criterion aims to gauge the degree to which the assessment task requires and generates production of extended chunks of prose. Authentic assessments or tasks must ask students to elaborate on their nuances/understanding, explanations, arguments, or conclusions through the generation of sustained written prose.
Task clarity and organization, student control, and explicit performance standards or marking criteria are conceptualized based on Marzano’s (1992) learning-centered instruction. The assumption here is that the explicitness of the procedures and criteria for the assessment task provides clear goals and explicit criteria and language for the assessment of value. The incorporation of these criteria into the classroom assessment provides students with ample opportunity to engage in formative assessment or assessment for learning, which contributes to their self-directed learning, independent learning, and critical thinking.
Task Clarity and Organization
The assessment task is framed logically and has instructions that are easy to understand so that students will not have misinterpretations and missing information. The written instructions, guidelines, worksheets, and other textual advanced organizers must be clear and well organized.
Connections to the Real World Beyond the Classroom
This criterion assesses the degree to which the assessment task and affiliated artifacts were connected to an activity, function, or task in a real-world situation.
Supportive Task Framing
Teachers’ scaffolding of an assignment or assessment task—that is, providing some structure and guidance—can assist students to accomplish a complex task (Nitko, 2004). There are three types of scaffolding: content, procedural, and strategic. For highly intellectual tasks, teachers should place more emphasis on strategic scaffolding.
Teachers provide students with the opportunity to determine the parameters of a task such as topics or questions to answer, alternative procedures, tools and resources to use (e.g., textbook, Internet, or newspaper), length of writing or response, or performance or marking criteria.
Explicit Performance Standards/Marking Criteria
The assessment task is provided with the teacher’s clear expectations for students’ performance and the marking criteria are made explicitly clear to the students. Reference to only technical or procedural requirements (e.g., the number of examples, length of an essay or response) is not taken as evidence of explicit performance standards or marking criteria. This criterion underscores the importance of sharing scoring criteria with students explicitly, which is also a key criteria espoused by Wiggins (1989) and Gulikers et al. (2004). To ensure fairness and equity, students need to know in advance the specific and differentiated criteria for what may count as “value,” quality, or success at completion of the task.
Given that the theoretical underpinnings of the criteria for assessing the quality of students’ work are similar to those for teachers’ assessment tasks, they will not be repeated here. Readers who are interested in the criteria and indicators used to judge the quality of teachers’ assessment tasks and students’ work across different subject areas can refer to Koh (2011a).
Future Research: The Remaining Questions
In the context of professional and vocational training, Gulikers, Bastiaens, Kirschner, and Kester (2008) have argued that the notion of authenticity is subjective and students’ perceptions of the authenticity of an assessment or a task can influence the quality of their learning. Their study has shown that there is a difference between teachers’ and students’ perceptions of assessment authenticity. As such, it is important to take into account students’ perceptions of meaningfulness or relevance of an assessment or a task to their real-life situations. Further, this finding also supports another crucial aspect of authentic assessment task design, that is, students must be involved in the process of determining and negotiating the assessment or task parameters (i.e., student control).
There has been a substantial body of research in teacher professional learning and development in classroom assessment or formative assessment. Many of them have focused on formative assessment or assessment for learning and models of effective professional development. Koh (2011b) has conducted a two-year intervention study with a group of elementary teachers in Singapore, to examine the effects of ongoing, sustained professional development in authentic assessment task design on the teachers’ assessment literacy, specifically teachers’ capacity in designing and implementing authentic assessment tasks. To enhance teachers’ understanding and internalization of the criteria for authentic intellectual quality in designing authentic tasks, only five of the key criteria were used in Koh’s (2011b) study: depth of knowledge, knowledge criticism, knowledge manipulation, sustained writing or extended communication, and connections to the real world beyond the classroom. The study has demonstrated positive results in improving teachers’ assessment literacy through ongoing, sustained professional development in authentic assessment task design in English, mathematics, and science at the elementary school level. In addition, in-depth interviews with the participating teachers have shown that their conceptions of authentic assessment have greatly improved toward the end of the two-year professional development. In a second study, Koh, Burke, Luke, Gong, and Tan (in press) found that Chinese language teachers had difficulty to incorporate certain knowledge manipulation criteria into their assessment tasks despite a quick grasp of the design principles of authentic assessment.
Webb (2009) has called for professional development in mathematics education to focus on “helping teachers to develop a ‘designers’ eye’ for selecting, adapting, and designing tasks to assess student understanding” (p. 3). Although the term “authentic assessment” was not directly used by Webb (2009), we can make inferences that authentic assessment is the most effective way of assessing student understanding across different subjects or disciplines.
Given that teachers need to have a “designers’ eye” (Webb, 2009, p. 3) or to be critical and intelligent consumers of high-quality authentic assessment or performance assessment, it is important for professional development and teacher education programs to provide both inservice and preservice teachers with ample opportunity to engage in authentic assessment task design and analysis of student work. For future research, the remaining questions should focus on building teachers’ capacity in authentic assessment and assessment for learning through a critical inquiry approach in school-based professional learning community or in teacher education programs. According to Wyatt-Smith and Gunn (2009), the critical inquiry approach refers to teachers’ ability to reflect on and understand the assessment processes and practices in actual sociocultural contexts in relation to four important lenses: (1) conceptions of the knowledge domains and competencies to be assessed; (2) conceptions of the alignment between assessment, teaching, and learning, and its enactment in practice; (3) teacher judgment practices in relation to standards, assessment task design, student work samples, and social moderation; and (4) curriculum literacies or discipline-specific language demands. To enable the power of authentic assessment to unfold in the classrooms of the early 21st century, it is essential that teachers are critical designers and reflective practitioners of classroom assessment tasks that support student learning and mastery of the 21st-century competencies.
Teachers’ capacity to design and implement authentic assessment is of paramount importance in the current era of competency-based education. In fact, authentic assessment has been used in International Baccalaureate programs and has also been incorporated into school-based assessments in several high-performing nations on PISA. The nations are Singapore, Hong Kong, Finland, and Australia. However, it is worth noting that the success of authentic assessment initiatives can be hindered by changes in school leadership or governmental policies. For example, the No Child Left Behind (NCLB) Act of 2001 in the United States and the National Assessment Program—Literacy and Numeracy (NAPLAN, 2008) in Queensland, Australia have posed challenges to school-based, teacher-moderated assessment due to an overemphasis on “back to basics” and high-stakes accountability testing of students’ academic achievement.
In 2010, the launch of the Common Core Standards in the Unites States brought significant changes in curriculum, assessment, and instruction. The standards define the 21st-century knowledge, skills, and dispositions students should have mastered within their K‒12 education so that they are well prepared for achieving their academic and careers aspirations as well as personal well-being in an increasingly complex and competitive world. Ideally, the Common Core Standards have created opportunities for the development and implementation of authentic assessments or performance assessments in English language arts and mathematics. However, a heavy focus on the use of student assessment data for accountability purposes has led to a push back from state governments, teachers, and parents. A lack of teacher autonomy in the design and use of assessments to help students achieve the 21st-century educational outcomes has defeated the original purpose of the Common Core. Hence, it is important for policy makers in the US and other countries to model the Finnish education system, in which teachers are given full autonomy to develop and implement classroom assessments that support student learning.
- Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman.
- Archbald, D., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school. Reston, VA: National Association of Secondary School Principals.
- Arter, J. (1999). Teaching about performance assessment. Educational Measurement: Issues and Practice, 18(2), 30–44.
- Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74.
- Cumming, J. J., & Maxwell, G. S. (1999). Contextualising authentic assessment. Assessment in Education: Principles, Policy & Practice, 6(2), 177‒194.
- Darling-Hammond, L., & Adamson, F. (2010). Beyond basic skills: The role of performance assessment in achieving 21st century standards of learning. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.
- Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and Teacher Education, 16, 523‒545.
- Gulikers, J. T. M., Bastiaens, T. J., & Kirschner, P. A. (2004). A five-dimensional framework for authentic assessment. Educational Technology Research and Development, 52(3), 67–86.
- Gulikers, J. T. M., Bastiaens, T. J., Kirschner, P. A., & Kester, L. (2008). Authenticity is in the eye of the beholder: Student and teacher perceptions of assessment authenticity. Journal of Vocational Education & Training, 60(4), 401‒412.
- Kilpatrick, W. H. (1918). The project method. Teachers College Record, 19, 319–335.
- Klenowski, V., & Wyatt-Smith, C. (2010). Standards, teacher judgement and moderation in contexts of national curriculum and assessment reform. Assessment Matters, 1, 84–108.
- Koh, K. (2011a). Improving teachers’ assessment literacy. Singapore: Pearson Education South Asia.
- Koh, K. (2011b). Improving teachers’ assessment literacy through professional development. Teaching Education, 22(3), 255‒276.
- Koh, K. (2014). Authentic assessment, teacher judgment and moderation in a context of high accountability. In C. Wyatt-Smith, V. Klenowski, & P. Colbert (Eds.), Designing assessment for quality learning (Vol. 1, pp. 249‒264). Dordrecht, The Netherlands: Springer.
- Koh, K., & Luke, A. (2009). Authentic and conventional assessment in Singapore schools: An empirical study of teacher assignments and student work. Assessment in Education: Principles, Policy & Practice, 16(3), 291‒318.
- Koh, K., Burke, L. E. C. A., Luke, A., Gong, W., & Tan, C, (in press). Developing the assessment literacy of teachers in Chinese language classrooms: A focus on assessment task design. Language Teaching Research.
- Ladwig, J. (2009). Working backwards towards curriculum: On the curricular implications of quality teaching. Curriculum Journal, 20(3), 271‒286.
- Lingard, B., Ladwig, J., Mills, M., Bahr, M., Chant, D., & Warry, M. (2001). The Queensland School Reform Longitudinal Study. Brisbane: Education Queensland.
- Luke, A. (2004). Two takes on the critical. In B. Norton & K. Toohey (Eds.), Critical pedagogies and language learning (pp. 1–14). Cambridge, U.K.: Cambridge University Press.
- Luke, A., Cazden, C., Lin, A., & Freebody, P. (2005). A coding scheme for the analysis of Singapore classrooms. Technical paper. Singapore: Centre for Research in Pedagogy and Practice.
- Luke, A., Freebody, P., Lau, S., & Gopinathan, S. (2005). Towards research-based innovation and reform: Singapore schooling in transition. Asia Pacific Journal of Education, 25(1), 5–28.
- Marzano, R. J. (1992). A different kind of classroom: Teaching with dimensions of learning. Alexandria, VA: The Association for Supervision and Curriculum Development.
- Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessment. Educational Researcher, 23(2), 13–23.
- Meyer, C. (1992). What’s the difference between authentic and performance assessment? Educational Leadership, 49(8), 39–40.
- Newmann, F. M., & Archbald, D. A. (1992). The nature of authentic academic achievement. In H. Berlak, F. M. Newmann, E. Adams, D. A. Archbald, T. Burgess, J. Raven, & T. A. Romberg (Eds.), Toward a new science of educational testing and assessment (pp. 71–84). Albany: State University of New York Press.
- Newmann, F. M., Bryk, A. S., & Nagaoka, J. K. (2001). Authentic intellectual work and standardized tests: Conflict or coexistence? Improving Chicago’s schools. Chicago: Consortium on Chicago School Research.
- Newmann, F. M., Marks, H. M., & Gamoran, A. (1996). Authentic pedagogy and student performance. American Journal of Education, 104(4), 280–312.
- Nitko, A. J. (2004). Educational assessment of students (4th ed.). Upper Saddle River, NJ: Pearson/Merrill Prentice Hall.
- Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature. Practical Assessment, Research & Evaluation, 13(4), 1‒10.
- Shavelson, R. J., Baxter, G. P., & Gao, X. (1993). Sampling variability of performance assessments. Journal of Educational Measurement, 30(3), 215–232.
- Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4‒14.
- Shepard, L., Hammerness, K., Darling-Hammond, L., Rust, F., Snowden, J. B., Gordon, E., … Pacheco, A. (2005). In L. Darling-Hammond & J. Bransford (Eds.), Preparing teachers for a changing world (pp. 275–326). San Francisco: John Wiley.
- Shulman, L. S. (2005). Signature pedagogies in the professions. Daedalus, 134(3), 52–59.
- Webb, D. C. (2009). Designing professional development for assessment. Educational Designer, 1(2), 1–26.
- Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 70(9), 703‒713.
- Wiggins, G. (1998). Educational assessment: Designing assessments to inform and improve student performance. San Francisco: John Wiley.
- Wyatt-Smith, C., & Gunn, S. (2009). Towards theorising assessment as critical inquiry. In J. J. Cumming & C. Wyatt-Smith (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 83–101). London: Springer.