Show Summary Details

Page of

Printed from Oxford Research Encyclopedias, Anthropology. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 06 May 2021


  • Mary Odell ButlerMary Odell ButlerUniversity of Maryland at College Park


Evaluation anthropology is an organic synthesis of evaluation and anthropology in which each reinforces the other. Anthropology contributes the theory of culture as a primary mode of human adaptation, ethnography, and a methodology that is sensitive to the context embeddedness of human activity. The evaluation side adds the rigorous science needed to evaluations to be credible to decision makers. These include analyses and conclusions in evidence that can be linked to evidence and to develop a rationale that permits evaluation users to reconstruct the arguments underlying conclusions. Program evaluations are concerned with the value of human interventions in achieving important social goals. They form the evidence base for maintaining, changing, or eliminating programs, decisions that affect communities, careers, finances, and the welfare of both staff and clients.

There are special ethical concerns in evaluations because of the risk to program staff providing value information about their agencies and programs. Confidentiality is critical because programs are a “small world” in which opinions and speech mannerisms can permit identification by those who work in the same or similar programs. Thus, evaluators must be cautious about the use of quotes, position descriptions, and attributions to avoid professional damage to program staff and clients.

Program evaluation has become an important field for practicing anthropologists. Anthropologists who wish to do evaluations should network with other anthropologists doing evaluations as well as organizations that employ evaluators, read and attend seminars on evaluation theories and methods, and consider adding skills such as network analysis, economics, and decision theory that will add to their value as members of evaluation teams.


This article describes the theoretical and methodological basis of evaluation and links it to the practice of anthropology. It describes an approach that combines the ethnographic basis of cultural anthropology with the scientific assessments of activities from the profession of evaluation. The long field experiences of the lone anthropologist characteristic of traditional ethnography have become less accessible to anthropologists in recent times because of funding limitations and the movement of anthropologists out of the academic contexts that supported such research (Trouillet 1991, 19). Evaluation provides a strategy to investigate culture that is connected to contemporary problems, deepens the anthropological understanding of how culture operates, utilizes the special skills of ethnography, and produces useful output to guide individuals, communities, and organizations as they seek to meet human needs under changing conditions.

Definition and Features

Evaluation is a scientific endeavor conducted for the purpose of describing the worth, value, and effectiveness of some activity directed to serving a human need or solving a human problem (Shadish, Cook, and Leviton 1991, 20). Evaluands, the term for the entity being evaluated, may be programs, projects, products, media campaigns, and curricula—almost anything that is delivered to communities of humans. Evaluation is defined by usefulness to a client. It is oriented to action and generates knowledge for decision making (Centers for Disease Control and Prevention 1999). It is applied research. Although experimental methods and data from medical assessments may support evaluation, the field itself is not to be confused with clinical trials or medical evaluation.

Evaluation calls on research skills that are familiar to most who are trained in a social science field, including anthropology. To do an evaluation, one must develop a research question, often called an evaluation question by evaluators. Then one goes on to construct a research protocol describing the research in detail that guides evaluation activities, including data collection, data analysis, how findings are to be constructed from the data, and how results are to be reported. Instrumentation is part of the protocol. It may differ from traditional ethnographic fieldwork in that it frequently integrates several disciplines and often uses mixed data collection and analysis methods.

Evaluation is a profession with its own principles, competencies, and standards.1 It has a body of method and theory of its own, not deducible from any other discipline (Patton 2015; Rossi, Lipsey, and Freeman 1999; Weiss 1997). To become an evaluator, one must take courses, read the literature, and participate in professional meetings. But once discovered, evaluation can become an arena in which to do anthropology. I was impressed with the scientific rigor of evaluation and its flexibility as a way of understanding why the tasks and activities that people try to do succeed or fail.

There are many things—programs, products, projects—that are the subject of evaluation. All evaluations are derived from similar intellectual foundations. They share similar methods and many theoretical orientations. The distinctions among types of evaluations affect the structure and tasks of people’s jobs and the kinds of clients or organizations for whom they work. For this article, the focus is on program evaluations. Programs are organized sets of activities designed to achieve some purpose or meet some need in communities and are usually administered by government agencies, schools and universities, charitable organizations, religious organizations, or community groups.

Conditions governing a successful program are developing one that can work (efficacy), implementing it correctly (program implementation fidelity), delivering it to the right people (as specified in the program plan), and determining whether it achieves what it was supposed to achieve in the short and long run (outcome and impact evaluation). Some kinds of program evaluations are evaluability assessment, process evaluation, outcome or impact evaluations, and cost-effectiveness studies.

Evaluability assessment asks if the program is well-enough designed and coherent enough so that it can be implemented correctly (Rossi et al. 1999, 157). If it is not, there is no sense in evaluating the program, since one cannot determine whether success or failure has anything to do with its design or its correct implementation.

Process evaluation seeks to discover if the program is being implemented correctly and completely. Part of process evaluation may be regular monitoring of implementation. Without effective process evaluation, you cannot attribute the outcomes that you observe to the program (Rossi et al. 1999, 176). Outcome or impact evaluation occurs when, given evaluability and a favorable process evaluation, the evaluator wishes to assess the degree to which the program has achieved its desired outcome in the short term and long term (Chen 2005, 195).

Cost-effectiveness evaluation often appears as an element in a more extensive evaluation design. It is used to estimate the worth of the evaluand, given merit (Rossi et al. 1999, 374). Put another way, even if the program is a good one, is it worth what it costs? There are many measurement issues in cost-effectiveness studies. Cost effectiveness is a fairly crude measure of program costs per unit outcome (e.g., the cost of an intervention per client served). Cost–benefit analysis is a much more rigorous measure that assumes estimators of direct (monetary) and indirect (e.g., opportunity costs, lost income) costs (Rossi et al. 1999, 72). Cost-effectiveness studies are more feasible and are usually enough for what is needed in most program evaluations (Butler 2015, 26).

Formative and summative evaluation are important terms often encountered in evaluation. These concepts come from the work of Michael Scriven and distinguish evaluations that generate input used to improve program design (formative) and those that demonstrate their effectiveness (summative). Formative evaluation is done to improve the program and may be done in the early days of implementation. Summative evaluation is done to determine the merit of the program in light of its outcomes. Scriven quotes from Robert Stake, another evaluator: “When the cook tastes the soup, that’s formative; when the guest tastes the soup, that’s summative” (Scriven 1991, 19).

Evaluation is used by organizations or communities to support program planning. Program planning and evaluation are often in the same office in government agencies. It is—at least theoretically—this feedback of evaluation into organizations that governs the fate of programs. In practice, the role of evaluation often tends to be politicized. For example, there are results that do not permit action, even if they are accurate. Cost-effectiveness data compiled as part of evaluations may tell funders to eliminate crucial public health programs (like immunizations) or close schools that are not showing good enough test results for what they cost. Such actions are not usually feasible because of the high cost of terminating them to the health and welfare of the populations that they serve.

Evaluation as Science

The basic scientific requirement for evaluation is specification of an evaluation question, developing evidence to answer the question using a clear and technically sound method, and analysis that grounds findings in evidence. Evaluations are designed to support programmatic and funding decisions that affect the professional careers of program designers and managers and that may involve large sums of money. People undertake evaluations because they need to know something to support a decision rather than because they want to test a theory (Butler 2015, 11).

The assertion that evaluation must be science is a controversial one. A vast body of literature in both anthropology and evaluation argues for naturalistic approaches that are grounded in the experience of participants rather than an a priori research design (Lincoln and Guba 1985, 70; Strauss and Corbin 1990, 43; Wolcott 2008, 69). Evaluation is applied research that must be grounded in rigorous science if clients are to trust it enough to act on it. Like all science, it must focus on testable hypotheses, evidence-based claims, falsifiability, and replicability (Andersen and Hepburn 2016).

Questions asked of evaluations should be specific enough and clear enough so that the scientist can develop criteria for knowing whether the predictions of the study are met or not met. Data become evidence when they are analyzed or summarized in such a way that they illustrate the analytic linkage of data to hypotheses. Logically hypotheses cannot be confirmed, only falsified, because it only takes one valid observation of falsity to falsify while all possible cases would be needed to confirm (Kraft 1974, 190; Popper 2014, 40). Finally, both the design and implementation of the evaluation must take into account several principles to protect the quality of the data (see Table 1).

Table 1. Quality Control Criteria for Evaluation Data



Construct validity

The concept underlying the research adequately represents the evaluand.

Internal validity

The evaluation tests for and demonstrates a likely causal relationship between the independent variables and the evaluand.

External validity

It is demonstrated that the results of the evaluation can or cannot be generalized to other instances of similar evaluands.


It is demonstrated that the methodology of the study can be repeated with similar outcomes.

Note: Modified from Yin, 2009, 40–45.

Not all of these criteria will be met completely by all evaluations. For example, external validity is not essential to an evaluation that is exploratory in nature, seeking to discover the relevant conditions affecting some evaluand. However, all of these criteria must be considered and built into the evaluation design and discussed in the reporting so that users of evaluation may be able to trust in the scientific soundness of the evaluation design and execution.

Scientific Paradigms in Evaluation

Both evaluation and anthropology lie on a continuum of understanding about the conditions of knowledge development that have emerged over the past few centuries (Kuhn 2012, 145). The entire continuum uses the scientific method. Paradigmatic differences arise in the position of scientists on the locus of causation in human behavior. Some of the dimensions of paradigms in social science are shown in figure 1. Approaches to research vary across these continua.2

Figure 1. Some of the dimensions of paradigms in social science.

These paradigms contain concepts, assumptions, appropriate questions, and useful answers. Both paradigms underlie how researchers, evaluators, and others study people and phenomena but the paradigms themselves are seldom directly examined; they just seem “right” to those who have been educated within the community that uses them. Paradigms become obvious only when they are pushed to the surface in opposition to competing paradigms (Kuhn 2012, 10–11). Social scientists in particular tend to be quite invested in paradigmatic positions. They are so fundamental to thinking that it is often difficult to articulate the paradigmatic assumptions underlying our research because they seem obvious to those who carry them. Anthropological archaeologists and biological anthropologists often are positivist in orientation. Cultural and social anthropologists are usually the most familiar with the constructivist position. Evaluators are found arguing both positions.

Positivists believe that there is an objective truth external to the observer that can be explored empirically using a set of rules encoded in the scientific method (Kuhn 2012, 10; Popper 1974, 13). Others (e.g., post-positivists, constructivists, interpretivists, naturalists) believe that reality is constructed by observers and participants in a situation and can only be discovered by exploration and cross-checking with those who experienced it (Clifford 1999, 8; Fox 1991, 1; Lincoln and Guba 1985, 71; Strauss and Corbin 1990). Different scholars use different terms for their views of the world, each differing slightly from the others. This article uses the word “constructivist” for consistency (Guba and Lincoln 1989, 124; Schwandt 1994, 125).

Positivists believe that knowledge is built by direct observation and is then tested against hypotheses about how the world operates. Perhaps most importantly, positivists assume that the scientific method is applied in the same way no matter what kind of problems are being considered, although the kinds of data collection, analysis, and reporting may differ. Positivist science focuses on close control of the methods used and questions asked and is often quantitative in approach. It is assumed that the application of the scientific method to social problems, as to those of the natural sciences, produces ever more agreement about observations, leading to a closer description of reality (Popper 2014, 108).

Those of the constructivist school believe that knowledge is indirect and filtered through the observer. Therefore, understanding is built up by inference from observation grounded in emergent properties of situations (Schwandt 1994, 125–126). Constructivist thinking is concerned with methods and questions that are context- and case-specific and are built on observation and careful documentation and analysis of what is observed.

Constructivists are trained in science and the constructive paradigm is often guided by scientific principles. However, they see social science as a different kind of inquiry, requiring a different epistemology than that used in the natural sciences. The investigator tries to surface an understanding of the relations among elements in a specific case or situation using inductive methods. The results of such inductive reason can never be conclusive, but properly analyzed and presented, they can be persuasive (Lincoln and Guba 1985, 101). The validity of qualitative research is assessed by such methods as agreement of multiple observers, triangulation across data sources, and consistency of findings across multiple researchers and multiple cases (Kidder and Fine 1987, 58; LeCompte and Schensul 2010, 195). The constructivist paradigm is often qualitative in approach, resting on teasing out the perception of participants on what is happening as well as symbols and relationships observed in the environment.

The Quantitative–Qualitative Issue

In evaluation, the relative utility or superiority of qualitative and quantitative approaches is a frequent topic of discussion proceeding from these basic epistemological positions. (House 1994, 15; Reichardt and Rallis 1994a). Quantitative methods focus on numeric data and analyze them using mathematical methods, most often statistics. Quantitative evaluations often try to infer the properties of populations from a subset of respondents chosen to be representative of some category of subject (Snedecor and Cochran 1967, 4).

Qualitative methods are descriptive of elements in narrative or text format and are analyzed using content analysis techniques (Miles and Huberman 1994; Noblit and Hare 1988; Yin, 2009). Quantitative methods are often used in approaches near the positivist end of the paradigm continuum, with more qualitative methods near the constructivist end. However, this is not a rigid rule. In fact, research designs that incorporate both kinds of data collection into mixed method approaches are very common among social scientists no matter where they lie on the theory spectrum.

Anthropologists are often invited onto evaluation teams because they are perceived to be expert in qualitative methods. Qualitative research is about description rather than enumeration. The basic assumption is that if situations are described in detail, what Clifford Geertz described as “thick description” (Geertz 1973, 6), the relationships among elements can be inferred based on logical rather than on statistical inference. Most qualitative research is dependent on the physical, social, and interpersonal context in which the observed entity is embedded. When context is “controlled out” or eliminated from consideration, the fundamental operations of the program’s and the evaluation’s focus may be lost. Qualitative evaluations seek understanding by carefully compiling information on attributes or characteristics of some unit of analysis—a program, a phenomenon, a population, or an individual—and producing findings based on observations and discussion with people. The unit of analysis for a qualitative study must be specified in such a way that all stakeholders can unambiguously understand what is or is not included in a study. Analysis of data is then conducted using content analysis or other diagnostic method that permits characterization and synthesis of findings across units of analysis (Marquart 1990; Miles and Huberman 1994; Yin 2009).

In evaluations that combine quantitative and qualitative methods, differences of opinion on the utility or superiority of quantitative and qualitative methods can bring up theoretical misunderstandings. For example, members of the team from different paradigms may approach evaluation design differently. However, because the limitations of qualitative and quantitative methodologies are different, their combined use in mixed methods evaluations may be stronger than either used alone. For example, Richardt and Rallis (1994b) cite a study of school change in which the case study revealed a type of school—a dynamic school—that successfully implemented changes, but the case study data could not demonstrate the number of schools demonstrating this set of characteristics. When the study incorporated quantitative analysis of a large U.S. Department of Education data set, the dynamic school pattern was actually widespread (Goldring and Rallis 1993). This kind of complementarity produces evaluations that can view programs from several perspectives, strengthening confidence in the findings.

For all practical purposes, paradigm differences matter less than collaboration around results. In studies that combine quantitative and qualitative approaches, results can be analyzed together provided both types of findings are analyzed in a manner that respects the differing analytic assumptions for each data source. For example, it is not advisable to count the number of times focus group members bring up some variable defined for quantitative data, nor is it justifiable to infer unmeasured motivations from regression coefficients. For qualitative and quantitative work to operate well together, both kinds of data must be analyzed in such a way that data of all types may be used as evidence (Greene and Caracelli 1997).

When it comes to selecting research methods, evaluation is a matter of identifying the best possible methods to apply to the situation under investigation and then implementing the chosen methods in a technically sound fashion (Patton 2018). In many of the evaluations in which the author has participated, programs have been assessed using a mixed method approach with both qualitative and quantitative components. Frequently, this approach includes a survey, an assessment of attitudes in a population, an ethnographic study of the design, and an implementation or assessment of outcomes. Substantive information on how the evaluand is oriented to demographic factors, how the population attitudes are distributed, and how many people have engaged with the program are central to generating and interpreting findings across mixed methods.

Not all evaluators agree on the rules for selecting methodologies to fit evaluation conditions. In 2002, the U.S. Department of Education promoted a methodological hierarchy that prioritized scientifically based approaches defined as random controlled trials (RCTs), experiments, and quasi-experimental research. The highest priority was for random-controlled trials in which subjects were randomly assigned to experimental and control groups prior to the initiation of any intervention with experiments, quasi-experiments ranked below them, followed by correlational studies, and case studies at the bottom.

There has been pushback from evaluation experts on the appropriateness of the Department of Education’s stance on the hierarchy of methods in evaluation. Holding theRCT as the “gold standard” omits the reasoning process that many experts call “evaluative thinking,” that is, consciously and scrupulously applying the most appropriate methodology available given the problem at hand (Vo and Archibald 2018.) RCTs and experiments are well understood by psychologists and medical researchers and may be the best method for determining outcomes of interventions with individuals. But these designs are seldom possible with programs because of the difficulty of controlling the study so that cross-contamination of experimental and control groups is avoided. These rigorous approaches are also prohibitively costly for evaluations where the unit of analysis is a program implementation or a community because they require samples large enough to detect significant differences. Finally, any systemic effects that may be there are controlled out of the research. Often these system effects are what evaluation is seeking to uncover.

No single conceptual approach is appropriate for all evaluations. Each type of evaluation has its own purpose as well as its strengths and weaknesses. Just as a survey is different from a set of focus groups, an ethnographic evaluation produces insights that differ from a quasi-experimental evaluation. If the client needs to know about the outcome or impact of an intervention on a population, a survey is essential, especially if statistical or quantitative evidence matters. If the client is interested in explaining why a particular program or process functions as it does, an ethnographic or qualitative evaluation is useful. To understand differences among different demographic groups, an experimental design for the evaluation is appropriate.

Evaluation History in the United States

I focus this account of the development of evaluation as a discipline on the United States, and thus, this account reflects only the author’s experience and not the importance of evaluation worldwide. Historically, evaluation is an American invention, and the vast majority of evaluations have been conducted in the United States (Connor 1985, 20). However, evaluation has become a worldwide practice and important in many countries (Chelimsky and Shadish 1997, 145; Chelimsky 1997, 54). Organizations are in place to provide professional support for evaluators across regions and across the world. For example, the International Organization for Cooperation in Evaluation (IOCE) has identified 158 voluntary associations for program evaluators in 110 countries.3 Another important trend is the appearance of indigenous evaluators working with their own people or as evaluators in non-indigenous settings. These evaluators are clearly an asset when working with their own groups, but they can also be found in all of the settings in which evaluations are done. They can sensitize non-indigenous evaluators to the various cultural matrices within which they collaborate with indigenous peoples (Cram 2018, 121).

The emergence of evaluation in the 20th century closely followed the development of social science more generally. Several trends in social policy in the United States formed a background for this field. They include the development of educational innovations from the 1930s to the 1950s, the War on Poverty in the 1960s, and the emphasis on accountability in the 1980s and 1990s (Shadish et al. 1991, 22). Qualitative work was traditionally the “handmaiden” of quantitative evaluations. People did qualitative studies to support the design and instrumentation of quasi-experimental evaluations.

The beginning of evaluation in the United States is linked to the work of Ralph Tyler in the Eight-Year Study of New York City Schools, begun in 1934 (Tyler 1991). The Eight-Year Study set out to evaluate the effectiveness of open, student-driven activities on educational outcomes. As an educational psychologist, Tyler’s perspective involved experimental uses of evaluation that could show measurable quantitative outcomes. However, he was aware of the role of socioeconomic differences on education outcomes. He either used this socioeconomic information impressionistically or as variables that classified responses into categories.

Experimental and quasi-experimental evaluation became important in the 1960s and early 1970s and is still critically important in evaluation (Cook and Campbell 1979). But the basic lineaments of this approach came from a particular intellectual context of experimental evaluation, one that focused on rigorous scientific methodology and measurement of program effects using mathematical models (House 1994; Shadish et al. 1991, 121). First, there was a widespread belief that all social problems could be solved by carefully thought-out social programming, particularly in the United States during the era of “the Great Society.” Social programming was never again as important and well-funded as it was under the Lyndon B. Johnson and Richard M. Nixon administrations. Pilot programs and demonstration programs were plentiful, and all of them required evaluation for accountability and effectiveness (Alkin and King 2016). The US federal government was focused on developing programs in housing, education, and employment to meet the needs of the poor. Such programs were easier to assess using experimental methods because demonstration projects were implemented under idealonditions—substantial funding, plenty of staff, and the time to incorporate evaluations as part of program design (Campbell and Stanley 1963; Scriven 1967).

During this period in the United States, the belief in a rational, scientific approach to accomplish almost any goal was a central cultural tenet. The first moon landing occurred in 1969. Computers were invented and put into widespread use in the 1960s, with people just beginning to glimpse their potential for use in social research. The microprocessor was invented in 1973. People believed in science. It was a time of great optimism and faith in the potential of science to solve not only technical problems, but social ones as well (Butler 2015, 31; Rossi et al. 1999, 15).

Evaluation theory began to emerge in the field of education in the 1940s, but program evaluation emerged in many different forms in connection with the federal War on Poverty in the 1960s. Qualitative and mixed-methods evaluations also began to “come into their own” at this time, eventually culminating in the kinds of evaluation that anthropologists are well-qualified to do (Chelimsky 1997, 61). Five evaluation approaches often used by evaluation anthropologists are described here: Michael Quinn Patton’s Utilization-Focused Evaluation (Patton 1997); Patton’s Developmental Evaluation (Patton 2011); Case Study Evaluation developed fully by Robert Yin (2009); Responsive Evaluations developed by Robert Stake (2006); and Empowerment Evaluation developed by David Fetterman, along with many others (Fetterman 1994; Fetterman, Rodrigues-Campos, and Zukoski 2018; Fetterman and Wandesman 2005). These are not the only approaches used by anthropologists to evaluate programs, and they may be used together or in combination with other approaches. All of them are suitable for mixed-method approaches.

Utilization-Focused Evaluation

Michael Quinn Patton’s Utilization-Focused Evaluation has been used widely since the late 1990s. Patton, trained as a sociologist, appreciates anthropological approaches and has published in the anthropological literature (Patton 2005). One of the most important aspects of his thinking about evaluation is his belief that people are the critical factor in making any program work.

Utilization-focused evaluations have a built-in user focus. Potential users participate in the design, implementation, and reporting of the evaluation results such that users have ownership of the evaluation from the beginning. Patton identifies those who are likely to be primary users of evaluation—program designers, program staff, and potential audiences for programs—and engages them in negotiation about the evaluation questions, the evaluation design, and the kind of process and information that will be useful to them. Patton’s approach is politically conscious. He is articulate on the manifest and hidden political impact of evaluation and insists that political contexts be unearthed (Patton 1997, 3).

Developmental Evaluation

Developmental Evaluation, also developed by Patton, takes into account the complex systems in which many programs are introduced and operate. Developmental evaluation supports ongoing innovation during the course of an evaluation, freeing evaluators from the generally held assumption that outcome or summative evaluations must freeze the evaluation design during summative evaluations (Patton 2011).

Developmental evaluations rely on complex adaptive systems (CAS) models when considering local programs as part of more extensive program designs. CAS are non-linear and dynamic, accommodating internal and bottom-up changes as well as external ones. They are very sensitive to initial conditions, so that a small change at the local level can reverberate through the system of multiple implementations as other parts of the system adapt to it (Lansing 2003). The CAS model is useful to evaluation because an innovation in a local program can and does often radiate to other entities and act adaptively to changes in context or other developments. The dynamic nature of the CAS accommodates these kinds of changes without pretending that conditions in systems can ever be factored out of observation.

I once worked on an evaluation of a Centers for Disease Control and Prevention (CDC) program to support providers delivering services to HIV-positive men who have sex with men (MSM) to advise their clients to be tested and treated for syphilis, a major risk factor for HIV transmission. The evaluation included implementations in HIV clinics in eight cities located in all parts of the United States. Management of HIV and sexually transmitted diseases (STDs) is sensitive to local conditions in terms of the risk behaviors of their MSM population, the diffusion of information within and around the MSM population, and community attitudes and policies toward HIV and homosexuality. There was no “typical” manifestation of this program. Moreover, the effectiveness with which programs were delivered was highly sensitive to small changes in policy or in attitudes toward homosexuality and HIV (Hoover et al. 2010).

At the time that this evaluation was done, the CAS model was not in widespread use. To accommodate the variability in local programs, we embedded the evaluation in a contextual matrix so that each program was analyzed as operating at the center of a system using multiple levels of variables. The core of the model was the provider–patient communication. This form of communication was then embedded analytically into several levels of context: individual characteristics, health system effects, community differences, and legal and policy conditions. The model was complex and sometimes quite awkward. For example, the kinds of data—especially contextual data—were highly variable from one community to another. The model succeeded in incorporating the considerable variability in local programs.

Case Study Evaluation

Case study evaluations are often done by anthropologists. I used case study evaluations more than any other type of evaluation done for government clients. There are many varieties of case study and the approach here is taken from the work of Robert Yin. A case study as used in the context of evaluation is “an empirical inquiry that investigates a contemporary phenomenon in its real-life context using multiple data sources” (Yin 2009, 18). The strength of case studies in evaluating programs comes from their propensity to provide testable descriptions of programs in situations where it is difficult to distinguish programs from their context. For example, substance abuse prevention programs, no matter how faithfully implemented, are dependent on where they are implemented in terms of rural–urban contexts and to some extent geographic location. It is seldom possible to argue that context is constant across multiple implementations. The case study evaluation approach treats context as a variable, a characteristic of the program or specific program implementation. Case studies can be exploratory, descriptive, explanatory, or based on hypothesis testing using logical inference that may derive conclusions based on the systematic comparison of evidence and deduction (Yin 1993, 3).

Responsive Evaluation

Responsive evaluation, developed by Robert Stake (1980), concentrates on the perceptions of those who participate in the program as guides to evaluation design, implementation, and interpretation in a case study format. The most important perspective comes from those people who are in some way stakeholders in the program. However, control of the evaluation remains with the evaluator, eschewing participant control of design or interpretation, as is the case in, for example, empowerment evaluation, discussed next. It is the responsibility of the evaluator to collect the perceptions, make tentative conclusions, and report findings in such a way that the readers can develop their own interpretations of effectiveness from their own perspectives as they use the evaluation (Stake 1980, 76).

Stake’s evaluation strategy distinguishes between the program design that governs all program implementations and the specific implementations that are the output of the design. Stake defines the case as a system with complementary components that are assembled to achieve some purpose (Stake 2006, 2). Cases are studied as examples of a wider universe of similar cases with the same purpose. Stake distinguishes between the relationship between a single case of a program and the universe of programs operating under a single rubric as a “case” and a “quintain” (Stake 2006, 4). A case is a single instance or sample, while a quintain is the broader set of similar entities from which the case is drawn. For example, the case may be the drug-free school activities sponsored in your local high school. The quintain from which this case is drawn is all drug-free school programs in New York City, the State of New York, or the United States. Stake asserts that evaluators study the similarities and differences in cases to understand the quintain better.

Stake’s methodology focuses attention on both the generality and specificity of findings. By contrast, scientists place more emphasis on the generalizable, while many evaluation professionals and practitioners often prefer more particular findings on which they can act. It is best to incorporate both the general and the particular in evaluations. Yet, with a finite budget and finite time, evaluators must often choose where to put the emphasis. They must negotiate a balance between attending to the elements that tie the quintain together and the situational details of individual cases. The design of the evaluation involves identifying what needs to be known about the quintain, studying each case in terms of its own circumstances, interpreting patterns within each case, and then analyzing patterns across the cases to arrive at general findings about the quintain.

Stake’s distinction also offers a way to look at programs with multiple implementations that are embedded in their own contexts and have their own, often idiosyncratic characteristics. The case–quintain approach allows variation in the questions asked of individual cases while still addressing the larger issues driving the evaluation.

Empowerment Evaluation

Empowerment evaluation came to evaluation from anthropology and was developed by David Fetterman, an anthropologist. Empowerment evaluation is the use of evaluation concepts, techniques, and findings to foster improvement and self-determination on the part of evaluated people or programs (Fetterman 1994; Fetterman, Kaftarian, and Wandersman 1996). Control of the evaluation is vested in the community affected by the program, project, or product being evaluated. Community members design, implement, analyze, and make recommendations based on their own needs. The evaluator is there as a facilitator and technical expert.

There are many questions in the evaluation community about whether empowerment evaluation can be considered evaluation, since it deviates from the traditional definition of evaluation as a study of the quality, effectiveness, or efficacy of some intervention. From an ethical perspective, empowering one group may result in disempowerment of another. The choice of whose values to support and advocate is not clear in any absolute sense but relies on the judgment of the evaluator. Such issues revolve around the justice and propriety of doing evaluations that take strong social positions (Cousins 2005; Miller and Campbell 2006; Smith 2007).

Power relations between the ethnographer and members of the population studied is important to anthropologists. Fetterman clarifies the differences among collaborative, participatory, and empowerment evaluation approaches in terms of the control exercised by the evaluator (Fetterman et. al. 2018, 2). Collaborative evaluations are designed and implemented by the evaluator in consultation with the stakeholders, who remain involved throughout the evaluation, although control of the evaluation always remains with the professional evaluator. Participatory evaluations are jointly controlled and implemented by the evaluators and the participants and encourage participants to participate fully in the design, implementation, and reporting of the evaluation. In this approach, the evaluation is controlled by program staff, program participants, and the community, with evaluators acting as critical friends and coaches serving to keep the evaluation on track, acting as coaches, and providing support as requested. Differences in the researcher–subject relationship may produce different results. For example, in doing case studies, it can be critical to put more than one site visit in the design if at all possible. I have found that respondents are much more forthcoming in on-site visits conducted after the first. Trust and rapport are improved with every site visit.

Key Concepts Anthropologists Bring to Evaluation

Some of the critical assets that anthropologists can bring to evaluation are deep understandings of culture as it is lived, including how it influences thought and how it operates. The method–theory nexus that is ethnography is the principal research approach of cultural anthropology. Ethnography supports extracting the diverse perspectives of stakeholders and incorporating them into evaluation design. And it is often a method for structuring data collection and data analysis. Anthropologists have experience working as participants in communities. For traditional anthropological fieldwork, the first step is building rapport with people in the field site and bringing them into the research as soon as is feasible. This trust-building process is equally important in evaluation.


Culture has always been difficult for anthropologists to describe. The uncertainty of anthropology about what precisely comprises culture is visible in the numerous definitions of culture that appear in the anthropological literature. Most students of anthropology have wrestled with the culture concept through undergraduate and graduate school and continue to do so throughout their careers. In my own graduate work, I was assigned a classic work of anthropology that presented some 490 definitions of culture (Kroeber and Kluckhohn 1954). It was from this exercise that I learned that the definition of culture is an arbitrary and heuristic exercise for most anthropologists. Culture is defined in a way that is not misleading so that the work of describing its manifestations can proceed. Supported by tradition, I define culture as the learned, shared ideas and behaviors that people have because they live in a particular group of human beings (Harris 1980, 106).

Anthropologists contribute their understanding of culture from both emic and etic perspectives (Harris 1968, 569; Pike 1954, 8). “Emic” is the interpretation of culture from the perspective of the insider. In evaluation, the insiders are the stakeholders—program participants, program staff, and volunteers. The test for goodness of fit of an emic explanation is the agreement of those within the observed culture that the anthropologist’s description is an accurate explanation of what they are doing with a particular aspect of cultural behavior. One gets this perspective by asking the participant, “What is happening here?” or “What would you do in this situation?” “Etic” is the interpretation of culture from the perspective of the outsider, often the community from which the ethnographer comes. In evaluation, the relevant outsider is often the client. The test for goodness of fit is the agreement on the part of the outsider (or scientific) community about what is, in fact, occurring (Butler 2015, 62).

Most evaluators also recognize culture as a factor in their work, but simple recognition is not adequate for most evaluations. Normally the culture surrounding the evaluand must be explored rather than taken at face value. Anthropologists bring with them an understanding of culture as a dynamic system rather than a set of descriptions, a sensitivity to the variability of culture within single populations, a sense of the nature of cultural adaptation to changed circumstances, and a feel for the sometimes dramatic difference between what people say they do and what they are observed to do. Another part of sensitivity to culture has to do with understanding language and the workings of symbol systems. People do not always say what they mean and mean what they say. An understanding of the arbitrary linkage of symbols to the symbolized helps in disentangling meaning of utterances from speech habits and differences in word usage. A key element in any evaluation is systems thinking, especially in modeling how programs function for the purpose of evaluation (Hargreaves and Podems 2012). Even grounded theory approaches, which come from an inductive position, must specify a research question and specification of boundaries for what will be studied (Strauss and Corbin 1990, 36).

The most important contribution of the culture concept to evaluation is that the same ideas, social organizations, and technologies may have different meanings in different groups of people or even within the same group of people. I have observed that when people come looking for an anthropologist to help with an evaluation, they are looking for someone to help them to figure out what questions are important to one or more constituencies that they serve, even within programs. This exploration involves using the culture concept to identify relationships, what relationships accomplish, and what beliefs actually manifest in behavior.


Another important skill that anthropologists bring to evaluations is ethnography. Here I define ethnography as investigation based on regular contact with relevant people around a problem; it is examined in its own context by compiling the varied viewpoints of those who operate in the situation being investigated. Ethnography is an approach to investigating some kind of human community by seeking to understand cultural events as they are experienced by the insider. Methodologically, ethnography relies on multiple data sources and data collection methods, but the key strategies are open interviewing and careful observation of what is happening in the community. To the extent possible, ethnographers try to use participant observation in that they act in events along with the inside participants. For example, in doing evaluations of community programs, evaluators may attend coalition meetings, walk the environment with engineers, or observe patterns of use of products in domestic settings. The ethnographer’s purpose is to move closer to an insider understanding of what is happening. Direct or participant observation of events provides an independent point from which to assess what people are saying. It helps counter bias in the accounts of stakeholders.

Ethnography focuses on actions that groups of people consider worth doing and why. What outcomes are desirable? What are people willing to do to achieve these outcomes? Do people agree on what is important and how it can be achieved? Are agreements and disagreements crystallized around substantial differences in world view or are they differing perspectives on the same ideas and events? How is any intervention embedded in a community that includes those who deliver it, those who pay for it, and those who receive it?

To be useful in evaluation, ethnography must be conducted in such a way that it produces empirical results and goes beyond chatting with people about what’s happening. As is the case in any evaluation, its purpose is to build a corpus of evidence to be used by agencies, organizations, and planners to maintain, reinforce, and improve social responses to problems, human needs, and special situations. No body of data is intrinsically evidence. Evidence is built by clear specifications of the questions to be asked, a well-managed data collection, and careful documentation of the data analysis.


Many of the interesting things that people do happen in some kind of community, whether it is an agricultural village, Wall Street, or the virtual world of Facebook. Multiple players in multiple communities connect for some purpose, only to break into new constellations as new realities emerge and mature. Communities are not necessarily based on geographic continuity, but also on kinship, social or economic bonds, professional or religious associations, and anything else that ties people together so that they can achieve some purpose or function. Communities can be identified as sets of linkages in overlapping networks.

For our purposes here—the evaluation of human activities—the concept of communities defines the context in which our research must be located. It is no longer useful to pretend that it is possible to hold context aside somehow and have enough cultural evidence available to support meaningful interpretation of human activities. Most people live in multiple communities—workplaces, families, churches, volunteer associations, athletic leagues—the list could go on. These community memberships may reinforce or contradict each other. One must look for these linkages in data collection and analysis.

Evaluation Anthropology

Evaluation anthropology is grounded in an overlap between the fields of evaluation and anthropology. Evaluation is the study of the value of programs, projects, products, or any human activity in which some outcome is a consideration in success or choice of alternative solutions to social problems. Anthropology at its broadest is the study of culture. Evaluation anthropology concerns itself with the culture of values—how people use the programs being evaluated and what governs their choice and their performance in these contexts.

Anthropology provides to evaluations a paradigmatic position that does not dismiss competing paradigms as invalid but as culturally embedded frames for interpreting experience. The anthropological theory of culture seeks to incorporate human thought and behavior in context along with a commitment to investigating value as a cultural phenomenon that may vary from one stakeholder to another. The ethnographic methodology combines multiple methods to bring into the research all sources that can provide useful insight into the situation investigated, including quantitative data (Bernard 2011, 384; LeCompte and Schensul 2010, 126). Finally, ethnographic analysis is comparative and holistic in that it seeks empirical patterns across multiple cases rather than striving for generalization to all possible sites.

The evaluation underlying evaluation anthropology brings to the mix a theoretical orientation to feasibility, acceptability, and value of human activities established relative to a diverse group of people who are affected by the thing evaluated—the stakeholders. Evaluation gives anthropologists the theoretical and methodological framework within which to build investigations that are credible, methodologically sound, and useful to those who commission evaluations. Methodologically, they bring scientific rigor and an eye for evidence. As anthropologists who evaluate, they are accustomed to working in interdisciplinary teams. Evaluation methods and theories are applied to cultural systems to assure the basis of their credibility in evaluation anthropology.

Evaluation anthropology seeks to ground the evaluation in both what is known of the “real world” context in which evaluands exist and in the effects of culture on their operation. As evaluation anthropologists, the world is understood as a dynamic system constantly adapting to local and more distant influences. Entities being evaluated are embedded in a multilayered matrix of politics, economics, ecology, and local priorities. Evaluation planning requires preliminary work, often ethnographic, on what is important to stakeholders.

The evaluation anthropology approach can be used at any phase of evaluation. Sometimes the use of ethnographic input in the early days of an evaluation design may have led the evaluators to recast the design to address specific local needs. I was part of a five-year case study evaluation of a project on teen pregnancy prevention curricula in high schools in three states. It was an exploratory case study conducted to support evaluation design and pilot testing in eight states prior to a more intensive five-year evaluation of three of the eight state programs.

The study involved multiple methods, including documentary data, cost data, community and organization records, and an ethnographic assessment of early program implementation in each state. We prepared a protocol with the study purpose, the method, an analysis plan, and a budget as part of our proposal to do the more intensive Phase 2 project. The work began, as usual, with a process to meet with the client and then networking out to find stakeholders. We identified some program staff recommended by the client and from them we networked in each state program to find others. However, there was a budgetary constraint on how much of this preliminary work we could do. We developed the design and instruments and went to one of the states to pilot the study in a designated community.

When we arrived, we discovered that we had missed an essential racial and ethnic divide between the African American population and the Latina/o population, both of whom were potential recipients of the program services. A cultural difference in the way coalitions operated had worked well in the African American community but seriously misfired in the Latina/o one. The Latinas/os perceived the African American staff of the program as rude and pushy in moving to what the Latinas/os considered premature closure on the questions that would be asked and of whom. This problem appeared in a coalition meeting in which an undercurrent of hostility on both sides was covered by surface politeness and an inability to focus on anything substantive. The coalition leaders were genuinely confused about this. They would not come out and blame the other ethnic group for the problems of the coalition, but both sides tended (very politely) to attribute it to intransigence on the part of the other group. It became clear that in this site, at least, much of the data collection would need to be conducted separately in both groups to get at how the program was supposed to operate in the minds of each group, how they each used the program, and how they actually behaved. This perspective was investigated in the other seven states and was incorporated into the study design. This sort of tailoring of study protocols to the context in specific evaluation sites is one of the strengths of evaluation anthropology.

Evaluation as Transdisciplinary Research

Evaluation anthropology is a transdisciplinary study and is more than an arithmetic combination of several ways of doing things. Transdisciplinary research is the pursuit of an integrated approach from interdisciplinary teams that is synergistic and not fully a product of any specific discipline. Building transdisciplinary approaches is a highly collaborative process. There is, of course, a continuum with multidisciplinary research at one end and transdisciplinary research at the other. For there to be an integration of the capabilities of members of an evaluation team, one that includes other professionals but also incorporates the stakeholders, the evaluation anthropologist must seek and respect the perspectives of others and their contributions to the evaluation enterprise. The power of transdisciplinary work lies in bringing multiple ways of doing science together for complementarity and triangulation. We can see our research from several points of view, and we can compare our findings by building up knowledge from multiple perspectives. It is this integrative character that adds value to evaluation and brings the conclusions closer to whatever version of “truth” we can jointly achieve.

A word of advice on constructing evaluations under the evaluation anthropology scheme: it is a bad idea to decide on method before you understand what the evaluation needs to ascertain. Premature choice of a method means that there will be some questions that will not only be missed but that the evaluator may not be aware that they are there to be missed. Always lead with the question(s) the evaluation will ask. What needs to be known? How will the results be used? Choosing a method first becomes a concern when doing evaluations under contract to businesses or government agencies. Often a potential client will contact an evaluator to do a survey, or focus groups, or an ethnographic study that may be a poor choice for discovering what they need to know. When this happens, it is best to quietly sell them on the idea of defining the evaluation question before moving on to choose a method.

Why Evaluation Anthropology?

Why make this thing called evaluation anthropology? Why can’t there just be anthropologists who evaluate or evaluators who use anthropology as a tool to evaluate? Aren’t there already enough specialties to remember? One must approach labeling a new field of research only if there is some reason why such a designation is useful.

The integrated application of evaluation and anthropological methods and theories is a stronger approach to situations in which culture, and especially cultural variability, are either determinant of success or position a program to fail because of cultural misunderstanding on the part of stakeholders. In such situations, we can design and implement evaluations with the goal of bringing all of the evidence we can find to bear on a single set of evaluation questions. Both anthropology and evaluation are part of the equation.

Evaluation anthropology can be stronger in combination than either is without the other. Evaluators may not have the training to conduct an ethnography that can incorporate the cultural variation surrounding programs. Similarly, anthropologists may not have an adequate knowledge of or experience of evidence-based approaches to evaluation to be effective. The concept of culture helps anthropologists to describe the armature of the sociocultural systems in which things operate. Evaluation supports the rigorous methodologies and the scientific orientation needed for evaluations to be deemed credible, valid, and reliable by those who will use them to make policy.

A key issue in both evaluation and applied anthropology is linking method, theory, and practice so that they reinforce each other. For anthropologists, practice of the discipline in any context is an enterprise in which certainty and truth are elusive and context dependent. We combine method and theory to focus in on what is important given the circumstances. Such a focus is much less comfortable than the world of the .05 significance level, which is a very important orientation to many evaluators who work with quantitative approaches. But evaluation anthropology is also less comfortable than the theoretical certainties of which anthropologists are fond. For example, the distinction between the anthropologist and the “other”—the people who anthropologists study—increasingly lives in postmodern societies in which the differences among peoples diminish and the cultures in which people act are transformed (Trouillot 1991).

Finally, the definition of evaluation anthropology as a subfield in anthropology can help anthropologists to be recognized in the wider world of evaluators and evaluation users who come to recognize the strength of anthropology to help build successful evaluations. Often evaluation anthropologists, like too many anthropological practitioners, have given up their identity as anthropologists. The de-identification of practitioners as anthropologists—in evaluation and elsewhere—does damage to the job prospects of other anthropologists. We should ensure that our colleagues and our clients attribute our work to anthropology. Otherwise our team members may never be aware that what we have accomplished is anthropology (Butler 2015, 181). I have always made sure that anyone I work with in multidisciplinary teams knows that I am an anthropologist.

Ethics in Evaluation Anthropology

Ethics is a set of culturally defined principles that guide the protection of persons in the pursuit of some professional or personal responsibility. Important parts of this definition are “culturally defined” in the sense of being part of the shared understanding of members of specific kinds of professions and workplaces. Ethics includes principles that embody values that define acceptable practice in terms of doing good and doing no harm. One source of guidance for cultural sensitivity in evaluations is the American Evaluation Association’s Public Statement on Cultural Competence in Evaluation, which notes that cultural competence is a stance that recognizes cultural diversity rather than a specific set of skills.4 Ethics also reinforces the responsibilities of those who serve and protect the people with whom they work and study (Berreman 2003, 51; Malefyt and Morais 2017, 2),

The basis of most ethics statements in the social sciences and medicine dates to the publication of the Belmont Report (1979) by the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, hereafter called the Commission. The Commission, made up of scientists, medical professionals, ethicists, and legal experts, was charged to “identify the basic ethical principles that should underlie the conduct of biomedical and behavioral research involving human subjects and to develop guidelines which should be followed to assure that such research is conducted in accordance with those principles.”

The Commission came up with three standards for ethical research with human subjects: respect for persons, beneficence, and justice. Respect for persons specifies that ethical research must respect the autonomy of persons as the decision makers in their own lives and requires that subjects participate in research voluntarily after having been informed of the purpose of the research, what their participation will be, any benefits that will accrue to them, and adequate information about potential risks of their participation. Beneficence refers to extending good to people but goes beyond this common usage of the term to define beneficence as an obligation to do no harm to research participants and to maximize possible benefits and minimize potential harm to them in the course of research. Justice means that the benefits of research should accrue to those who bear its burdens even if there is also a benefit to the larger society. An injustice occurs when some benefit to which a person is entitled is denied without good reason or when some burden is imposed unduly. Realistically, it is difficult to tell a priori what the benefits should be and who is likely to receive them.

The open nature of ethnographic research poses challenges in informing people of the risks of exposure and the limits of confidentiality (Fluhr-Lobban 2003; Mabry 1999). In evaluation, the same privacy concerns arise. Because evaluation is designed to assign values to programs, to their staff, and to their recipients, it is meant to be critical scrutiny. It is difficult for people to say no to requests for interviews in connection with their jobs, and it is hard to protect people’s identity. Evaluations tend to occur in small worlds where people are known to each other. Usually there are only so many people who could have made a specific point, and often people can be recognized by speech mannerisms or opinions.

Wittingly or unwittingly, evaluation poses risks to the jobs, the careers, and the livelihoods of those who cooperate with evaluators. It affects the way in which needs of population subgroups are met, especially those who are vulnerable to the failure of social services, public health, and education. To those who design, fund, and implement programs, there are political risks that are not always obvious to external evaluators. For this reason, we must be scrupulous in protecting the identity of those who help us. Data should be “sanitized”—checked carefully for clues to identification, such as names of specific program activities, clues to the location of the program being discussed, and names of coworkers or partners. I never use direct quotations in evaluation reports. While they add interest and read well, the risk of unintentional exposure of someone’s identity is too high for me. For people who work together closely, speech mannerisms can give someone away.

In evaluation and in anthropology, the conditions that affect research are difficult to identify in advance. It may be impossible to anticipate all contingencies affecting an evaluation. Some events that are expected to be problems never occur. Other unexpected situations arise and need to be addressed during the course of research. Consequently, the protection of all stakeholder interests in an evaluation depends heavily on the vigilance and judgment of researchers.

Ethical principles are designed to help identify likely issues and to plan ahead to avoid violations of people’s rights, privacy, and short- and long-range interests. Evaluators are obligated to become familiar with the appropriate ethical principles associated with conducting evaluations. Many professional associations promulgate their own ethical principles.

Professional organizations of most fields involved in social research have guidelines to ensure protection of human subjects in their research.

The standards of three professional organizations in the United States are summarized in Table 2. These include the Guiding Principles from the American Evaluation Association (AEA), the Principles of Professional Responsibility of the American Anthropological Association (AAA), and the statement of Ethical and Professional Responsibilities of the Society for Applied Anthropology (SfAA). Each of them incorporates the same basic ideas but differ in emphasis because of the different issues each faces. Evaluators work in areas that affect decision making and the welfare of populations of people. AEA focuses on technical adequacy, honesty of evaluation, and on the interests of stakeholders. The AAA Principles of Professional Responsibility reflect the academic nature of the association and emphasize the highly variable application of anthropological methods in field situations around the world and the quality of academic endeavors. SfAA serves applied anthropologists and is phrased in terms of the responsibilities of practitioners to the wide communities of people who may be affected by their actions. All of them are phrased as guidance or principles. None of these organizations assumes responsibility to arbitrate ethical problems or impose sanctions on ethical violations.

Table 2. Themes in Ethical Principles for Evaluation and Anthropology

American Evaluation Association Guiding Principles

American Anthropological Association Principles of Professional Responsibility

Society for Applied Anthropology Statement of Ethics and Professional Responsibilities

Technical adequacy

Strongly emphasized; adhere to highest technical standards; make clear the limitations of methods; ommunicate results accurately; practice within your competence

Implied but vague; be responsible for accuracy of reported work; maintain integrity in research

Accurately report our competencies; report research accurately; attempt to avoid misuse of our work

Honesty and integrity

Honesty to clients; reveal conflicts of interest; present results honestly

Honor the moral rules of scientific and scholarly conduct; don’t cheat or falsify results

Implied but not explicitly discussed

Respect for people

Emphasized; include all persons involved in the evaluation or defined stakeholders

Strongly emphasized; do no harm; respect well-being; establish a working partnership with the research community; determine in advance whether participants want recognition or anonymity and honor their request

Full disclosure of research; only voluntary participation; maintain confidentiality and advise people of limits of confidentiality

The common good

Serve diversity or public interest; disseminate results; seek a balance between client needs and those of other stakeholders

Disseminate research whenever possible; be candid about biases; explicitly endorses advocacy

Do not recommend actions harmful to the community studied; vague on good to society as a whole

Teach responsibly

Not mentioned

Teach well and sensitively; teach ethics; acknowledge student work

Teach well; orient to needs of larger society; teach ethics; acknowledge student work

Building Careers in Evaluation Anthropology

Identifying and Securing a Position

Anthropologists can and do build careers in evaluation. However, they must acquire knowledge about the evaluation market before investigating specific options. One strategy involves treating the job search as a research project using an ethnographic approach to discover information and insights. It may be the most important ethnography an anthropologist ever does because it can serve as a foundation for approaching future evaluation projects. I used to joke with my family during one period of unemployment: “Yes, I looked for a job today. There weren’t any jobs for anthropologists in the Washington Post.” This tactic characterized my job search because I did not know where to look or what to look for. I learned how unlikely it was to find a job listing for an “evaluator” or an “evaluation anthropologist,” or even a “social researcher.” An organization seeking an evaluation may title the position and describe it in a variety of ways. Here are just a few positions that I noticed on the internet website for those with a master’s degree:

Program Evaluator

Evaluation Technical Associate/Assistant


Research Associate

Qualitative Team Manager

Study Coordinator

External Evaluator5

How does one find job openings? The web of course! It is not the only way but a sure place to start. In the United States, the American Anthropological Association (AAA), the Society for Applied Anthropology (SfAA), the Ethnographic Praxis in Industry Conference (EPIC), and the professional evaluation associations all have websites that post jobs. If you are located outside the United States, there are hundreds of professional evaluation associations around the world, one of which almost certainly serves your country. An alternate route involves searching the websites of professional associations linked with particular specialties (e.g., education, environment, public health, international development) that are likely to use evaluations.

Once a list of potential employers is identified, one should move on to informational interviews to learn about specific jobs or careers. Typically, this process involves identifying particular individuals who hold such jobs that may interest you, contacting them, and requesting a short amount of time (e.g., 15–20 minutes) to interview them, either in person or by telephone. It is appropriate to ask for advice related to a particular job, job search, or career. Requesting a few minutes of their time to learn about their job, how they got it, and what it involves is a common job search strategy. Most professionals understand the informational interview and respond positively to such requests. Informational interviewing often leads the job seeker to a job. Even if it does not, it certainly equips the job seeker to participate in successful job interviews for jobs found in other ways. These ways include networking, attending job events at professional meetings, surfing the web, and volunteering in professional and community organizations.

General Suggestions and Advice

Here are some general thoughts that may be useful in entering the field of evaluation. It is helpful to become familiar with evaluation—theory and method—and to be prepared to talk about it in job interviews. That learning process has begun here through this article. Keeping current helps you score some points in an interview by being aware of the latest controversy (e.g., in evaluation, in high-profile issues in the field, or in issues in the organization interviewing you). Learn about evaluation in your own area(s) of interest. What evaluations have been done to protect tribal rights, prevent teenagers from smoking, market software, or whatever you care about?

Build and track a network of contacts. Keep track of everyone you meet who has any connection to what you want to do. Start today. Your network is the most important asset you have, and you will go back to it over and over. Make sure the people in your network know you are looking for a job.

Apply for many jobs. Don’t make the mistake of falling in love with a job in an interview and stopping your job search while you wait to see if it pans out. If you get a second job offer, call the first interviewer and ask how close he or she is to making a decision. Then make your decision in light of what that person tells you. There is no reason not to ask about a decision on filling a position. Indeed, employers expect those questions.

Think about how much you want to be paid. When I was interviewing people for jobs as evaluators, I was always amazed at how few people had even thought about this for entry-level jobs. It is not bad form to have salary expectations. They will ask you. It is a good question for informational interviews.

Do not be afraid to take a job that falls short of your dreams. It is only a job. You have probably heard that you need to have a job to find a job. If you do your job well, offers will often come across your desk. But first, you need a foot in the door. Do not take a job you loathe but do not be too picky either.

Acquiring Specialty Knowledge

One reason that anthropologists do well as evaluators is that they are “good learners.” Fieldwork and the experiences that happen there help anthropologists learn content and learn it quickly. Useful knowledge and tools for me included:

Statistics—I took a doctoral exam in statistics; it was the best thing I ever did. It is important to be able to understand statistics, plan project protocols with statisticians, and report and explain statistical results.

Programming—Anthropologists may have many different career experiences as they develop careers. One job I held was as a Statistical Analysis System (SAS) programmer. Understanding the logic of quantitative analysis was a big help later when I began building mixed method evaluation designs. I especially recommend learning some data management and text analysis software.

Languages—I learned Spanish doing my doctoral work in Guatemala; I consistently use it in my evaluation work. Other languages may be necessary in other locations. It is difficult to do either ethnography or evaluation without local language skills.

Project management—Any opportunity to manage research in any capacity has direct application to evaluations—including volunteer work.

Financial management—Learning how to estimate and manage a budget are skills pertinent to management and leadership positions.

It is appropriate to introduce any useful skills one has picked up during job interviews. I got a job once because I happened to mention that I programmed in SAS on a previous job. Skills such as programming are assets that appeal to employers, often beyond the specifics of a job description. They may help an employer who desperately needs, for example, a person who can help out with computer analysis.

Future Directions

Ethnographic evaluation will become important in the era of globalization, retrenchment from traditional forms of liberalism, and pressure on trade (Wasson, Butler, and Copeland-Carson 2012). More than ever, it is critical to understand how events unfold from the insider perspective. Ethnographic approaches to evaluation, alone or as part of larger evaluation agendas, have become more common in recent years. Most especially, rapid assessment procedures become more important every year as they provide a technically sound and cost-effective way to compile the information needed to guide some policy or program in a short turnaround time (Beebe 2001).

International evaluation is already an important component of program evaluation. For example, the United States Agency for International Development (USAID) funds about 200 evaluations per year to support program development, program modification, and project monitoring. USAID is a logical place for evaluation anthropologists with their focus on cross-cultural evaluations and studies in contexts outside the United States. One piece of advice that I would give anthropologists seeking to do international evaluations is to target the job search to agencies and employers who are already doing this work. Evaluation is a very difficult field to enter without experience, not so much of individuals as of organizations commissioned to do these kinds of evaluations (Archibald, Sharrock, Buckley, and Young 2018).

Indigenous evaluation is another area that can be expected to grow in the coming generation. The use of evaluators who are culturally from the communities being evaluated provides advantages in terms of entry to the field as well as the ability to review and critique findings of studies that are conducted in these evaluators’ communities. Indigenous evaluators can be sensitive to the need to uphold the sovereignty of their people and to be sensitive to issues of intergenerational trauma resulting from colonialism, forced relocation, and assimilationist policies (Cram 2018). Indigenous evaluation is a logical place for empowerment evaluation to become more common in indigenous evaluations, but also in other types of evaluation with marginal and unempowered populations.

I recommend that we all pay attention to developmental evaluation and complex adaptive systems (Patton 2011; Hargreaves and Podems 2012). This method of modeling human systems can accommodate the complexity and non-linearity of much of the economic and sociopolitical issues that arise with postmodernity (Lansing 2003). As the world in general becomes aware of the complexity of the globalizing world, this kind of modeling is likely to become very important because it allows us to incorporate complexity without sacrificing validity of results.

Further Reading

  • Butler, Mary Odell, and Jacqueline Copeland-Carson, eds. 2005. Creating Evaluation Anthropology: Introducing an Emerging Subfield. Napa Bulletin 24. Arlington, VA: American Anthropological Association.
  • House, Ernest R., and Kenneth R. Howe. 1999. Values in Evaluation and Social Research. Thousand Oaks, CA: SAGE.
  • Malefyt, Timothy de Waal, and Robert J. Morais, eds. 2017. Ethics in the Anthropology of Business: Explorations in Theory, Practice and Pedagogy. New York, NY: Routledge.
  • Rossi, Peter H., Mark Lipsey, and Howard E. Freeman. 2019. Evaluation: A Systematic Approach, 8th ed. Thousand Oaks, CA: SAGE.
  • Strauss, Anselm, and Juliet Corbin. 2015. Basics of Qualitative Research: Grounded Theory Procedures and Techniques, 4th ed. Thousand Oaks, CA: SAGE.
  • Yin, Robert K. 2018. Case Study Research and Applications: Design and Methods. Thousand Oaks, CA: SAGE.


  • Andersen, Hanne, and Brian Hepburn. 2016. “Scientific Method.” The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta. Stanford, CA: Stanford University Press.
  • Alkin, Marvin C., and Jean A. King. 2016. “The Historical Development of Evaluation Use.” American Journal of Evaluation 37 (4): 568–579.
  • Archibald, Thomas, Guy Sharrock, Jane Buckley, and Stacey Young. 2018. “Every Practitioner a ‘Knowledge Worker’: Promoting Evaluative Thinking to Enhance Learning and Adaptive Management in International Development.” Evaluative Thinking. New Directions in Evaluation 2018 (158): 73–91.
  • Beebe, James. 2001. Rapid Assessment Process: An Introduction. Walnut Creek, CA: Altamira Press.
  • Bernard, H. R. Russell. 2011. Research Methods in Anthropology, 5th ed. Lanham, MD: Altamira Press.
  • Berreman, Gerald D. 2003. “Ethics versus ‘Realism’ in Anthropology: Redux.” In Ethics and the Profession of Anthropology, 2nd ed., edited by Carolyn Fluehr-Lobban, 51–83. Walnut Creek, CA: Altamira Press.
  • Butler, Mary Odell. 2006. “Translating Evaluation Anthropology.” In” Creating Evaluation Anthropology: Introducing an Emerging Subfield, edited by Mary Odell Butler and Jacqueline Copeland-Carson, 17–30. Napa Bulletin 24. Arlington, VA: American Anthropological Association.
  • Butler, Mary Odell. 2015. Evaluation: A Culture Systems Approach. New York: Taylor & Francis.
  • Campbell, Donald T., and Julian C. Stanley. 1963. Experimental and Quasi-Experimental Designs for Research. Chicago: Rand-McNally.
  • Centers for Disease Control and Prevention. 1999. “A Framework for Program Evaluation in Public Health.” Morbidity and Mortality Weekly Report 48: RR-11.
  • Chelimsky, Eleanor. 1997. “The Political Environment of Evaluation and What It Means for the Development of the Field.” In Evaluation for the 21st Century: A Handbook, edited by Eleanor Chelimsky and William R. Shadish, 53–71. Thousand Oaks, CA: SAGE.
  • Chelimsky, Eleanor, and William R. Shadish, eds. 1997. Evaluation for the 21st Century: A Handbook. Thousand Oaks, CA: SAGE.
  • Clifford, James. 1999. “Introduction: Partial Truths.” In Writing Culture: The Poetics and Politics of Ethnography, 1–19. Berkeley, CA: University of California Press. First published 1986.
  • Chen, Huey-Tsyh. 2005. Practical Program Evaluation. Thousand Oaks, CA: SAGE.
  • Connor, Ross F. 1985. “International and Domestic Evaluation: Comparisons and Insights.” New Directions for Evaluation 25: 19–28.
  • Cook, Thomas D., and Donald T. Campbell. 1979. Quasi-Experimentation: Design and Analysis Issues in Field Settings. Boston: Houghton-Mifflin.
  • Cousins, Bradley J. 2005. “Will the Real Empowerment Evaluation Please Stand Up?” In Empowerment Evaluation Principles in Practice, edited by David M. Fetterman and Abraham Wandesman, 183–208. New York: Guilford Press.
  • Cram, Fiona. 2018. “Conclusion: Lessons about Indigenous Evaluation.” New Directions for Evaluation 159: 121–132.
  • Fetterman, David M. 1994. “Empowerment Evaluation.” Evaluation Practice 15 (1): 1–15.
  • Fetterman, David M., Shakeh H. Kaftarian, and Abraham Wandersman. 1996. Empowerment Evaluation: Knowledge and Tools for Self-Assessment. Thousand Oaks CA, SAGE.
  • Fetterman, David M., Liliana Rodrigues-Campos, Ann P. Zukoski, and Contributors. 2018. Collaborative, Participatory and Empowerment Evaluation: Stakeholder Involvement Approaches. New York: Guilford Press.
  • Fetterman, David. M., and Abraham Wandesman. 2005. Empowerment Evaluation Principles in Practice. New York: Guilford Press.
  • Fluehr-Lobban. 2003. “Introduction.” In Ethics and the Profession of Anthropology, 2nd ed., edited by Carolyn Fluehr-Lobban, 1–28. Walnut Creek, CA: Altamira Press
  • Fox, Richard G. 1991. “Introduction: Working in the Present.” In Recapturing Anthropology: Working in the Present, edited by Robin Fox, 1–16. Santa Fe, NM: School of American Research.
  • Geertz, Clifford. 1973. The Interpretation of Cultures. New York: Basic Books.
  • Goldring, Ellen B., and Sharon F. Rallis. 1993. Principals of Dynamic Schools. Newbury Park, CA: SAGE.
  • Greene, Jennifer C., and Valerie J. Caracelli. 1997. “Defining and Describing the Paradigm Issue in Mixed-Method Evaluation.” New Directions in Evaluation 74: 5–17.
  • Guba, Egon G., and Yvonna S. Lincoln. 1989. Fourth Generation Evaluation. Newbury Park, CA: SAGE.
  • Hargreaves, Margaret B., and Donna Podems. 2012. “Advancing Systems Thinking in Evaluation: A Review of Four Publications.” American Journal of Evaluation 33 (3): 462–470.
  • Harris, Marvin. 1980. Culture, People, Nature. New York: Harper & Row.
  • Harris, Marvin. 1968. The Rise of Anthropological Theory. New York: Crowell.
  • Hoover Karen W., Mary O. Butler, Kimberly Workowski, Felix Carpio, Stephen Follansbee, and Beau Gratzer. 2010. “STD Screening of HIV-infected MSM in HIV Clinics.” Sexually Transmitted Diseases 37 (12): 771–776.
  • House, Ernest R. 1994. “Integrating the Quantitative and Qualitative.” New Directions for Program Evaluation 61: 3–22.
  • Kidder Louise, and Michelle Fine. 1987. “Qualitative and Quantitative Methods, When Stories Converge.” Special issue, Multiple Methods in Program Evaluation 1987 (35): 57–75.
  • Kraft, Victor. 1974. “Popper and the Vienna Circle.” In The Philosophy of Karl Popper, vol. 14, 185–204. Lasalle, IL: The Library of Living Philosophers.
  • Kroeber, Alfred, and Clyde Kluckhohn. 1954. Culture. New York: Vintage Books.
  • Kuhn, Thomas S. 2012. The Structure of Scientific Revolutions, 3rd ed. Chicago: University of Chicago Press.
  • Lansing, J. Stephen. 2003. “Complex Adaptive Systems.” Annual Review of Anthropology 32: 183–204.
  • LeCompte, Margaret D., and Jean J. Schensul. 2010. Designing and Conducting Ethnographic Research, Ethnographers Toolkit, Book 1, 2nd ed. Lanham, MD: Rowman & Littlefield.
  • Lincoln, Yvonna S., and Egon S. Guba. 1985. Naturalistic Inquiry. Newbury Park, CA: SAGE.
  • Love, Arnold, and Craig Russon. 2004. “Evaluation Standards in International Context,” New Directions for Evaluation 104: 5–14.
  • Mabry, Linda. 1999. “Circumstantial Ethics.” American Journal of Evaluation 20 (2): 199–212.
  • Malefyt, Timothy de Waal, and Robert J. Morais. 2017. “Introduction: Capitalism, Work and Ethics.” In Ethics in the Anthropology of Business: Explorations in Theory, Practice and Pedagogy, edited by Timothy de Waal Malefyt and Robert J. Morais, 1–22. New York, NY: Routledge.
  • Marquart, Jules M. 1990. “A Pattern Matching Approach to Link Program Theory and Evaluation Data.” New Directions for Program Evaluation 47: 93–107.
  • Miles, Matthew B., and A. Michael Huberman. 1994. Qualitative Data Analysis: An Expanded Soucebook, 2nd ed. Thousand Oaks, CA: SAGE.
  • Miller, Robin L., and Rebecca Campbell. 2006. “Taking Stock of Empowerment Evaluation: An Empirical Review.” American Journal of Evaluation 27 (3): 296–319.
  • National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. 1979, April. The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research.
  • Noblit George W., and Dwight R. Hare. 1988. Meta-Ethnography: Synthesizing Qualitative Studies. Newbury Park, CA: SAGE.
  • Patton, Michael Quinn. 2018. “A Historical Perspective on the Evolution of Evaluative Thinking.” Evaluative Thinking 2018 (158): 11–28.
  • Patton, Michael Quinn. 2015. Qualitative Research and Evaluation Methods, 3rd ed. Thousand Oaks, CA: SAGE.
  • Patton, Michael Quinn. 2011. Developmental Evaluation. New York: Guilford Press.
  • Patton, Michael Quinn. 2005. “The View from Evaluation.” In Creating Evaluation Anthropology: Introducing an Emerging Subfield, edited by Mary Odell Butler and Jacqueline Copeland-Carson, 31–40. Napa Bulletin 24. Arlington, VA: American Anthropological Association.
  • Patton, Michael Q. 1997. Utilization-Focused Evaluation. Thousand Oaks, CA: SAGE.
  • Pike, Kenneth. 1954. Language in Relation to a Unified Theory of the Structure of Human Behavior, vol. 1. Glendale, CA: Summer Institute of Linguistics.
  • Popper, Karl. 2014. The Logic of Scientific Discovery. Mansfield Center, CT: Martino Publishing.
  • Popper, Karl. 1974. The Autobiography of Karl Popper. In The Philosophy of Karl Popper, vol. 14, 1–181. Lasalle IL: The Library of Living Philosophers.
  • Reichardt, Charles A., and Sharon F. Rallis. 1994a. “The Relationship Between the Qualitative and Quantitative Research Traditions.” New Directions for Evaluation 61: 5–11.
  • Reichardt, Charles A., and Sharon F. Rallis. 1994b. “Qualitative and Quantitative Inquiries Are Not Incompatible: A Call for a New Partnership.” New Directions for Evaluation 61: 85–90.
  • Rossi, Peter H., Mark Lipsey, and Howard E. Freeman. 1999. Evaluation: A Systematic Approach, 6th ed. Thousand Oaks, CA: SAGE.
  • Schwandt, Thomas A. 1994. “Constructivist, Interpretative Approaches to Human Inquiry.” In Handbook of Qualitative Research, edited by Norman Denzin and Yvonna S. Lincoln, 118–137. Thousand Oaks, CA: SAGE.
  • Scriven Michael. 1967. “The Methodology of Evaluation.” In Perspectives on Curriculum Design, edited by Ralph W. Tyler, Robert M. Gagne, and Michael Scriven. Chicago: Rand-McNally.
  • Scriven, Michael. 1991. “Beyond Formative and Summative Evaluation.” In Evaluation and Education: At Quarter Century, edited by Milbrey Wallin McLaughlin and Denis Charles Phillips, 19–64. Chicago: University of Chicago Press.
  • Shadish, William R., Thomas D. Cook, and Laura C. Leviton. 1991. Foundations of Program Evaluation. Newbury Park, CA: SAGE.
  • Smith, Nick L. 2007. “Empowerment Evaluation as Evaluation Ideology.” American Journal of Evaluation 28 (2): 169–178.
  • Snedecor, George W, and William G. Cochran. 1967. Statistical Methods, 6th ed. Ames: Iowa State University Press.
  • Stake, Robert E. 1980. “Program Evaluation, Particularly Responsive Evaluation.” In Rethinking Educational Research, edited by W. B. Dockerell and David Hamilton. London: Hodder and Stoughton.
  • Stake, Robert E. 1994. “Case Studies.” In Handbook of Qualitative Research, edited by Norman Denzin and Yvonna S. Lincoln, 236–247. Thousand Oaks, CA: SAGE.
  • Stake, Robert E. 2006. Multiple Case Study Analysis. New York: Guilford Press.
  • Strauss, Anselm, and Juliet Corbin. 1990. Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Newbury Park, CA: SAGE.
  • Trouillet, Michel-Rolph. 1991. “Anthropology and the Savage Slot: The Poetics and Politics of Otherness.” In Recapturing Anthropology, edited by Richard G Fox, 17–44. Santa Fe, NM: School of American Research Press.
  • Tyler, Ralph W. 1991. “General Statement on Program Evaluation.” In Evaluation and Education: At Quarter Century, edited by Milbrey Wallin McLaughlin and Denis Charles Phillips, 3–17. Chicago: University of Chicago Press.
  • United States Department of Education. 1992. “Use of Scientifically Based Research in Education.” Working Group Conference, February.
  • Vo, Anne T., and Thomas Archibald. 2018. “Evaluative Thinking: New Directions in Evaluative Thinking.” New Directions in Evaluation 158: 139–147.
  • Wasson, Christina, Mary Odell Butler, and Jacqueline Copeland-Carson, eds. 2012. Applying Anthropology in the Global Village. Walnut Creek, CA: Left Coast Press.
  • Weiss, Carol H. 1997. Evaluation: Methods for Studying Programs and Policies. Thousand Oaks, CA: SAGE.
  • Wolcott, Harry F. 2008. Ethnography: A Way of Seeing, 2nd ed. Lanham, MD: Altamira Press.
  • Yin, Robert K. 2009. Case Study Research: Design and Methods, 4th ed. Thousand Oaks, CA: SAGE.
  • Yin, Robert K. 1993. Applications of Case Study Research. Newbury Park, CA: SAGE.