Monitoring and Evaluation of Sexual and Reproductive Health Programs
Monitoring and Evaluation of Sexual and Reproductive Health Programs
- Janine Barden-O'FallonJanine Barden-O'FallonGillings School of Global Public Health, University of North Carolina at Chapel Hill
- and Erin McCallumErin McCallumGillings School of Global Public Health. University of North Carolina at Chapel Hill
Monitoring and evaluation (M&E) can be defined as the systematic collection, analysis, and use of data to answer questions about program performance and achievements. An M&E system encompasses all the activities related to setting up, collecting, reporting, and using program information. A robust, well-functioning M&E system can provide program stakeholders with the information necessary to carry out a responsive and successful program intervention and is therefore a critical tool for program management. There are many tools and techniques needed for successful M&E of sexual and reproductive health (SRH) programs. These include frameworks to visually depict the organization of the program, its context and goals, and the logic of its M&E system. Essential practices of M&E also include continuous stakeholder engagement, the development of indicators to measure program activities and outcomes, the collection and use of data to calculate the indicators, and the design and implementation of evaluation research to assess the benefits of the program.
Over time, language around “M&E” has evolved, and multiple variations of the phrase are in use, including “MEL” (monitoring, evaluation, and learning), “MER” (monitoring, evaluation, and reporting), and “MERL” (monitoring, evaluation, research, and learning), to name but a few. These terms bring to the forefront a particular emphasis of the M&E system, with an apparent trend toward the use of “MEL” to emphasize the importance of organizational learning. Despite this trend, “M&E” continues to be the most widely known and understood phrase and implicitly includes activities such as learning, research, and reporting within a robust system.
- Global Health
- Sexual & Reproductive Health
Monitoring and evaluation (M&E) systems provide valuable information for decision makers at all levels—leadership, staff, funders, and other stakeholders—on fidelity to implementation plans and strategies, effectiveness, efficiency, relevance, and sustainability. M&E can identify and measure the influence of individual-, family-, community-, and environmental- (or structural-) level factors associated with health outcomes. M&E information can be used to strengthen the design and implementation of health interventions and improve the use of resources, thereby increasing cost-effectiveness. Program decision makers can use M&E information to answer a number of questions about program performance and achievements, such as: How well was the program implemented? Did the program achieve its objectives? Were the results attributable to program efforts? Which program activities were more (or less) effective? Did the program reach its intended beneficiaries? At what cost? What impact did the program have on the health status of intended beneficiaries?
Global health programs are implemented in a context of limited budgets. Since the late 1980s, the practice of M&E has gained support for its ability to identify effective programmatic responses and improve accountability and the efficient use of funds, such that in the early 21st century, it is considered a routine component of most health programs (Oyediran et al., 2014). In the family planning field, beginning in the early 1990s, prominent donors and organizations engaged in family planning programs responded to a need for systematic information by concentrating more resources and efforts on improved M&E (Barden-O’Fallon & Bisgrove, 2016). More than 30 years of attention and investment in M&E has resulted in significant advances in measurement, data collection, tool development, and M&E capacity within the family planning field. Although sexual and reproductive health (SRH) programs include services beyond family planning, and therefore require a wider range of measures, data, and tools, M&E of SRH has benefited from these advances. The M&E of SRH programs continues to build on this foundation, ever evolving as new programmatic priorities emerge and technologies for data collection, compilation, analysis, and dissemination continue to advance.
M&E has also been used in recent years as a critical tool for advancing human rights by ensuring equity and accessibility within SRH programs. SRH programs frequently focus on at-risk or vulnerable populations and may be implemented in low-resource or humanitarian settings due to a heightened need for SRH services. At the outset of a program, M&E teams can consider how equity and accessibility can be assured for beneficiaries within a program and then monitor specific data regarding these outcomes throughout the life of a program. After a program is completed, evaluation can determine a program’s overall accessibility, as well as its overall impact on equity outcomes in a population, which can then inform future programs within the population or setting. When programs are implemented in settings with political or social stigmas and constraints surrounding SRH services, M&E helps to ensure that services are received by those who most need them.
M&E as Part of Program Information
Every SRH program has an information life cycle, beginning with assessment of the health problem the program will aim to address, and ending with learning about the program and its achievements. M&E builds off information collected during the program assessment and planning stage, when decisions are made about the nature and context of the health problem and the design of the interventions that are best suited to address the identified problems. M&E information is then generated during program implementation to guide decision-making while the program is carried out. Additionally, findings from program evaluation can feed into the information life cycle of the next program by informing decisions about “What’s next?,” whether it be an extension of the program, a replication of the program, or a change in focus or strategy.
Programmatic information needs have been expressed through the use of a “stairway” (Rugg et al., 2004; Peersman and Rugg, 2010; UNAIDS, 2019). The stairway schematic was originally developed for application to national HIV strategies but has since been adapted and used for a wide variety of health topics (e.g., MEASURE Evaluation, 2014a). In this schematic, programmatic information needs build on each other, starting with the base rung of the staircase to identify the health problem and what should be done to address it, with progressive rungs to understand the programmatic response, how well the response is implemented, and whether the response is making a difference, identified as part of M&E activities. See figure 1.
M&E as an Information System
M&E is a complex system, requiring financial resources, program support, dedicated staff, and technical skills; thus, there are many ways that the system can fail. Some of the common problems evidenced in SRH program M&E information systems are a lack of sufficient resources, lack of coordination among all actors, staff turnover, poor planning, and poor M&E leadership. These issues are especially prevalent in low-resource settings or with low-resourced programs and represent an ongoing challenge to successful M&E systems and the generation of high-quality evidence to support decision-making. One key tool of M&E is the performance monitoring plan (PMP) (also known as a performance management plan, and, more recently, MEL plan), which can be used to strategically address some of these issues. This is accomplished by specifying, in detail, exactly what needs to be measured, how it will be measured, when it will be measured, who is responsible to do the measuring, and what will be done with the measures once they are obtained.
Definition of “Monitoring” and “Evaluation” and Their Complementary Functions
M&E systems are mainly composed of two different, but complementary, processes for generating and using programmatic information. Monitoring is the routine tracking and reporting of priority information about a project, program, intervention, or initiative. Monitoring almost always includes routine tracking of inputs (i.e., the resources needed to carry out planned program activities), activities (also referred to as “processes”), and outputs (i.e., what the program produces), and can also be used for tracking of outcomes and impacts (i.e., the effects of program implementation) (see the “Logic Models and Logical Frameworks” section). Monitoring requires data at multiple time points and is primarily used to keep track of implementation progress in order to reflect on progress-to-date or make adjustments to the program during implementation to improve efficiency.1 Monitoring data are often required by program funders; the periodicity for reporting the data usually follows the funder requirements and may be as frequent as monthly or quarterly or as infrequently as annually. Monitoring can be considered an “internal” M&E process, in that it is often planned and carried out by the program itself. Furthermore, the data for monitoring are most typically generated and used by the programs themselves, through record-keeping, observations, and/or participant surveys. Monitoring data may also come from client or patient health records and/or health information systems (HISs).
Evaluation is the systematic collection of information about a project, program, intervention, or initiative, in order to determine its merit or worth. Evaluation implies a judgement assessment on the value of an intervention and is thus an invaluable tool to help determine which interventions should be continued, replicated, or scaled up. Unlike monitoring, evaluation is designed to attribute program activities to outcomes and impacts. Evaluation can also help determine which aspects of the program contributed most to its success, and whether the program was successful in reaching its intended beneficiary population.
While evaluations can be “internal” processes, more typically evaluations are “external,” meaning that they are conducted by those unaffiliated with the program itself. Evaluations can be formative, conducted before implementation of a program to assess feasibility, acceptability, and appropriateness; process, conducted during program implementation to assess program delivery; or summative, completed after the program to assess the program’s success. Common types of evaluation include:
This type of evaluation, also known as “performance evaluation,” assesses the implementation of the program. Process evaluations answer questions related to implementation quality and coverage, such as, Were the program components delivered as designed? Did the program reach its intended beneficiaries? Process evaluation is similar to program monitoring in that it relies heavily on program-generated data, often related to program outputs. However, process evaluation usually incorporates additional data, using qualitative or mixed methods to generate evidence beyond what may have been required through regular monitoring activities.
This type of summative evaluation measures change in the health knowledge, attitudes, practices, and/or behaviors associated with the program activities among its beneficiaries. Outcome evaluations often rely on M&E frameworks, such as the logic model, to link program activities to positive changes in outcomes, including the degree to which program objectives were achieved.
Impact evaluations are summative evaluations that move a step beyond outcome evaluations to assess program effectiveness in achieving its ultimate goals and are therefore often assessed at the population level. Impact evaluations may include elements of process and outcome evaluations to understand how well the program was implemented, whether its main outcomes were achieved, and what the overall impact of the program was on a population’s health status (often assessing change in morbidity, mortality, or fertility).
Outcome and impact evaluation may use similar designs, methods, and data sources. Occasionally, impact evaluations are classified as such based solely on the design and methodology. For example, the U.S. Agency for International Development (USAID) has defined impact evaluation as the process to
measure[s] the change in a development outcome that is attributable to a defined intervention. Impact evaluations are based on models of cause and effect and require a credible and rigorously defined counterfactual to control for factors other than the intervention that might account for the observed change(USAID, 2010).
However, outcome evaluations that assess change in knowledge, attitudes, practices, and/or behaviors could also use an evaluation design with a rigorously defined counterfactual while not assessing the consequence of these changes on health status at the population level.
There can be confusion within M&E language, as the same type of program information can often be collected for multiple M&E purposes. For example, assessing a change in health behaviors and practices could be part of a “summative evaluation,” “outcome monitoring,” “performance monitoring,” and/or “program evaluation.” In fact, “process evaluation” (or performance evaluation) and “program monitoring” (or performance monitoring) are often used interchangeably, as the activities required for each are virtually identical. The benefit of this overlap is that information can serve multiple purposes—monitoring information can also be used for evaluation, for instance, but it also means clarity is needed when communicating about M&E activities, especially regarding the purpose and intended use of the data.
M&E frameworks serve as a visual, organizational schematic for M&E systems. By identifying the goals of a program and the logical connection between program activities and outcomes, frameworks help ensure program success. Frameworks are often created early in a program’s life cycle, as input and agreement from all stakeholders are desired in the development process to ensure that all involved hold the same expectations for goals and outcomes. The development of frameworks thus serves as a uniting experience and a vehicle for better understanding of the program goals for staff and stakeholders alike. At best, frameworks are used throughout the program to illustrate the ongoing relationship between the program and its implementation context, the short- and long-term outcomes expected as a result of program activities, the data needed for monitoring and evaluating program implementation and at what time intervals, and who is responsible for which activities. Programs are most successful when frameworks are used by program staff for these comprehensive purposes; however, frameworks are also sometimes developed and used solely in compliance with funders.
There are many existing, evidence-based frameworks commonly used to inform the M&E process. These can be used with their original designs or adapted to best suit a program’s specific needs. Depending on the program, it may even be necessary to build a framework from the ground up, using available tools and guides.
Conceptual frameworks assist M&E teams in understanding the broader context of a program by examining the relationships between the individual, organizational, and structural factors within a setting. These frameworks also explore how program activities and outcomes may be influenced by broader factors such as political and cultural contexts (UN Women, 2010). Attention to the relationships between program activities and external factors can help ensure that implementers collect the necessary data to show programmatic success. Conceptual models are therefore most successful when the program objectives and anticipated outcomes are well defined and understood (Hortsman et al., 2002). Illustratively, conceptual frameworks can take many forms, such as Venn diagrams, tables, or other illustrations that clearly demonstrate relationships between concepts or variables of interest. Theory of change models are one type of conceptual framework commonly used in SRH programming. These models focus on the program’s logic flow and how it will achieve its goals given a particular context. While conceptual frameworks and theory of change models do not provide the details sufficient to operationalize an M&E system, they are an important complement to other frameworks. Discipline-based theories of social and behavior change, such as the Theory of Planned Behavior (Ajzen, 1991), the Social Ecological Model (McLeroy et al., 1988), and many others, are often used to underpin the frameworks, thus providing a theory-based foundation to the program as well as eliminating the need to develop a model from scratch. By predicting how relationships between factors might influence outcomes of the project, conceptual frameworks help implementers identify strategies to address any related barriers that may arise. Conceptual frameworks are also critical for use in evaluation to help evaluators identify and rule out non-programmatic factors that may contribute to observed changes.
Logic Models and Logical Frameworks (LogFrames)
In contrast to conceptual frameworks, logic models and logical frameworks, or LogFrames, are frequently used as the organizing frameworks for an M&E system. These models provide a logical, step-by-step explanation of how a program intends to achieve its desired outcomes. These models thereby focus on the internal logic of a program. While there are a few variations of logic models, most include project steps in this order: inputs, activities (or “processes”), outputs, outcomes, and impact. Inputs are the resources needed to implement activities; in turn, activities are what implementers do with these resources. SRH programming includes a wide range of potential program areas and activities. For example, program activities may include training to improve counseling skills or SRH service delivery, introduction of new clinical practices, dissemination of new information or treatment protocols, provision of health education at the individual or community level, distribution of condoms or other contraceptive methods, and/or communication with stakeholders who impact health services or health outcomes, to name just a few. Often, programs have multiple activities that together are intended to contribute to the improvement of health outcomes. Outputs are the products expected to result from implemented activities, and outcomes are the expected changes in health knowledge, behavior, and/or practice, as a result of producing the outputs. Finally, impact is the population-level effect or change that results from each previous step. In this way, the logic model demonstrates an “if, then” sequence of steps in which each step is contingent on the step preceding it (Ladd et al., n.d.). Some logic models also include moderators, which are external factors that may influence the logical process, as overarching pieces of the model that display which moderators might affect which steps (Centers for Disease Control and Prevention, 2018). By incorporating these elements, M&E teams can determine which moderating factors may influence each step and which data are crucial to monitor at each step of the program. Figure 2 shows an example of a logic model with definitions of each component.
The main difference between a logic model and LogFrame is that the LogFrame uses a matrix structure to include the performance measures and information sources that correspond to the activities, outputs, outcomes (or objectives), and goals (or impacts). LogFrames thereby incorporate information that would otherwise be detailed in a separate indicator matrix.
Results frameworks are comparable to logic models in that they also focus on the cause-and-effect processes within a program. Results frameworks usually take the shape of a flow chart. Generally, they start with one or more strategic objectives (SOs) as the priority focus, and then list at least one primary intermediate result (IR) needed to obtain this objective. The SO is similar to an impact, while the primary IRs reflect health outcomes that need to change in order to achieve the SO. From each primary IR flows multiple secondary, third, fourth, and so forth sub-IRs, which are needed to achieve the primary IR, often listed in order of significance. Some models also include outputs and indicators associated with each IR and sub-IR. Either way, the discrete IRs or outputs should be defined and measurable. A key difference with logic models and LogFrames is that results frameworks don’t lay out specific programmatic steps in a linear fashion. Figure 3 shows a blank template for a results framework.
Adapting and Developing New Frameworks
It is often necessary to adapt known M&E frameworks or develop new M&E frameworks from scratch, depending on a program’s specific goals and implementation strategies. At the very least, frameworks should always include the program’s goals and outcomes and the activities needed to achieve them. The inclusion of other factors may depend on the purpose of the framework and who will be using it, especially whether it will be used internally for program management or externally to explain the program to stakeholders or other audiences. Various global frameworks centered on specific topics within SRH have been developed, such as the conceptual framework for reproductive empowerment, that can be adapted and applied at the program level or serve as a guide for framework development (Edmeades et al., 2018).
Clear, well-thought-out frameworks allow implementers to examine how project factors influence one another, how specific actions lead to desired goals, and what outcomes can be expected at the conclusion of a project. The use of frameworks is therefore crucial to successful M&E.
An indicator can be defined as a variable that measures one aspect of a program or a health outcome. Indicators are used to generate the information needed for the M&E system. Indicators provide crucial M&E information at every level and stage of program implementation: indicators for inputs and processes can help program management monitor whether the program is being carried out as planned. Indicators on outputs can help determine whether the program produced its expected results, while indicators on outcomes and impacts are used to determine whether health outcomes improved and whether the improvement was enough to be considered “successful.” Indicators should link directly to the activities, outputs, and outcomes identified in the M&E frameworks.
Each indicator selected for the M&E system must be defined in clear, unambiguous terms, so that all stakeholders understand what is being measured and how to interpret the information generated. These details are specified in Indicator Reference Sheets that are included in the annex of a performance monitoring plan (PMP) or monitoring, evaluation, and learning (MEL) plan. Indicators are expected to vary over time and are therefore presented as neutral, or non-directional. For example, in the indicator, “Percent of girls vaccinated with two doses of HPV vaccine by age 15 years,” the measure is allowing for the possibility that the coverage of the HPV vaccine may increase or decrease over time. Even though the program anticipates that its work will result in an increased percentage of girls vaccinated against HPV, the indicator allows for the possibility that an increase may not be achieved, perhaps due to a reduction in program funding over time, a change in the focus of program activities, or other factors that may be outside the program’s control.
The identification and selection of indicators can be a difficult endeavor for inexperienced and experienced M&E staff alike, especially for those working in programs with complex concepts or difficult-to-measure outcomes, such as may be the case for programs aiming to improve “reproductive empowerment” or “quality of SRH services,” for example. Many resources exist to help with indicator selection, including the Family Planning/Reproductive Health Database, which consists of over 400 core SRH indicators with full Indicator Reference Sheets, and the WHO Mother and Newborn Information for Tracking Outcomes and Results (MoNITOR) database with indicators for maternal and newborn, child, adolescent, aging, cross-cutting, and global strategy areas. The criteria shown in table 1 can serve as a guide to the selection of strong indicators and avoid inefficiency and the collection of data that do not contribute to knowledge-based practice.
Table 1. Checklist for Selecting Strong SRH Indicators
Application for M&E of SRH programs
Criteria is met
The indicator measures what it is supposed to measure, either directly or by proxy. If available, a standard indicator and definition are used.
The indicator provides the same information, with as little bias as possible, each time it is used.
The indicator is defined in clear, unambiguous terms.
The data for the indicator can be collected by the program using data sources that are available and accessible.
Note: Data at the beneficiary/client, health care provider, or health facility levels may be most easily accessible for SRH service delivery programs.
The indicator can vary in either/any direction.
The indicator provides information when needed, at the appropriate intervals.
The indicator is directly linked to a programmatic action, output, or outcome.
If the program will have an impact evaluation, the indicator links to a specific health impact.
The process of collecting the indicator data, analyzing the information, and using the results is feasible for the program. The technical and financial resources are available.
The cost of collecting data for the indicator is proportionate to the usefulness of the indicator.
Duplication of effort should be avoided; outcome indicators at the national and regional levels are often available through Demographic Health Surveys and other similar surveys.
Note: Adapted from Barden-O’Fallon and Reynolds (2017). In the public domain.
Program indicators are often quantitative. Metrics used for quantitative indicators include counts (number of providers trained, number of condoms distributed, etc.); calculations such as percentages, rates, and ratios (percentage of facilities with trained providers, total fertility rate, maternal mortality ratio, etc.); and composite measures such as indexes (e.g., quality of care index comprising the sum of scores on six quality outcome indicators). Qualitative information can also be useful to SRH M&E. Qualitative indicators include thresholds (similar to milestones), such as the presence/absence of a policy or practice (e.g., a national law prohibiting all forms of female genital mutilation) or whether a practice has met a predetermined level or standard (e.g., 100% of clinics received program support). Qualitative information may also include use of quotes, summaries, or case studies to provide contextual information needed for interpretation of quantitative information.
Standardized indicator definition and language should be used whenever possible, so that programs align their data systems to generate information that can be compared, combined, or in other ways contribute to programmatic knowledge. Standardized indicators are more often available for outcome and impact measures and can be found in compendiums and databases, as well as from similar programs or previous iterations of the program itself. Unlike outcome measures, indicators for outputs are often program-specific and unique to each program. However, there are standardized generic versions of output indicators for common outputs that can be modified to be program specific. For example, “Number of health workers trained in cervical cancer screening” is a generic form of an indicator that could be modified by a program to be more specific, such as “Number of nurses trained in cervical cancer screening by NAME OF PROGRAM.”
Using Indicators to Measure Equity and Accessibility
Indicators are important tools for monitoring program equity and accessibility. While equity and accessibility are not always explicitly identified as program outcomes, stakeholders and implementers are often very concerned that the program causes no harm and is accessible to all eligible beneficiaries. As with any other indicator, indicators of equity and accessibility should be identified in the planning stage of M&E when possible. Prior to program implementation, gender, racial, ethnic, wealth, age, education, and/or other inequities should be identified and included in conceptual frameworks, as should how such inequities may affect the inputs, process, outputs, outcomes, and impact of the program. Monitoring of equity and accessibility throughout the life of the program allows implementers to make any necessary changes during program delivery (World Health Organization and Joint United Nations Programme on HIV/AIDS, 2016).
SRH programs are often implemented in contexts of gender inequity. Some SRH programs are designed to improve health outcomes by directly addressing gender inequity, such as those focused on mitigating gender-based violence or ending female genital mutilation. Other programs, such as those focused on increasing access to contraception or improving safe birthing practices, may not directly address gender inequity, yet gender dynamics may influence program success. At a minimum, programs should disaggregate indicators by sex (or gender identity); in some cases, this may be sufficient to monitor the potential differential effects of the program. However, most SRH programs will require more specific indicators to assess the program’s impact on gender relationships and to ensure that gender inequity is not exacerbated by the program. Examples of specific indicators related to gender equity are: “Percentage of women who are able to leave the house without permission,” “Percentage of women who have experienced violence in the 12 months preceding the survey,” and “Proportion of respondents age 15–49 who believe that, if her husband has an STI, a wife can propose condom use”, among many others.
Many SRH programs specifically target children, youth, and adolescents to reduce access barriers caused by existing cultural, political, or religious stigma in a given setting toward young people’s engagement in sexual activity and/or SRH education and knowledge. For programs targeting children, youth, adolescents, and adults, age-sensitive and specific indicators should be developed to ensure that youth and adolescents are not overlooked or underserved. Often, this involves disaggregating indicators by relevant age groups (Adamou, 2020). This may also involve indicators that measure age-specific data, such as, “Number of adolescent girls who attended at least 80% of SRH sessions,” “Percentage of parents and caregivers who believe adolescent girls should have access to SRH information and services” or “Contraceptive prevalence rate among adolescents”. Individuals with intersectional minority identities of age, race/ethnicity, sex, or gender often face increased health-related stigma, which may curtail access to services (Rai et al., 2020). The development of indicators specific to intersectional minority identities can allow M&E teams to further assess whether program interventions are equitably serving intended beneficiaries. For example, an intersectional indicator including age and marital status is “Percent of sexually active, unmarried adolescents who consistently use condoms.” Intersectionality can also be assessed in other indicators using disaggregation. Finally, by using indicators to measure equity and accessibility within SRH programs, M&E teams ensure that the systems of oppression that create the inequity necessitating intervention are not reinforced, but instead mitigated and diminished.
A key criterion for indicators is that they are measurable, meaning that the data needed to calculate the indicator are available (or can be generated) and are accessible to the program. This criterion tends to get less consideration when indicators are being selected for program M&E, mainly because data sources are not yet fully known at the beginning of a program. M&E data include routine and non-routine sources.
Routine data are collected on a regular, or routine, basis and are most often used for indicators requiring a count (“Number of . . .”). These data are suited for monitoring the progress of activities and include program-specific data generated by the program itself (recordkeeping and administrative), as well as service-delivery data generated by a health facility or community outreach worker (client/patient information and service statistics). Programs often keep records of their program activities and accomplishments, deliverables, participant information, and program details relating to costs, inputs, management, and organization. This type of program information is typically required by donors for accountability and transparency and therefore is included in all M&E systems. Because these data are also more easily collected, programs may fall into a trap of being overly reliant on program-generated output-level information. However, complimentary data are needed to assess program coverage as well as most outcomes and impacts.
Knowledge of the service delivery data environment, how data are collected and reported, and whether the data can be accessed at the health facility or through a health information system (HIS) is vital information for program M&E systems. Service delivery data may be obtained through paper-based or digital records kept by the health facilities, or they may be available through an HIS with approved access to the system. In some contexts, service delivery data are publicly available through reports. Other types of information systems that produce routine data include vital registration systems and surveillance systems. The main benefit of using data that are already being collected by a HIS is that the program will not need to generate the data, thus costs and resources are minimized. Another key benefit is that the use of routine service data for program M&E can serve to help strengthen the HIS. This can be accomplished because data points receiving more attention, that are requested and used, may be more likely to get recorded accurately and on time. Thus, using health service data can improve HIS data quality.
However, there are a number of challenges facing HIS that can affect the quality of the data needed by SRH programs. In low- and middle-income countries, the accurate collection, reporting, analysis and use of routine data from HIS are challenging tasks that span across health areas (Braa et al., 2007). This is particularly true for the family planning field, which has paid relatively little attention to the strengthening of routine HIS, causing the field to fall behind other health areas, such as HIV and malaria (Adamou et al., 2020; Family Planning 2020, 2017). These challenges include poor data quality, such as double-counting and incomplete reporting, the centralization of information management, lack of relevant indicators, and missing information on private-sector health services. Programs seeking to use routine data in their M&E systems may therefore be tempted to develop a parallel information system to sidestep these data quality issues. While in the short term this may produce accurate information for use by a program, it introduces redundancy and adds to the reporting burden at service delivery points. This can be extremely problematic for a country’s health information ecosystem (Khurana, 2021). One tool available to assess national HIS performance and identify areas for strengthening is the Performance of Routine Information System Management (PRISM).
Non-routine data are collected infrequently, even a single time, and are often part of evaluation research. These types of data come from surveys, censuses, and other types of research activities, including qualitative studies, that generate findings related to the program’s progress or results. Surveys are therefore often used to assess program outcomes and impacts. An important consideration for use of surveys in M&E is that they are resource intensive, requiring sufficient time, funds, and expertise to plan, carry out, and analyze the data. Indicators requiring survey data should therefore be considered for feasibility; are survey data already available or will they need to be collected by the program? Will the information generated by these indicators justify the resources needed to collect the data?
The types of surveys common to SRH programs include population-based (household) surveys, such as the Demographic Health Survey (DHS), Multiple Indicator Cluster Survey (MICS), or Performance Monitoring for Action (PMA), which collect data on a wide variety of health knowledge, attitudes, and practice outcomes. Surveys may include biomarkers and collect data on health conditions and diseases. Population-based surveys are generally representative of the population of interest, whether it be the general population or a sub-population.
Facility-based surveys can collect information related to service readiness (through an inventory or audit), quality of care (through client-provider observations), health worker knowledge and attitudes (through provider interviews), and client experiences and satisfaction with services (through client exit interviews). These types of surveys are useful to program M&E because they can be tailored to fit the specific needs of the program. Unlike population-based surveys, facility-based surveys collect data representative of facility operations and functions. However, their client record data represent only the health-seeking segment of the general population. This is an important consideration for programs interested in reducing barriers to access, for example, as data from non-users of the services would not be collected. Facility-based surveys may include both public- and private-sector health facilities and collect more detailed information than what is available through the routine HIS. In comparison to HIS, programs have more control over survey data quality. Depending on the context, SRH programs may need to allow for additional data collection time if some services are rarely or only periodically provided. SRH programs may be able to use secondary data from facility-based surveys, such as the Service Provision Assessment (SPA) or the Service Availability and Readiness Assessment (SARA). Additionally, spatial data collected by Geographic Information Systems (GIS) may be available to use for mapping health service and program coverage.
Regardless of the source or type of data, it is important to recognize that in some situations it is difficult to obtain accurate measures of certain outcomes due to social, political, or legal constraints. For example, in countries where abortion and abortion-related services are illegal, abortion data may not be collected in existing population-level surveys nor recorded in national HIS. When participating in program surveys or interviews on the topic, participants may not be honest about their experiences with abortion due to social or religious stigma or fear of legal consequences. Under-reporting of health behavior and status is common for any topic that is stigmatized or illegal in the local setting. As a result, inattention to the issue can result in biased and inaccurate estimates of program outcomes.
Constraints to data collection also exist in humanitarian settings, such as those that are experiencing or recovering from war, displacement, and natural disaster. Various and ever-changing ethical, legal, and structural factors may pose barriers to accessing data sources. Specifically, quantitative data may be difficult to collect if access cannot be granted to national HIS or if population-level surveys are not feasible to conduct, perhaps due to continual migration of the population. The triangulation of data sources in these settings is therefore difficult, but likely not impossible, if data collection methods and tools are flexible and adaptable (International Committee of the Red Cross, 2020). The challenges posed to data collection in humanitarian settings may also contribute to inaccurate measures of the outcomes and impacts of a program. This is due to the high likelihood of co-occurring programs functioning in close proximity to one another, thus introducing a possibility for program contamination (Smith & Blanchet, 2019). This highlights the importance of involving local stakeholders on the ground to inform the M&E process at all stages, as they have the most insight and knowledge on the issues at hand.
Special Considerations for Data Collection and Use in M&E
Following defined ethical guidelines is required for collecting data from human subjects. This is especially crucial in SRH contexts, as SRH data may reveal personal, stigmatized, or potentially criminalizing information, and the individuals or groups whom the data represent may be considered at-risk or vulnerable. To start, informed consent needs to be granted from those who are providing their personal information to be used as data, usually by way of a written or digital signature. If data are collected from non-direct, administrative sources, informed consent from the participant is still required if the data identifies the participant (UNAIDS, 2019). Individuals under 18 years of age cannot legally provide consent; in most cases, parents and guardians must provide consent on their behalf. All plans for informed consent must be approved by an Institutional Review Board (IRB) before data collection begins.
When creating a plan and corresponding indicators for data collection, researchers should carefully determine what data are necessary for M&E and opt for data that is nonpersonal and non-identifying when possible. It is important to consider how data, once disseminated, may unintentionally identify participants to their local communities or even on a larger scale, depending on the reach of dissemination. This is especially important to consider when a small cell count exists for a specific indicator, as deductive disclosure may be possible (UNAIDS, 2019). It is of particular concern when measuring abortion-related services, sex work, drug use, or other activities that are illegal or highly stigmatized in some settings. It is also important to consider the case of children and youth, who, if identified by the public via data dissemination, may have stigma and safety risks follow them as they develop into adults, which can have reverberating negative impacts on health outcomes along with attainment of social, educational, and employment opportunities. If it is not possible to ensure that participants will not be identifiable via disseminated data, then disseminating such data is unethical. Data collected by programs must remain safeguarded, confidential, private, and, at all times possible, de-identified and disaggregated.
The main objective of an evaluation is to influence decisions that affect programming and funding. How complex and precise the evaluation design is depends on who the decision makers are and on what types of decisions will be taken as a consequence of the evaluation findings. Different decision makers not only demand different types of information but also vary in their requirements of how precise the findings must be. Information may be needed on the quality of the service provision, acceptability and utilization of the services by the intended beneficiaries, program coverage, cost, health impact, or some combination of these. The precision of evidence needed, or how well the design isolates the program effects in comparison to other factors, may range from plausibility (whether the program seems to have caused change) to probability (exact estimate of the program’s impact). Design choices are also shaped by resource availability, especially the costs in terms of funds, skills, and time required for implementation of the evaluation. The selection of the evaluation design is thus made through a consideration of (a) what information is needed by decision makers, (b) the level of precision that decision makers will accept, (c) the context in which the evaluation will take place, and (d) the resource costs associated with potential designs. Generally, the more information needed and the more precise the findings must be, the more complex, expensive, and resource intensive the design will be. The appropriate evaluation design for any given context is always a compromise among key stakeholders between these factors. Careful consideration of design options and an understanding of what the different designs will provide is therefore necessary to make informed program evaluation design selections.
The Issue of Causality
When a new program or intervention is being developed, it may be tested on a pilot scale to determine whether it is going to work. The information obtained from such a pilot would then be used in the decision to continue, replicate, or expand the intervention or to discontinue it. This requires a study design that produces valid inferences about cause and effect. Did the program activities cause a change in the outcomes? How much of a change did the program cause? To fully answer these questions, evaluation designs need to establish causality by proving (a) the program came before the outcome changed, (b) the change in the outcome would not have changed in absence of the program, and (c) other possible causes of the change can be ruled out. Because evaluators are not able to measure what would have happened in the absence of the program, an estimate is needed. A counterfactual is a measure that is “against the facts.” In program evaluation, a counterfactual is used as a comparison of what would have happened in the absence of the program versus what actually did happen. Many methods exist to calculate counterfactuals; some offer more precision than others and are thus considered stronger designs, though counterfactuals can never be known for sure and are thus always estimates.
Randomized Experimental Designs
Randomized experimental designs offer one of the strongest evaluation designs in terms of estimating the counterfactual. The design requires the use of randomization in an exposed (intervention) or non-exposed (control) group before program implementation begins. Randomization can be carried out at the individual beneficiary level, or at a group level such as a health clinic or school, or a geographic area, such as a neighborhood, village, district, or state. Given a sufficiently large sample, randomization ensures that intervention and control groups are equivalent with respect to all relevant characteristics other than exposure to the program to be evaluated. The control group therefore serves as the counterfactual with the only difference between the two groups being the exposure to the program. This is the strongest approach to isolating the program effect and is why randomized experimental designs are often considered the “gold standard” approach for outcome and impact evaluation.
Non-Randomized Experimental Designs
Non-randomized experimental designs, also known as quasi-experimental designs, follow the same logic as experimental designs by developing intervention and non-intervention groups for comparison. However, quasi-experimental designs do not include randomization of individuals or groups. The non-intervention group, in this case referred to as the comparison group, still acts as the counterfactual. Without randomization to ensure that relevant characteristics are equally distributed between intervention and non-intervention groups, the quasi-experimental approach is more vulnerable to biases that can affect the validity of measures. In particular, selection (or participation) bias is more prone to occur. Selection bias refers to the observed and unobserved factors that may influence an individual’s or group’s participation in a program and thereby influence the measure of program outcomes (often inflating the estimated success of the program). A number of methods are available to reduce non-randomization bias, such as using matching to construct the comparison group. In this approach, nonparticipants are selected during the design phase of the evaluation according to characteristics similar to those of program participants. Matching can also be done during the analysis phase of the evaluation with the use of statistical techniques. Matching is limited to characteristics that are observed and measurable. While quasi-experiments produce weaker estimates of the counterfactual than randomized experiments, these designs are often more feasible to implement. Programs may be reticent to deny the intervention to those who want it; intervention groups or areas may have already been identified; or the program may have already initiated program activities at the time of the evaluation design. In some instances, baseline, or pre-intervention, data may not be feasible to collect, and thus the evaluation must rely solely on endline, or post-intervention, data. In these situations, assumptions of causality are difficult to meet, and, thus, biased estimates of program effects can be produced. Evaluators can use conceptual frameworks to show theory-based linkages between program activities and observed measures to strengthen the argument of causality.
Non-experimental designs do not use the experimental approach of comparison between intervention and non-intervention groups. Thus, there is no control or comparison group. These types of designs assess whether program activities met their objectives and can assess changes in outcomes. However, without a comparison group, the designs are not able to isolate the program effect to the same degree as experimental designs. Non-experimental designs, including pre-/post-program or longitudinal measurement, are often referred to as observational studies because they are limited to descriptions of whether changes occurred rather than providing attribution to the changes. Without a comparison group, it may be difficult to determine whether observed changes in health outcomes were due to the program and not caused by outside influences. These designs are useful when resource constraints or ethical concerns prevent the use of an experimental design, or when the precision of the impact estimate is not of great concern. These designs are also useful when there is no obvious comparison group, such as when national-level programs are expected to reach an entire population.
When selecting an evaluation design, it is important that M&E staff work collaboratively with the implementation team from the beginning of the program if possible, so that evaluation design and program implementation can inform each other. M&E staff also need to understand the various trade-offs between one design over another and be explicit with stakeholders about the reasons for the selection of a particular design, the weaknesses of the design, and how the weaknesses will or won’t be mitigated through implementation of the evaluation.
It is important to note that all evaluation designs have relative strengths and weaknesses. The descriptions provided in evaluation designs focus on the ability to isolate the program effect by constructing strong estimates of counterfactuals. However, in the “real world” of SRH program evaluation, there are a number of contextual, ethical, and nontechnical factors that can also influence the feasibility of implementing strong evaluation designs. One practical constraint to selecting strong evaluation designs has to do with timing: if programs are already underway when the evaluation is commissioned, there is very little likelihood that a randomized experimental design can be implemented. Likewise, if a program is already completed, a non-experimental design may be the only option. For this reason, a best practice is to think about evaluation at the beginning of program implementation, even during program planning, so that all options for evaluation design can be considered. Another very common practical constraint deals with program placement. Most SRH programs are not placed at random, meaning there is usually something specific about the target population that the program is intending to address (e.g., reaching underserved populations due to area of residence, age, race, education, or other characteristics). The very reason the program is placed in a particular area may be a reason that a control or comparison group is hard to identify. Additionally, there are often other programs acting in the same area or with the same population, meaning that “pure” comparison groups may not exist. In such cases, a comparison may be measuring the effects of one program versus another program rather than the program versus no program at all. Finally, practical constraints to the implementation of strong evaluation designs are also related to human resources and capacity: turnover of stakeholders and decision makers during the course of evaluation may mean that funds are cut or interest wanes; expectations of multiple stakeholders and competing objectives and research questions can overwhelm the capacity of what the evaluation can deliver; buy-in from stakeholders or people in positions of authority may be difficult to receive. Often, pressure for rapid results may limit time needed for full analysis and interpretation, and decisions to modify program design may be made while evaluations are ongoing.
Much guidance exists on the planning of evaluation (see, e.g., Iskarpatyoti et al., 2017) as well as on the selection of methods, see, e.g., Gertler et al., 2016). Ultimately, any evaluation must be designed to ensure that the information generated is useful to decision makers, that it is feasible to implement given the context and available resources, that it will provide accurate information, and that it will be conducted ethically and with regard to the individuals involved in and affected by the evaluation (Patton, 1997). The particular design and methods used should be those that best meet these criteria.
M&E of SRH programs covers the essential elements and processes involved in M&E including concepts and terminology, frameworks, indicators, data sources, and evaluation study designs, specifically in the SRH context. As public health evolves, innovations in M&E continue to transform its practice and purpose. M&E training is commonly a part of public health education programs, and obtaining M&E knowledge is easily accessible by way of online M&E training programs, guides, tools, examples, case studies, and peer-reviewed research. More emphasis is being placed on using M&E systems for program learning and adaptive management. Likewise, more interest has developed in using locally collected and managed routine data for M&E systems, and due to improvements in quality and accessibility, incentive to independently develop parallel information systems has waned. There is also growing interest in using “big data,” or large sources of data collected outside of the program, such as through social media, especially for behavior change communication interventions. Monitoring remains crucial to the implementation of global initiatives and agendas, such as the Sustainable Development Goals and FP2030, as they capture progress at all levels. Similarly, monitoring and evaluating the equity and accessibility of public health interventions helps to advance human rights. M&E approaches continue to adapt and evolve as programmatic evidence continues to be generated and knowledge of successful practices in SRH programming continues to grow.
- Bamberger, M., & Mabry, L. (2019). Real world evaluation: Working under budget, time, data, and political constraints (3rd ed.). SAGE.
- Data for Impact. (n.d.). Family planning and reproductive health indicators database. University of North Carolina at Chapel Hill.
- Lance, P., Spencer, J., & Janko, M. (2016). Data science for global health. MEASURE Evaluation, University of North Carolina at Chapel Hill.
- MEASURE Evaluation. (2019). Performance of Routine Information System Management (PRISM) toolkit: PRISM tools. MEASURE Evaluation, University of North Carolina at Chapel Hill.
- RTI International. (n.d.). MERLA 101 course. Monitoring, evaluation, research, learning, and adapting.
- U.S. Agency for International Development. (n.d.). Monitoring, evaluation and learning toolkits. Learning Lab.
- World Health Organization. (2022). Inequality monitoring in sexual, reproductive, maternal, newborn, child and adolescent health: A step-by-step manual.
- Adamou, B. (2020). Gaps in global monitoring and evaluation of adolescent and youth reproductive health. MEASURE Evaluation, University of North Carolina at Chapel Hill.
- Adamou, B., Barden-O’Fallon, J., Williams, K., & Selim, A. (2020). Routine family planning data in the low- and middle-income country context: A synthesis of findings from 17 small research grants. Global Health: Science and Practice, 8(4), 799–812.
- Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179–211.
- Barden-O’Fallon, J., & Bisgrove, E. Z. (2016). Monitoring & evaluation in family planning: Strengths, weaknesses, and future directions (Working paper no. WP-16-163). MEASURE Evaluation, University of North Carolina at Chapel Hill.
- Barden-O’Fallon, J., & Reynolds, Z. (2017). Measuring family planning service delivery: An assessment of selected indicators across implementing partners (Technical report no. TR-17-194). MEASURE Evaluation, University of North Carolina at Chapel Hill.
- Braa, J., Hanseth, O., Heywood, A., Mohammed, W., & Shaw, V. (2007). Developing health information systems in developing countries: The flexible standards strategy. Management Information Systems Quarterly, 31(2), 381–402.
- Centers for Disease Control and Prevention. (2018). Program evaluation framework checklist for step 2.
- Edmeades, J., Hinson, L., Sebany, M., & Murithi, L. (2018). A conceptual framework for reproductive empowerment: Empowering individuals and couples to improve their health [Brief]. International Center for Research on Women.
- Family Planning 2020. (2017). FP2020: The way ahead 2016–2017.
- Gertler, P. J., Martinez, S., Premand, P., Rawlings, L. B., & Vermeersch, C. M. J. (2016). Impact evaluation in practice (2nd ed.). Inter-American Development Bank and World Bank.
- Hortsman, R. G., Cleland, J., Douthwaite, M., Ambegaokar, M., & Salway, S. (2002). Monitoring and evaluation of sexual and reproductive health interventions—A manual for the EC/UNFPA initiative for reproductive health in Asia. Multi-Country PBF Network.
- International Committee of the Red Cross. (2020). Acquiring and analysing data in support of evidence-based decision: A guide for humanitarian work.
- Iskarpatyoti, B. S., Sutherland, B., & Reynolds, H. W. (2017). Getting to an evaluation plan: A six step process from engagement to evidence. A workbook. MEASURE Evaluation, University of North Carolina at Chapel Hill.
- Khurana, N. (2021). Issue analysis: A use-driven approach to data governance can promote the quality of routine health data in India. Global Health: Science and Practice, 9(2), 238–245.
- Ladd, S., Jernigan, J., Watkins, N., Farris, R., Minta, B., & Brown, S. (n.d.). Evaluation guide: Developing and using a logic model. Centers for Disease Control and Prevention.
- McLeroy, K. R., Bibeau, D., Steckler, A., & Glanz, K. (1988). An ecological perspective on health promotion programs. Health Education Quarterly, 15(4), 351–377.
- MEASURE Evaluation. (2014a). GIS and HIV: Linking HIV databases in Rwanda. A case study (Working paper SR-14-86). MEASURE Evaluation, University of North Carolina at Chapel Hill.
- MEASURE Evaluation. (2014b). Applying geospatial tools to Rugg’s staircase method for monitoring and evaluation: MEASURE Evaluation’s case studies (Working paper WP-14-154). MEASURE Evaluation, University of North Carolina at Chapel Hill.
- Oyediran, K. A., Makinde, O. A., & Mullen, S. (2014). Monitoring and evaluation of sexual and reproductive health programmes. In F. Okonofua (Ed.), Confronting the challenge of reproductive health in Africa: A textbook for students and development practitioners (pp. 441–474). BrownWalker Press
- Patton, M. Q. (1997). Utilization-focused evaluation. SAGE.
- Peersman, G., & Rugg, D. (2010). Basic terminology and frameworks for monitoring and evaluation. UNAIDS.
- Rai, S. S., Peters, R. M. H., Syurina, E. V., Rai, S. S., Peters, R. M. H., Syurina, E. V., Irwanto, I., Naniche, D., & Zweekhorst, M. B. M. (2020). Intersectionality and health-related stigma: Insights from experiences of people living with stigmatized health conditions in Indonesia. International Journal for Equity in Health, 19(206), 1–15.
- Rugg, D., Carael, M., Boerma, J. T., & Novak, J. (2004). Global advances in monitoring and evaluation of HIV/AIDS: From AIDS case reporting to program improvement [Special issue]. Global Advances in HIV/AIDS Monitoring and Evaluation, 103, 33–48.
- Smith, J., & Blanchet, K. (2019). Research methodologies in humanitarian crises. Elrha.
- Joint United Nations Programme on HIV/AIDS. (UNAIDS) (2019). Rights-based monitoring and evaluation of national HIV responses.
- UN Women (2010). Monitoring and evaluation frameworks (3 parts). Virtual Knowledge Centre to End Violence Against Women and Girls.
- U.S. Agency for International Development. (2010). ADS chapter 203: Assessing and learning.
- Women’s Refugee Commission and the United Nations Children’s Fund. (2020, May 28). Toolkit for monitoring and evaluating adolescent sexual and reproductive health interventions in safe spaces.
- World Health Organization and Joint United Nations Programme on HIV/AIDS. (2016). A tool for strengthening gender-sensitive national HIV and sexual and reproductive health (SRH) monitoring and evaluation systems.
1. Efficiency is the consideration of benefits weighed against costs. “Efficient” programs and interventions are those in which health outcomes are achieved in the most economical way. This term is different from “effectiveness” and “efficacy,” which relate to the ability of an intervention to achieve desired health outcomes under “real-world” or ideal/controlled conditions, respectively.