Do Good. Better. Guidebook Chapter 19: Outcome Measurement

Chapter 19: Outcome Measurement

When you are done with this section, you will be able to...

Describe the importance of outcome measurement in social impact evaluation.
Explain how single-group designs track participant change over time.
Recognize how outcome data informs program improvement and decision-making.

INTRODUCTION

Outcome measurement provides organizations with a structured way to track progress, evaluate participant change, and use evidence to strengthen decision-making. Rather than relying solely on assumptions or anecdotal observations, organizations can use outcome data to assess whether their interventions are moving participants toward the desired goals outlined in their theory of change.

This chapter explores the core components of effective outcome measurement and how they are applied in social impact work. It also examines how organizations collect and interpret outcome data, the strengths and limitations of common measurement approaches, and the role outcome measurement plays in continuous learning, accountability, and program improvement.

WHAT IS OUTCOME MEASUREMENT?

Outcome measurement is the systematic assessment of changes in participants or systems. Since an outcome is defined as the measured change of a social issue’s negative consequences, the goal of outcome measurement is to determine whether meaningful change has occurred within the affected population by the end of the intervention period. Although evaluation comes after implementation in the Social Impact Cycle, outcome measurement can take place at any point while implementing an intervention.

By regularly examining participant outcomes, organizations can identify which aspects of their intervention are working well, which need adjustment, and where additional support may be required. These insights inform day-to-day decision-making, guide intervention refinement, and strengthen accountability to stakeholders who rely on credible evidence of progress.

Effective outcome measurement is built on three key steps. Organizations must:

Articulate their theory of change and logic model to clarify how program activities are expected to lead to short-, medium-, and long-term outcomes.
Select appropriate outcome measures that accurately capture the specific outcomes the theory of change anticipates.
Develop an evaluation design that specifies how and when data will be collected, from whom, and under what conditions.

Together, these elements form the foundation of a coherent outcome measurement approach that links intended change, actual change, and measurement strategies. The following subsections explore each of these components in greater depth and will explain how they function as key elements of effective outcome measurement.

Theory of Change and Logic Model

In previous chapters, a theory of change was described as a logical framework that explains how your intervention is expected to create change. The theory of change describes the causal pathways leading from current conditions to expected outcomes. Remember: the theory of change is different from a logic model. A logic model is a visual companion to the theory of change that explains “what” will happen. The theory of change provides an overview of “why” it will happen.

Within outcome measurement, the theory of change and logic model function as practical guides that shape what data should be collected, when it should be collected, and how results should be interpreted. By clearly mapping the expected progression from activities to outcomes, they reveal which specific changes should be measured as evidence of progress, as well as appropriate measurement points such as baseline, mid-program, and follow-up. The theory of change also provides a reference point for interpreting results. Because it makes explicit the anticipated pathways of change, organizations can compare actual outcomes with those predicted in the theory of change and logic model. If the observed changes do not align with expectations, this signals a need to pause and reassess underlying assumptions, implementation quality, or contextual factors influencing participant progress.

By periodically examining whether real outcomes match those anticipated in the theory of change and logic model, organizations can make timely adjustments to program activities, supports, or delivery strategies. These adjustments help ensure that implementation remains aligned with intended pathways of change and that the program continues progressing toward its desired outcomes and ultimate impact. Utilizing outcome measurement in this iterative way allows organizations to engage in continuous learning rather than waiting until the end of a program to evaluate results.

Real World Example: Reading Partners is a one-on-one literacy tutoring program that pairs trained volunteers with students who are behind grade level in reading.¹ Guided by a theory of change that links individualized tutoring to improved literacy skills and broader academic success, the program uses outcome measurement to track progress at key stages. For example, evaluators collect baseline reading assessments when students enter the program, monitor students’ skill development throughout tutoring, and administer follow-up assessments when students complete the program. If interim data shows that students are not achieving the expected short-term gains in reading proficiency, program staff can examine factors that influence student success, such as tutor training, session frequency, or curriculum alignment, and make targeted adjustments. In this way, the theory of change acts as a guide for outcome measurement.

Outcome Measures

Outcome measures are the specific methods, instruments, and tools used to collect data on the anticipated changes spelled out in the theory of change and logic model.²

Methods describe the procedures that will be used to collect and analyze outcome data. Common methods include surveys, structured interviews, focus groups, classroom observations, or review of administrative records.
Instruments take the abstract concept of what will be measured—such as ”improved well-being" or ”school readiness”—and transform them into concrete items, questions, or tasks that can be scored and interpreted. Some frequently utilized instruments include validated questionnaires (such as anxiety or depression scales), developmental screening assessments, reading or math tests, or structured skill checklists.
Tools describe the physical or digital aids used to administer instruments and carry out the chosen method. Standard tools include paper surveys, online surveys (e.g., using Google Forms or SurveyMonkey), tablets for data entry, observation rubrics, and case management software to store and organize results.

Together, methods, instruments, and tools work to translate abstract goals into a systematic assessment for evaluating the changes caused by an intervention. These resources also help organizations move beyond intuition or anecdotal evidence and instead rely on empirical information to understand what is working and what is not. Over time, the accumulation of outcome data helps identify patterns of effectiveness and reveals which program components are most strongly associated with positive change. These insights help organizations identify best practices, inform program refinement, and guide decisions about scaling, modifying, or discontinuing specific strategies.

Real World Example: Head Start, a program that provides comprehensive early childhood education services to low-income children, uses outcome measures to evaluate changes in children’s cognitive, language, and social-emotional skills over time. They use standardized developmental assessments as their data-collection method, conducted when students enter the program, at certain intervals within the program, and when they exit the program to track progress toward school readiness goals. The specific questions and type of assessment act as their instrument, and the online assessment platforms and paper aids act as their tools. By using consistent tools, instruments, and methods, Head Start programs can compare results with other locations, monitor compliance and quality improvement efforts, and adjust support for children who are not demonstrating expected gains. This example illustrates how well-chosen outcome measures translate broad goals—such as improved early learning and development—into concrete, measurable indicators that guide program improvement and demonstrate progress to funders and policymakers.

Evaluation Design

An evaluation design is the conceptual plan or structure that describes how you will collect and analyze data to answer your key evaluation questions.³ Derived from the program goals outlined in your theory of change and logic model, these key evaluation questions specify the information you need from your evaluation. They address the goals of your intervention and clarify what information you aim to gain from the evaluation.

When designing an evaluation, you must make three core decisions:

Who you’ll study: Who comprises your treatment, comparison, or control groups? How will participants be selected?
When you’ll collect data: When will baseline data be collected? When will follow-up measurements occur? Will there be interim measurements?
What you’ll measure: Which outputs and outcomes will you track to answer your evaluation questions?

Together, these decisions form the foundation of a strong evaluation design. By clearly defining who will be studied, when data will be collected, and what indicators will be measured, evaluators can generate more reliable and meaningful findings. A thoughtful evaluation design not only strengthens the credibility of the results but also ensures the evaluation produces actionable insights that support learning, decision-making, and improved social impact outcomes.

WHAT DOES IT LOOK LIKE TO CONDUCT AN OUTCOME MEASUREMENT?

To understand whether participants actually change over time, you need at least two data points on the same outcome: an initial reference point and one or more follow-up measurements. The initial data point is generally referred to as the baseline and is typically collected just before participants begin an intervention. Later measurements are taken several months after, or at the program’s completion, and are compared to the baseline to gauge the extent and direction of the change. When multiple follow-up measurements are collected, organizations can also examine whether early gains are sustained, increase, or fade after the intervention ends.

The type of data you collect should be guided by your theory of change and evaluation questions. Quantitative tools such as standardized scales, behavioral indicators, or administrative records can capture measurable shifts in outcomes, while qualitative methods such as interviews or open-ended responses can help explain how and why those changes occurred. Using both types of data together often provides a more complete picture of participant progress than either approach alone.

Single-Group Design (Performance Measurement)

The most common method to measure participant change over time is a single-group design, often called a “pre-post” or “before-and-after" measurement. This design measures outcomes for one group of participants before and after an intervention, then compares the two measurements, concluding whether a change occurred among the participants. It is most useful for tracking participant progress, monitoring outcomes over time, and supporting internal learning and program improvement. Single-group design is an excellent tool for showing whether change occurred. However, because it does not control for external factors or utilize a comparison group, single-group design cannot confidently state whether the intervention caused that change. While it cannot establish causation, it is still valuable for understanding overall progress and informing operational improvements.

How it works: Measure a baseline characteristic before your intervention, deliver your intervention, and measure again—in the exact same way—after the intervention. By comparing the before-and-after measurements, you can see whether change occurred among your participants.

Example: A literacy nonprofit measures students’ reading levels at the beginning of their tutoring program (baseline: 40% reading at grade level), provides 20 weeks of one-on-one tutoring, and measures again at the end (outcome: 75% reading at grade level). They can report that reading levels improved by 35 percentage points among participants.

Strengths

Formative evaluation: You can make improvements to your program while it’s still running based on participant feedback and early results gathered through your testing.

Recognizable change: At a quick glance, the design shows whether change is occurring in your desired direction.

Easy implementation: When resources are limited, but you need to track basic progress, a before-and-after test is a valuable and accessible option.

Internal learning tool: The simple model helps provide a basic outline for internal learning and program refinement.

Limitations

The primary limitation of the single-group design is the low internal validity, or the limited capacity of the design to identify a true cause-and-effect relationship. This means you cannot confidently claim your program caused the observed improvements. Other factors may explain the improvement, including:

Maturation: People naturally change over time, which could lead to improvement even without your program.
History: External events in the economy, environment, or other unrelated organizations might cause change.
Regression to the mean: People who start at extreme levels tend to move toward average levels naturally over time, regardless of intervention.
Testing effects: Sometimes people improve simply because they’ve taken the same test multiple times and become familiar with it.
Selection effects: The people who choose to participate in your program might be more motivated or have more resources than those who don’t, meaning they might improve even without your help.

The Bottom Line: Single-group designs can demonstrate that change occurred among participants, but they cannot determine with confidence why that change occurred. Consequently, they are most appropriately used for internal learning, monitoring progress, and guiding program improvement rather than making strong causal claims.

WHAT ARE THE STRENGTHS AND LIMITATIONS OF OUTCOME MEASUREMENT?

Outcome measurement offers a practical and accessible way for organizations to understand whether participants are experiencing meaningful change. By tracking outcomes at multiple points in time, organizations can document progress, communicate results to stakeholders, and build a credible record of their program’s contributions. Outcome data also supports transparency and accountability, demonstrating that resources are being used to produce observable benefits for the people served. Because these measures can often be collected through surveys, assessments, or administrative data, they are feasible for many programs to implement on an ongoing basis.

However, outcome measurement also has important limitations. Changes observed among participants cannot always be attributed solely to the program, since other factors in participants’ lives may influence results. Without a comparison group, it is difficult to determine whether similar changes would have occurred in the absence of the intervention. In addition, outcomes may not capture the full complexity of participant experiences or the longer-term impacts of the intervention. Measures can also be fallible depending on how questions are worded, when data is collected, and whether participants complete follow-up assessments.

Recognizing both the strengths and limitations of outcome measurement helps organizations interpret their findings responsibly. Rather than overstating conclusions, practitioners can use outcome data as one important source of evidence within a broader evaluation strategy.

SUMMARY

Outcome measurement offers a practical and accessible way for organizations to understand whether participants are experiencing meaningful change. By tracking outcomes at multiple points in time, outcome measurement helps organizations document progress, make improvements, communicate results to stakeholders, and build a credible record of their program’s contributions. Outcome data also supports transparency and accountability, demonstrating that resources are being used to produce observable benefits for the people served.

Ultimately, this evidence-based approach moves organizations away from guesswork, providing a shared language for staff and stakeholders to collaborate on informed, systematic improvements. Though outcome measurements alone are insufficient to prove causation, they provide substantial value for organizations looking for accessible evaluation methods. To prove an intervention is responsible for the change among participants, you must apply the impact assessment principles discussed in the next chapter.

ENDNOTES

1 - Robin Tepper Jacob, Catherine Armstrong, and Jacklyn Altuna Willard, Mobilizing Volunteer Tutors to Improve Student Literacy: Implementation, Impacts, and Costs of the Reading Partners Program (New York: MDRC, 2015).
2 - Institute of Education Sciences, What Works Clearinghouse. “Module 5: Outcome Measures (WWC Group Design Standards Training).” U.S. Department of Education, n.d.
3 - BetterEvaluation. (n.d.). “Evaluation design.” Manager’s guide to evaluation: Humanitarian Global. (n.d.). “Evaluation Designs.”

Access the Full PDF Here

Resources published and shared by the Ballard Center are not necessarily endorsed by BYU or The Church of Jesus Christ of Latter-day Saints.