The PEARLS Framework

How we evaluate governance research for trustworthiness

Jump to criterion

What Is PEARLS?

The PEARLS framework is a six-criterion evaluation tool designed to help practitioners distinguish research worth acting on from research that sounds compelling but doesn't hold up under scrutiny. Each criterion is scored on a 1–5 scale, producing a total between 6 and 30 points.

PEARLS was developed specifically for school board governance — a domain where the evidence base is thinner than in instructional practice, where advocacy-funded studies circulate alongside rigorous academic work, and where decision-makers rarely have time for deep methodological review. The framework is not a replacement for expert judgment; it's a structured starting point.

Default to 1. Each criterion begins at the floor score unless specific, observable evidence warrants a higher rating. This conservative default counteracts the natural tendency to give benefit of the doubt to studies that confirm existing beliefs or come from prestigious sources.

The Six PEARLS Criteria

P
Participants
Who was studied, and how were they selected?
Score 1–5

Examine the sample: its size, demographic composition, and how participants were recruited or selected. A study's findings are only as generalizable as its participants are representative of the population you care about. Ask: are these the students, districts, or boards that resemble the context where this finding would be applied?

1
Very small, non-representative sample with no rationale for its selection. Participants clearly mismatch the intended application context.
3
Sample is adequate for the study type but has meaningful limitations in generalizability or contextual relevance to your setting.
5
Large, diverse, and well-matched to the population of interest. Strong demographic and contextual relevance. Selection process is transparent and defensible.
E
Execution
How rigorous was the research design?
Score 1–5

Evaluate the methodology and study design. Well-designed studies — randomized controlled trials, quasi-experimental designs, rigorous longitudinal tracking — provide substantially more robust evidence than surveys, case studies, or opinion aggregation. The design should be appropriate to the question being asked, and clearly described enough that it could in principle be replicated.

1
Methods are unclear, poorly explained, or fundamentally flawed. Design cannot support the conclusions claimed.
3
Methodologically sound for its study type, but not designed to establish causal impact. Describes what is, not what works.
5
Rigorous design that provides credible evidence of what works: randomized controlled trial, quasi-experimental design, or longitudinal tracking of outcomes with appropriate controls.
A
Analysis
How were data interpreted and conclusions drawn?
Score 1–5

Assess how the data were analyzed and what conclusions were drawn from them. The central question here is whether the claims match the evidence. This criterion pays particular attention to the distinction between correlation and causation — one of the most frequently abused distinctions in education research. Look for whether authors acknowledge what their findings do and don't prove.

1
Conclusions are unclear, overreaching, or not supported by the data presented. Causal claims are made without causal evidence.
3
Data are reasonably interpreted but some claims stretch beyond the evidence. Study limitations are only partially acknowledged.
5
Analysis is transparent, precise, and well-aligned with what the design can support. The distinction between correlation and causation is explicit. Limitations are clearly acknowledged.
R
Relevance
How timely and applicable is this to your context?
Score 1–5

Evaluate timeliness and contextual fit. Research conducted in a different era, geography, or demographic context may not transfer. Consider: when was the data actually collected (not just published)? Was the context — district size, student demographics, policy environment — meaningfully similar to the context in which you'd apply the findings?

1
Outdated research or findings from a totally irrelevant context. Using this to inform current decisions would be a stretch.
3
Findings are somewhat applicable, or the topic is timely but the specific context lacks direct transferability to your situation.
5
Highly timely and directly relevant. You can readily imagine drawing on this research to inform practice or policy in your specific context.
S
Scope
Does it account for complexity and acknowledge its own limits?
Score 1–5

Evaluate the breadth and depth of the study — and crucially, whether it acknowledges what it doesn't explain. Research that examines multiple relevant variables, accounts for contextual factors, and honestly discusses alternative explanations is more trustworthy than research that treats a complex phenomenon as if one variable explains everything.

1
Narrow and shallow — misses key variables, skips essential context, and fails to recognize or disclose its own limitations.
3
Demonstrates depth or breadth, but not both. Addresses some complexity without fully accounting for the range of factors at play.
5
Comprehensive and reflective. Addresses sufficient breadth and depth. Explicitly acknowledges what the research does and doesn't explain, and where alternative interpretations are plausible.

Score Interpretation

Sum the six criterion scores for a total between 6 and 30. The interpretation bands below describe what the score implies for decision-making purposes. These thresholds are not arbitrary cutoffs — they reflect the compounding effect of multiple weaknesses. A study with multiple 2s and 3s may still be informative reading; it is not an appropriate foundation for policy change.

6–12
13–19
20–25
26–30
Severely Limited Insufficient Informative Trustworthy
Score Range Rating What It Means for Decisions
6–12 Severely Limited Do not use as a basis for decisions. May be useful for understanding a debate's history, but has fundamental evidentiary problems.
13–19 Insufficient May inform curiosity but not decisions. Treat as background reading or a prompt for further inquiry — not a source of actionable evidence.
20–25 Informative Useful context that can contribute to a decision alongside stronger evidence. Exercise caution — do not rely on this study alone.
26–30 Trustworthy Appropriate basis for practice and policy decisions. Findings can be cited with confidence when the context matches your situation.

PEARLS in Practice: Three Examples

The following examples apply the PEARLS framework to well-known studies — two from outside governance research, one from the intersection of health and public policy. They illustrate how the framework handles a range of methodological quality, including a study that has been formally retracted.

Hart & Risley (1995) — "The 30 Million Word Gap"
42 families observed over 2.5 years; word exposure correlated with later vocabulary and language outcomes.
18 /30
Insufficient
P 2
E 3
A 3
R 4
L 4
S 2
  • P=2: 42 families, homogeneous and self-selected. No randomization. Sample is too small and non-diverse to support broad generalization.
  • E=3: Careful longitudinal observation over 2.5 years — methodologically serious — but the tiny sample severely limits what can be inferred.
  • A=3: The correlation between word exposure and outcomes is credibly documented. However, policy extrapolations routinely overstate what the original finding showed.
  • R=4: Directly relevant to early childhood development and education policy; the topic remains active and consequential.
  • L=4: Federally and university funded. No commercial conflicts identified.
  • S=2: Narrowly focused on word counts. Misses socioeconomic confounders, quality of interaction, and alternative explanations that subsequent research has surfaced.
18/30 — Insufficient. Genuinely influential but consistently over-cited relative to its evidentiary weight. The "30 million words" figure has become a policy shorthand that outran the study's actual claims.
Concord Grape Juice & Cognitive Function (2016)
Study reporting memory and cognitive improvements following daily Concord grape juice consumption. Funded by Welch Foods.
13 /30
Insufficient
P 2
E 3
A 2
R 3
L 1
S 2
  • P=2: Small sample of older adults with mild memory complaints — not a general-population sample. Self-selected health-concerned participants may differ meaningfully from general adults.
  • E=3: Randomized placebo-controlled design is a genuine methodological strength. However, the study's short duration limits the durability of conclusions.
  • A=2: Positive findings reported prominently; effect sizes are modest and the study was not pre-registered, raising the possibility of selective outcome reporting.
  • R=3: Moderately relevant to anyone specifically interested in cognitive health, but the population and intervention are highly specific.
  • L=1: Funded by Welch Foods, the manufacturer of Concord grape juice. This is a direct commercial conflict — the funder has a financial stake in positive results.
  • S=2: Single product, single outcome measure, short time horizon. No exploration of mechanism or alternative explanations for the effect.
13/30 — Insufficient. The randomized design is a real strength, but methodological rigor is undermined by the funding conflict and narrow scope. The L=1 alone should prompt caution.
Wakefield et al. (1998) — MMR Vaccine & Autism
Retracted Lancet paper claiming a causal link between the MMR vaccine and autism in 12 children.
6 /30
Untrustworthy
P 1
E 1
A 1
R 1
L 1
S 1
  • P=1: n=12, and children were specifically selected to show a pattern — not a representative sample by any definition.
  • E=1: Case series with no controls, no randomization, no blinding. Methodology was subsequently found to have been fraudulently manipulated.
  • A=1: Causal conclusions vastly exceeded evidentiary basis. Conclusions were not merely overstated — they were later found to be fabricated.
  • R=1: The study has been comprehensively refuted by dozens of large-scale subsequent studies. Citing it as evidence would be anti-evidence.
  • L=1: Wakefield received undisclosed payments from litigation attorneys who were seeking evidence against vaccine manufacturers at the time of the study.
  • S=1: Three symptoms, 12 children, no acknowledgment of limitations, no alternative explanations considered.
6/30 — Untrustworthy. The lowest possible PEARLS score across all six criteria. The paper was retracted by The Lancet in 2010 and Wakefield lost his medical license. This example illustrates that PEARLS scores don't require hindsight — the flaws were scoreable at publication.

How We Grade School Board Research

School board governance is a methodologically challenging research domain. Outcomes are diffuse and long-cycle. The number of boards in any study is usually small. Variables like board composition, board behavior, leadership culture, and district context are difficult to isolate from one another. Most research in this space is correlational, not causal.

When applying PEARLS to governance research specifically, we weight the following factors:

A note on the evidence base: Most school board research scores in the 18–24 range on the PEARLS scale. Very few studies clear 26. This is not a criticism of the researchers — it reflects the genuine difficulty of conducting rigorous governance research at scale. We are transparent about this limitation throughout our research summaries. When we cite a study rated Informative, we say so. When the best available evidence scores 20, we note that a 20 is the ceiling, not a floor.