When Correlation Repeats Across 50 States: The NAEP Evidence Behind My Senate Testimony
How Reading and Math Trends Shift After State-by-State Digital Adoption
In 1984, several U.S. states introduced mandatory seatbelt laws. Almost immediately, those states reported significant declines in traffic deaths and serious injuries - yet many observers dismissed these trends as coincidence.
Then more states adopted seatbelt laws. And each time they did, fatalities fell.
This is called staggered policy adoption, and it is one of the strongest natural tools social scientists have for identifying causal effects. When different jurisdictions implement the same policy at different times, and outcomes shift in alignment with those adoption dates rather than with a single calendar year, researchers gain strong evidence that the policy itself is driving the change.
Why does this matter?
Staggered EdTech Adoption
During my recent Senate testimony, I stated that when NAEP performance is aligned with state-level digital adoption, scores plateau and then decline.
I should note that in that testimony, I misspoke slightly - referring to ‘one-to-one’ deployment when the analysis, in fact, examines broader, statewide digital adoption. The distinction matters, and I want to be precise.
Since that testimony, I’ve received repeated requests to publish the underlying data. What follows is that analysis.
In the United States, education policy is largely controlled at the state level. As a result, digital infrastructure was not embedded into classrooms everywhere at once. Some states operationalized statewide digital testing and instructional systems earlier; others followed later.
When each state’s digital inflection year is aligned within a common event-time framework and mapped against NAEP performance, a striking empirical signal emerges.
For readers outside the United States, the National Assessment of Educational Progress (NAEP) is the largest nationally representative assessment of student learning. Often called ‘The Nation’s Report Card’, it tests 4th and 8th grade reading and mathematics every two years. Importantly, NAEP remains anchored to its original 1992 scoring scale, allowing genuine longitudinal comparison.
(NOTE: Florida and Texas represent the early and late ends of adoption. Because each lacks meaningful comparator states at those extremes, their outermost data points were excluded from slope estimation to avoid distortion.)
Across state after state, scores in both 4th and 8th grade rose steadily for many years prior to large-scale digital adoption. After adoption, however, the trajectory shifts - often sharply - toward decline.
To put some values to this:
YEAR 4 MATH: -1.45 points per year
YEAR 4 READING: -1.07 points per year
YEAR 8 MATH: -1.81 points per year
YEAR 8 READING: -1.16 points per year
The magnitude increases from grade 4 to grade 8, consistent with intensifying digital exposure in later schooling.
But Correlation Isn’t Causation…Right?
‘Correlation is not causation.’
Although strictly speaking true, this is one of the most misused phrases in public debate.
This warning only makes sense when examining a single, isolated dataset.
For example, swimming correlates with ice cream consumption. That sounds meaningful until you realize both are driven by a third variable: heat. In summer months, when temperatures rise, people tend to swim and eat ice cream. The correlation is real - but there is no genuine causal link between them.
Now imagine something different.
Imagine that the same correlation appears repeatedly across different populations, different cultures, different countries, different age groups, and independent datasets - and that it demonstrates a consistent dose-response pattern: as exposure increases, outcomes worsen proportionally.
This is how we move beyond loose association into the realm of genuine pattern with justified causal inference.
Returning to swimming and ice cream - the correlation disappears beyond summer conditions. It does not hold for indoor winter swimming. It does not hold for athletes training year-round. It does not hold in countries where dairy consumption is minimal. It does not hold where swimming is mandatory (such as military training or among sea nomad communities).
With classroom technology, the opposite occurs.
Across international assessments (PISA, TIMMS, PIRLS), and across national testing (NAEP), the same pattern appears again and again.
Routine digital exposure in instructional contexts is associated with weaker academic outcomes in a dose-response relationship. This pattern appears across states, countries, grade levels, subjects, and years.
And notably, there is no robust, internationally consistent signal suggesting that routine classroom digital exposure systematically improves learning outcomes. To be fair, there are isolated positive findings - but they do not replicate across contexts with the strength, scale, or consistency of the negative association.
Is this causation in the strictest sense? No. But it does allow for far stronger inference than the slogan “correlation is not causation” would suggest.
That phrase was meant to prevent naive conclusions - not to justify dismissing replicated, cross-context patterns spanning decades. To dismiss this data may feel like scientific rigor - but in truth, it’s little more than selective skepticism.
So Now Then…
As in the seatbelt example, the repeated alignment between policy timing and outcome-change across jurisdictions provides strong evidence that digital technology is contributing to the observed declines in academic performance.
Three additional clarifications.
First, this is not a COVID story. Performance declines begin prior to the pandemic and align with widespread digital adoption - not lockdown timing. To remove any ambiguity, the graphs above exclude data from the 2022 NAEP cycle - the year most disrupted by COVID effects.
Second, as noted above, NAEP scores are not periodically re-normed. Unlike many assessments that reset scoring scales over time, NAEP remains locked to its original 1992 scale. As a result, these declines reflect genuine changes in measured performance rather than statistical adjustments.
Third, to strengthen causative inference drawn from scaled and repeated correlational patterns, two additional elements are necessary: converging academic research and plausible biological mechanisms that explain the findings more convincingly than competing alternatives.
As explored in The Digital Delusion, both lines of evidence exist - and they align converge on the same patterns observed in the large-scale data.
A Note About Digital Inflection Points
Unfortunately, there is no central database that lists when digital technology shifted from peripheral to central within education across each state. As such, I felt it best to include my determinations.
NOTE: Digital inflection years were defined as the first operational period in which statewide accountability systems or legislative mandates required routine computer-based administration or mandatory online instructional participation, thereby institutionalizing digital infrastructure across districts. Where a clear statewide trigger existed (e.g., mandated online coursework or formal transitions to computer-based statewide assessments), the implementation year of that policy was used. In states without a single discrete triggering event, the inflection year was determined by the operational transition to routine computer-based statewide assessment across the majority of districts, as evidenced by coordinated device deployment, broadband readiness, and testing infrastructure requirements. Sensitivity analyses shifting inflection years by ±1–2 years did not materially alter event-time estimates, indicating that results are robust to reasonable timing variation.




Right now your "digital inflection" data across all states looks like it predominantly reflects adoption of computerized use of tech by students in the classroom. Do you have more data than just this? What exactly is the correlation you are purporting, and where is the dataset for those correlations?
I have not read the data for every state, but the ones that I did read (OH, ND, MA, RI, OR) noted that the inflection points correspond to the time that online TESTING was implemented. NOT when online INSTRUCTION was implemented, or became widespread. These data don't support an assertion that digital instruction is failing our kids. They do support an assertion that digital TESTING shows some performance differences. I think we need to continue to dig.