China's gaokao examination is the largest mass test in the world. Image: X Screengrab

This week, while families gathered outside examination halls across China, some in red qipao for luck, 12.9 million students sat for the gaokao, the world’s largest annual standardized test.

On the other side of the Pacific, American higher education is moving in the opposite direction. Roughly 90% of ranked four-year US colleges no longer require the SAT or ACT.

Two of the world’s largest education systems are heading in opposite directions on the same question: how do you fairly measure a young mind? Increasingly, artificial intelligence is rewriting the answer.

China has doubled down on the single high-stakes exam. The gaokao remains, in the words of one researcher, a pillar of educational equity and social stability, even as Beijing reframes it from pure exam-based selection toward broader evaluation.

It is also being wired to national priorities. This year’s reforms added new majors in fields such as embodied intelligence, rare-earth science and the low-altitude economy, steering test-takers toward strategic workforce gaps.

The United States took the opposite turn. After dropping test requirements during the pandemic, it is now having second thoughts. Yale, MIT and Dartmouth have reinstated testing.

This spring, more than 1,000 University of California (UC) faculty urged the system to restore at least a math requirement, citing preparation gaps so severe that instructors must reteach middle-school mathematics.

Average scores have fallen, too: fewer than 40% of SAT takers now meet the College Board’s own readiness benchmarks — the level it defines as a 75% chance of earning at least a C in entry-level college courses. Grade inflation, the faculty wrote, has left transcripts “nearly meaningless.”

Here is the genuinely interesting part, and it has little to do with which system is better: AI is exposing the hidden assumptions beneath each model.

The American admissions essay was long prized as the humane counterweight to cold test scores. It has become the system’s softest target.

A growing share of applicants now use AI to brainstorm, outline or draft their personal statements. Survey data cited by Inside Higher Ed indicates that about half use it to brainstorm and roughly one in five to produce a first draft. A small industry has sprung up to “humanize” chatbot prose.

The UC professors drew the obvious conclusion. In an era of AI-assisted essays and inflated transcripts, a standardized score is the closest thing colleges have to a signal that a machine cannot easily fake. The irony is sharp: the very subjectivity that made holistic admissions feel fairer is now its greatest weakness.

China faces the mirror image. Because everything rides on one sealed, synchronized event, the gaokao is comparatively easy to defend.

During last year’s exam, the country’s leading chatbots — ByteDance’s Doubao, Alibaba’s Qwen, Tencent’s Yuanbao, Moonshot’s Kimi and DeepSeek — temporarily switched off image recognition and question-answering during testing hours, citing fairness, as Bloomberg and The Guardian reported.

A tightly proctored, synchronized exam is structurally more resistant to AI-assisted fraud than any take-home application file.

Standardization, the feature critics call rigid, turns out to be a powerful integrity firewall. But that same rigidity is a limit. A system built to be cheat-proof and uniform is poorly suited to measuring the creativity and judgment an AI-saturated economy will increasingly reward.

Strip away the national rivalry and a counterintuitive lesson emerges. AI is not pushing assessment toward something more human and open-ended, as many predicted. For now, it is pushing the other way — toward measures that are harder to automate and easier to verify.

America is rediscovering the standardized test out of necessity, not nostalgia, because AI eroded the alternatives faster than anyone expected. China’s exam-centric model, long mocked in the West as a pressure cooker, happens to be resilient to the same threat.

Yet resilience is not the same as wisdom. A test that machines cannot beat is not automatically a test that measures what matters. The deeper question is whether any single instrument — an essay, a multiple-choice exam, a GPA — can survive a technology capable of imitating or solving most of them.

Even the UC faculty conceded that scores should serve as a readiness check, not a mechanical ranking tool.

The most useful frame, then, is not China versus America, or exams versus essays. It is simpler than that. Every assessment we have inherited was designed for a world without AI, and that world is gone.

The students who sat the gaokao this week, and the American teenagers wondering whether to take the SAT, are early test cases in the same experiment. The country that learns fastest will not be the one with the toughest exam or the most polished essay.

It will be the one willing to ask, honestly, what we are actually trying to measure — now that a machine can fake almost everything we used to count.

Y. Tony Yang is an Endowed Professor at the George Washington University in Washington, D.C.

Leave a comment