The next phase of global AI competition will hinge less on who can scale today’s transformer models the fastest but more on who can reinvent the architecture to deliver comparable capability at a fraction of the power cost, according to a top United States scientist.
In simple terms, a transformer is the core AI architecture that learns patterns in vast amounts of data by weighing relationships between words or symbols, while a large language model (LLM) is a transformer trained at scale to generate and reason with human-like text. Popular LLM examples include ChatGPT and DeepSeek.
“I would like to see alternatives to the transformer model to give us this kind of thinking without the high energy use that we have now,” Jennifer Chayes, dean of the College of Computing, Data Science, and Society at the University of California, Berkeley, told Asia Times in an interview in Hong Kong.
“It’s very basic mathematical questions that will underlie that,” she said. “Nobody knows exactly what to do there. People are trying different approaches.”
She added that some of the world’s best computer scientists are spending time grappling with this question because of its enormous societal stakes, particularly around energy consumption and climate change, but acknowledged that these researchers are reluctant to devote themselves fully to such a difficult problem for fear that years of effort could yield little return and jeopardise their careers.
Chayes praised China’s DeepSeek for using “knowledge distillation” methods to train its AI models, noting that the process consumes far less energy than traditional AI training.
“That was very clever. That’s an innovation,” she said, adding that distillation techniques are now being applied beyond mainstream AI development and into scientific research, including her own work in chemistry and materials science.
“No matter how much money you have, you can’t generate enough chemistry data to make a foundation chemistry model,” she said. “So, how do you distill and post-train your AI models? How do you integrate the training with experiments in cases where there is very sparse data? These are huge foundational questions.”
Knowledge distillation – or, simply, distillation – is a commonly used AI training technique. It can be understood as a student who keeps asking questions to a knowledgeable teacher. At some point, the student will be as smart as the teacher.
On January 22, 2025, a group of DeepSeek researchers published a paper stating that the training of the DeepSeek‑R1 model relied on distilled data from Alibaba’s Tongyi Qianwen (Qwen) and Meta’s Llama models. The team said the total training cost of DeepSeek‑R1 was about US$5.58 million, roughly 1.1% of the estimated US$500 million spent on training Meta’s Llama 3.1.
Brain drain in the US
Commenting on US chip export controls on China, Chayes said she did not believe the restrictions had had a negative impact on Chinese researchers. Instead, she argued that the pressure may have encouraged scientists in China to pursue more innovative ways to increase computing power and efficiency.
“If you’re under pressure, you are going to make greater breakthroughs. And DeepSeek certainly did that,” she said. “I see that people are getting more computing power at Tsinghua University than they’re being given at US universities.”
At the same time, Chayes noted that many American universities face a different challenge when competing with Chinese counterparts, as they continue to lose talent to large technology companies, a dynamic that can leave academic researchers feeling disadvantaged in access to people and resources.
The debate over export controls has played out against a shifting policy backdrop. Reuters reported on January 30 that China had given DeepSeek approval to buy Nvidia’s H200 AI chips, although Nvidia chief executive Jensen Huang said on the same day that his company had not received confirmation and that Beijing was still finalizing the license.
Last April, the US government banned exports of Nvidia’s H20 AI chips to China amid rising political tensions between Washington and Beijing, a restriction that was later reversed in July 2025. Now, Washington allows the export of the H200 chips to China.
New Shaw Prize category
Before joining UC Berkeley in 2020, Chayes spent more than two decades as a technical fellow at Microsoft, where she led major research programs and helped shape the company’s long-term research agenda. She founded and led the Theory Group at Microsoft Research, and later established Microsoft Research New England and the New York City lab. She has also chaired the selection committees for the Turing Award.
She has recently accepted an invitation from Tony Chan, the former president of King Abdullah University of Science and Technology, to chair the selection committee for the Shaw Prize’s newly established award in the computer science category. The move expands the Shaw Prize beyond its three long-standing fields of mathematics, astronomy and life science.
The selection committee brings together some of the most influential figures in the AI sector, including John Hennessy, chairman of the board of Alphabet Inc, which operates Google; Yann LeCun, former chief scientist of Meta AI and widely recognized as a “godfather of AI”; and Harry Shum, the former executive vice president of Microsoft’s AI and Research group.
Chayes stressed that the selection committee has deep knowledge of AI development in China and is well placed to assess research from mainland Chinese scientists alongside work from the rest of the world under the Shaw Prize’s established nomination and review process.
“Harry Shum has spent much of his career in China,” she said. “And personally, in 1997–98, I worked with (Taiwanese computer scientist) Kai-Fu Lee to open Microsoft Research Asia in Beijing.”
She added that in the more than two decades leading up to her move to Berkeley in 2020, she mentored many computer scientists in Beijing and built extensive personal relationships with research leaders across China.
Chayes said she has worked closely with Chinese researchers, both in mainland China and in the United States, over the past decades, and has a positive image of them.
“I feel that, on average, researchers from China work harder than those from the rest of the world. That’s something that I love about them,” she said. “They’re really my type of people.”
She said the Shaw Prize’s first laureate in computer science, to be announced in spring 2027, will be selected purely on scientific merit and could come from anywhere in the world.
“We’ve been discussing who the obvious candidates are. We want to make sure who gets nominated. Some of them are Chinese, some are from Europe and America. Some are Chinese in America,” she said.
She cited a mainland Chinese postdoctoral researcher who had worked in North America and is now based in Hong Kong as an example of the kind of globally mobile scholar who could be considered.
Chayes also described Ya-Qin Zhang, a chair professor at Tsinghua University and former president of Baidu, as an old friend, saying she first knew him when he was head of Microsoft Research Asia in 1998. She added that Chinese-born American computer scientist Fei-Fei Li is also a close friend, and that the two continue to work together on an AI expert panel appointed by California Governor Gavin Newsom in September 2024.
Read: Beijing to approve Nvidia H200 imports, flagging overreliance
Follow Jeff Pao on Twitter at @jeffpao3
