“Scant evidence” – Google’s AI chemistry claims were very misleading
AI discovered 2.2m new "materials" said DeepMind. Chemistry professors investigated, found hallucinations, repetition, known crystals.
In September 2023 Google DeepMind broke the news to the world that its new AI tool “GNoME” had “helped discover 2.2 million new crystals” or the structures of pure elements, minerals, metals, and intermetallic compounds. The news seemed revolutionary: “Our work represents an order-of-magnitude expansion in stable materials known to humanity” the company claimed in a paper published in science magazine Nature.
The new materials “have the potential to develop future transformative technologies ranging from superconductors, powering supercomputers, and next-generation batteries” DeepMind said in a separate blog.
It described the find as “equivalent to nearly 800 years’ worth of knowledge” and saying boldly that it was “contributing 380,000 materials that we predict to be stable to the Materials Project, which is now processing the compounds and adding them into its online database.”
The truth starts to crystallise…
Seven months later, that claim has been ruthlessly dismissed – with fresh analysis by two leading chemistry professors finding “scant evidence for compounds that fulfill the trifecta of novelty, credibility, and utility” and hinting gently that many of the “materials” (a term they reject) may have been effectively hallucinated – even as they recognise potential in AI.
“There is clearly a great need to incorporate domain expertise in materials synthesis and crystallography,” said Professor Emeritus Anthony Cheetham and Distinguished Professor Ram Seshadi (both of the University of California) in an April 8, 2024 paper in Chemistry Materials.
“A recurrent issue that is found throughout the GNoME database, is that many of the entries are based upon the ordering of metal ions that are unlikely to be ordered in the real world” Cheetham and Seshadi warned.
See also: Anthropic’s CISO drinks the AI kool aid - backpedals frantically on security analysis claim
Frustratingly, the whole database was a mess, they said.
"We believe that experimentalists, who presumably are an important target audience, would find it helpful if the results were presented in a more organized manner, rather than as a seemingly random walk through the periodic table. For example, it would be useful to list all the oxides together, as well as the fluorides, chlorides, bromides, etc. This could no doubt be done by the user with the help of the search function in the database, but it would be helpful if it was done in the parent listing, given the enormous number of entries...
"The compositions are [also] often not presented in a manner that an experimental materials chemist would find appropriate or helpful, nor are the usual rules of chemical nomenclature followed."
"Already known..."
Having reviewed a sample of the “material” structures generated by GNoME they said “a large fraction of the 384,870 compositions adopt structures that are already known and can be found in the ICSD database.”
(That's the world's largest database for completely determined inorganic crystal structures. As of April 2024 it had 299,045 entries.)
DeepMind really should not have called its work on “crystalline inorganic compounds” “materials”, the two added. They took particular aim at the claim of “an order-of-magnitude expansion in stable materials known to humanity”, saying: “We would respectfully suggest that the work by Merchant et al. does not report any new materials but reports a list of proposed compounds. In our view, a compound can be called a material when it exhibits some functionality and, therefore, has potential utility.”
Name. Email. News and interviews you shouldn't miss: Get subscribed to The Stack for free
The two professors added: “Since no functionality has been demonstrated for the 384,870 compositions in the Stable Structure database, they cannot yet be regarded as materials”.
So was the GNoME approach a waste of time? Not entirely. Mostly, the work would have benefited from significantly more domain expertise, the two professors noted. (The Stack revisited the original DeepMind paper and found that none of its authors are “pure” chemists; two have doctorates in physics, the others are computer scientists.)
Cheetham and Seshadi said: “Much could be achieved... by embedding a knowledge of solid-state chemistry into their methodology…While we are confident that [AI and ML] have a bright future in the field of materials discovery, more work needs to be done before that promise is fulfilled.”
LLMs + research is hard...
Their research perhaps recalls US Senator Chris Murphy's claim in 2023 that “ChatGPT taught itself to do advanced chemistry. It wasn't built into the model. Nobody programmed it to learn complicated chemistry. It decided to teach itself, then made its knowledge available to anyone who asked. Something is coming. We aren't ready.”
As Brian O'Neill, an associate professor of computer science at Western New England University gently explained in response, ChatGPT “didn’t learn the rules of chemistry, or physics, or math, or anything. It learned how phrases and sentences and documents about chemistry are constructed... it didn’t learn chemistry – it learned how chemistry papers are written.”
That makes generative AI outputs often superficially convincing – but as this week's Chemistry Materials paper shows, that complicates research: There's not enough expert chemistry professors in the world' to sense-check every LLM's "ground-breaking" output but when – asProfessor Emily Bender puts it in a joint paper with Professor Chirag Shah from the University of Washington LLMs “do not have any understanding of what they are producing, any communicative intent, any model of the world, or any ability to be accountable for the truth of what they are saying" that robust analysis is more important than ever.
Vast computational power emphatically has its place in the sciences. But robust scrutiny by domain experts, perhaps earlier in the research cycle, will continue to be critical if real innovation is going to happen.
As Terrence Sejnowski, Professor at the Salk Institute for Biological Studies put it in a February 17, 2023 paper in Neural Computation: "What appears to be intelligence in LLMs may in fact be a mirror that reflects the intelligence of the interviewer, a remarkable twist that could be considered a reverse Turing test."