ChatGPT accelerates chemistry discovery for climate response, study shows

August 7, 2023

Zhiling Zheng and a MOF-powered water harvester.

Photo: Zhiling Zheng and a MOF-powered water harvester (photo courtesy of Zhiling Zheng.)

UC Berkeley experts taught ChatGPT how to quickly create datasets on difficult-to-aggregate research about certain materials that can be used to fight climate change, according to a new paper published in the Journal of the American Chemical Society.

These datasets on the synergy of the highly-porous materials known as metal-organic frameworks (MOFs) will inform predictive models. The models will accelerate chemists' ability to create or optimize MOFs, including ones that alleviate water scarcity and capture air pollution. All chemists – not just coders – can build these databases due to the use of AI-fueled chatbots.

"In a world where you have sparse data, now you can build large datasets," said Omar Yaghi, the Berkeley chemistry professor who invented MOFs and an author of the study. "There are hundreds of thousands of MOFs that have been reported, but nobody has been able to mine that information. Now we can mine it, tabulate it and build large datasets."

This breakthrough by experts at the College of Computing, Data Science, and Society's Bakar Institute of Digital Materials for the Planet (BIDMaP) will lead to efficient and cost-effective MOFs more quickly, an urgent need as the planet warms. It can also be applied to other areas of chemistry. It is one example of how AI can augment and democratize scientific research.

"We show that ChatGPT can be a very helpful assistant," said Zhiling Zheng, lead author of the study and a chemistry Ph.D. student at Berkeley. "Our ultimate goal is to make [research] much easier."

Other authors of the study, "ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis," include the Department of Chemistry's Oufan Zhang and the Department of Electrical Engineering and Computer Sciences's Christian Borgs and Jennifer Chayes. All are affiliated with BIDMaP, except Zhang.

Certain authors are also affiliated with the Kavli Energy Nanoscience Institute, the Department of Mathematics, the Department of Statistics, the School of Information and KACST-UC Berkeley Center of Excellence for Nanomaterials for Clean Energy Applications.

If we don't use it, then we can't make it better. If we can't make it better, then we will have missed a whole area that society is already using. AI has transformed many other sectors of our society – commerce, banking, travel. Why not transform science?
Omar Yaghi, Berkeley Professor of Chemistry

'A substantial jump' in AI for science

The team guided ChatGPT to quickly conduct a literature review. They curated 228 relevant papers. Then they enabled ChatGPT to process the relevant sections in those papers and to extract, clean and organize that data.

To help them teach ChatGPT to generate accurate and relevant information, they modified an approach called "prompt engineering" into "ChemPrompt Engineering." They developed prompts that avoided asking ChatGPT for made up or misleading content; laid out detailed directions that explained to the chatbot the context and format for the response; and provided the large language model a template or instructions for extracting data.

The chatbot's literature review – and the experts' approach – was successful. ChatGPT finished in a fraction of an hour what would have taken a student years to complete, said Borgs, BIDMaP's director. It mined the synthetic conditions of MOFs with 95% accuracy, Yaghi said.

"One big area of how you do 'AI for science' is probing literature more effectively. This is really a substantial jump in doing natural language processing in chemistry," said Chayes, dean of the College of Computing, Data Science, and Society. "And to use it, you can just be a chemist, not a computer scientist."

This development will speed up MOF-related science work, including those efforts aimed at combating climate change, said Borgs. With natural disasters becoming more severe and frequent, we need that saved time, he said.

Yaghi noted that using AI in this way is still new. Like any new tool, experts will need time to identify its shortcomings and address them. But it's worth investing the effort, he said.

"If we don't use it, then we can't make it better. If we can't make it better, then we will have missed a whole area that society is already using," Yaghi said. "AI has transformed many other sectors of our society – commerce, banking, travel. Why not transform science?"

For more information