A new statistical technique allows researchers to safely use the predictions obtained from machine learning to test scientific hypotheses. This image shows an artistic interpretation of the technique, called prediction-powered inference, which has been generated by the DALL-E AI system. Illustration courtesy of Michael Jordan.
Over the past decade, AI has permeated nearly every corner of science: Machine learning models have been used to predict protein structures, estimate the fraction of the Amazon rainforest that has been lost to deforestation and even classify faraway galaxies that might be home to exoplanets.
But while AI can be used to speed scientific discovery — helping researchers make predictions about phenomena that may be difficult or costly to study in the real world — it can also lead scientists astray. In the same way that chatbots sometimes "hallucinate," or make things up, machine learning models can sometimes present misleading or downright false results.
In a paper published online today (Thursday, Nov. 9) in Science, researchers at the University of California, Berkeley, present a new statistical technique for safely using the predictions obtained from machine learning models to test scientific hypotheses.
The technique, called prediction-powered inference (PPI), uses a small amount of real-world data to correct the output of large, general models — such as AlphaFold, which predicts protein structures — in the context of specific scientific questions.
"These models are meant to be general: They can answer many questions, but we don't know which questions they answer well and which questions they answer badly — and if you use them naively, without knowing which case you're in, you can get bad answers," said study author Michael Jordan, the Pehong Chen Distinguished Professor of electrical engineering and computer science and of statistics at UC Berkeley. "With PPI, you're able to use the model, but correct for possible errors, even when you don’t know the nature of those errors at the outset."
For the complete story, visit the source article at UC Berkeley News: How to use AI for discovery — without leading science astray