Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment

Large language models (LLMs) have transformed AI and achieved breakthrough performance on a wide range of tasks that require human intelligence. In science, perhaps the most interesting application of LLMs is for hypothesis formation. A feature of LLMs, which results from their probabilistic structu...

Full description

Saved in:

Bibliographic Details
Main Authors	Abdel-Rehim, Abbi, Zenil, Hector, Orhobor, Oghenejokpeme, Fisher, Marie, Collins, Ross J, Bourne, Elizabeth, Fearnley, Gareth W, Tate, Emma, Smith, Holly X, Soldatova, Larisa N, King, Ross D
Format	Journal Article
Language	English
Published	20.05.2024
Subjects	Computer Science - Learning Quantitative Biology - Cell Behavior Quantitative Biology - Quantitative Methods
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Large language models (LLMs) have transformed AI and achieved breakthrough performance on a wide range of tasks that require human intelligence. In science, perhaps the most interesting application of LLMs is for hypothesis formation. A feature of LLMs, which results from their probabilistic structure, is that the output text is not necessarily a valid inference from the training text. These are 'hallucinations', and are a serious problem in many applications. However, in science, hallucinations may be useful: they are novel hypotheses whose validity may be tested by laboratory experiments. Here we experimentally test the use of LLMs as a source of scientific hypotheses using the domain of breast cancer treatment. We applied the LLM GPT4 to hypothesize novel pairs of FDA-approved non-cancer drugs that target the MCF7 breast cancer cell line relative to the non-tumorigenic breast cell line MCF10A. In the first round of laboratory experiments GPT4 succeeded in discovering three drug combinations (out of 12 tested) with synergy scores above the positive controls. These combinations were itraconazole + atenolol, disulfiram + simvastatin and dipyridamole + mebendazole. GPT4 was then asked to generate new combinations after considering its initial results. It then discovered three more combinations with positive synergy scores (out of four tested), these were disulfiram + fulvestrant, mebendazole + quinacrine and disulfiram + quinacrine. A limitation of GPT4 as a generator of hypotheses was that its explanations for them were formulaic and unconvincing. We conclude that LLMs are an exciting novel source of scientific hypotheses.
DOI:	10.48550/arxiv.2405.12258