Wisdom of the silicon crowd: LLM ensemble prediction capabilities rival human crowd accuracy

Schoenegger, Philipp ORCID: 0000-0001-9930-487X, Tuminauskaite, Indre, Park, Peter S., Valdece Sousa Bastos, Rafael and E. Tetlock, Philip (2024) Wisdom of the silicon crowd: LLM ensemble prediction capabilities rival human crowd accuracy. Science Advances, 10 (45). ISSN 2375-2548

Text (sciadv.adp1528) - Published Version
Available under License Creative Commons Attribution.
Download (920kB)

Scopus publication

Identification Number: 10.1126/sciadv.adp1528

Abstract

Human forecasting accuracy improves through the “wisdom of the crowd” effect, in which aggregated predictions tend to outperform individual ones. Past research suggests that individual large language models (LLMs) tend to underperform compared to human crowd aggregates. We simulate a wisdom of the crowd effect with LLMs. Specifically, we use an ensemble of 12 LLMs to make probabilistic predictions about 31 binary questions, comparing them with those made by 925 human forecasters in a 3-month tournament. We show that the LLM crowd outperforms a no-information benchmark and is statistically indistinguishable from the human crowd. We also observe human-like biases, such as the acquiescence bias. In another study, we find that LLM predictions (of GPT-4 and Claude 2) improve when exposed to the median human prediction, increasing accuracy by 17 to 28%. However, simply averaging human and machine forecasts yields more accurate results. Our findings suggest that LLM predictions can rival the human crowd’s forecasting accuracy through simple aggregation.

Item Type:	Article
Additional Information:	© 2024 The Author(s)
Divisions:	Management
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date Deposited:	04 Oct 2024 16:00
Last Modified:	28 Jul 2025 01:39
URI:	http://eprints.lse.ac.uk/id/eprint/125626

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics