Benchmarking OpenAI's APIs and other Large Language Models for repeatable and efficient question answering across multiple documents

Filipovska, Elena, Mladenovska, Ana, Bajrami, Merxhan, Dobreva, Jovana, Hillman, Velislava, Lameski, Petre and Zdravevski, Eftim (2024) Benchmarking OpenAI's APIs and other Large Language Models for repeatable and efficient question answering across multiple documents. Annals of Computer Science and Intelligence Systems (2024). pp. 107-117. ISSN 2300-5963

Text (3979) - Published Version
Available under License Creative Commons Attribution.
Download (295kB)

Scopus publication

Identification Number: 10.15439/2024F3979

Abstract

The rapid growth of document volumes and complexity in various domains necessitates advanced automated methods to enhance the efficiency and accuracy of information extraction and analysis. This paper aims to evaluate the efficiency and repeatability of OpenAI's APIs and other Large Language Models (LLMs) in automating question-answering tasks across multiple documents, specifically focusing on analyzing Data Privacy Policy (DPP) documents of selected EdTech providers. We test how well these models perform on large-scale text processing tasks using the OpenAI's LLM models (GPT 3.5 Turbo, GPT 4, GPT 4o) and APIs in several frameworks: direct API calls (i.e., one-shot learning), LangChain, and Retrieval Augmented Generation (RAG) systems. We also evaluate a local deployment of quantized versions (with FAISS) of LLM models (Llama-2-13B-chat-GPTQ). Through systematic evaluation against predefined use cases and a range of metrics, including response format, execution time, and cost, our study aims to provide insights into the optimal practices for document analysis. Our findings demonstrate that using OpenAI's LLMs via API calls is a workable workaround for accelerating document analysis when using a local GPU-powered infrastructure is not a viable solution, particularly for long texts. On the other hand, the local deployment is quite valuable for maintaining the data within the private infrastructure. Our findings show that the quantized models retain substantial relevance even with fewer parameters than ChatGPT and do not impose processing restrictions on the number of tokens. This study offers insights on maximizing the use of LLMs for better efficiency and data governance in addition to confirming their usefulness in improving document analysis procedures.

Item Type:	Article
Additional Information:	© 2024 The Author(s)
Divisions:	Media and Communications
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science H Social Sciences > HD Industries. Land use. Labor > HD28 Management. Industrial Management
Date Deposited:	08 Jan 2025 12:51
Last Modified:	14 Dec 2025 05:58
URI:	http://eprints.lse.ac.uk/id/eprint/126674

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics