Login | Register

Beyond the Hype. Deploying and Evaluating a Conversational Agent Using LLMs in an Academic Setting

Title:

Beyond the Hype. Deploying and Evaluating a Conversational Agent Using LLMs in an Academic Setting

Berrizbeitia, Francisco ORCID: https://orcid.org/0000-0002-1542-8435 and Chalifour, Joshua ORCID: https://orcid.org/0000-0001-7663-0509 (2024) Beyond the Hype. Deploying and Evaluating a Conversational Agent Using LLMs in an Academic Setting. In: Access 2024, 21 Oct - 23 Oct, Montreal. (Unpublished)

[thumbnail of beyond-the-hype.pdf]
Preview
Slideshow (application/pdf)
beyond-the-hype.pdf - Presentation
Available under License Spectrum Terms of Access.
855kB

Abstract

This presentation will cover our ongoing work investigating and deploying generative AI technology in the context of libraries and memory institutions. It’s not novel that libraries provide online human or machine-based chat services, but taking advantage of generative AI requires new technical approaches and considerations around the ethics and usefulness of conversational agents. We will discuss our development of a chatbot configured for delivering academic library information services. This includes defining a protocol for assessing and guiding implementation decisions as well as evaluating the tool’s utility.

Our initial step in developing the chatbot involved building a knowledge base (stored on an in-house metadata management system), which could be connected to generative AI technology. Next, we experimented with a variety of open source and proprietary language models to understand how each performs. We are testing the following approaches: A closed source large language model (Bing Chat / Gemini / ChatGPT) prompted to act as reference personnel; a context-aware closed source LLM (OpenAI GPT); and a context-aware open source LLM (Llama). We are testing with questions that a useful chatbot should be able to answer. The chatbot’s responses for each approach are evaluated comparatively.

A key objective of this project is the testing protocol and evaluation framework. Reference questions often require a dynamic conversation, iterating on the direction of inquiry. This makes it challenging to evaluate outputs as merely accurate or inaccurate. Our study builds on Lai (2023) to develop a testing protocol, incorporating multiple dimensions of user interactions. Our protocol will support the interrogation of ethical concerns around these technologies and their application. We are operationalizing aspects of the LC Labs AI Planning Framework (Library of Congress, 2023) to define use cases for generative AI in information services and ethical criteria.

Divisions:Concordia University > Library
Item Type:Conference or Workshop Item (Lecture)
Refereed:No
Authors:Berrizbeitia, Francisco and Chalifour, Joshua
Date:22 October 2024
ID Code:994745
Deposited By: Francisco Berrizbeitia
Deposited On:04 Nov 2024 19:52
Last Modified:04 Nov 2024 19:52
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top