Towards Trustworthy AI for Academic Policy Navigation: A Human Evaluation of a RAG-Powered Chatbot

Authors: Meacham, S. and et, A.

Journal: AI and Society

Publisher: Springer Nature

eISSN: 1435-5655

ISSN: 0951-5666

Abstract:

Navigating institutional policies poses significant challenges for students and staff due to the hierarchical, legalistic language and fragmented access points across university systems. While Large Language Models (LLMs) like GPT-4o offer natural language fluency, their lack of grounding and hallucination risks limit their trustworthiness in academic domains. This study presents and evaluates a Retrieval-Augmented Generation (RAG) chatbot, purpose-built for Bournemouth University's Code of Practice for Research Degrees. We combine Pinecone vector search, layout-aware document chunking, hybrid reranking, and GPT-4o-based answer generation to ensure contextual relevance and citation transparency. Evaluation via the RAGAS framework and BERTScore yielded a faithfulness score of 0.9597, outperforming baseline LLMs. Simulated usability feedback from doctoral students highlighted strengths in clarity (90\%) and source attribution (93\%). By integrating semantic retrieval, human-centered evaluation, and domain-specific preprocessing, this work demonstrates a scalable pathway toward trustworthy AI assistants for institutional policy navigation.

Source: Manual