Real-World AI Evaluation Design and Planning

Authors: Westling, C., Briggs, M., Skeadas, T.

Conference: ICSIS 2026

Dates: 09/06/2026

Publication Date: 12/06/2026

Abstract:

Understanding how AI systems behave in the real- world is becoming more imperative in a world where companies, organizations, and governments are quickly adopting and deploying this technology. Using a novel framework for real-world AI evaluation, CIRCLE [1], we present a set of activities for testing AI systems in deployment contexts including field testing and red teaming. We demonstrate how these activities can produce specific outcomes of interest to stakeholders outside the AI stack. The CIRCLE framework is rooted in an understanding of the AI lifecycle that moves beyond traditional model-centric evaluation techniques. By providing a hypothetical case study from an education setting, we showcase how evaluation approaches that are responsive to stakeholders’ views outside of the traditional AI stack allow for systems that are aligned with stakeholder objectives, support the aims of building more trustworthy and safer AI systems, and enable better decisions about their deployment.

Source: Manual