Large-Scale Heuristic Evaluation for UX Using Multimodal Models
KIXLAB's Grant Project with Samsung CX Insight Team (MX Division)
KIXLAB’s Grant Project with Samsung CX Insight Team (MX Division)
Overview of the Large-Scale Heuristic Evaluation Framework.
Phase 1. Taxonomy of Functional and Evaluation Dimensions for Generative AI Assessment
Goal: Systematically categorize functions and evaluation factors required to assess generative AI systems.
- Identified limitations of directly applying traditional static evaluation metrics to generative AI features
- Structured a taxonomy covering:
- Roles of generative AI
- Input enhancement, output generation, output refinement
- Synthesis, adaptation, and context/state management
- Modalities and interaction types
- Text, image, sound
- Input (e.g., typing, drawing, recording) and output interactions (e.g., viewing, adjusting)
- User factors
- Domain expertise
- Prior experience
- Evaluation goals
- Usability, system capability, user behavior understanding, and changes in users’ mental models
- Task design and evaluation setup
- Automated vs. heuristic tasks
- Open-ended vs. close-ended user tasks
- Measurement dimensions
- Subjective (e.g., usability, creativity, interactivity)
- Objective (e.g., task completion, time, output quality)
- Roles of generative AI
Phase 2. Exploring LMM-based Usability Evaluation Methods
Research Questions:
- How can LMMs be used to discover diverse usability issues?
- To what extent can LMMs assist usability evaluation?
- Can LMMs identify subtle usability issues?
Methods:
- Applied three heuristic-based evaluation perspectives:
- Nielsen Norman heuristics
- User journey–based evaluation
- User needs–based evaluation
- Used LMMs to extract usability issues from given user scenarios and evaluation prompts
Findings:
- Identified usability issues that are visually salient within specific user scenarios
- Discovered usability issues unfamiliar to evaluators or difficult to articulate manually
- Enabled evaluation from diverse remember perspectives by considering different user needs and contexts
Phase 3. Toward Practical AI Heuristic Evaluation in Practice
Observed limitations of AI-based evaluation:
- Limited coverage of diverse and dynamic user scenarios
- Tendency to generate usability issues outside predefined scenarios or evaluation scopes
Follow-up Research Question:
- How can human evaluators (UI/UX experts) and AI collaborate to overcome each other’s limitations?
Explored Collaboration Scenarios:
- AI identifies potential usability issues, which human experts review and refine
- Human experts provide contextual information and evaluation criteria to guide AI assessment
Goal:
- To refine AI-based heuristic evaluation roles identified in Phase 2 and explore their applicability as practical guidelines for real-world design workflows