Large-Scale Heuristic Evaluation for UX Using Multimodal Models

KIXLAB’s Grant Project with Samsung CX Insight Team (MX Division)

Overview of the Large-Scale Heuristic Evaluation Framework.

Phase 1. Taxonomy of Functional and Evaluation Dimensions for Generative AI Assessment

Goal: Systematically categorize functions and evaluation factors required to assess generative AI systems.

Identified limitations of directly applying traditional static evaluation metrics to generative AI features
Structured a taxonomy covering:
- Roles of generative AI
  - Input enhancement, output generation, output refinement
  - Synthesis, adaptation, and context/state management
- Modalities and interaction types
  - Text, image, sound
  - Input (e.g., typing, drawing, recording) and output interactions (e.g., viewing, adjusting)
- User factors
  - Domain expertise
  - Prior experience
- Evaluation goals
  - Usability, system capability, user behavior understanding, and changes in users’ mental models
- Task design and evaluation setup
  - Automated vs. heuristic tasks
  - Open-ended vs. close-ended user tasks
- Measurement dimensions
  - Subjective (e.g., usability, creativity, interactivity)
  - Objective (e.g., task completion, time, output quality)

Research Questions:

Methods:

Applied three heuristic-based evaluation perspectives:
- Nielsen Norman heuristics
- User journey–based evaluation
- User needs–based evaluation
Used LMMs to extract usability issues from given user scenarios and evaluation prompts

Findings:

Identified usability issues that are visually salient within specific user scenarios
Discovered usability issues unfamiliar to evaluators or difficult to articulate manually
Enabled evaluation from diverse remember perspectives by considering different user needs and contexts

Observed limitations of AI-based evaluation:

Limited coverage of diverse and dynamic user scenarios
Tendency to generate usability issues outside predefined scenarios or evaluation scopes

Follow-up Research Question:

How can human evaluators (UI/UX experts) and AI collaborate to overcome each other’s limitations?

Explored Collaboration Scenarios:

AI identifies potential usability issues, which human experts review and refine
Human experts provide contextual information and evaluation criteria to guide AI assessment

Goal:

To refine AI-based heuristic evaluation roles identified in Phase 2 and explore their applicability as practical guidelines for real-world design workflows