Large-Scale Heuristic Evaluation for UX Using Multimodal Models

KIXLAB's Grant Project with Samsung CX Insight Team (MX Division)

Grant Project KIXLAB, Samsung CX Insight Team (MX Division)
My Role User Interface, Evaluation, Multimodal LLM
Project Period 2025

KIXLAB’s Grant Project with Samsung CX Insight Team (MX Division)

Overview of the Large-Scale Heuristic Evaluation Framework.

Phase 1. Taxonomy of Functional and Evaluation Dimensions for Generative AI Assessment

Goal: Systematically categorize functions and evaluation factors required to assess generative AI systems.

  • Identified limitations of directly applying traditional static evaluation metrics to generative AI features
  • Structured a taxonomy covering:
    • Roles of generative AI
      • Input enhancement, output generation, output refinement
      • Synthesis, adaptation, and context/state management
    • Modalities and interaction types
      • Text, image, sound
      • Input (e.g., typing, drawing, recording) and output interactions (e.g., viewing, adjusting)
    • User factors
      • Domain expertise
      • Prior experience
    • Evaluation goals
      • Usability, system capability, user behavior understanding, and changes in users’ mental models
    • Task design and evaluation setup
      • Automated vs. heuristic tasks
      • Open-ended vs. close-ended user tasks
    • Measurement dimensions
      • Subjective (e.g., usability, creativity, interactivity)
      • Objective (e.g., task completion, time, output quality)

Phase 2. Exploring LMM-based Usability Evaluation Methods

Research Questions:

  • How can LMMs be used to discover diverse usability issues?
  • To what extent can LMMs assist usability evaluation?
  • Can LMMs identify subtle usability issues?

Methods:

  • Applied three heuristic-based evaluation perspectives:
    • Nielsen Norman heuristics
    • User journey–based evaluation
    • User needs–based evaluation
  • Used LMMs to extract usability issues from given user scenarios and evaluation prompts

Findings:

  • Identified usability issues that are visually salient within specific user scenarios
  • Discovered usability issues unfamiliar to evaluators or difficult to articulate manually
  • Enabled evaluation from diverse remember perspectives by considering different user needs and contexts

Phase 3. Toward Practical AI Heuristic Evaluation in Practice

Observed limitations of AI-based evaluation:

  • Limited coverage of diverse and dynamic user scenarios
  • Tendency to generate usability issues outside predefined scenarios or evaluation scopes

Follow-up Research Question:

  • How can human evaluators (UI/UX experts) and AI collaborate to overcome each other’s limitations?

Explored Collaboration Scenarios:

  • AI identifies potential usability issues, which human experts review and refine
  • Human experts provide contextual information and evaluation criteria to guide AI assessment

Goal:

  • To refine AI-based heuristic evaluation roles identified in Phase 2 and explore their applicability as practical guidelines for real-world design workflows