---
mw_bundle: 1
id: BT59F4sC
title: "Gaze control macOS prototype"
url: https://memory.wiki/b/BT59F4sC
document_count: 2
updated: 2026-06-02T17:32:00.470Z
analysis_generated_at: 2026-06-02T17:16:27.380Z
analysis_stale: true
source: "memory.wiki"
---
# Gaze control macOS prototype

> ⚠ _Analysis may be stale — one or more member docs were edited after the last analysis run. Re-run the canvas to refresh._

## Summary

This collection represents a comprehensive approach to building a Vision Pro-style webcam controller for macOS, consisting of both a working prototype and strategic design framework. The documents showcase a deliberate progression from raw implementation to polished product, with deep understanding of both technical constraints and interaction design principles. The core insight is that webcam gaze tracking inherent limitations require a snap-to-UI-element approach rather than traditional floating cursor control.

## Themes

- Vision Pro interaction paradigm
- Webcam-based multimodal control
- Accessibility-driven design
- Prototype-to-product methodology

## Cross-document insights

- The deliberate preservation of jitter in v0 is a brilliant product strategy that makes users viscerally understand why the snap-to-UI approach is necessary, rather than trying to explain it theoretically.
- The project recognizes that webcam gaze accuracy is fundamentally limited by biology (eye saccades), not technology, which is why it pivots to UI element targeting rather than pursuing higher precision.
- The multimodal approach cleverly divides labor: eyes for coarse region selection, head pose for fine adjustment, and fingers for activation - matching human motor control capabilities.
- The constraint to webcam-only input actually drives innovation by forcing creative solutions within realistic hardware limitations rather than requiring specialized eye trackers.

## Key takeaways

- Success requires abandoning traditional floating cursor gaze control in favor of UI element snapping due to fundamental webcam accuracy limitations
- The prototype deliberately preserves raw jitter to motivate the v1 snap-to-UI design rather than attempting to smooth it away
- Multimodal input strategy assigns specific roles to each modality based on human motor control capabilities rather than trying to make any single input perfect

## Open questions / gaps

- No discussion of performance benchmarks, latency requirements, or computational overhead for real-time tracking
- Missing error handling strategies for poor lighting conditions, user movement, or hardware limitations
- Lack of user testing methodology or metrics for evaluating interaction effectiveness
- No consideration of privacy implications or user consent for continuous webcam monitoring

## Notable connections

- **doc:xH0i5alB** ↔ **doc:1_r5VqyU** — The design document provides the theoretical framework and strategic vision that the prototype code implements as a proof-of-concept validation.

## Concepts (this bundle)

- **Vision Pro Interaction Paradigm**
- **Webcam Gaze Limitations**
- **Snap-to-UI-Element**
- **Multimodal Input Strategy**
- **Deliberate Jitter Strategy**
- **9-Point Calibration**
- **Prototype-to-Product Pipeline**
- **Webcam-Only Constraint**
- **Tuning Parameters**

## Concept relations

- **Vision Pro Interaction Paradigm** ↔ **Snap-to-UI-Element** — enables through
- **Webcam Gaze Limitations** ↔ **Snap-to-UI-Element** — necessitates solution
- **Deliberate Jitter Strategy** ↔ **Snap-to-UI-Element** — motivates need for

## Documents

### 1. [----------------------------- tuning knobs --------------------------------](https://memory.wiki/1_r5VqyU)
This is a raw prototype Vision Pro-style controller that tracks iris gaze from a webcam to move the macOS cursor and detects thumb-index pinches to trigger clicks, using MediaPipe for face and hand detection and a quadratic calibration map to convert normalized gaze ratios to screen coordinates.

### 2. [Gaze Control — webcam-only multimodal macOS controller](https://memory.wiki/xH0i5alB)
A macOS controller project uses only the built-in webcam to enable Vision Pro-style interaction through gaze-based element selection combined with head pose, finger pinches, and gestures. Rather than tracking a cursor, gaze selects UI elements via the Accessibility API while pinches activate them, overcoming webcam gaze imprecision by snapping focus to nearby on-screen elements.
*sections:* Goal: Build a Vision Pro–style controller for a flat screen (regular Mac display) driven by | Hard constraints: Webcam is the ONLY input sensor. No Tobii, no Leap Motion, no depth cam.; Target OS: macOS (Apple Silicon).; The end user (project owner) does not write code directly — explain changes plainly and | Core design principle (do not violate): Do NOT build a floating gaze cursor. Webcam gaze is only accurate to ~1–4 cm and jitters; Gaze selects/highlights the UI element you're looking at (focus), it does not paint pixels.; A pinch activates the focused element.; Use the macOS Accessibility API (AXUIElement) to read on-screen element frames so gaze can | …


_Digest view — follow any link above to fetch that doc's full markdown. Add `?full=1` to this URL for the concatenated payload._