Introduction

agent-device is a CLI for automating iOS simulators + physical devices and Android emulators + devices from agents. It provides:

  • Accessibility snapshots for UI understanding
  • Deterministic interactions (tap, type, scroll)
  • Session-aware workflows and replay

If you know agent-browser, this is the mobile-native counterpart for iOS/Android UI automation. For exploratory QA and bug-hunting workflows, see skills/dogfood/SKILL.md in this repository.

What it’s good at

  • Capturing structured UI state for LLMs
  • Driving common UI actions with refs or semantic selectors
  • Replaying flows for regression checks

Platform support highlights

  • iOS core runner commands: snapshot, diff snapshot, wait, click, fill, get, is, find, press, long-press, focus, type, scroll, scrollintoview, back, home, app-switcher, open (app), close, screenshot, apps, appstate, install, install-from-source, reinstall, trigger-app-event.
  • iOS appstate is session-scoped on the selected target device.
  • iOS simulator-only: alert, pinch, settings, push, clipboard.
  • Session performance metrics: perf/metrics is available on iOS and Android and currently reports startup timing sampled from open command round-trip duration.
  • iOS record supports simulators and physical devices.
    • Simulators use native simctl io ... recordVideo.
    • Physical devices use runner screenshot capture (XCUIScreen.main.screenshot() frames) stitched into MP4, so FPS is best-effort (not guaranteed 60 even with --fps 60).
    • Physical-device recording requires an active app session context (open <app> first).
    • Physical-device recording defaults to 15 FPS and supports --fps caps.
  • Android supports the same core interaction set, plus push notification simulation, clipboard read/write, and keyboard status|get|dismiss via adb shell commands.
  • App-event triggers are available on iOS and Android through app-defined deep-link hooks (trigger-app-event), using active session context or explicit device selectors.

Architecture (high level)

  1. CLI sends requests to the daemon.
  2. The daemon manages sessions and dispatches to platform drivers.
  3. iOS uses XCTest runner for snapshots and input on simulators and physical devices.
  4. Android uses ADB-based tooling.

Example

# Navigate and get snapshot
agent-device open Settings --platform ios
agent-device snapshot -i
# Output
# Page: Contacts
# App: com.apple.MobileAddressBook
# Snapshot: 44 nodes
# @e1 [application] "Contacts"
#  @e2 [window]
#    @e3 [other]
#  @e4 [other] "Lists"
#    @e5 [navigation-bar] "Lists"
#      @e6 [button] "Lists"
#      @e7 [text] "Contacts"
#    @e8 [other] "John Doe"

# Click and fill
agent-device click @e8
agent-device snapshot -i
agent-device diff snapshot
agent-device fill @e5 "Doe 2"
agent-device close

Need React or React Native expertise you can count on?