Android Action Kernel

AI agent that controls Android devices through the Accessibility API. Give it a goal in plain English and it autonomously navigates the device using a Perception → Reasoning → Action loop.

How It Works

Perceive — Captures the screen's accessibility tree via adb shell uiautomator dump, parses it into interactive UI elements with coordinates and state
Reason — Sends the screen context, action history, and goal to an LLM which decides the next action as a JSON object
Act — Executes the action (tap, type, swipe, launch app, etc.) via ADB
Repeat — Diffs the screen state, detects stuck loops, and continues until the goal is done or max steps reached

Falls back to screenshot-based vision when the accessibility tree is empty (games, WebViews, Flutter).

Prerequisites

Bun 1.0+
Android SDK Platform Tools (ADB in PATH)
Android device connected via USB or WiFi ADB
API key for one of: Groq, OpenAI, AWS Bedrock, or OpenRouter

Quick Start

cd android-action-kernel
bun install
cp .env.example .env
# Edit .env — set LLM_PROVIDER and the corresponding API key
bun run src/kernel.ts

The agent will prompt you for a goal, then start controlling the device.

Configuration

Copy .env.example to .env. Key settings:

Variable	Default	Description
`LLM_PROVIDER`	`groq`	`groq`, `openai`, `bedrock`, or `openrouter`
`MAX_STEPS`	`30`	Maximum actions before stopping
`STEP_DELAY`	`2`	Seconds between actions (lets UI settle)
`STUCK_THRESHOLD`	`3`	Unchanged screens before recovery kicks in
`VISION_ENABLED`	`true`	Screenshot fallback when accessibility tree is empty

LLM Providers

Provider	Key Variable	Default Model
Groq (free tier)	`GROQ_API_KEY`	`llama-3.3-70b-versatile`
OpenAI	`OPENAI_API_KEY`	`gpt-4o`
AWS Bedrock	AWS credential chain	`us.meta.llama3-3-70b-instruct-v1:0`
OpenRouter	`OPENROUTER_API_KEY`	`anthropic/claude-3.5-sonnet`

Available Actions

The agent can perform 15 actions:

Category	Actions
Navigation	`tap`, `longpress`, `swipe`, `enter`, `back`, `home`
Text	`type`, `clear`
App Control	`launch` (by package, activity, or URI with extras)
Data	`screenshot`, `clipboard_get`, `clipboard_set`
System	`shell`, `wait`, `done`

Project Structure

src/
  kernel.ts          # Main agent loop (entry point)
  actions.ts         # ADB action implementations with retry
  llm-providers.ts   # LLM abstraction (OpenAI, Groq, Bedrock, OpenRouter)
  sanitizer.ts       # Accessibility XML parser
  config.ts          # Environment config loader
  constants.ts       # ADB keycodes, coordinates, defaults

Notes

Swipe coordinates in constants.ts are calibrated for 1080px-wide screens. Adjust SWIPE_COORDS for different resolutions.
The agent automatically detects stuck loops and injects recovery hints after STUCK_THRESHOLD steps without screen changes.
ADB commands retry with exponential backoff (up to MAX_RETRIES attempts).

3.1 KiB Raw Blame History