From 2c17ba40e875ba67871d2f68ced1c7d86a65748a Mon Sep 17 00:00:00 2001 From: Sanju Sivalingam Date: Tue, 17 Feb 2026 17:21:34 +0530 Subject: [PATCH] docs: add Android companion app design Co-Authored-By: Claude Opus 4.6 --- docs/plans/2026-02-17-android-app-design.md | 397 ++++++++++++++++++++ 1 file changed, 397 insertions(+) create mode 100644 docs/plans/2026-02-17-android-app-design.md diff --git a/docs/plans/2026-02-17-android-app-design.md b/docs/plans/2026-02-17-android-app-design.md new file mode 100644 index 0000000..b1d5e59 --- /dev/null +++ b/docs/plans/2026-02-17-android-app-design.md @@ -0,0 +1,397 @@ +# Android Companion App Design + +> DroidClaw Android app: the eyes and hands of the AI agent. Connects to the Hono server via WebSocket, captures accessibility trees and screenshots, executes gestures on command, and supports device-initiated goals. + +**Date:** 2026-02-17 +**Scope:** Full v1 (all 4 phases) +**Package:** `com.thisux.droidclaw` + +--- + +## Architecture Overview + +Three independent layers with clear boundaries: + +``` +┌──────────────────────────────────────────────┐ +│ UI Layer │ +│ MainActivity + Compose (Home, Settings, Logs)│ +│ Observes StateFlows from services │ +├──────────────────────────────────────────────┤ +│ Connection Layer │ +│ ConnectionService (foreground service) │ +│ ReliableWebSocket (Ktor) + CommandRouter │ +├──────────────────────────────────────────────┤ +│ Accessibility Layer │ +│ DroidClawAccessibilityService (system svc) │ +│ ScreenTreeBuilder + GestureExecutor │ +│ ScreenCaptureManager (MediaProjection) │ +└──────────────────────────────────────────────┘ +``` + +- **Accessibility Layer**: System-managed service. Reads screen trees, executes gestures, captures screenshots. Runs independently of app UI. +- **Connection Layer**: Foreground service with Ktor WebSocket. Bridges accessibility to server. Handles reconnection, heartbeat, message queuing. +- **UI Layer**: Compose with bottom nav. Observes service state via `StateFlow`. Goal input, settings, logs. + +--- + +## Project Structure + +``` +android/app/src/main/java/com/thisux/droidclaw/ +├── DroidClawApp.kt # Application class (DataStore init) +├── MainActivity.kt # Compose host + bottom nav +├── accessibility/ +│ ├── DroidClawAccessibilityService.kt # System service, tree capture +│ ├── ScreenTreeBuilder.kt # NodeInfo → UIElement list +│ └── GestureExecutor.kt # Node-first actions + dispatchGesture fallback +├── connection/ +│ ├── ConnectionService.kt # Foreground service, Ktor WebSocket +│ ├── ReliableWebSocket.kt # Reconnect, heartbeat, message queue +│ └── CommandRouter.kt # Dispatches server commands → GestureExecutor +├── capture/ +│ └── ScreenCaptureManager.kt # MediaProjection screenshots +├── model/ +│ ├── UIElement.kt # Mirrors @droidclaw/shared types +│ ├── Protocol.kt # WebSocket message types +│ └── AppState.kt # Connection status, steps, etc. +├── data/ +│ └── SettingsStore.kt # DataStore for API key, server URL +├── ui/ +│ ├── screens/ +│ │ ├── HomeScreen.kt # Status + goal input + live log +│ │ ├── SettingsScreen.kt # API key, server URL, battery opt +│ │ └── LogsScreen.kt # Step history +│ └── theme/ # Existing Material 3 theme +└── util/ + ├── BatteryOptimization.kt # OEM-specific exemption helpers + └── DeviceInfo.kt # Model, Android version, screen size +``` + +--- + +## Dependencies + +| Library | Version | Purpose | +|---------|---------|---------| +| `io.ktor:ktor-client-cio` | 3.1.x | HTTP/WebSocket client (coroutine-native) | +| `io.ktor:ktor-client-websockets` | 3.1.x | WebSocket plugin for Ktor | +| `org.jetbrains.kotlinx:kotlinx-serialization-json` | 1.7.x | JSON serialization | +| `org.jetbrains.kotlinx:kotlinx-coroutines-android` | 1.9.x | Coroutines | +| `androidx.datastore:datastore-preferences` | 1.1.x | Persistent settings (API key, server URL) | +| `androidx.lifecycle:lifecycle-service` | 2.8.x | Service lifecycle | +| `androidx.navigation:navigation-compose` | 2.8.x | Bottom nav routing | +| `androidx.compose.material:material-icons-extended` | latest | Nav icons | + +--- + +## Permissions + +```xml + + + + + + +``` + +Plus the accessibility service declaration: +```xml + + + + + + +``` + +--- + +## Layer 1: Accessibility Service + +### DroidClawAccessibilityService + +System-managed service. Android starts/stops it based on user toggling it in Settings > Accessibility. + +**State exposed via companion StateFlow** (no binding needed): +```kotlin +companion object { + val isRunning = MutableStateFlow(false) + val lastScreenTree = MutableStateFlow>(emptyList()) + var instance: DroidClawAccessibilityService? = null +} +``` + +**Lifecycle:** +- `onServiceConnected()`: Set `isRunning = true`, store `instance` +- `onAccessibilityEvent()`: Capture events for window changes, content changes +- `onInterrupt()` / `onDestroy()`: Set `isRunning = false`, clear `instance` + +### ScreenTreeBuilder + +Walks `rootInActiveWindow` depth-first, extracts: +- Bounds (Rect), center coordinates (x, y) +- text, contentDescription, className, viewIdResourceName +- State flags: enabled, checked, focused, scrollable, clickable, longClickable +- Parent context (parent class, parent description) + +**Output:** `List` matching `@droidclaw/shared` UIElement type. + +**Null handling:** `rootInActiveWindow` returns null during screen transitions. Retry with exponential backoff (50ms, 100ms, 200ms) up to 3 attempts. If still null, return empty list (server uses vision fallback). + +**Memory safety:** `AccessibilityNodeInfo` must be recycled. Use extension: +```kotlin +inline fun AccessibilityNodeInfo.use(block: (AccessibilityNodeInfo) -> T): T { + try { return block(this) } finally { recycle() } +} +``` + +**Screen hash:** `computeScreenHash()` — hash of element IDs + text + centers. Used by server for stuck-loop detection. + +### GestureExecutor + +Node-first strategy for all actions: + +| Action | Primary (node) | Fallback (gesture) | +|--------|----------------|-------------------| +| tap | `performAction(ACTION_CLICK)` on node at (x,y) | `dispatchGesture()` tap at coordinates | +| type | `performAction(ACTION_SET_TEXT)` on focused node | Character-by-character gesture taps | +| long_press | `performAction(ACTION_LONG_CLICK)` | `dispatchGesture()` hold 1000ms | +| swipe | — | `dispatchGesture()` path from start→end | +| scroll | `performAction(ACTION_SCROLL_FORWARD/BACKWARD)` on scrollable parent | Swipe gesture | +| back | `performGlobalAction(GLOBAL_ACTION_BACK)` | — | +| home | `performGlobalAction(GLOBAL_ACTION_HOME)` | — | +| notifications | `performGlobalAction(GLOBAL_ACTION_NOTIFICATIONS)` | — | +| launch | `startActivity(packageManager.getLaunchIntentForPackage())` | — | +| clear | Focus node → select all → delete | — | +| enter | `performAction(ACTION_IME_ENTER)` or keyevent KEYCODE_ENTER | — | + +**Result reporting:** Each action returns `ActionResult { success: Boolean, error: String? }`. + +--- + +## Layer 2: Connection Service + +### ConnectionService + +Foreground service with persistent notification. + +**Lifecycle:** +1. User taps "Connect" → service starts +2. Reads API key + server URL from DataStore +3. Creates `ReliableWebSocket` and connects +4. Notification shows: "DroidClaw - Connected to server" (or "Reconnecting...") +5. Notification has "Disconnect" action button +6. Service stops when user disconnects or notification action tapped + +**State exposed:** +```kotlin +companion object { + val connectionState = MutableStateFlow(ConnectionState.Disconnected) + val currentSteps = MutableStateFlow>(emptyList()) + val currentGoalStatus = MutableStateFlow(GoalStatus.Idle) + var instance: ConnectionService? = null +} +``` + +### ReliableWebSocket + +Wraps Ktor `WebSocketSession` with reliability: + +- **Connect:** `HttpClient { install(WebSockets) }` → `client.webSocket(serverUrl + "/ws/device")` +- **Auth handshake:** First message: `{ type: "auth", apiKey: "dc_xxx", deviceInfo: { model, android, screenWidth, screenHeight } }` +- **Wait for:** `{ type: "auth_ok", deviceId: "uuid" }` or `{ type: "auth_error" }` → close + surface error +- **Heartbeat:** Ktor WebSocket has built-in ping/pong. Configure `pingIntervalMillis = 30_000` +- **Reconnect:** On connection loss, exponential backoff: 1s → 2s → 4s → 8s → max 30s. Reset backoff on successful auth. +- **Message queue:** `Channel(Channel.BUFFERED)` for outbound messages. Drained when connected, buffered when disconnected. +- **State:** Emits `ConnectionState` (Disconnected, Connecting, Connected, Error(message)) + +### CommandRouter + +Receives JSON from WebSocket, parses, dispatches: + +``` +"get_screen" → ScreenTreeBuilder.capture() → send screen response +"get_screenshot"→ ScreenCaptureManager.capture() → compress, base64, send +"execute" → GestureExecutor.execute(action) → send result response +"ping" → send { type: "pong" } +"goal_started" → update UI state to running +"step" → append to currentSteps, update UI +"goal_completed"→ update UI state to completed +"goal_failed" → update UI state to failed +``` + +All responses include the `requestId` from the command for server-side Promise resolution. + +--- + +## Layer 3: Screen Capture + +### ScreenCaptureManager + +MediaProjection-based screenshot capture. + +**Setup:** +1. Request `MediaProjection` via `MediaProjectionManager.createScreenCaptureIntent()` +2. User grants consent (Android system dialog) +3. Create `VirtualDisplay` → `ImageReader` (RGBA_8888) +4. Keep projection alive in ConnectionService scope + +**Capture flow:** +1. Server requests screenshot +2. Acquire latest `Image` from `ImageReader` +3. Convert to `Bitmap` +4. Scale to max 720px width (maintain aspect ratio) +5. Compress to JPEG quality 50 +6. Return `ByteArray` + +**Edge cases:** +- **Android 14+:** Per-session consent. Projection dies if user revokes or after reboot. Re-prompt on next connect. +- **FLAG_SECURE:** Returns black frame. Detect by checking if all pixels are black (sample corners). Report `error: "secure_window"` to server. +- **Projection unavailable:** Graceful degradation. Server works with accessibility tree only (vision fallback without actual screenshot). + +--- + +## Layer 4: Data & Settings + +### SettingsStore + +Preferences DataStore for persistent settings: + +| Key | Type | Default | +|-----|------|---------| +| `api_key` | String | `""` | +| `server_url` | String | `"wss://localhost:8080"` | +| `device_name` | String | Device model name | +| `auto_connect` | Boolean | `false` | + +Exposed as `Flow` for reactive UI updates. + +--- + +## Layer 5: UI + +### Navigation + +Bottom nav with 3 tabs: +- **Home** (icon: `Home`) — connection status, goal input, live steps +- **Settings** (icon: `Settings`) — API key, server URL, permissions checklist +- **Logs** (icon: `History`) — past session history + +### HomeScreen + +``` +┌─────────────────────────────┐ +│ ● Connected to server │ ← status badge (green/yellow/red) +├─────────────────────────────┤ +│ [Enter a goal... ] [Run] │ ← goal input + submit +├─────────────────────────────┤ +│ Step 1: tap (540, 800) │ ← live step log +│ "Tapping the search icon" │ +│ │ +│ Step 2: type "lofi beats" │ +│ "Typing the search query" │ +│ │ +│ ✓ Goal completed (5 steps) │ ← final status +└─────────────────────────────┘ +``` + +- Goal input disabled when not connected or when a goal is running +- Steps stream in real-time via `ConnectionService.currentSteps` StateFlow +- Status transitions: idle → running → completed/failed + +### SettingsScreen + +``` +┌─────────────────────────────┐ +│ API Key │ +│ [dc_••••••••••••••] [Edit]│ +├─────────────────────────────┤ +│ Server URL │ +│ [wss://your-server.app ] │ +├─────────────────────────────┤ +│ Setup Checklist │ +│ ✓ API key configured │ +│ ✗ Accessibility service │ ← tap to open Android settings +│ ✗ Screen capture permission │ ← tap to grant +│ ✓ Battery optimization off │ +└─────────────────────────────┘ +``` + +- Warning cards for missing setup items +- Deep-links to Android system settings for accessibility toggle +- Battery optimization request via `ACTION_REQUEST_IGNORE_BATTERY_OPTIMIZATIONS` + +### LogsScreen + +- In-memory list of past sessions: goal text, step count, success/failure, timestamp +- Tap to expand → shows all steps with action + reasoning +- Clears on app restart (persistent storage is v2) + +--- + +## WebSocket Protocol (Device Side) + +### Device → Server + +| Message | When | +|---------|------| +| `{ type: "auth", apiKey, deviceInfo }` | On connect | +| `{ type: "screen", requestId, elements, screenHash }` | Response to get_screen | +| `{ type: "screenshot", requestId, image }` | Response to get_screenshot | +| `{ type: "result", requestId, success, error? }` | Response to execute | +| `{ type: "goal", text }` | User submits goal on phone | +| `{ type: "pong" }` | Response to ping | + +### Server → Device + +| Message | When | +|---------|------| +| `{ type: "auth_ok", deviceId }` | Auth succeeded | +| `{ type: "auth_error", message }` | Auth failed | +| `{ type: "get_screen", requestId }` | Agent loop needs screen tree | +| `{ type: "get_screenshot", requestId }` | Vision fallback | +| `{ type: "execute", requestId, action }` | Execute tap/type/swipe/etc | +| `{ type: "ping" }` | Heartbeat check | +| `{ type: "step", step, action, reasoning }` | Live step update (for phone UI) | +| `{ type: "goal_started", sessionId }` | Agent loop started | +| `{ type: "goal_completed", sessionId }` | Agent loop done | +| `{ type: "goal_failed", sessionId, error }` | Agent loop failed | + +--- + +## Battery Optimization + +OEM-specific battery killers are the #2 reliability problem after Google Play policy. + +**Strategy:** +1. Detect if battery optimization is disabled: `PowerManager.isIgnoringBatteryOptimizations()` +2. If not, show warning card in Settings with button to request exemption +3. For aggressive OEMs (Xiaomi, Huawei, Samsung, OnePlus, Oppo, Vivo), show additional guidance linking to dontkillmyapp.com +4. ConnectionService uses `PARTIAL_WAKE_LOCK` to prevent CPU sleep during active goals +5. Foreground service notification keeps process priority high + +--- + +## Distribution + +- **Primary:** APK sideload from droidclaw.ai +- **Secondary:** F-Droid +- **NOT Play Store:** Google Play policy (Nov 2025) explicitly prohibits autonomous AI action execution via AccessibilityService + +--- + +## Known Limitations + +1. **FLAG_SECURE apps** (banking, password managers) block both tree and screenshots +2. **WebView/Flutter** apps may return empty accessibility trees — server falls back to vision +3. **Android 14+** requires per-session MediaProjection consent +4. **Android 16 Advanced Protection** will auto-revoke accessibility for non-accessibility tools +5. **dispatchGesture()** can be detected/ignored by some apps — node-first strategy mitigates +6. **rootInActiveWindow** returns null during transitions — retry with backoff