From 0c0efe9b1e10c6849e93e2ab3ec63e6c5091f47f Mon Sep 17 00:00:00 2001 From: Sanju Sivalingam Date: Tue, 17 Feb 2026 20:17:06 +0530 Subject: [PATCH] chore: remove docs/plans from repo and gitignore it Contains product architecture, roadmaps, and implementation plans that should not be public. Co-Authored-By: Claude Opus 4.6 --- .gitignore | 1 + docs/plans/2026-02-17-android-app-design.md | 397 --- docs/plans/2026-02-17-android-app-plan.md | 2564 ----------------- .../2026-02-17-option1-implementation-plan.md | 2394 --------------- .../2026-02-17-option1-web-backend-design.md | 357 --- 5 files changed, 1 insertion(+), 5712 deletions(-) delete mode 100644 docs/plans/2026-02-17-android-app-design.md delete mode 100644 docs/plans/2026-02-17-android-app-plan.md delete mode 100644 docs/plans/2026-02-17-option1-implementation-plan.md delete mode 100644 docs/plans/2026-02-17-option1-web-backend-design.md diff --git a/.gitignore b/.gitignore index e73ba2f..04b39fa 100644 --- a/.gitignore +++ b/.gitignore @@ -7,6 +7,7 @@ competitor/ logs/ kernel_screenshot.png window_dump.xml +docs/plans/ docs/architecture-web-flow.md docs/INTENT.md OPTION1-IMPLEMENTATION.md diff --git a/docs/plans/2026-02-17-android-app-design.md b/docs/plans/2026-02-17-android-app-design.md deleted file mode 100644 index b1d5e59..0000000 --- a/docs/plans/2026-02-17-android-app-design.md +++ /dev/null @@ -1,397 +0,0 @@ -# Android Companion App Design - -> DroidClaw Android app: the eyes and hands of the AI agent. Connects to the Hono server via WebSocket, captures accessibility trees and screenshots, executes gestures on command, and supports device-initiated goals. - -**Date:** 2026-02-17 -**Scope:** Full v1 (all 4 phases) -**Package:** `com.thisux.droidclaw` - ---- - -## Architecture Overview - -Three independent layers with clear boundaries: - -``` -┌──────────────────────────────────────────────┐ -│ UI Layer │ -│ MainActivity + Compose (Home, Settings, Logs)│ -│ Observes StateFlows from services │ -├──────────────────────────────────────────────┤ -│ Connection Layer │ -│ ConnectionService (foreground service) │ -│ ReliableWebSocket (Ktor) + CommandRouter │ -├──────────────────────────────────────────────┤ -│ Accessibility Layer │ -│ DroidClawAccessibilityService (system svc) │ -│ ScreenTreeBuilder + GestureExecutor │ -│ ScreenCaptureManager (MediaProjection) │ -└──────────────────────────────────────────────┘ -``` - -- **Accessibility Layer**: System-managed service. Reads screen trees, executes gestures, captures screenshots. Runs independently of app UI. -- **Connection Layer**: Foreground service with Ktor WebSocket. Bridges accessibility to server. Handles reconnection, heartbeat, message queuing. -- **UI Layer**: Compose with bottom nav. Observes service state via `StateFlow`. Goal input, settings, logs. - ---- - -## Project Structure - -``` -android/app/src/main/java/com/thisux/droidclaw/ -├── DroidClawApp.kt # Application class (DataStore init) -├── MainActivity.kt # Compose host + bottom nav -├── accessibility/ -│ ├── DroidClawAccessibilityService.kt # System service, tree capture -│ ├── ScreenTreeBuilder.kt # NodeInfo → UIElement list -│ └── GestureExecutor.kt # Node-first actions + dispatchGesture fallback -├── connection/ -│ ├── ConnectionService.kt # Foreground service, Ktor WebSocket -│ ├── ReliableWebSocket.kt # Reconnect, heartbeat, message queue -│ └── CommandRouter.kt # Dispatches server commands → GestureExecutor -├── capture/ -│ └── ScreenCaptureManager.kt # MediaProjection screenshots -├── model/ -│ ├── UIElement.kt # Mirrors @droidclaw/shared types -│ ├── Protocol.kt # WebSocket message types -│ └── AppState.kt # Connection status, steps, etc. -├── data/ -│ └── SettingsStore.kt # DataStore for API key, server URL -├── ui/ -│ ├── screens/ -│ │ ├── HomeScreen.kt # Status + goal input + live log -│ │ ├── SettingsScreen.kt # API key, server URL, battery opt -│ │ └── LogsScreen.kt # Step history -│ └── theme/ # Existing Material 3 theme -└── util/ - ├── BatteryOptimization.kt # OEM-specific exemption helpers - └── DeviceInfo.kt # Model, Android version, screen size -``` - ---- - -## Dependencies - -| Library | Version | Purpose | -|---------|---------|---------| -| `io.ktor:ktor-client-cio` | 3.1.x | HTTP/WebSocket client (coroutine-native) | -| `io.ktor:ktor-client-websockets` | 3.1.x | WebSocket plugin for Ktor | -| `org.jetbrains.kotlinx:kotlinx-serialization-json` | 1.7.x | JSON serialization | -| `org.jetbrains.kotlinx:kotlinx-coroutines-android` | 1.9.x | Coroutines | -| `androidx.datastore:datastore-preferences` | 1.1.x | Persistent settings (API key, server URL) | -| `androidx.lifecycle:lifecycle-service` | 2.8.x | Service lifecycle | -| `androidx.navigation:navigation-compose` | 2.8.x | Bottom nav routing | -| `androidx.compose.material:material-icons-extended` | latest | Nav icons | - ---- - -## Permissions - -```xml - - - - - - -``` - -Plus the accessibility service declaration: -```xml - - - - - - -``` - ---- - -## Layer 1: Accessibility Service - -### DroidClawAccessibilityService - -System-managed service. Android starts/stops it based on user toggling it in Settings > Accessibility. - -**State exposed via companion StateFlow** (no binding needed): -```kotlin -companion object { - val isRunning = MutableStateFlow(false) - val lastScreenTree = MutableStateFlow>(emptyList()) - var instance: DroidClawAccessibilityService? = null -} -``` - -**Lifecycle:** -- `onServiceConnected()`: Set `isRunning = true`, store `instance` -- `onAccessibilityEvent()`: Capture events for window changes, content changes -- `onInterrupt()` / `onDestroy()`: Set `isRunning = false`, clear `instance` - -### ScreenTreeBuilder - -Walks `rootInActiveWindow` depth-first, extracts: -- Bounds (Rect), center coordinates (x, y) -- text, contentDescription, className, viewIdResourceName -- State flags: enabled, checked, focused, scrollable, clickable, longClickable -- Parent context (parent class, parent description) - -**Output:** `List` matching `@droidclaw/shared` UIElement type. - -**Null handling:** `rootInActiveWindow` returns null during screen transitions. Retry with exponential backoff (50ms, 100ms, 200ms) up to 3 attempts. If still null, return empty list (server uses vision fallback). - -**Memory safety:** `AccessibilityNodeInfo` must be recycled. Use extension: -```kotlin -inline fun AccessibilityNodeInfo.use(block: (AccessibilityNodeInfo) -> T): T { - try { return block(this) } finally { recycle() } -} -``` - -**Screen hash:** `computeScreenHash()` — hash of element IDs + text + centers. Used by server for stuck-loop detection. - -### GestureExecutor - -Node-first strategy for all actions: - -| Action | Primary (node) | Fallback (gesture) | -|--------|----------------|-------------------| -| tap | `performAction(ACTION_CLICK)` on node at (x,y) | `dispatchGesture()` tap at coordinates | -| type | `performAction(ACTION_SET_TEXT)` on focused node | Character-by-character gesture taps | -| long_press | `performAction(ACTION_LONG_CLICK)` | `dispatchGesture()` hold 1000ms | -| swipe | — | `dispatchGesture()` path from start→end | -| scroll | `performAction(ACTION_SCROLL_FORWARD/BACKWARD)` on scrollable parent | Swipe gesture | -| back | `performGlobalAction(GLOBAL_ACTION_BACK)` | — | -| home | `performGlobalAction(GLOBAL_ACTION_HOME)` | — | -| notifications | `performGlobalAction(GLOBAL_ACTION_NOTIFICATIONS)` | — | -| launch | `startActivity(packageManager.getLaunchIntentForPackage())` | — | -| clear | Focus node → select all → delete | — | -| enter | `performAction(ACTION_IME_ENTER)` or keyevent KEYCODE_ENTER | — | - -**Result reporting:** Each action returns `ActionResult { success: Boolean, error: String? }`. - ---- - -## Layer 2: Connection Service - -### ConnectionService - -Foreground service with persistent notification. - -**Lifecycle:** -1. User taps "Connect" → service starts -2. Reads API key + server URL from DataStore -3. Creates `ReliableWebSocket` and connects -4. Notification shows: "DroidClaw - Connected to server" (or "Reconnecting...") -5. Notification has "Disconnect" action button -6. Service stops when user disconnects or notification action tapped - -**State exposed:** -```kotlin -companion object { - val connectionState = MutableStateFlow(ConnectionState.Disconnected) - val currentSteps = MutableStateFlow>(emptyList()) - val currentGoalStatus = MutableStateFlow(GoalStatus.Idle) - var instance: ConnectionService? = null -} -``` - -### ReliableWebSocket - -Wraps Ktor `WebSocketSession` with reliability: - -- **Connect:** `HttpClient { install(WebSockets) }` → `client.webSocket(serverUrl + "/ws/device")` -- **Auth handshake:** First message: `{ type: "auth", apiKey: "dc_xxx", deviceInfo: { model, android, screenWidth, screenHeight } }` -- **Wait for:** `{ type: "auth_ok", deviceId: "uuid" }` or `{ type: "auth_error" }` → close + surface error -- **Heartbeat:** Ktor WebSocket has built-in ping/pong. Configure `pingIntervalMillis = 30_000` -- **Reconnect:** On connection loss, exponential backoff: 1s → 2s → 4s → 8s → max 30s. Reset backoff on successful auth. -- **Message queue:** `Channel(Channel.BUFFERED)` for outbound messages. Drained when connected, buffered when disconnected. -- **State:** Emits `ConnectionState` (Disconnected, Connecting, Connected, Error(message)) - -### CommandRouter - -Receives JSON from WebSocket, parses, dispatches: - -``` -"get_screen" → ScreenTreeBuilder.capture() → send screen response -"get_screenshot"→ ScreenCaptureManager.capture() → compress, base64, send -"execute" → GestureExecutor.execute(action) → send result response -"ping" → send { type: "pong" } -"goal_started" → update UI state to running -"step" → append to currentSteps, update UI -"goal_completed"→ update UI state to completed -"goal_failed" → update UI state to failed -``` - -All responses include the `requestId` from the command for server-side Promise resolution. - ---- - -## Layer 3: Screen Capture - -### ScreenCaptureManager - -MediaProjection-based screenshot capture. - -**Setup:** -1. Request `MediaProjection` via `MediaProjectionManager.createScreenCaptureIntent()` -2. User grants consent (Android system dialog) -3. Create `VirtualDisplay` → `ImageReader` (RGBA_8888) -4. Keep projection alive in ConnectionService scope - -**Capture flow:** -1. Server requests screenshot -2. Acquire latest `Image` from `ImageReader` -3. Convert to `Bitmap` -4. Scale to max 720px width (maintain aspect ratio) -5. Compress to JPEG quality 50 -6. Return `ByteArray` - -**Edge cases:** -- **Android 14+:** Per-session consent. Projection dies if user revokes or after reboot. Re-prompt on next connect. -- **FLAG_SECURE:** Returns black frame. Detect by checking if all pixels are black (sample corners). Report `error: "secure_window"` to server. -- **Projection unavailable:** Graceful degradation. Server works with accessibility tree only (vision fallback without actual screenshot). - ---- - -## Layer 4: Data & Settings - -### SettingsStore - -Preferences DataStore for persistent settings: - -| Key | Type | Default | -|-----|------|---------| -| `api_key` | String | `""` | -| `server_url` | String | `"wss://localhost:8080"` | -| `device_name` | String | Device model name | -| `auto_connect` | Boolean | `false` | - -Exposed as `Flow` for reactive UI updates. - ---- - -## Layer 5: UI - -### Navigation - -Bottom nav with 3 tabs: -- **Home** (icon: `Home`) — connection status, goal input, live steps -- **Settings** (icon: `Settings`) — API key, server URL, permissions checklist -- **Logs** (icon: `History`) — past session history - -### HomeScreen - -``` -┌─────────────────────────────┐ -│ ● Connected to server │ ← status badge (green/yellow/red) -├─────────────────────────────┤ -│ [Enter a goal... ] [Run] │ ← goal input + submit -├─────────────────────────────┤ -│ Step 1: tap (540, 800) │ ← live step log -│ "Tapping the search icon" │ -│ │ -│ Step 2: type "lofi beats" │ -│ "Typing the search query" │ -│ │ -│ ✓ Goal completed (5 steps) │ ← final status -└─────────────────────────────┘ -``` - -- Goal input disabled when not connected or when a goal is running -- Steps stream in real-time via `ConnectionService.currentSteps` StateFlow -- Status transitions: idle → running → completed/failed - -### SettingsScreen - -``` -┌─────────────────────────────┐ -│ API Key │ -│ [dc_••••••••••••••] [Edit]│ -├─────────────────────────────┤ -│ Server URL │ -│ [wss://your-server.app ] │ -├─────────────────────────────┤ -│ Setup Checklist │ -│ ✓ API key configured │ -│ ✗ Accessibility service │ ← tap to open Android settings -│ ✗ Screen capture permission │ ← tap to grant -│ ✓ Battery optimization off │ -└─────────────────────────────┘ -``` - -- Warning cards for missing setup items -- Deep-links to Android system settings for accessibility toggle -- Battery optimization request via `ACTION_REQUEST_IGNORE_BATTERY_OPTIMIZATIONS` - -### LogsScreen - -- In-memory list of past sessions: goal text, step count, success/failure, timestamp -- Tap to expand → shows all steps with action + reasoning -- Clears on app restart (persistent storage is v2) - ---- - -## WebSocket Protocol (Device Side) - -### Device → Server - -| Message | When | -|---------|------| -| `{ type: "auth", apiKey, deviceInfo }` | On connect | -| `{ type: "screen", requestId, elements, screenHash }` | Response to get_screen | -| `{ type: "screenshot", requestId, image }` | Response to get_screenshot | -| `{ type: "result", requestId, success, error? }` | Response to execute | -| `{ type: "goal", text }` | User submits goal on phone | -| `{ type: "pong" }` | Response to ping | - -### Server → Device - -| Message | When | -|---------|------| -| `{ type: "auth_ok", deviceId }` | Auth succeeded | -| `{ type: "auth_error", message }` | Auth failed | -| `{ type: "get_screen", requestId }` | Agent loop needs screen tree | -| `{ type: "get_screenshot", requestId }` | Vision fallback | -| `{ type: "execute", requestId, action }` | Execute tap/type/swipe/etc | -| `{ type: "ping" }` | Heartbeat check | -| `{ type: "step", step, action, reasoning }` | Live step update (for phone UI) | -| `{ type: "goal_started", sessionId }` | Agent loop started | -| `{ type: "goal_completed", sessionId }` | Agent loop done | -| `{ type: "goal_failed", sessionId, error }` | Agent loop failed | - ---- - -## Battery Optimization - -OEM-specific battery killers are the #2 reliability problem after Google Play policy. - -**Strategy:** -1. Detect if battery optimization is disabled: `PowerManager.isIgnoringBatteryOptimizations()` -2. If not, show warning card in Settings with button to request exemption -3. For aggressive OEMs (Xiaomi, Huawei, Samsung, OnePlus, Oppo, Vivo), show additional guidance linking to dontkillmyapp.com -4. ConnectionService uses `PARTIAL_WAKE_LOCK` to prevent CPU sleep during active goals -5. Foreground service notification keeps process priority high - ---- - -## Distribution - -- **Primary:** APK sideload from droidclaw.ai -- **Secondary:** F-Droid -- **NOT Play Store:** Google Play policy (Nov 2025) explicitly prohibits autonomous AI action execution via AccessibilityService - ---- - -## Known Limitations - -1. **FLAG_SECURE apps** (banking, password managers) block both tree and screenshots -2. **WebView/Flutter** apps may return empty accessibility trees — server falls back to vision -3. **Android 14+** requires per-session MediaProjection consent -4. **Android 16 Advanced Protection** will auto-revoke accessibility for non-accessibility tools -5. **dispatchGesture()** can be detected/ignored by some apps — node-first strategy mitigates -6. **rootInActiveWindow** returns null during transitions — retry with backoff diff --git a/docs/plans/2026-02-17-android-app-plan.md b/docs/plans/2026-02-17-android-app-plan.md deleted file mode 100644 index 31040a3..0000000 --- a/docs/plans/2026-02-17-android-app-plan.md +++ /dev/null @@ -1,2564 +0,0 @@ -# DroidClaw Android App Implementation Plan - -> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. - -**Goal:** Build the DroidClaw Android companion app — a Jetpack Compose app that connects to the Hono server via WebSocket, captures accessibility trees and screenshots, executes gestures, and lets users submit goals from the phone. - -**Architecture:** Three-layer architecture (Accessibility → Connection → UI). The AccessibilityService captures screen trees and executes gestures. A foreground ConnectionService manages the Ktor WebSocket. Compose UI with bottom nav (Home/Settings/Logs) observes service state via companion-object StateFlows. - -**Tech Stack:** Kotlin, Jetpack Compose, Ktor Client WebSocket, kotlinx.serialization, DataStore Preferences, MediaProjection API, AccessibilityService API. - ---- - -## Existing Project State - -The Android project is a fresh Compose scaffold: -- `android/app/build.gradle.kts` — Compose app with AGP 9.0.1, Kotlin 2.0.21, compileSdk 36, minSdk 24 -- `android/gradle/libs.versions.toml` — version catalog with basic Compose + lifecycle deps -- `android/app/src/main/java/com/thisux/droidclaw/MainActivity.kt` — Hello World Compose activity -- `android/app/src/main/java/com/thisux/droidclaw/ui/theme/` — Default Material 3 theme (Color, Type, Theme) -- Shared TypeScript types in `packages/shared/src/types.ts` and `packages/shared/src/protocol.ts` define the data models and WebSocket protocol the Android app must mirror. - ---- - -### Task 1: Add Dependencies & Build Config - -**Files:** -- Modify: `android/gradle/libs.versions.toml` -- Modify: `android/build.gradle.kts` (root) -- Modify: `android/app/build.gradle.kts` - -**Step 1: Add version catalog entries** - -Add to `android/gradle/libs.versions.toml`: - -```toml -[versions] -agp = "9.0.1" -coreKtx = "1.10.1" -junit = "4.13.2" -junitVersion = "1.1.5" -espressoCore = "3.5.1" -lifecycleRuntimeKtx = "2.6.1" -activityCompose = "1.8.0" -kotlin = "2.0.21" -composeBom = "2024.09.00" -ktor = "3.1.1" -kotlinxSerialization = "1.7.3" -kotlinxCoroutines = "1.9.0" -datastore = "1.1.1" -lifecycleService = "2.8.7" -navigationCompose = "2.8.5" -composeIconsExtended = "1.7.6" - -[libraries] -# ... keep existing entries ... -ktor-client-cio = { group = "io.ktor", name = "ktor-client-cio", version.ref = "ktor" } -ktor-client-websockets = { group = "io.ktor", name = "ktor-client-websockets", version.ref = "ktor" } -ktor-client-content-negotiation = { group = "io.ktor", name = "ktor-client-content-negotiation", version.ref = "ktor" } -ktor-serialization-kotlinx-json = { group = "io.ktor", name = "ktor-serialization-kotlinx-json", version.ref = "ktor" } -kotlinx-serialization-json = { group = "org.jetbrains.kotlinx", name = "kotlinx-serialization-json", version.ref = "kotlinxSerialization" } -kotlinx-coroutines-android = { group = "org.jetbrains.kotlinx", name = "kotlinx-coroutines-android", version.ref = "kotlinxCoroutines" } -datastore-preferences = { group = "androidx.datastore", name = "datastore-preferences", version.ref = "datastore" } -lifecycle-service = { group = "androidx.lifecycle", name = "lifecycle-service", version.ref = "lifecycleService" } -navigation-compose = { group = "androidx.navigation", name = "navigation-compose", version.ref = "navigationCompose" } -compose-icons-extended = { group = "androidx.compose.material", name = "material-icons-extended", version.ref = "composeIconsExtended" } - -[plugins] -android-application = { id = "com.android.application", version.ref = "agp" } -kotlin-compose = { id = "org.jetbrains.kotlin.plugin.compose", version.ref = "kotlin" } -kotlin-serialization = { id = "org.jetbrains.kotlin.plugin.serialization", version.ref = "kotlin" } -``` - -**Step 2: Add serialization plugin to root build.gradle.kts** - -In `android/build.gradle.kts`, add: -```kotlin -plugins { - alias(libs.plugins.android.application) apply false - alias(libs.plugins.kotlin.compose) apply false - alias(libs.plugins.kotlin.serialization) apply false -} -``` - -**Step 3: Add plugin and dependencies to app build.gradle.kts** - -In `android/app/build.gradle.kts`: -```kotlin -plugins { - alias(libs.plugins.android.application) - alias(libs.plugins.kotlin.compose) - alias(libs.plugins.kotlin.serialization) -} - -// ... android block stays the same, but fix compileSdk ... -android { - namespace = "com.thisux.droidclaw" - compileSdk = 36 - - defaultConfig { - applicationId = "com.thisux.droidclaw" - minSdk = 24 - targetSdk = 36 - versionCode = 1 - versionName = "1.0" - testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner" - } - // ... rest stays the same ... - kotlinOptions { - jvmTarget = "11" - } -} - -dependencies { - // Existing - implementation(libs.androidx.core.ktx) - implementation(libs.androidx.lifecycle.runtime.ktx) - implementation(libs.androidx.activity.compose) - implementation(platform(libs.androidx.compose.bom)) - implementation(libs.androidx.compose.ui) - implementation(libs.androidx.compose.ui.graphics) - implementation(libs.androidx.compose.ui.tooling.preview) - implementation(libs.androidx.compose.material3) - - // New - Ktor WebSocket - implementation(libs.ktor.client.cio) - implementation(libs.ktor.client.websockets) - implementation(libs.ktor.client.content.negotiation) - implementation(libs.ktor.serialization.kotlinx.json) - - // New - Serialization - implementation(libs.kotlinx.serialization.json) - - // New - Coroutines - implementation(libs.kotlinx.coroutines.android) - - // New - DataStore - implementation(libs.datastore.preferences) - - // New - Lifecycle service - implementation(libs.lifecycle.service) - - // New - Navigation - implementation(libs.navigation.compose) - implementation(libs.compose.icons.extended) - - // Test deps stay the same - testImplementation(libs.junit) - androidTestImplementation(libs.androidx.junit) - androidTestImplementation(libs.androidx.espresso.core) - androidTestImplementation(platform(libs.androidx.compose.bom)) - androidTestImplementation(libs.androidx.compose.ui.test.junit4) - debugImplementation(libs.androidx.compose.ui.tooling) - debugImplementation(libs.androidx.compose.ui.test.manifest) -} -``` - -**Step 4: Sync and verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 5: Commit** - -```bash -git add android/gradle/libs.versions.toml android/build.gradle.kts android/app/build.gradle.kts -git commit -m "feat(android): add Ktor, serialization, DataStore, navigation dependencies" -``` - ---- - -### Task 2: Data Models - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/model/UIElement.kt` -- Create: `android/app/src/main/java/com/thisux/droidclaw/model/Protocol.kt` -- Create: `android/app/src/main/java/com/thisux/droidclaw/model/AppState.kt` - -These mirror `packages/shared/src/types.ts` and `packages/shared/src/protocol.ts`. - -**Step 1: Create UIElement.kt** - -```kotlin -package com.thisux.droidclaw.model - -import kotlinx.serialization.Serializable - -@Serializable -data class UIElement( - val id: String = "", - val text: String = "", - val type: String = "", - val bounds: String = "", - val center: List = listOf(0, 0), - val size: List = listOf(0, 0), - val clickable: Boolean = false, - val editable: Boolean = false, - val enabled: Boolean = false, - val checked: Boolean = false, - val focused: Boolean = false, - val selected: Boolean = false, - val scrollable: Boolean = false, - val longClickable: Boolean = false, - val password: Boolean = false, - val hint: String = "", - val action: String = "read", - val parent: String = "", - val depth: Int = 0 -) -``` - -**Step 2: Create Protocol.kt** - -```kotlin -package com.thisux.droidclaw.model - -import kotlinx.serialization.Serializable -import kotlinx.serialization.json.JsonObject - -// Device → Server messages -@Serializable -data class AuthMessage( - val type: String = "auth", - val apiKey: String, - val deviceInfo: DeviceInfoMsg? = null -) - -@Serializable -data class DeviceInfoMsg( - val model: String, - val androidVersion: String, - val screenWidth: Int, - val screenHeight: Int -) - -@Serializable -data class ScreenResponse( - val type: String = "screen", - val requestId: String, - val elements: List, - val screenshot: String? = null, - val packageName: String? = null -) - -@Serializable -data class ResultResponse( - val type: String = "result", - val requestId: String, - val success: Boolean, - val error: String? = null, - val data: String? = null -) - -@Serializable -data class GoalMessage( - val type: String = "goal", - val text: String -) - -@Serializable -data class PongMessage( - val type: String = "pong" -) - -// Server → Device messages (parsed via discriminator) -@Serializable -data class ServerMessage( - val type: String, - val requestId: String? = null, - val deviceId: String? = null, - val message: String? = null, - val sessionId: String? = null, - val goal: String? = null, - val success: Boolean? = null, - val stepsUsed: Int? = null, - val step: Int? = null, - val action: JsonObject? = null, - val reasoning: String? = null, - val screenHash: String? = null, - // Action-specific fields - val x: Int? = null, - val y: Int? = null, - val x1: Int? = null, - val y1: Int? = null, - val x2: Int? = null, - val y2: Int? = null, - val duration: Int? = null, - val text: String? = null, - val packageName: String? = null, - val url: String? = null, - val code: Int? = null -) -``` - -**Step 3: Create AppState.kt** - -```kotlin -package com.thisux.droidclaw.model - -enum class ConnectionState { - Disconnected, - Connecting, - Connected, - Error -} - -enum class GoalStatus { - Idle, - Running, - Completed, - Failed -} - -data class AgentStep( - val step: Int, - val action: String, - val reasoning: String, - val timestamp: Long = System.currentTimeMillis() -) - -data class GoalSession( - val sessionId: String, - val goal: String, - val steps: List, - val status: GoalStatus, - val timestamp: Long = System.currentTimeMillis() -) -``` - -**Step 4: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 5: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/model/ -git commit -m "feat(android): add data models (UIElement, Protocol, AppState)" -``` - ---- - -### Task 3: DataStore Settings - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/data/SettingsStore.kt` -- Create: `android/app/src/main/java/com/thisux/droidclaw/DroidClawApp.kt` -- Modify: `android/app/src/main/AndroidManifest.xml` (add Application class) - -**Step 1: Create SettingsStore.kt** - -```kotlin -package com.thisux.droidclaw.data - -import android.content.Context -import androidx.datastore.core.DataStore -import androidx.datastore.preferences.core.Preferences -import androidx.datastore.preferences.core.booleanPreferencesKey -import androidx.datastore.preferences.core.edit -import androidx.datastore.preferences.core.stringPreferencesKey -import androidx.datastore.preferences.preferencesDataStore -import kotlinx.coroutines.flow.Flow -import kotlinx.coroutines.flow.map - -val Context.dataStore: DataStore by preferencesDataStore(name = "settings") - -object SettingsKeys { - val API_KEY = stringPreferencesKey("api_key") - val SERVER_URL = stringPreferencesKey("server_url") - val DEVICE_NAME = stringPreferencesKey("device_name") - val AUTO_CONNECT = booleanPreferencesKey("auto_connect") -} - -class SettingsStore(private val context: Context) { - - val apiKey: Flow = context.dataStore.data.map { prefs -> - prefs[SettingsKeys.API_KEY] ?: "" - } - - val serverUrl: Flow = context.dataStore.data.map { prefs -> - prefs[SettingsKeys.SERVER_URL] ?: "wss://localhost:8080" - } - - val deviceName: Flow = context.dataStore.data.map { prefs -> - prefs[SettingsKeys.DEVICE_NAME] ?: android.os.Build.MODEL - } - - val autoConnect: Flow = context.dataStore.data.map { prefs -> - prefs[SettingsKeys.AUTO_CONNECT] ?: false - } - - suspend fun setApiKey(value: String) { - context.dataStore.edit { it[SettingsKeys.API_KEY] = value } - } - - suspend fun setServerUrl(value: String) { - context.dataStore.edit { it[SettingsKeys.SERVER_URL] = value } - } - - suspend fun setDeviceName(value: String) { - context.dataStore.edit { it[SettingsKeys.DEVICE_NAME] = value } - } - - suspend fun setAutoConnect(value: Boolean) { - context.dataStore.edit { it[SettingsKeys.AUTO_CONNECT] = value } - } -} -``` - -**Step 2: Create DroidClawApp.kt** - -```kotlin -package com.thisux.droidclaw - -import android.app.Application -import com.thisux.droidclaw.data.SettingsStore - -class DroidClawApp : Application() { - lateinit var settingsStore: SettingsStore - private set - - override fun onCreate() { - super.onCreate() - settingsStore = SettingsStore(this) - } -} -``` - -**Step 3: Register Application class in AndroidManifest.xml** - -Add `android:name=".DroidClawApp"` to the `` tag: - -```xml - -``` - -**Step 4: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 5: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/data/ android/app/src/main/java/com/thisux/droidclaw/DroidClawApp.kt android/app/src/main/AndroidManifest.xml -git commit -m "feat(android): add DataStore settings and Application class" -``` - ---- - -### Task 4: Accessibility Service + ScreenTreeBuilder - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/accessibility/DroidClawAccessibilityService.kt` -- Create: `android/app/src/main/java/com/thisux/droidclaw/accessibility/ScreenTreeBuilder.kt` -- Create: `android/app/src/main/res/xml/accessibility_config.xml` -- Modify: `android/app/src/main/AndroidManifest.xml` (add service declaration) - -**Step 1: Create accessibility_config.xml** - -Create `android/app/src/main/res/xml/accessibility_config.xml`: - -```xml - - -``` - -**Step 2: Create ScreenTreeBuilder.kt** - -```kotlin -package com.thisux.droidclaw.accessibility - -import android.graphics.Rect -import android.view.accessibility.AccessibilityNodeInfo -import com.thisux.droidclaw.model.UIElement -import java.security.MessageDigest - -object ScreenTreeBuilder { - - fun capture(rootNode: AccessibilityNodeInfo?): List { - if (rootNode == null) return emptyList() - val elements = mutableListOf() - walkTree(rootNode, elements, depth = 0, parentDesc = "") - return elements - } - - private fun walkTree( - node: AccessibilityNodeInfo, - elements: MutableList, - depth: Int, - parentDesc: String - ) { - try { - val rect = Rect() - node.getBoundsInScreen(rect) - - val text = node.text?.toString() ?: "" - val contentDesc = node.contentDescription?.toString() ?: "" - val viewId = node.viewIdResourceName ?: "" - val className = node.className?.toString() ?: "" - val displayText = text.ifEmpty { contentDesc } - - val isInteractive = node.isClickable || node.isLongClickable || - node.isEditable || node.isScrollable || node.isFocusable - - if (isInteractive || displayText.isNotEmpty()) { - val centerX = (rect.left + rect.right) / 2 - val centerY = (rect.top + rect.bottom) / 2 - val width = rect.width() - val height = rect.height() - - val action = when { - node.isEditable -> "type" - node.isScrollable -> "scroll" - node.isLongClickable -> "longpress" - node.isClickable -> "tap" - else -> "read" - } - - elements.add( - UIElement( - id = viewId, - text = displayText, - type = className.substringAfterLast("."), - bounds = "[${rect.left},${rect.top}][${rect.right},${rect.bottom}]", - center = listOf(centerX, centerY), - size = listOf(width, height), - clickable = node.isClickable, - editable = node.isEditable, - enabled = node.isEnabled, - checked = node.isChecked, - focused = node.isFocused, - selected = node.isSelected, - scrollable = node.isScrollable, - longClickable = node.isLongClickable, - password = node.isPassword, - hint = node.hintText?.toString() ?: "", - action = action, - parent = parentDesc, - depth = depth - ) - ) - } - - for (i in 0 until node.childCount) { - val child = node.getChild(i) ?: continue - try { - walkTree(child, elements, depth + 1, className) - } finally { - child.recycle() - } - } - } catch (_: Exception) { - // Node may have been recycled during traversal - } - } - - fun computeScreenHash(elements: List): String { - val digest = MessageDigest.getInstance("MD5") - for (el in elements) { - digest.update("${el.id}|${el.text}|${el.center}".toByteArray()) - } - return digest.digest().joinToString("") { "%02x".format(it) }.take(12) - } -} -``` - -**Step 3: Create DroidClawAccessibilityService.kt** - -```kotlin -package com.thisux.droidclaw.accessibility - -import android.accessibilityservice.AccessibilityService -import android.util.Log -import android.view.accessibility.AccessibilityEvent -import android.view.accessibility.AccessibilityNodeInfo -import com.thisux.droidclaw.model.UIElement -import kotlinx.coroutines.delay -import kotlinx.coroutines.flow.MutableStateFlow -import kotlinx.coroutines.runBlocking - -class DroidClawAccessibilityService : AccessibilityService() { - - companion object { - private const val TAG = "DroidClawA11y" - val isRunning = MutableStateFlow(false) - val lastScreenTree = MutableStateFlow>(emptyList()) - var instance: DroidClawAccessibilityService? = null - } - - override fun onServiceConnected() { - super.onServiceConnected() - Log.i(TAG, "Accessibility service connected") - instance = this - isRunning.value = true - } - - override fun onAccessibilityEvent(event: AccessibilityEvent?) { - // We capture on-demand via getScreenTree(), not on every event - } - - override fun onInterrupt() { - Log.w(TAG, "Accessibility service interrupted") - } - - override fun onDestroy() { - super.onDestroy() - Log.i(TAG, "Accessibility service destroyed") - instance = null - isRunning.value = false - } - - /** - * Capture current screen tree with retry for null rootInActiveWindow. - * Returns empty list if root is still null after retries (server uses vision fallback). - */ - fun getScreenTree(): List { - val delays = longArrayOf(50, 100, 200) - for (delayMs in delays) { - val root = rootInActiveWindow - if (root != null) { - try { - val elements = ScreenTreeBuilder.capture(root) - lastScreenTree.value = elements - return elements - } finally { - root.recycle() - } - } - runBlocking { delay(delayMs) } - } - Log.w(TAG, "rootInActiveWindow null after retries") - return emptyList() - } - - /** - * Find node closest to given coordinates. - */ - fun findNodeAt(x: Int, y: Int): AccessibilityNodeInfo? { - val root = rootInActiveWindow ?: return null - return findNodeAtRecursive(root, x, y) - } - - private fun findNodeAtRecursive( - node: AccessibilityNodeInfo, - x: Int, - y: Int - ): AccessibilityNodeInfo? { - val rect = android.graphics.Rect() - node.getBoundsInScreen(rect) - - if (!rect.contains(x, y)) { - node.recycle() - return null - } - - // Check children (deeper = more specific) - for (i in 0 until node.childCount) { - val child = node.getChild(i) ?: continue - val found = findNodeAtRecursive(child, x, y) - if (found != null) { - node.recycle() - return found - } - } - - // This node contains the point and no child does - return if (node.isClickable || node.isLongClickable || node.isEditable) { - node - } else { - node.recycle() - null - } - } -} -``` - -**Step 4: Add service declaration + permissions to AndroidManifest.xml** - -```xml - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -``` - -**Step 5: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 6: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/accessibility/ android/app/src/main/res/xml/accessibility_config.xml android/app/src/main/AndroidManifest.xml -git commit -m "feat(android): add AccessibilityService and ScreenTreeBuilder" -``` - ---- - -### Task 5: GestureExecutor - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/accessibility/GestureExecutor.kt` - -**Step 1: Create GestureExecutor.kt** - -This implements the node-first strategy: try `performAction()` on accessibility nodes first, fall back to `dispatchGesture()` with coordinates. - -```kotlin -package com.thisux.droidclaw.accessibility - -import android.accessibilityservice.AccessibilityService -import android.accessibilityservice.GestureDescription -import android.content.Intent -import android.graphics.Path -import android.net.Uri -import android.os.Bundle -import android.util.Log -import android.view.accessibility.AccessibilityNodeInfo -import com.thisux.droidclaw.model.ServerMessage -import kotlinx.coroutines.suspendCancellableCoroutine -import kotlin.coroutines.resume - -data class ActionResult(val success: Boolean, val error: String? = null, val data: String? = null) - -class GestureExecutor(private val service: DroidClawAccessibilityService) { - - companion object { - private const val TAG = "GestureExecutor" - } - - suspend fun execute(msg: ServerMessage): ActionResult { - return try { - when (msg.type) { - "tap" -> executeTap(msg.x ?: 0, msg.y ?: 0) - "type" -> executeType(msg.text ?: "") - "enter" -> executeEnter() - "back" -> executeGlobalAction(AccessibilityService.GLOBAL_ACTION_BACK) - "home" -> executeGlobalAction(AccessibilityService.GLOBAL_ACTION_HOME) - "notifications" -> executeGlobalAction(AccessibilityService.GLOBAL_ACTION_NOTIFICATIONS) - "longpress" -> executeLongPress(msg.x ?: 0, msg.y ?: 0) - "swipe" -> executeSwipe( - msg.x1 ?: 0, msg.y1 ?: 0, - msg.x2 ?: 0, msg.y2 ?: 0, - msg.duration ?: 300 - ) - "launch" -> executeLaunch(msg.packageName ?: "") - "clear" -> executeClear() - "clipboard_set" -> executeClipboardSet(msg.text ?: "") - "clipboard_get" -> executeClipboardGet() - "paste" -> executePaste() - "open_url" -> executeOpenUrl(msg.url ?: "") - "switch_app" -> executeLaunch(msg.packageName ?: "") - "keyevent" -> executeKeyEvent(msg.code ?: 0) - "open_settings" -> executeOpenSettings() - "wait" -> executeWait(msg.duration ?: 1000) - else -> ActionResult(false, "Unknown action: ${msg.type}") - } - } catch (e: Exception) { - Log.e(TAG, "Action ${msg.type} failed", e) - ActionResult(false, e.message) - } - } - - private suspend fun executeTap(x: Int, y: Int): ActionResult { - // Try node-first - val node = service.findNodeAt(x, y) - if (node != null) { - try { - if (node.performAction(AccessibilityNodeInfo.ACTION_CLICK)) { - return ActionResult(true) - } - } finally { - node.recycle() - } - } - // Fallback to gesture - return dispatchTapGesture(x, y) - } - - private suspend fun executeType(text: String): ActionResult { - val focused = findFocusedNode() - if (focused != null) { - try { - val args = Bundle().apply { - putCharSequence(AccessibilityNodeInfo.ACTION_ARGUMENT_SET_TEXT_CHARSEQUENCE, text) - } - if (focused.performAction(AccessibilityNodeInfo.ACTION_SET_TEXT, args)) { - return ActionResult(true) - } - } finally { - focused.recycle() - } - } - return ActionResult(false, "No focused editable node found") - } - - private fun executeEnter(): ActionResult { - val focused = findFocusedNode() - if (focused != null) { - try { - // Try IME action first - if (focused.performAction(AccessibilityNodeInfo.ACTION_IME_ENTER)) { - return ActionResult(true) - } - } finally { - focused.recycle() - } - } - // Fallback: global key event - return ActionResult( - service.performGlobalAction(AccessibilityService.GLOBAL_ACTION_BACK).not(), // placeholder - "Enter key fallback not available via accessibility" - ) - } - - private fun executeGlobalAction(action: Int): ActionResult { - val success = service.performGlobalAction(action) - return ActionResult(success, if (!success) "Global action failed" else null) - } - - private suspend fun executeLongPress(x: Int, y: Int): ActionResult { - // Try node-first - val node = service.findNodeAt(x, y) - if (node != null) { - try { - if (node.performAction(AccessibilityNodeInfo.ACTION_LONG_CLICK)) { - return ActionResult(true) - } - } finally { - node.recycle() - } - } - // Fallback: gesture hold at point - return dispatchSwipeGesture(x, y, x, y, 1000) - } - - private suspend fun executeSwipe(x1: Int, y1: Int, x2: Int, y2: Int, duration: Int): ActionResult { - return dispatchSwipeGesture(x1, y1, x2, y2, duration) - } - - private fun executeLaunch(packageName: String): ActionResult { - val intent = service.packageManager.getLaunchIntentForPackage(packageName) - ?: return ActionResult(false, "Package not found: $packageName") - intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK) - service.startActivity(intent) - return ActionResult(true) - } - - private fun executeClear(): ActionResult { - val focused = findFocusedNode() - if (focused != null) { - try { - val args = Bundle().apply { - putCharSequence(AccessibilityNodeInfo.ACTION_ARGUMENT_SET_TEXT_CHARSEQUENCE, "") - } - if (focused.performAction(AccessibilityNodeInfo.ACTION_SET_TEXT, args)) { - return ActionResult(true) - } - } finally { - focused.recycle() - } - } - return ActionResult(false, "No focused editable node to clear") - } - - private fun executeClipboardSet(text: String): ActionResult { - val clipboard = service.getSystemService(android.content.Context.CLIPBOARD_SERVICE) as android.content.ClipboardManager - val clip = android.content.ClipData.newPlainText("droidclaw", text) - clipboard.setPrimaryClip(clip) - return ActionResult(true) - } - - private fun executeClipboardGet(): ActionResult { - val clipboard = service.getSystemService(android.content.Context.CLIPBOARD_SERVICE) as android.content.ClipboardManager - val text = clipboard.primaryClip?.getItemAt(0)?.text?.toString() ?: "" - return ActionResult(true, data = text) - } - - private fun executePaste(): ActionResult { - val focused = findFocusedNode() - if (focused != null) { - try { - if (focused.performAction(AccessibilityNodeInfo.ACTION_PASTE)) { - return ActionResult(true) - } - } finally { - focused.recycle() - } - } - return ActionResult(false, "No focused node to paste into") - } - - private fun executeOpenUrl(url: String): ActionResult { - val intent = Intent(Intent.ACTION_VIEW, Uri.parse(url)).apply { - addFlags(Intent.FLAG_ACTIVITY_NEW_TASK) - } - service.startActivity(intent) - return ActionResult(true) - } - - private fun executeKeyEvent(code: Int): ActionResult { - // AccessibilityService doesn't have direct keyevent dispatch - // Use instrumentation or shell command via Runtime - return try { - Runtime.getRuntime().exec(arrayOf("input", "keyevent", code.toString())) - ActionResult(true) - } catch (e: Exception) { - ActionResult(false, "keyevent failed: ${e.message}") - } - } - - private fun executeOpenSettings(): ActionResult { - val intent = Intent(android.provider.Settings.ACTION_SETTINGS).apply { - addFlags(Intent.FLAG_ACTIVITY_NEW_TASK) - } - service.startActivity(intent) - return ActionResult(true) - } - - private suspend fun executeWait(duration: Int): ActionResult { - kotlinx.coroutines.delay(duration.toLong()) - return ActionResult(true) - } - - // --- Gesture Helpers --- - - private suspend fun dispatchTapGesture(x: Int, y: Int): ActionResult { - val path = Path().apply { moveTo(x.toFloat(), y.toFloat()) } - val stroke = GestureDescription.StrokeDescription(path, 0, 50) - val gesture = GestureDescription.Builder().addStroke(stroke).build() - return dispatchGesture(gesture) - } - - private suspend fun dispatchSwipeGesture( - x1: Int, y1: Int, x2: Int, y2: Int, duration: Int - ): ActionResult { - val path = Path().apply { - moveTo(x1.toFloat(), y1.toFloat()) - lineTo(x2.toFloat(), y2.toFloat()) - } - val stroke = GestureDescription.StrokeDescription(path, 0, duration.toLong()) - val gesture = GestureDescription.Builder().addStroke(stroke).build() - return dispatchGesture(gesture) - } - - private suspend fun dispatchGesture(gesture: GestureDescription): ActionResult = - suspendCancellableCoroutine { cont -> - service.dispatchGesture( - gesture, - object : AccessibilityService.GestureResultCallback() { - override fun onCompleted(gestureDescription: GestureDescription?) { - if (cont.isActive) cont.resume(ActionResult(true)) - } - override fun onCancelled(gestureDescription: GestureDescription?) { - if (cont.isActive) cont.resume(ActionResult(false, "Gesture cancelled")) - } - }, - null - ) - } - - private fun findFocusedNode(): AccessibilityNodeInfo? { - return service.rootInActiveWindow?.findFocus(AccessibilityNodeInfo.FOCUS_INPUT) - } -} -``` - -**Step 2: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 3: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/accessibility/GestureExecutor.kt -git commit -m "feat(android): add GestureExecutor with node-first strategy" -``` - ---- - -### Task 6: Screen Capture (MediaProjection) - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/capture/ScreenCaptureManager.kt` - -**Step 1: Create ScreenCaptureManager.kt** - -```kotlin -package com.thisux.droidclaw.capture - -import android.app.Activity -import android.content.Context -import android.content.Intent -import android.graphics.Bitmap -import android.graphics.PixelFormat -import android.hardware.display.DisplayManager -import android.hardware.display.VirtualDisplay -import android.media.ImageReader -import android.media.projection.MediaProjection -import android.media.projection.MediaProjectionManager -import android.util.DisplayMetrics -import android.util.Log -import android.view.WindowManager -import kotlinx.coroutines.flow.MutableStateFlow -import java.io.ByteArrayOutputStream - -class ScreenCaptureManager(private val context: Context) { - - companion object { - private const val TAG = "ScreenCapture" - const val REQUEST_CODE = 1001 - val isAvailable = MutableStateFlow(false) - } - - private var mediaProjection: MediaProjection? = null - private var virtualDisplay: VirtualDisplay? = null - private var imageReader: ImageReader? = null - private var screenWidth = 720 - private var screenHeight = 1280 - private var screenDensity = DisplayMetrics.DENSITY_DEFAULT - - fun initialize(resultCode: Int, data: Intent) { - val mgr = context.getSystemService(Context.MEDIA_PROJECTION_SERVICE) as MediaProjectionManager - mediaProjection = mgr.getMediaProjection(resultCode, data) - - val wm = context.getSystemService(Context.WINDOW_SERVICE) as WindowManager - val metrics = DisplayMetrics() - @Suppress("DEPRECATION") - wm.defaultDisplay.getRealMetrics(metrics) - screenWidth = metrics.widthPixels - screenHeight = metrics.heightPixels - screenDensity = metrics.densityDpi - - // Scale down for capture - val scale = 720f / screenWidth - val captureWidth = 720 - val captureHeight = (screenHeight * scale).toInt() - - imageReader = ImageReader.newInstance(captureWidth, captureHeight, PixelFormat.RGBA_8888, 2) - virtualDisplay = mediaProjection?.createVirtualDisplay( - "DroidClaw", - captureWidth, captureHeight, screenDensity, - DisplayManager.VIRTUAL_DISPLAY_FLAG_AUTO_MIRROR, - imageReader!!.surface, null, null - ) - - mediaProjection?.registerCallback(object : MediaProjection.Callback() { - override fun onStop() { - Log.i(TAG, "MediaProjection stopped") - release() - } - }, null) - - isAvailable.value = true - Log.i(TAG, "Screen capture initialized: ${captureWidth}x${captureHeight}") - } - - fun capture(): ByteArray? { - val reader = imageReader ?: return null - val image = reader.acquireLatestImage() ?: return null - return try { - val planes = image.planes - val buffer = planes[0].buffer - val pixelStride = planes[0].pixelStride - val rowStride = planes[0].rowStride - val rowPadding = rowStride - pixelStride * image.width - - val bitmap = Bitmap.createBitmap( - image.width + rowPadding / pixelStride, - image.height, - Bitmap.Config.ARGB_8888 - ) - bitmap.copyPixelsFromBuffer(buffer) - - // Crop padding - val cropped = Bitmap.createBitmap(bitmap, 0, 0, image.width, image.height) - if (cropped != bitmap) bitmap.recycle() - - // Check for secure window (all black) - if (isBlackFrame(cropped)) { - cropped.recycle() - Log.w(TAG, "Detected FLAG_SECURE (black frame)") - return null - } - - // Compress to JPEG - val stream = ByteArrayOutputStream() - cropped.compress(Bitmap.CompressFormat.JPEG, 50, stream) - cropped.recycle() - stream.toByteArray() - } finally { - image.close() - } - } - - private fun isBlackFrame(bitmap: Bitmap): Boolean { - // Sample 4 corners + center - val points = listOf( - 0 to 0, - bitmap.width - 1 to 0, - 0 to bitmap.height - 1, - bitmap.width - 1 to bitmap.height - 1, - bitmap.width / 2 to bitmap.height / 2 - ) - return points.all { (x, y) -> bitmap.getPixel(x, y) == android.graphics.Color.BLACK } - } - - fun release() { - virtualDisplay?.release() - virtualDisplay = null - imageReader?.close() - imageReader = null - mediaProjection?.stop() - mediaProjection = null - isAvailable.value = false - } -} -``` - -**Step 2: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 3: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/capture/ -git commit -m "feat(android): add ScreenCaptureManager with MediaProjection" -``` - ---- - -### Task 7: ReliableWebSocket (Ktor) - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/connection/ReliableWebSocket.kt` - -**Step 1: Create ReliableWebSocket.kt** - -```kotlin -package com.thisux.droidclaw.connection - -import android.util.Log -import com.thisux.droidclaw.model.AuthMessage -import com.thisux.droidclaw.model.ConnectionState -import com.thisux.droidclaw.model.DeviceInfoMsg -import com.thisux.droidclaw.model.ServerMessage -import io.ktor.client.HttpClient -import io.ktor.client.engine.cio.CIO -import io.ktor.client.plugins.websocket.WebSockets -import io.ktor.client.plugins.websocket.webSocket -import io.ktor.websocket.Frame -import io.ktor.websocket.close -import io.ktor.websocket.readText -import kotlinx.coroutines.CancellationException -import kotlinx.coroutines.CoroutineScope -import kotlinx.coroutines.Job -import kotlinx.coroutines.channels.Channel -import kotlinx.coroutines.delay -import kotlinx.coroutines.flow.MutableStateFlow -import kotlinx.coroutines.flow.StateFlow -import kotlinx.coroutines.isActive -import kotlinx.coroutines.launch -import kotlinx.serialization.encodeToString -import kotlinx.serialization.json.Json - -class ReliableWebSocket( - private val scope: CoroutineScope, - private val onMessage: suspend (ServerMessage) -> Unit -) { - companion object { - private const val TAG = "ReliableWS" - private const val MAX_BACKOFF_MS = 30_000L - } - - private val json = Json { ignoreUnknownKeys = true; encodeDefaults = true } - - private val _state = MutableStateFlow(ConnectionState.Disconnected) - val state: StateFlow = _state - - private val _errorMessage = MutableStateFlow(null) - val errorMessage: StateFlow = _errorMessage - - private val outbound = Channel(Channel.BUFFERED) - private var connectionJob: Job? = null - private var client: HttpClient? = null - private var backoffMs = 1000L - private var shouldReconnect = true - - var deviceId: String? = null - private set - - fun connect(serverUrl: String, apiKey: String, deviceInfo: DeviceInfoMsg) { - shouldReconnect = true - connectionJob?.cancel() - connectionJob = scope.launch { - while (shouldReconnect && isActive) { - try { - _state.value = ConnectionState.Connecting - _errorMessage.value = null - connectOnce(serverUrl, apiKey, deviceInfo) - } catch (e: CancellationException) { - throw e - } catch (e: Exception) { - Log.e(TAG, "Connection failed: ${e.message}") - _state.value = ConnectionState.Error - _errorMessage.value = e.message - } - if (shouldReconnect && isActive) { - Log.i(TAG, "Reconnecting in ${backoffMs}ms") - delay(backoffMs) - backoffMs = (backoffMs * 2).coerceAtMost(MAX_BACKOFF_MS) - } - } - } - } - - private suspend fun connectOnce(serverUrl: String, apiKey: String, deviceInfo: DeviceInfoMsg) { - val httpClient = HttpClient(CIO) { - install(WebSockets) { - pingIntervalMillis = 30_000 - } - } - client = httpClient - - // Convert wss:// to proper URL path - val wsUrl = serverUrl.trimEnd('/') + "/ws/device" - - httpClient.webSocket(wsUrl) { - // Auth handshake - val authMsg = AuthMessage(apiKey = apiKey, deviceInfo = deviceInfo) - send(Frame.Text(json.encodeToString(authMsg))) - Log.i(TAG, "Sent auth message") - - // Wait for auth response - val authFrame = incoming.receive() as? Frame.Text - ?: throw Exception("Expected text frame for auth response") - - val authResponse = json.decodeFromString(authFrame.readText()) - when (authResponse.type) { - "auth_ok" -> { - deviceId = authResponse.deviceId - _state.value = ConnectionState.Connected - _errorMessage.value = null - backoffMs = 1000L // Reset backoff on success - Log.i(TAG, "Authenticated, deviceId=$deviceId") - } - "auth_error" -> { - shouldReconnect = false // Don't retry auth errors - _state.value = ConnectionState.Error - _errorMessage.value = authResponse.message ?: "Authentication failed" - close() - return@webSocket - } - else -> { - throw Exception("Unexpected auth response: ${authResponse.type}") - } - } - - // Launch outbound sender - val senderJob = launch { - for (msg in outbound) { - send(Frame.Text(msg)) - } - } - - // Read incoming messages - try { - for (frame in incoming) { - if (frame is Frame.Text) { - val text = frame.readText() - try { - val msg = json.decodeFromString(text) - onMessage(msg) - } catch (e: Exception) { - Log.e(TAG, "Failed to parse message: ${e.message}") - } - } - } - } finally { - senderJob.cancel() - } - } - - httpClient.close() - client = null - _state.value = ConnectionState.Disconnected - } - - fun send(message: String) { - outbound.trySend(message) - } - - inline fun sendTyped(message: T) { - send(json.encodeToString(message)) - } - - fun disconnect() { - shouldReconnect = false - connectionJob?.cancel() - connectionJob = null - client?.close() - client = null - _state.value = ConnectionState.Disconnected - _errorMessage.value = null - deviceId = null - } -} -``` - -**Step 2: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 3: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/connection/ReliableWebSocket.kt -git commit -m "feat(android): add ReliableWebSocket with Ktor, reconnect, auth handshake" -``` - ---- - -### Task 8: CommandRouter - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/connection/CommandRouter.kt` - -**Step 1: Create CommandRouter.kt** - -```kotlin -package com.thisux.droidclaw.connection - -import android.util.Base64 -import android.util.Log -import com.thisux.droidclaw.accessibility.DroidClawAccessibilityService -import com.thisux.droidclaw.accessibility.GestureExecutor -import com.thisux.droidclaw.accessibility.ScreenTreeBuilder -import com.thisux.droidclaw.capture.ScreenCaptureManager -import com.thisux.droidclaw.model.AgentStep -import com.thisux.droidclaw.model.GoalStatus -import com.thisux.droidclaw.model.PongMessage -import com.thisux.droidclaw.model.ResultResponse -import com.thisux.droidclaw.model.ScreenResponse -import com.thisux.droidclaw.model.ServerMessage -import kotlinx.coroutines.flow.MutableStateFlow - -class CommandRouter( - private val webSocket: ReliableWebSocket, - private val captureManager: ScreenCaptureManager? -) { - companion object { - private const val TAG = "CommandRouter" - } - - val currentGoalStatus = MutableStateFlow(GoalStatus.Idle) - val currentSteps = MutableStateFlow>(emptyList()) - val currentGoal = MutableStateFlow("") - val currentSessionId = MutableStateFlow(null) - - private var gestureExecutor: GestureExecutor? = null - - fun updateGestureExecutor() { - val svc = DroidClawAccessibilityService.instance - gestureExecutor = if (svc != null) GestureExecutor(svc) else null - } - - suspend fun handleMessage(msg: ServerMessage) { - Log.d(TAG, "Handling: ${msg.type}") - - when (msg.type) { - "get_screen" -> handleGetScreen(msg.requestId!!) - "ping" -> webSocket.sendTyped(PongMessage()) - - // Action commands — all have requestId - "tap", "type", "enter", "back", "home", "notifications", - "longpress", "swipe", "launch", "clear", "clipboard_set", - "clipboard_get", "paste", "open_url", "switch_app", - "keyevent", "open_settings", "wait" -> handleAction(msg) - - // Goal lifecycle - "goal_started" -> { - currentSessionId.value = msg.sessionId - currentGoal.value = msg.goal ?: "" - currentGoalStatus.value = GoalStatus.Running - currentSteps.value = emptyList() - Log.i(TAG, "Goal started: ${msg.goal}") - } - "step" -> { - val step = AgentStep( - step = msg.step ?: 0, - action = msg.action?.toString() ?: "", - reasoning = msg.reasoning ?: "" - ) - currentSteps.value = currentSteps.value + step - Log.d(TAG, "Step ${step.step}: ${step.reasoning}") - } - "goal_completed" -> { - currentGoalStatus.value = if (msg.success == true) GoalStatus.Completed else GoalStatus.Failed - Log.i(TAG, "Goal completed: success=${msg.success}, steps=${msg.stepsUsed}") - } - - else -> Log.w(TAG, "Unknown message type: ${msg.type}") - } - } - - private fun handleGetScreen(requestId: String) { - updateGestureExecutor() - val svc = DroidClawAccessibilityService.instance - val elements = svc?.getScreenTree() ?: emptyList() - val packageName = try { - svc?.rootInActiveWindow?.packageName?.toString() - } catch (_: Exception) { null } - - // Optionally include screenshot - var screenshot: String? = null - if (elements.isEmpty()) { - // Vision fallback: capture screenshot - val bytes = captureManager?.capture() - if (bytes != null) { - screenshot = Base64.encodeToString(bytes, Base64.NO_WRAP) - } - } - - val response = ScreenResponse( - requestId = requestId, - elements = elements, - screenshot = screenshot, - packageName = packageName - ) - webSocket.sendTyped(response) - } - - private suspend fun handleAction(msg: ServerMessage) { - updateGestureExecutor() - val executor = gestureExecutor - if (executor == null) { - webSocket.sendTyped( - ResultResponse( - requestId = msg.requestId!!, - success = false, - error = "Accessibility service not running" - ) - ) - return - } - - val result = executor.execute(msg) - webSocket.sendTyped( - ResultResponse( - requestId = msg.requestId!!, - success = result.success, - error = result.error, - data = result.data - ) - ) - } - - fun reset() { - currentGoalStatus.value = GoalStatus.Idle - currentSteps.value = emptyList() - currentGoal.value = "" - currentSessionId.value = null - } -} -``` - -**Step 2: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 3: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/connection/CommandRouter.kt -git commit -m "feat(android): add CommandRouter for dispatching server commands" -``` - ---- - -### Task 9: ConnectionService (Foreground Service) - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/connection/ConnectionService.kt` -- Create: `android/app/src/main/java/com/thisux/droidclaw/util/DeviceInfo.kt` - -**Step 1: Create DeviceInfo.kt** - -```kotlin -package com.thisux.droidclaw.util - -import android.content.Context -import android.util.DisplayMetrics -import android.view.WindowManager -import com.thisux.droidclaw.model.DeviceInfoMsg - -object DeviceInfoHelper { - fun get(context: Context): DeviceInfoMsg { - val wm = context.getSystemService(Context.WINDOW_SERVICE) as WindowManager - val metrics = DisplayMetrics() - @Suppress("DEPRECATION") - wm.defaultDisplay.getRealMetrics(metrics) - return DeviceInfoMsg( - model = android.os.Build.MODEL, - androidVersion = android.os.Build.VERSION.RELEASE, - screenWidth = metrics.widthPixels, - screenHeight = metrics.heightPixels - ) - } -} -``` - -**Step 2: Create ConnectionService.kt** - -```kotlin -package com.thisux.droidclaw.connection - -import android.app.Notification -import android.app.NotificationChannel -import android.app.NotificationManager -import android.app.PendingIntent -import android.content.Context -import android.content.Intent -import android.os.Build -import android.os.IBinder -import android.os.PowerManager -import android.util.Log -import androidx.core.app.NotificationCompat -import androidx.lifecycle.LifecycleService -import androidx.lifecycle.lifecycleScope -import com.thisux.droidclaw.DroidClawApp -import com.thisux.droidclaw.MainActivity -import com.thisux.droidclaw.R -import com.thisux.droidclaw.capture.ScreenCaptureManager -import com.thisux.droidclaw.model.ConnectionState -import com.thisux.droidclaw.model.GoalMessage -import com.thisux.droidclaw.model.GoalStatus -import com.thisux.droidclaw.model.AgentStep -import com.thisux.droidclaw.util.DeviceInfoHelper -import kotlinx.coroutines.flow.MutableStateFlow -import kotlinx.coroutines.flow.first -import kotlinx.coroutines.launch - -class ConnectionService : LifecycleService() { - - companion object { - private const val TAG = "ConnectionSvc" - private const val CHANNEL_ID = "droidclaw_connection" - private const val NOTIFICATION_ID = 1 - - val connectionState = MutableStateFlow(ConnectionState.Disconnected) - val currentSteps = MutableStateFlow>(emptyList()) - val currentGoalStatus = MutableStateFlow(GoalStatus.Idle) - val currentGoal = MutableStateFlow("") - val errorMessage = MutableStateFlow(null) - var instance: ConnectionService? = null - - const val ACTION_CONNECT = "com.thisux.droidclaw.CONNECT" - const val ACTION_DISCONNECT = "com.thisux.droidclaw.DISCONNECT" - const val ACTION_SEND_GOAL = "com.thisux.droidclaw.SEND_GOAL" - const val EXTRA_GOAL = "goal_text" - } - - private var webSocket: ReliableWebSocket? = null - private var commandRouter: CommandRouter? = null - private var captureManager: ScreenCaptureManager? = null - private var wakeLock: PowerManager.WakeLock? = null - - override fun onCreate() { - super.onCreate() - instance = this - createNotificationChannel() - } - - override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int { - super.onStartCommand(intent, flags, startId) - - when (intent?.action) { - ACTION_CONNECT -> { - startForeground(NOTIFICATION_ID, buildNotification("Connecting...")) - connect() - } - ACTION_DISCONNECT -> { - disconnect() - stopSelf() - } - ACTION_SEND_GOAL -> { - val goal = intent.getStringExtra(EXTRA_GOAL) ?: return START_NOT_STICKY - sendGoal(goal) - } - } - - return START_NOT_STICKY - } - - private fun connect() { - lifecycleScope.launch { - val app = application as DroidClawApp - val apiKey = app.settingsStore.apiKey.first() - val serverUrl = app.settingsStore.serverUrl.first() - - if (apiKey.isBlank() || serverUrl.isBlank()) { - connectionState.value = ConnectionState.Error - errorMessage.value = "API key or server URL not configured" - stopSelf() - return@launch - } - - captureManager = ScreenCaptureManager(this@ConnectionService) - - val ws = ReliableWebSocket(lifecycleScope) { msg -> - commandRouter?.handleMessage(msg) - } - webSocket = ws - - val router = CommandRouter(ws, captureManager) - commandRouter = router - - // Forward state - launch { - ws.state.collect { state -> - connectionState.value = state - updateNotification( - when (state) { - ConnectionState.Connected -> "Connected to server" - ConnectionState.Connecting -> "Connecting..." - ConnectionState.Error -> "Connection error" - ConnectionState.Disconnected -> "Disconnected" - } - ) - } - } - launch { - ws.errorMessage.collect { errorMessage.value = it } - } - launch { - router.currentSteps.collect { currentSteps.value = it } - } - launch { - router.currentGoalStatus.collect { currentGoalStatus.value = it } - } - launch { - router.currentGoal.collect { currentGoal.value = it } - } - - // Acquire wake lock during active connection - acquireWakeLock() - - val deviceInfo = DeviceInfoHelper.get(this@ConnectionService) - ws.connect(serverUrl, apiKey, deviceInfo) - } - } - - private fun sendGoal(text: String) { - webSocket?.sendTyped(GoalMessage(text = text)) - } - - private fun disconnect() { - webSocket?.disconnect() - webSocket = null - commandRouter?.reset() - commandRouter = null - captureManager?.release() - captureManager = null - releaseWakeLock() - connectionState.value = ConnectionState.Disconnected - } - - override fun onDestroy() { - disconnect() - instance = null - super.onDestroy() - } - - override fun onBind(intent: Intent): IBinder? { - super.onBind(intent) - return null - } - - // --- Notification --- - - private fun createNotificationChannel() { - if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) { - val channel = NotificationChannel( - CHANNEL_ID, - "DroidClaw Connection", - NotificationManager.IMPORTANCE_LOW - ).apply { - description = "Shows when DroidClaw is connected to the server" - } - val nm = getSystemService(NotificationManager::class.java) - nm.createNotificationChannel(channel) - } - } - - private fun buildNotification(text: String): Notification { - val openIntent = PendingIntent.getActivity( - this, 0, - Intent(this, MainActivity::class.java), - PendingIntent.FLAG_IMMUTABLE - ) - - val disconnectIntent = PendingIntent.getService( - this, 1, - Intent(this, ConnectionService::class.java).apply { - action = ACTION_DISCONNECT - }, - PendingIntent.FLAG_IMMUTABLE - ) - - return NotificationCompat.Builder(this, CHANNEL_ID) - .setContentTitle("DroidClaw") - .setContentText(text) - .setSmallIcon(R.drawable.ic_launcher_foreground) - .setOngoing(true) - .setContentIntent(openIntent) - .addAction(0, "Disconnect", disconnectIntent) - .build() - } - - private fun updateNotification(text: String) { - val nm = getSystemService(NotificationManager::class.java) - nm.notify(NOTIFICATION_ID, buildNotification(text)) - } - - // --- Wake Lock --- - - private fun acquireWakeLock() { - val pm = getSystemService(Context.POWER_SERVICE) as PowerManager - wakeLock = pm.newWakeLock( - PowerManager.PARTIAL_WAKE_LOCK, - "DroidClaw::ConnectionWakeLock" - ).apply { - acquire(10 * 60 * 1000L) // 10 minutes max - } - } - - private fun releaseWakeLock() { - wakeLock?.let { - if (it.isHeld) it.release() - } - wakeLock = null - } -} -``` - -**Step 3: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 4: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/connection/ConnectionService.kt android/app/src/main/java/com/thisux/droidclaw/util/DeviceInfo.kt -git commit -m "feat(android): add ConnectionService foreground service and DeviceInfo helper" -``` - ---- - -### Task 10: UI — Navigation + HomeScreen - -**Files:** -- Modify: `android/app/src/main/java/com/thisux/droidclaw/MainActivity.kt` -- Create: `android/app/src/main/java/com/thisux/droidclaw/ui/screens/HomeScreen.kt` - -**Step 1: Create HomeScreen.kt** - -```kotlin -package com.thisux.droidclaw.ui.screens - -import android.content.Intent -import androidx.compose.foundation.background -import androidx.compose.foundation.layout.Arrangement -import androidx.compose.foundation.layout.Box -import androidx.compose.foundation.layout.Column -import androidx.compose.foundation.layout.Row -import androidx.compose.foundation.layout.Spacer -import androidx.compose.foundation.layout.fillMaxSize -import androidx.compose.foundation.layout.fillMaxWidth -import androidx.compose.foundation.layout.height -import androidx.compose.foundation.layout.padding -import androidx.compose.foundation.layout.size -import androidx.compose.foundation.lazy.LazyColumn -import androidx.compose.foundation.lazy.items -import androidx.compose.foundation.shape.CircleShape -import androidx.compose.material3.Button -import androidx.compose.material3.Card -import androidx.compose.material3.MaterialTheme -import androidx.compose.material3.OutlinedTextField -import androidx.compose.material3.Text -import androidx.compose.runtime.Composable -import androidx.compose.runtime.collectAsState -import androidx.compose.runtime.getValue -import androidx.compose.runtime.mutableStateOf -import androidx.compose.runtime.remember -import androidx.compose.runtime.setValue -import androidx.compose.ui.Alignment -import androidx.compose.ui.Modifier -import androidx.compose.ui.draw.clip -import androidx.compose.ui.graphics.Color -import androidx.compose.ui.platform.LocalContext -import androidx.compose.ui.unit.dp -import com.thisux.droidclaw.connection.ConnectionService -import com.thisux.droidclaw.model.ConnectionState -import com.thisux.droidclaw.model.GoalStatus - -@Composable -fun HomeScreen() { - val context = LocalContext.current - val connectionState by ConnectionService.connectionState.collectAsState() - val goalStatus by ConnectionService.currentGoalStatus.collectAsState() - val steps by ConnectionService.currentSteps.collectAsState() - val currentGoal by ConnectionService.currentGoal.collectAsState() - val errorMessage by ConnectionService.errorMessage.collectAsState() - - var goalInput by remember { mutableStateOf("") } - - Column( - modifier = Modifier - .fillMaxSize() - .padding(16.dp) - ) { - // Status Badge - Row( - verticalAlignment = Alignment.CenterVertically, - modifier = Modifier.fillMaxWidth() - ) { - Box( - modifier = Modifier - .size(12.dp) - .clip(CircleShape) - .background( - when (connectionState) { - ConnectionState.Connected -> Color(0xFF4CAF50) - ConnectionState.Connecting -> Color(0xFFFFC107) - ConnectionState.Error -> Color(0xFFF44336) - ConnectionState.Disconnected -> Color.Gray - } - ) - ) - Text( - text = when (connectionState) { - ConnectionState.Connected -> "Connected to server" - ConnectionState.Connecting -> "Connecting..." - ConnectionState.Error -> errorMessage ?: "Connection error" - ConnectionState.Disconnected -> "Disconnected" - }, - style = MaterialTheme.typography.bodyLarge, - modifier = Modifier.padding(start = 8.dp) - ) - } - - Spacer(modifier = Modifier.height(8.dp)) - - // Connect/Disconnect button - Button( - onClick = { - val intent = Intent(context, ConnectionService::class.java).apply { - action = if (connectionState == ConnectionState.Disconnected || connectionState == ConnectionState.Error) { - ConnectionService.ACTION_CONNECT - } else { - ConnectionService.ACTION_DISCONNECT - } - } - context.startForegroundService(intent) - }, - modifier = Modifier.fillMaxWidth() - ) { - Text( - when (connectionState) { - ConnectionState.Disconnected, ConnectionState.Error -> "Connect" - else -> "Disconnect" - } - ) - } - - Spacer(modifier = Modifier.height(16.dp)) - - // Goal Input - Row( - modifier = Modifier.fillMaxWidth(), - horizontalArrangement = Arrangement.spacedBy(8.dp) - ) { - OutlinedTextField( - value = goalInput, - onValueChange = { goalInput = it }, - label = { Text("Enter a goal...") }, - modifier = Modifier.weight(1f), - enabled = connectionState == ConnectionState.Connected && goalStatus != GoalStatus.Running, - singleLine = true - ) - Button( - onClick = { - if (goalInput.isNotBlank()) { - val intent = Intent(context, ConnectionService::class.java).apply { - action = ConnectionService.ACTION_SEND_GOAL - putExtra(ConnectionService.EXTRA_GOAL, goalInput) - } - context.startService(intent) - goalInput = "" - } - }, - enabled = connectionState == ConnectionState.Connected - && goalStatus != GoalStatus.Running - && goalInput.isNotBlank() - ) { - Text("Run") - } - } - - // Current goal - if (currentGoal.isNotEmpty()) { - Spacer(modifier = Modifier.height(8.dp)) - Text( - text = "Goal: $currentGoal", - style = MaterialTheme.typography.titleSmall, - color = MaterialTheme.colorScheme.primary - ) - } - - Spacer(modifier = Modifier.height(16.dp)) - - // Step Log - LazyColumn( - modifier = Modifier.weight(1f), - verticalArrangement = Arrangement.spacedBy(8.dp) - ) { - items(steps) { step -> - Card( - modifier = Modifier.fillMaxWidth() - ) { - Column(modifier = Modifier.padding(12.dp)) { - Text( - text = "Step ${step.step}: ${step.action}", - style = MaterialTheme.typography.titleSmall - ) - if (step.reasoning.isNotEmpty()) { - Text( - text = step.reasoning, - style = MaterialTheme.typography.bodySmall, - color = MaterialTheme.colorScheme.onSurfaceVariant - ) - } - } - } - } - } - - // Goal Status - if (goalStatus == GoalStatus.Completed || goalStatus == GoalStatus.Failed) { - Spacer(modifier = Modifier.height(8.dp)) - Text( - text = if (goalStatus == GoalStatus.Completed) { - "Goal completed (${steps.size} steps)" - } else { - "Goal failed" - }, - style = MaterialTheme.typography.titleMedium, - color = if (goalStatus == GoalStatus.Completed) { - Color(0xFF4CAF50) - } else { - MaterialTheme.colorScheme.error - } - ) - } - } -} -``` - -**Step 2: Rewrite MainActivity.kt with bottom nav** - -```kotlin -package com.thisux.droidclaw - -import android.os.Bundle -import androidx.activity.ComponentActivity -import androidx.activity.compose.setContent -import androidx.activity.enableEdgeToEdge -import androidx.compose.foundation.layout.fillMaxSize -import androidx.compose.foundation.layout.padding -import androidx.compose.material.icons.Icons -import androidx.compose.material.icons.filled.History -import androidx.compose.material.icons.filled.Home -import androidx.compose.material.icons.filled.Settings -import androidx.compose.material3.Icon -import androidx.compose.material3.NavigationBar -import androidx.compose.material3.NavigationBarItem -import androidx.compose.material3.Scaffold -import androidx.compose.material3.Text -import androidx.compose.runtime.Composable -import androidx.compose.runtime.getValue -import androidx.compose.ui.Modifier -import androidx.navigation.NavDestination.Companion.hierarchy -import androidx.navigation.NavGraph.Companion.findStartDestination -import androidx.navigation.compose.NavHost -import androidx.navigation.compose.composable -import androidx.navigation.compose.currentBackStackEntryAsState -import androidx.navigation.compose.rememberNavController -import com.thisux.droidclaw.ui.screens.HomeScreen -import com.thisux.droidclaw.ui.screens.LogsScreen -import com.thisux.droidclaw.ui.screens.SettingsScreen -import com.thisux.droidclaw.ui.theme.DroidClawTheme - -sealed class Screen(val route: String, val label: String) { - data object Home : Screen("home", "Home") - data object Settings : Screen("settings", "Settings") - data object Logs : Screen("logs", "Logs") -} - -class MainActivity : ComponentActivity() { - override fun onCreate(savedInstanceState: Bundle?) { - super.onCreate(savedInstanceState) - enableEdgeToEdge() - setContent { - DroidClawTheme { - MainNavigation() - } - } - } -} - -@Composable -fun MainNavigation() { - val navController = rememberNavController() - val screens = listOf(Screen.Home, Screen.Settings, Screen.Logs) - - Scaffold( - modifier = Modifier.fillMaxSize(), - bottomBar = { - NavigationBar { - val navBackStackEntry by navController.currentBackStackEntryAsState() - val currentDestination = navBackStackEntry?.destination - - screens.forEach { screen -> - NavigationBarItem( - icon = { - Icon( - when (screen) { - is Screen.Home -> Icons.Filled.Home - is Screen.Settings -> Icons.Filled.Settings - is Screen.Logs -> Icons.Filled.History - }, - contentDescription = screen.label - ) - }, - label = { Text(screen.label) }, - selected = currentDestination?.hierarchy?.any { it.route == screen.route } == true, - onClick = { - navController.navigate(screen.route) { - popUpTo(navController.graph.findStartDestination().id) { - saveState = true - } - launchSingleTop = true - restoreState = true - } - } - ) - } - } - } - ) { innerPadding -> - NavHost( - navController = navController, - startDestination = Screen.Home.route, - modifier = Modifier.padding(innerPadding) - ) { - composable(Screen.Home.route) { HomeScreen() } - composable(Screen.Settings.route) { SettingsScreen() } - composable(Screen.Logs.route) { LogsScreen() } - } - } -} -``` - -**Step 3: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 4: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/ui/screens/HomeScreen.kt android/app/src/main/java/com/thisux/droidclaw/MainActivity.kt -git commit -m "feat(android): add HomeScreen with goal input and bottom nav" -``` - ---- - -### Task 11: UI — SettingsScreen - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/ui/screens/SettingsScreen.kt` -- Create: `android/app/src/main/java/com/thisux/droidclaw/util/BatteryOptimization.kt` - -**Step 1: Create BatteryOptimization.kt** - -```kotlin -package com.thisux.droidclaw.util - -import android.content.Context -import android.content.Intent -import android.net.Uri -import android.os.PowerManager -import android.provider.Settings - -object BatteryOptimization { - fun isIgnoringBatteryOptimizations(context: Context): Boolean { - val pm = context.getSystemService(Context.POWER_SERVICE) as PowerManager - return pm.isIgnoringBatteryOptimizations(context.packageName) - } - - fun requestExemption(context: Context) { - val intent = Intent(Settings.ACTION_REQUEST_IGNORE_BATTERY_OPTIMIZATIONS).apply { - data = Uri.parse("package:${context.packageName}") - } - context.startActivity(intent) - } - - fun openAccessibilitySettings(context: Context) { - context.startActivity(Intent(Settings.ACTION_ACCESSIBILITY_SETTINGS)) - } -} -``` - -**Step 2: Create SettingsScreen.kt** - -```kotlin -package com.thisux.droidclaw.ui.screens - -import androidx.compose.foundation.layout.Arrangement -import androidx.compose.foundation.layout.Column -import androidx.compose.foundation.layout.Row -import androidx.compose.foundation.layout.Spacer -import androidx.compose.foundation.layout.fillMaxSize -import androidx.compose.foundation.layout.fillMaxWidth -import androidx.compose.foundation.layout.height -import androidx.compose.foundation.layout.padding -import androidx.compose.foundation.rememberScrollState -import androidx.compose.foundation.verticalScroll -import androidx.compose.material.icons.Icons -import androidx.compose.material.icons.filled.CheckCircle -import androidx.compose.material.icons.filled.Error -import androidx.compose.material3.Card -import androidx.compose.material3.CardDefaults -import androidx.compose.material3.Icon -import androidx.compose.material3.MaterialTheme -import androidx.compose.material3.OutlinedButton -import androidx.compose.material3.OutlinedTextField -import androidx.compose.material3.Text -import androidx.compose.runtime.Composable -import androidx.compose.runtime.collectAsState -import androidx.compose.runtime.getValue -import androidx.compose.runtime.mutableStateOf -import androidx.compose.runtime.remember -import androidx.compose.runtime.rememberCoroutineScope -import androidx.compose.runtime.setValue -import androidx.compose.ui.Alignment -import androidx.compose.ui.Modifier -import androidx.compose.ui.graphics.Color -import androidx.compose.ui.platform.LocalContext -import androidx.compose.ui.text.input.PasswordVisualTransformation -import androidx.compose.ui.unit.dp -import com.thisux.droidclaw.DroidClawApp -import com.thisux.droidclaw.accessibility.DroidClawAccessibilityService -import com.thisux.droidclaw.capture.ScreenCaptureManager -import com.thisux.droidclaw.util.BatteryOptimization -import kotlinx.coroutines.launch - -@Composable -fun SettingsScreen() { - val context = LocalContext.current - val app = context.applicationContext as DroidClawApp - val scope = rememberCoroutineScope() - - val apiKey by app.settingsStore.apiKey.collectAsState(initial = "") - val serverUrl by app.settingsStore.serverUrl.collectAsState(initial = "wss://localhost:8080") - - var editingApiKey by remember(apiKey) { mutableStateOf(apiKey) } - var editingServerUrl by remember(serverUrl) { mutableStateOf(serverUrl) } - - val isAccessibilityEnabled by DroidClawAccessibilityService.isRunning.collectAsState() - val isCaptureAvailable by ScreenCaptureManager.isAvailable.collectAsState() - val isBatteryExempt = remember { BatteryOptimization.isIgnoringBatteryOptimizations(context) } - - Column( - modifier = Modifier - .fillMaxSize() - .padding(16.dp) - .verticalScroll(rememberScrollState()), - verticalArrangement = Arrangement.spacedBy(16.dp) - ) { - Text("Settings", style = MaterialTheme.typography.headlineMedium) - - // API Key - OutlinedTextField( - value = editingApiKey, - onValueChange = { editingApiKey = it }, - label = { Text("API Key") }, - modifier = Modifier.fillMaxWidth(), - visualTransformation = PasswordVisualTransformation(), - singleLine = true - ) - if (editingApiKey != apiKey) { - OutlinedButton( - onClick = { scope.launch { app.settingsStore.setApiKey(editingApiKey) } } - ) { - Text("Save API Key") - } - } - - // Server URL - OutlinedTextField( - value = editingServerUrl, - onValueChange = { editingServerUrl = it }, - label = { Text("Server URL") }, - modifier = Modifier.fillMaxWidth(), - singleLine = true - ) - if (editingServerUrl != serverUrl) { - OutlinedButton( - onClick = { scope.launch { app.settingsStore.setServerUrl(editingServerUrl) } } - ) { - Text("Save Server URL") - } - } - - Spacer(modifier = Modifier.height(8.dp)) - - // Setup Checklist - Text("Setup Checklist", style = MaterialTheme.typography.titleMedium) - - ChecklistItem( - label = "API key configured", - isOk = apiKey.isNotBlank(), - actionLabel = null, - onAction = {} - ) - - ChecklistItem( - label = "Accessibility service", - isOk = isAccessibilityEnabled, - actionLabel = "Enable", - onAction = { BatteryOptimization.openAccessibilitySettings(context) } - ) - - ChecklistItem( - label = "Screen capture permission", - isOk = isCaptureAvailable, - actionLabel = null, - onAction = {} - ) - - ChecklistItem( - label = "Battery optimization disabled", - isOk = isBatteryExempt, - actionLabel = "Disable", - onAction = { BatteryOptimization.requestExemption(context) } - ) - } -} - -@Composable -private fun ChecklistItem( - label: String, - isOk: Boolean, - actionLabel: String?, - onAction: () -> Unit -) { - Card( - modifier = Modifier.fillMaxWidth(), - colors = CardDefaults.cardColors( - containerColor = if (isOk) { - MaterialTheme.colorScheme.secondaryContainer - } else { - MaterialTheme.colorScheme.errorContainer.copy(alpha = 0.3f) - } - ) - ) { - Row( - modifier = Modifier - .fillMaxWidth() - .padding(12.dp), - verticalAlignment = Alignment.CenterVertically, - horizontalArrangement = Arrangement.SpaceBetween - ) { - Row( - verticalAlignment = Alignment.CenterVertically, - horizontalArrangement = Arrangement.spacedBy(8.dp) - ) { - Icon( - imageVector = if (isOk) Icons.Filled.CheckCircle else Icons.Filled.Error, - contentDescription = if (isOk) "OK" else "Missing", - tint = if (isOk) Color(0xFF4CAF50) else MaterialTheme.colorScheme.error - ) - Text(label) - } - if (!isOk && actionLabel != null) { - OutlinedButton(onClick = onAction) { - Text(actionLabel) - } - } - } - } -} -``` - -**Step 3: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 4: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/ui/screens/SettingsScreen.kt android/app/src/main/java/com/thisux/droidclaw/util/BatteryOptimization.kt -git commit -m "feat(android): add SettingsScreen with checklist and BatteryOptimization util" -``` - ---- - -### Task 12: UI — LogsScreen - -**Files:** -- Create: `android/app/src/main/java/com/thisux/droidclaw/ui/screens/LogsScreen.kt` - -**Step 1: Create LogsScreen.kt** - -```kotlin -package com.thisux.droidclaw.ui.screens - -import androidx.compose.foundation.clickable -import androidx.compose.foundation.layout.Arrangement -import androidx.compose.foundation.layout.Column -import androidx.compose.foundation.layout.Row -import androidx.compose.foundation.layout.fillMaxSize -import androidx.compose.foundation.layout.fillMaxWidth -import androidx.compose.foundation.layout.padding -import androidx.compose.foundation.lazy.LazyColumn -import androidx.compose.foundation.lazy.items -import androidx.compose.material3.Card -import androidx.compose.material3.MaterialTheme -import androidx.compose.material3.Text -import androidx.compose.runtime.Composable -import androidx.compose.runtime.collectAsState -import androidx.compose.runtime.getValue -import androidx.compose.runtime.mutableStateOf -import androidx.compose.runtime.remember -import androidx.compose.runtime.setValue -import androidx.compose.ui.Modifier -import androidx.compose.ui.graphics.Color -import androidx.compose.ui.unit.dp -import com.thisux.droidclaw.connection.ConnectionService -import com.thisux.droidclaw.model.GoalStatus - -@Composable -fun LogsScreen() { - val steps by ConnectionService.currentSteps.collectAsState() - val goalStatus by ConnectionService.currentGoalStatus.collectAsState() - val currentGoal by ConnectionService.currentGoal.collectAsState() - - Column( - modifier = Modifier - .fillMaxSize() - .padding(16.dp) - ) { - Text("Logs", style = MaterialTheme.typography.headlineMedium) - - if (currentGoal.isNotEmpty()) { - Row( - modifier = Modifier - .fillMaxWidth() - .padding(vertical = 8.dp), - horizontalArrangement = Arrangement.SpaceBetween - ) { - Text( - text = currentGoal, - style = MaterialTheme.typography.titleSmall - ) - Text( - text = when (goalStatus) { - GoalStatus.Running -> "Running" - GoalStatus.Completed -> "Completed" - GoalStatus.Failed -> "Failed" - GoalStatus.Idle -> "Idle" - }, - color = when (goalStatus) { - GoalStatus.Running -> Color(0xFFFFC107) - GoalStatus.Completed -> Color(0xFF4CAF50) - GoalStatus.Failed -> MaterialTheme.colorScheme.error - GoalStatus.Idle -> Color.Gray - } - ) - } - } - - if (steps.isEmpty()) { - Text( - text = "No steps recorded yet. Submit a goal to see agent activity here.", - style = MaterialTheme.typography.bodyMedium, - color = MaterialTheme.colorScheme.onSurfaceVariant, - modifier = Modifier.padding(top = 16.dp) - ) - } else { - LazyColumn( - modifier = Modifier.weight(1f), - verticalArrangement = Arrangement.spacedBy(8.dp) - ) { - items(steps) { step -> - var expanded by remember { mutableStateOf(false) } - Card( - modifier = Modifier - .fillMaxWidth() - .clickable { expanded = !expanded } - ) { - Column(modifier = Modifier.padding(12.dp)) { - Text( - text = "Step ${step.step}: ${step.action}", - style = MaterialTheme.typography.titleSmall - ) - if (expanded && step.reasoning.isNotEmpty()) { - Text( - text = step.reasoning, - style = MaterialTheme.typography.bodySmall, - color = MaterialTheme.colorScheme.onSurfaceVariant, - modifier = Modifier.padding(top = 4.dp) - ) - } - } - } - } - } - } - } -} -``` - -**Step 2: Verify build compiles** - -Run: `cd android && ./gradlew assembleDebug` -Expected: BUILD SUCCESSFUL - -**Step 3: Commit** - -```bash -git add android/app/src/main/java/com/thisux/droidclaw/ui/screens/LogsScreen.kt -git commit -m "feat(android): add LogsScreen with expandable step cards" -``` - ---- - -### Task 13: Final Integration & Build Verification - -**Files:** -- All previously created files - -**Step 1: Full clean build** - -Run: `cd android && ./gradlew clean assembleDebug` -Expected: BUILD SUCCESSFUL with 0 errors - -**Step 2: Verify APK exists** - -Run: `ls -la android/app/build/outputs/apk/debug/app-debug.apk` -Expected: File exists - -**Step 3: Commit all remaining changes** - -```bash -cd android && git add -A && git status -git commit -m "feat(android): complete DroidClaw v1 companion app - -- Accessibility service with ScreenTreeBuilder and GestureExecutor -- Ktor WebSocket with reliable reconnection and auth handshake -- Foreground ConnectionService with notification -- MediaProjection screen capture with vision fallback -- DataStore settings for API key, server URL -- Compose UI with bottom nav (Home, Settings, Logs) -- Home: connection status, goal input, live step log -- Settings: API key, server URL, setup checklist -- Logs: expandable step cards with reasoning" -``` - ---- - -## Execution Notes - -**Build requirement:** The Android project requires Android Studio or at minimum the Android SDK with API 36 installed. Gradle commands run via `./gradlew` wrapper in the `android/` directory. - -**Testing on device:** After building, install via `adb install android/app/build/outputs/apk/debug/app-debug.apk`. Then: -1. Open Settings > Accessibility > DroidClaw and enable the service -2. Open the app > Settings tab > enter API key and server URL -3. Go to Home tab > tap Connect -4. Enter a goal and tap Run - -**Not covered in v1 (future work):** -- Persistent session history (LogsScreen clears on restart) -- MediaProjection consent flow wired through UI (currently needs manual setup) -- Auto-connect on boot -- OEM-specific battery guidance (dontkillmyapp.com links) diff --git a/docs/plans/2026-02-17-option1-implementation-plan.md b/docs/plans/2026-02-17-option1-implementation-plan.md deleted file mode 100644 index a74586b..0000000 --- a/docs/plans/2026-02-17-option1-implementation-plan.md +++ /dev/null @@ -1,2394 +0,0 @@ -# Option 1: Web + Backend Implementation Plan - -> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. - -**Goal:** Build the SvelteKit dashboard and Hono.js backend so users can sign up, manage API keys, configure LLM providers, connect Android devices via WebSocket, and run the DroidClaw agent loop from the browser. - -**Architecture:** Monorepo with `packages/shared/` (types), `server/` (Hono + Bun WebSocket + agent loop), and `web/` (SvelteKit dashboard). Both services share the same Postgres via Drizzle. Better Auth handles user auth (SvelteKit) and API key verification (Hono). The agent loop runs server-side, sending commands to connected phones via WebSocket. - -**Tech Stack:** SvelteKit 2 (Svelte 5, node adapter), Hono.js (Bun), Drizzle ORM, Postgres, Better Auth (apiKey plugin), Tailwind v4, Valibot, TypeScript. - -**Design doc:** `docs/plans/2026-02-17-option1-web-backend-design.md` - ---- - -## Task 1: Shared Types Package - -**Files:** -- Create: `packages/shared/package.json` -- Create: `packages/shared/tsconfig.json` -- Create: `packages/shared/src/index.ts` -- Create: `packages/shared/src/types.ts` -- Create: `packages/shared/src/protocol.ts` - -**Step 1: Create package.json** - -```json -{ - "name": "@droidclaw/shared", - "version": "0.0.1", - "type": "module", - "exports": { - ".": "./src/index.ts" - }, - "scripts": { - "typecheck": "tsc --noEmit" - }, - "devDependencies": { - "typescript": "^5.9.2" - } -} -``` - -**Step 2: Create tsconfig.json** - -```json -{ - "compilerOptions": { - "target": "ES2022", - "module": "ES2022", - "moduleResolution": "bundler", - "strict": true, - "esModuleInterop": true, - "skipLibCheck": true, - "declaration": true, - "outDir": "dist", - "rootDir": "src" - }, - "include": ["src/**/*.ts"] -} -``` - -**Step 3: Create types.ts** - -Port the core types from `src/sanitizer.ts` (UIElement) and `src/actions.ts` (ActionDecision, ActionResult) into shared types that work for both ADB and WebSocket connections. - -```typescript -// packages/shared/src/types.ts - -export interface UIElement { - id: string; - text: string; - type: string; - bounds: string; - center: [number, number]; - size: [number, number]; - clickable: boolean; - editable: boolean; - enabled: boolean; - checked: boolean; - focused: boolean; - selected: boolean; - scrollable: boolean; - longClickable: boolean; - password: boolean; - hint: string; - action: "tap" | "type" | "longpress" | "scroll" | "read"; - parent: string; - depth: number; -} - -export interface ActionDecision { - action: string; - coordinates?: [number, number]; - text?: string; - direction?: string; - reason?: string; - package?: string; - activity?: string; - uri?: string; - extras?: Record; - command?: string; - filename?: string; - think?: string; - plan?: string[]; - planProgress?: string; - skill?: string; - query?: string; - url?: string; - path?: string; - source?: string; - dest?: string; - code?: number; - setting?: string; -} - -export interface ActionResult { - success: boolean; - message: string; - data?: string; -} - -export interface DeviceInfo { - model: string; - androidVersion: string; - screenWidth: number; - screenHeight: number; -} - -export interface ScreenState { - elements: UIElement[]; - screenshot?: string; // base64 PNG - packageName?: string; - fallbackReason?: string; -} -``` - -**Step 4: Create protocol.ts** - -```typescript -// packages/shared/src/protocol.ts - -import type { UIElement, ActionResult, DeviceInfo } from "./types.js"; - -// --- Device -> Server messages --- - -export type DeviceMessage = - | { type: "auth"; apiKey: string; deviceInfo?: DeviceInfo } - | { type: "screen"; requestId: string; elements: UIElement[]; screenshot?: string; packageName?: string } - | { type: "result"; requestId: string; success: boolean; error?: string; data?: string } - | { type: "goal"; text: string } - | { type: "pong" }; - -// --- Server -> Device messages --- - -export type ServerToDeviceMessage = - | { type: "auth_ok"; deviceId: string } - | { type: "auth_error"; message: string } - | { type: "get_screen"; requestId: string } - | { type: "tap"; requestId: string; x: number; y: number } - | { type: "type"; requestId: string; text: string } - | { type: "swipe"; requestId: string; x1: number; y1: number; x2: number; y2: number; duration?: number } - | { type: "enter"; requestId: string } - | { type: "back"; requestId: string } - | { type: "home"; requestId: string } - | { type: "longpress"; requestId: string; x: number; y: number } - | { type: "launch"; requestId: string; packageName: string } - | { type: "clear"; requestId: string } - | { type: "clipboard_set"; requestId: string; text: string } - | { type: "clipboard_get"; requestId: string } - | { type: "paste"; requestId: string } - | { type: "open_url"; requestId: string; url: string } - | { type: "switch_app"; requestId: string; packageName: string } - | { type: "notifications"; requestId: string } - | { type: "keyevent"; requestId: string; code: number } - | { type: "open_settings"; requestId: string } - | { type: "wait"; requestId: string; duration?: number } - | { type: "ping" } - | { type: "goal_started"; sessionId: string; goal: string } - | { type: "goal_completed"; sessionId: string; success: boolean; stepsUsed: number }; - -// --- Server -> Dashboard messages --- - -export type DashboardMessage = - | { type: "device_online"; deviceId: string; name: string } - | { type: "device_offline"; deviceId: string } - | { type: "step"; sessionId: string; step: number; action: Record; reasoning: string; screenHash: string } - | { type: "goal_started"; sessionId: string; goal: string; deviceId: string } - | { type: "goal_completed"; sessionId: string; success: boolean; stepsUsed: number }; -``` - -**Step 5: Create index.ts (barrel export)** - -```typescript -export * from "./types.js"; -export * from "./protocol.js"; -``` - -**Step 6: Install dependencies and verify typecheck** - -```bash -cd packages/shared && bun install && bun run typecheck -``` - -**Step 7: Commit** - -```bash -git add packages/shared -git commit -m "feat: add @droidclaw/shared types package" -``` - ---- - -## Task 2: Hono Server Scaffolding - -**Files:** -- Create: `server/package.json` -- Create: `server/tsconfig.json` -- Create: `server/src/index.ts` -- Create: `server/src/env.ts` -- Create: `server/src/db.ts` -- Create: `server/src/auth.ts` -- Create: `server/.env.example` -- Create: `server/Dockerfile` - -**Step 1: Create package.json** - -```json -{ - "name": "@droidclaw/server", - "version": "0.0.1", - "type": "module", - "scripts": { - "dev": "bun --watch src/index.ts", - "start": "bun src/index.ts", - "typecheck": "tsc --noEmit", - "db:push": "drizzle-kit push", - "db:generate": "drizzle-kit generate", - "db:migrate": "drizzle-kit migrate" - }, - "dependencies": { - "hono": "^4.7.0", - "better-auth": "^1.3.27", - "drizzle-orm": "^0.44.5", - "postgres": "^3.4.7" - }, - "devDependencies": { - "@types/bun": "^1.1.0", - "drizzle-kit": "^0.31.4", - "typescript": "^5.9.2" - } -} -``` - -**Step 2: Create tsconfig.json** - -```json -{ - "compilerOptions": { - "target": "ES2022", - "module": "ES2022", - "moduleResolution": "bundler", - "strict": true, - "esModuleInterop": true, - "skipLibCheck": true, - "outDir": "dist", - "rootDir": "src", - "types": ["bun-types"], - "paths": { - "@droidclaw/shared": ["../packages/shared/src"] - } - }, - "include": ["src/**/*.ts"] -} -``` - -**Step 3: Create .env.example** - -``` -DATABASE_URL="postgres://user:password@host:port/db-name" -PORT=8080 -CORS_ORIGIN="http://localhost:5173" -``` - -**Step 4: Create env.ts** - -```typescript -// server/src/env.ts - -export const env = { - DATABASE_URL: process.env.DATABASE_URL!, - PORT: parseInt(process.env.PORT || "8080"), - CORS_ORIGIN: process.env.CORS_ORIGIN || "http://localhost:5173", -}; - -if (!env.DATABASE_URL) { - throw new Error("DATABASE_URL is not set"); -} -``` - -**Step 5: Create db.ts** - -```typescript -// server/src/db.ts - -import { drizzle } from "drizzle-orm/postgres-js"; -import postgres from "postgres"; -import { env } from "./env.js"; - -const client = postgres(env.DATABASE_URL); -export const db = drizzle(client); -``` - -**Step 6: Create auth.ts** - -Better Auth instance with apiKey plugin, pointing to same Postgres. No sveltekitCookies — Hono uses its own session middleware. - -```typescript -// server/src/auth.ts - -import { betterAuth } from "better-auth"; -import { apiKey } from "better-auth/plugins"; -import { drizzleAdapter } from "better-auth/adapters/drizzle"; -import { db } from "./db.js"; - -export const auth = betterAuth({ - database: drizzleAdapter(db, { - provider: "pg", - }), - plugins: [apiKey()], -}); -``` - -**Step 7: Create index.ts** - -Minimal Hono app with Better Auth handler, CORS, health check. WebSocket upgrade via Bun.serve. - -```typescript -// server/src/index.ts - -import { Hono } from "hono"; -import { cors } from "hono/cors"; -import { auth } from "./auth.js"; -import { env } from "./env.js"; - -const app = new Hono(); - -// CORS for dashboard -app.use( - "*", - cors({ - origin: env.CORS_ORIGIN, - allowHeaders: ["Content-Type", "Authorization"], - allowMethods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"], - credentials: true, - }) -); - -// Better Auth handler -app.on(["POST", "GET"], "/api/auth/*", (c) => { - return auth.handler(c.req.raw); -}); - -// Health check -app.get("/health", (c) => c.json({ status: "ok" })); - -// Start server with WebSocket support -const server = Bun.serve({ - port: env.PORT, - fetch: app.fetch, - websocket: { - open(ws) { - console.log("WebSocket connected"); - }, - message(ws, message) { - // placeholder — Task 4 implements device/dashboard handlers - }, - close(ws) { - console.log("WebSocket disconnected"); - }, - }, -}); - -console.log(`Server running on port ${server.port}`); -``` - -**Step 8: Create Dockerfile** - -```dockerfile -FROM oven/bun:1 - -WORKDIR /app - -COPY packages/shared ./packages/shared -COPY server ./server - -WORKDIR /app/server -RUN bun install - -EXPOSE 8080 -CMD ["bun", "src/index.ts"] -``` - -**Step 9: Install dependencies and verify** - -```bash -cd server && bun install && bun run typecheck -``` - -**Step 10: Start dev server and test health endpoint** - -```bash -cd server && bun run dev -# In another terminal: -curl http://localhost:8080/health -# Expected: {"status":"ok"} -``` - -**Step 11: Commit** - -```bash -git add server -git commit -m "feat: scaffold Hono server with auth and health check" -``` - ---- - -## Task 3: Extended Database Schema - -**Files:** -- Modify: `web/src/lib/server/db/schema.ts` (add new tables) -- Modify: `web/src/lib/server/auth.ts` (add apiKey plugin) -- Modify: `web/src/lib/auth-client.ts` (add apiKey client plugin) - -**Step 1: Add apiKey plugin to Better Auth server config** - -In `web/src/lib/server/auth.ts`, add the apiKey plugin: - -```typescript -import { betterAuth } from 'better-auth'; -import { sveltekitCookies } from 'better-auth/svelte-kit'; -import { apiKey } from 'better-auth/plugins'; -import { drizzleAdapter } from 'better-auth/adapters/drizzle'; -import { db } from './db'; -import { getRequestEvent } from '$app/server'; - -export const auth = betterAuth({ - database: drizzleAdapter(db, { - provider: 'pg' - }), - plugins: [sveltekitCookies(getRequestEvent), apiKey()], - emailAndPassword: { - enabled: true - } -}); -``` - -**Step 2: Add apiKey client plugin** - -In `web/src/lib/auth-client.ts`: - -```typescript -import { createAuthClient } from 'better-auth/svelte'; -import { apiKeyClient } from 'better-auth/client/plugins'; - -export const authClient = createAuthClient({ - baseURL: 'http://localhost:5173', - plugins: [apiKeyClient()] -}); -``` - -**Step 3: Add new tables to schema.ts** - -Append to `web/src/lib/server/db/schema.ts`: - -```typescript -import { pgTable, text, timestamp, boolean, integer, jsonb } from 'drizzle-orm/pg-core'; - -// ... existing user, session, account, verification tables stay unchanged ... - -export const llmConfig = pgTable('llm_config', { - id: text('id').primaryKey(), - userId: text('user_id') - .notNull() - .references(() => user.id, { onDelete: 'cascade' }), - provider: text('provider').notNull(), // openai | groq | ollama | bedrock | openrouter - apiKey: text('api_key').notNull(), // encrypted - model: text('model'), - createdAt: timestamp('created_at').defaultNow().notNull(), - updatedAt: timestamp('updated_at') - .defaultNow() - .$onUpdate(() => new Date()) - .notNull() -}); - -export const device = pgTable('device', { - id: text('id').primaryKey(), - userId: text('user_id') - .notNull() - .references(() => user.id, { onDelete: 'cascade' }), - name: text('name').notNull(), - lastSeen: timestamp('last_seen'), - status: text('status').notNull().default('offline'), // online | offline - deviceInfo: jsonb('device_info'), // { model, androidVersion, screenWidth, screenHeight } - createdAt: timestamp('created_at').defaultNow().notNull() -}); - -export const agentSession = pgTable('agent_session', { - id: text('id').primaryKey(), - userId: text('user_id') - .notNull() - .references(() => user.id, { onDelete: 'cascade' }), - deviceId: text('device_id') - .notNull() - .references(() => device.id, { onDelete: 'cascade' }), - goal: text('goal').notNull(), - status: text('status').notNull().default('running'), // running | completed | failed | cancelled - stepsUsed: integer('steps_used').default(0), - startedAt: timestamp('started_at').defaultNow().notNull(), - completedAt: timestamp('completed_at') -}); - -export const agentStep = pgTable('agent_step', { - id: text('id').primaryKey(), - sessionId: text('session_id') - .notNull() - .references(() => agentSession.id, { onDelete: 'cascade' }), - stepNumber: integer('step_number').notNull(), - screenHash: text('screen_hash'), - action: jsonb('action'), - reasoning: text('reasoning'), - result: text('result'), - timestamp: timestamp('timestamp').defaultNow().notNull() -}); -``` - -**Step 4: Generate and run migration** - -```bash -cd web && bun run db:generate && bun run db:push -``` - -**Step 5: Verify Better Auth apiKey table was created** - -Better Auth auto-manages its `api_key` table. Check with: - -```bash -cd web && bun run db:studio -``` - -Verify tables exist: `user`, `session`, `account`, `verification`, `api_key`, `llm_config`, `device`, `agent_session`, `agent_step`. - -**Step 6: Commit** - -```bash -git add web/src/lib/server/db/schema.ts web/src/lib/server/auth.ts web/src/lib/auth-client.ts web/drizzle/ -git commit -m "feat: add apiKey plugin and new schema tables" -``` - ---- - -## Task 4: Hono WebSocket Handlers - -**Files:** -- Create: `server/src/ws/sessions.ts` -- Create: `server/src/ws/device.ts` -- Create: `server/src/ws/dashboard.ts` -- Modify: `server/src/index.ts` (wire up WebSocket upgrade with path routing) - -**Step 1: Create sessions.ts (in-memory session manager)** - -```typescript -// server/src/ws/sessions.ts - -import type { ServerWebSocket } from "bun"; -import type { DeviceInfo } from "@droidclaw/shared"; - -export interface ConnectedDevice { - deviceId: string; - userId: string; - ws: ServerWebSocket; - deviceInfo?: DeviceInfo; - connectedAt: Date; -} - -export interface DashboardSubscriber { - userId: string; - ws: ServerWebSocket; -} - -export interface WebSocketData { - path: string; // "/ws/device" or "/ws/dashboard" - userId?: string; - deviceId?: string; - authenticated: boolean; -} - -// request/response tracking for command-response pattern -export interface PendingRequest { - resolve: (value: unknown) => void; - reject: (reason: Error) => void; - timer: ReturnType; -} - -class SessionManager { - // deviceId -> ConnectedDevice - devices = new Map(); - // userId -> deviceId[] (one user can have multiple devices) - userDevices = new Map>(); - // userId -> DashboardSubscriber[] - dashboardSubscribers = new Map(); - // requestId -> PendingRequest (for command-response pattern) - pendingRequests = new Map(); - - addDevice(device: ConnectedDevice) { - this.devices.set(device.deviceId, device); - const userDevs = this.userDevices.get(device.userId) ?? new Set(); - userDevs.add(device.deviceId); - this.userDevices.set(device.userId, userDevs); - } - - removeDevice(deviceId: string) { - const device = this.devices.get(deviceId); - if (device) { - this.devices.delete(deviceId); - const userDevs = this.userDevices.get(device.userId); - if (userDevs) { - userDevs.delete(deviceId); - if (userDevs.size === 0) this.userDevices.delete(device.userId); - } - } - } - - getDevice(deviceId: string): ConnectedDevice | undefined { - return this.devices.get(deviceId); - } - - getDevicesForUser(userId: string): ConnectedDevice[] { - const deviceIds = this.userDevices.get(userId); - if (!deviceIds) return []; - return [...deviceIds] - .map((id) => this.devices.get(id)) - .filter((d): d is ConnectedDevice => d !== undefined); - } - - addDashboardSubscriber(sub: DashboardSubscriber) { - const subs = this.dashboardSubscribers.get(sub.userId) ?? []; - subs.push(sub); - this.dashboardSubscribers.set(sub.userId, subs); - } - - removeDashboardSubscriber(ws: ServerWebSocket) { - for (const [userId, subs] of this.dashboardSubscribers) { - const filtered = subs.filter((s) => s.ws !== ws); - if (filtered.length === 0) { - this.dashboardSubscribers.delete(userId); - } else { - this.dashboardSubscribers.set(userId, filtered); - } - } - } - - // send message to all dashboard subscribers for a user - notifyDashboard(userId: string, message: object) { - const subs = this.dashboardSubscribers.get(userId); - if (!subs) return; - const data = JSON.stringify(message); - for (const sub of subs) { - sub.ws.send(data); - } - } - - // send command to device, return promise that resolves when device responds - sendCommand(deviceId: string, command: object, timeout = 15_000): Promise { - const device = this.devices.get(deviceId); - if (!device) return Promise.reject(new Error("device not connected")); - - const requestId = crypto.randomUUID(); - const commandWithId = { ...command, requestId }; - - return new Promise((resolve, reject) => { - const timer = setTimeout(() => { - this.pendingRequests.delete(requestId); - reject(new Error(`command timeout: ${JSON.stringify(command)}`)); - }, timeout); - - this.pendingRequests.set(requestId, { resolve, reject, timer }); - device.ws.send(JSON.stringify(commandWithId)); - }); - } - - // resolve a pending request (called when device sends a response) - resolveRequest(requestId: string, data: unknown) { - const pending = this.pendingRequests.get(requestId); - if (pending) { - clearTimeout(pending.timer); - this.pendingRequests.delete(requestId); - pending.resolve(data); - } - } -} - -export const sessions = new SessionManager(); -``` - -**Step 2: Create device.ts (device WebSocket handler)** - -```typescript -// server/src/ws/device.ts - -import type { ServerWebSocket } from "bun"; -import { auth } from "../auth.js"; -import { sessions, type WebSocketData } from "./sessions.js"; -import type { DeviceMessage } from "@droidclaw/shared"; - -export async function handleDeviceMessage( - ws: ServerWebSocket, - raw: string -) { - let msg: DeviceMessage; - try { - msg = JSON.parse(raw); - } catch { - ws.send(JSON.stringify({ type: "error", message: "invalid JSON" })); - return; - } - - // handle auth handshake - if (msg.type === "auth") { - try { - const result = await auth.api.verifyApiKey({ - body: { key: msg.apiKey }, - }); - - if (!result || !result.valid || !result.key) { - ws.send(JSON.stringify({ type: "auth_error", message: "invalid API key" })); - ws.close(); - return; - } - - const deviceId = crypto.randomUUID(); - ws.data.userId = result.key.userId; - ws.data.deviceId = deviceId; - ws.data.authenticated = true; - - sessions.addDevice({ - deviceId, - userId: result.key.userId, - ws, - deviceInfo: msg.deviceInfo, - connectedAt: new Date(), - }); - - ws.send(JSON.stringify({ type: "auth_ok", deviceId })); - - // notify dashboard subscribers - sessions.notifyDashboard(result.key.userId, { - type: "device_online", - deviceId, - name: msg.deviceInfo?.model ?? "Unknown Device", - }); - - console.log(`Device ${deviceId} connected for user ${result.key.userId}`); - } catch (e) { - ws.send( - JSON.stringify({ - type: "auth_error", - message: e instanceof Error ? e.message : "auth failed", - }) - ); - ws.close(); - } - return; - } - - // all other messages require authentication - if (!ws.data.authenticated) { - ws.send(JSON.stringify({ type: "auth_error", message: "not authenticated" })); - ws.close(); - return; - } - - switch (msg.type) { - case "screen": - case "result": - // resolve the pending command request - sessions.resolveRequest(msg.requestId, msg); - break; - - case "goal": - // device-initiated goal — will be handled by agent loop (Task 6) - // for now, acknowledge - console.log(`Goal from device ${ws.data.deviceId}: ${msg.text}`); - break; - - case "pong": - // heartbeat response — device is alive - break; - } -} - -export function handleDeviceClose(ws: ServerWebSocket) { - if (ws.data.deviceId && ws.data.userId) { - sessions.removeDevice(ws.data.deviceId); - sessions.notifyDashboard(ws.data.userId, { - type: "device_offline", - deviceId: ws.data.deviceId, - }); - console.log(`Device ${ws.data.deviceId} disconnected`); - } -} -``` - -**Step 3: Create dashboard.ts (dashboard WebSocket handler)** - -```typescript -// server/src/ws/dashboard.ts - -import type { ServerWebSocket } from "bun"; -import { auth } from "../auth.js"; -import { sessions, type WebSocketData } from "./sessions.js"; - -export async function handleDashboardMessage( - ws: ServerWebSocket, - raw: string -) { - let msg: { type: string; [key: string]: unknown }; - try { - msg = JSON.parse(raw); - } catch { - ws.send(JSON.stringify({ type: "error", message: "invalid JSON" })); - return; - } - - // auth via session token (sent as first message) - if (msg.type === "auth") { - try { - const token = msg.token as string; - const session = await auth.api.getSession({ - headers: new Headers({ Authorization: `Bearer ${token}` }), - }); - - if (!session) { - ws.send(JSON.stringify({ type: "auth_error", message: "invalid session" })); - ws.close(); - return; - } - - ws.data.userId = session.user.id; - ws.data.authenticated = true; - - sessions.addDashboardSubscriber({ - userId: session.user.id, - ws, - }); - - ws.send(JSON.stringify({ type: "auth_ok" })); - - // send current device list - const devices = sessions.getDevicesForUser(session.user.id); - for (const device of devices) { - ws.send( - JSON.stringify({ - type: "device_online", - deviceId: device.deviceId, - name: device.deviceInfo?.model ?? "Unknown Device", - }) - ); - } - } catch { - ws.send(JSON.stringify({ type: "auth_error", message: "auth failed" })); - ws.close(); - } - return; - } - - if (!ws.data.authenticated) { - ws.send(JSON.stringify({ type: "auth_error", message: "not authenticated" })); - return; - } - - // dashboard messages handled here (e.g., goal submission via WebSocket) - // REST endpoint POST /goals is the primary way — this is a secondary path -} - -export function handleDashboardClose(ws: ServerWebSocket) { - sessions.removeDashboardSubscriber(ws); -} -``` - -**Step 4: Update index.ts with WebSocket upgrade routing** - -Replace the placeholder websocket handlers in `server/src/index.ts`: - -```typescript -// server/src/index.ts - -import { Hono } from "hono"; -import { cors } from "hono/cors"; -import { auth } from "./auth.js"; -import { env } from "./env.js"; -import { handleDeviceMessage, handleDeviceClose } from "./ws/device.js"; -import { handleDashboardMessage, handleDashboardClose } from "./ws/dashboard.js"; -import type { WebSocketData } from "./ws/sessions.js"; - -const app = new Hono(); - -app.use( - "*", - cors({ - origin: env.CORS_ORIGIN, - allowHeaders: ["Content-Type", "Authorization"], - allowMethods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"], - credentials: true, - }) -); - -app.on(["POST", "GET"], "/api/auth/*", (c) => { - return auth.handler(c.req.raw); -}); - -app.get("/health", (c) => c.json({ status: "ok" })); - -const server = Bun.serve({ - port: env.PORT, - fetch(req, server) { - const url = new URL(req.url); - - // WebSocket upgrade for device connections - if (url.pathname === "/ws/device") { - const upgraded = server.upgrade(req, { - data: { path: "/ws/device", authenticated: false }, - }); - if (upgraded) return undefined; - return new Response("WebSocket upgrade failed", { status: 400 }); - } - - // WebSocket upgrade for dashboard connections - if (url.pathname === "/ws/dashboard") { - const upgraded = server.upgrade(req, { - data: { path: "/ws/dashboard", authenticated: false }, - }); - if (upgraded) return undefined; - return new Response("WebSocket upgrade failed", { status: 400 }); - } - - // all other requests go to Hono - return app.fetch(req); - }, - websocket: { - open(ws) { - console.log(`WebSocket opened: ${ws.data.path}`); - }, - message(ws, message) { - const raw = typeof message === "string" ? message : new TextDecoder().decode(message); - if (ws.data.path === "/ws/device") { - handleDeviceMessage(ws, raw); - } else if (ws.data.path === "/ws/dashboard") { - handleDashboardMessage(ws, raw); - } - }, - close(ws) { - if (ws.data.path === "/ws/device") { - handleDeviceClose(ws); - } else if (ws.data.path === "/ws/dashboard") { - handleDashboardClose(ws); - } - }, - }, -}); - -console.log(`Server running on port ${server.port}`); -``` - -**Step 5: Verify typecheck** - -```bash -cd server && bun run typecheck -``` - -**Step 6: Commit** - -```bash -git add server/src/ws/ server/src/index.ts -git commit -m "feat: add WebSocket handlers for device and dashboard connections" -``` - ---- - -## Task 5: Hono REST Routes - -**Files:** -- Create: `server/src/routes/devices.ts` -- Create: `server/src/routes/goals.ts` -- Create: `server/src/routes/health.ts` -- Modify: `server/src/index.ts` (mount routes) - -**Step 1: Create session middleware for REST routes** - -```typescript -// server/src/middleware/auth.ts - -import type { Context, Next } from "hono"; -import { auth } from "../auth.js"; - -export async function sessionMiddleware(c: Context, next: Next) { - const session = await auth.api.getSession({ - headers: c.req.raw.headers, - }); - - if (!session) { - return c.json({ error: "unauthorized" }, 401); - } - - c.set("user", session.user); - c.set("session", session.session); - await next(); -} -``` - -**Step 2: Create devices route** - -```typescript -// server/src/routes/devices.ts - -import { Hono } from "hono"; -import { sessionMiddleware } from "../middleware/auth.js"; -import { sessions } from "../ws/sessions.js"; - -const devices = new Hono(); - -devices.use("*", sessionMiddleware); - -// list connected devices for the authenticated user -devices.get("/", (c) => { - const user = c.get("user"); - const userDevices = sessions.getDevicesForUser(user.id); - - return c.json( - userDevices.map((d) => ({ - deviceId: d.deviceId, - name: d.deviceInfo?.model ?? "Unknown Device", - deviceInfo: d.deviceInfo, - connectedAt: d.connectedAt.toISOString(), - })) - ); -}); - -export { devices }; -``` - -**Step 3: Create goals route** - -```typescript -// server/src/routes/goals.ts - -import { Hono } from "hono"; -import { sessionMiddleware } from "../middleware/auth.js"; -import { sessions } from "../ws/sessions.js"; - -const goals = new Hono(); - -goals.use("*", sessionMiddleware); - -// submit a goal for a connected device -goals.post("/", async (c) => { - const user = c.get("user"); - const body = await c.req.json<{ deviceId: string; goal: string }>(); - - if (!body.deviceId || !body.goal) { - return c.json({ error: "deviceId and goal are required" }, 400); - } - - const device = sessions.getDevice(body.deviceId); - if (!device) { - return c.json({ error: "device not connected" }, 404); - } - - if (device.userId !== user.id) { - return c.json({ error: "device does not belong to you" }, 403); - } - - // TODO (Task 6): start agent loop for this device+goal - // For now, acknowledge the goal - const sessionId = crypto.randomUUID(); - - return c.json({ - sessionId, - deviceId: body.deviceId, - goal: body.goal, - status: "queued", - }); -}); - -export { goals }; -``` - -**Step 4: Extract health route** - -```typescript -// server/src/routes/health.ts - -import { Hono } from "hono"; -import { sessions } from "../ws/sessions.js"; - -const health = new Hono(); - -health.get("/", (c) => { - return c.json({ - status: "ok", - connectedDevices: sessions.devices.size, - }); -}); - -export { health }; -``` - -**Step 5: Mount routes in index.ts** - -Add to `server/src/index.ts` after the CORS middleware, replacing the inline health check: - -```typescript -import { devices } from "./routes/devices.js"; -import { goals } from "./routes/goals.js"; -import { health } from "./routes/health.js"; - -// ... after CORS and auth handler ... - -app.route("/devices", devices); -app.route("/goals", goals); -app.route("/health", health); -``` - -Remove the old inline `app.get("/health", ...)`. - -**Step 6: Verify typecheck and test** - -```bash -cd server && bun run typecheck -``` - -**Step 7: Commit** - -```bash -git add server/src/routes/ server/src/middleware/ server/src/index.ts -git commit -m "feat: add REST routes for devices, goals, and health" -``` - ---- - -## Task 6: Agent Loop (Server-Side) - -**Files:** -- Create: `server/src/agent/loop.ts` -- Create: `server/src/agent/llm.ts` -- Create: `server/src/agent/stuck.ts` -- Modify: `server/src/routes/goals.ts` (wire up agent loop) -- Modify: `server/src/ws/device.ts` (handle device-initiated goals) - -This is the biggest task. It adapts the existing `src/kernel.ts` logic to work over WebSocket instead of ADB. - -**Step 1: Create llm.ts** - -Adapt `src/llm-providers.ts` — same LLM provider factory, but reads config from the user's `llm_config` DB row instead of env vars. - -```typescript -// server/src/agent/llm.ts - -// This file adapts src/llm-providers.ts to work with per-user LLM config. -// The SYSTEM_PROMPT, provider factory, and response parsing all come from -// the existing codebase. Key differences: -// - Config comes from DB (llm_config table) not env vars -// - Same LLMProvider interface -// - Same parseJsonResponse() logic - -// Import the SYSTEM_PROMPT and provider logic from existing src/ -// OR copy and adapt the relevant portions. -// The exact approach depends on whether we want to share code via -// packages/shared or duplicate for server independence. - -// For v1: duplicate the SYSTEM_PROMPT and provider factory here. -// The prompt is ~200 lines and changes rarely. Duplication is acceptable -// for deployment independence (server deploys without src/). - -export interface LLMConfig { - provider: string; // openai | groq | ollama | bedrock | openrouter - apiKey: string; - model?: string; -} - -export interface LLMProvider { - getAction( - systemPrompt: string, - userPrompt: string, - imageBase64?: string - ): Promise; -} - -export function getLlmProvider(config: LLMConfig): LLMProvider { - // Adapt from src/llm-providers.ts - // Each provider uses config.apiKey and config.model - // instead of reading from process.env - throw new Error("TODO: adapt from src/llm-providers.ts"); -} - -export function parseJsonResponse(raw: string): Record | null { - // Same logic as src/llm-providers.ts parseJsonResponse() - // Handle clean JSON and markdown-wrapped code blocks - const cleaned = raw.replace(/```(?:json)?\s*/g, "").replace(/```/g, "").trim(); - try { - return JSON.parse(cleaned); - } catch { - return null; - } -} -``` - -> **Note for implementer:** Copy the SYSTEM_PROMPT, provider implementations (OpenAI, Groq, etc.), and parseJsonResponse from `src/llm-providers.ts`. Adapt each provider constructor to accept `LLMConfig` instead of reading env vars. The core logic is identical. - -**Step 2: Create stuck.ts** - -```typescript -// server/src/agent/stuck.ts - -// Adapted from kernel.ts stuck-loop detection. -// Same algorithm: track recent actions in a sliding window, -// detect repetition, inject recovery hints. - -export interface StuckDetector { - recordAction(action: string, screenHash: string): void; - isStuck(): boolean; - getRecoveryHint(): string; - reset(): void; -} - -export function createStuckDetector(windowSize: number = 5): StuckDetector { - const recentActions: string[] = []; - const recentHashes: string[] = []; - - return { - recordAction(action: string, screenHash: string) { - recentActions.push(action); - recentHashes.push(screenHash); - if (recentActions.length > windowSize) recentActions.shift(); - if (recentHashes.length > windowSize) recentHashes.shift(); - }, - - isStuck(): boolean { - if (recentActions.length < 3) return false; - // all recent actions are the same - const allSame = recentActions.every((a) => a === recentActions[0]); - // all recent screen hashes are the same - const allSameHash = recentHashes.every((h) => h === recentHashes[0]); - return allSame || allSameHash; - }, - - getRecoveryHint(): string { - return ( - "STUCK DETECTED: You have been repeating the same action or seeing the same screen. " + - "Try a completely different approach: scroll to find new elements, go back, " + - "use the home button, or try a different app." - ); - }, - - reset() { - recentActions.length = 0; - recentHashes.length = 0; - }, - }; -} -``` - -**Step 3: Create loop.ts (main agent loop)** - -```typescript -// server/src/agent/loop.ts - -import { sessions } from "../ws/sessions.js"; -import { getLlmProvider, parseJsonResponse, type LLMConfig } from "./llm.js"; -import { createStuckDetector } from "./stuck.js"; -import type { UIElement, ScreenState, ActionDecision } from "@droidclaw/shared"; - -export interface AgentLoopOptions { - deviceId: string; - userId: string; - goal: string; - llmConfig: LLMConfig; - maxSteps?: number; - onStep?: (step: AgentStep) => void; - onComplete?: (result: AgentResult) => void; -} - -export interface AgentStep { - stepNumber: number; - action: ActionDecision; - reasoning: string; - screenHash: string; -} - -export interface AgentResult { - success: boolean; - stepsUsed: number; - sessionId: string; -} - -function computeScreenHash(elements: UIElement[]): string { - const parts = elements.map( - (e) => `${e.id}|${e.text}|${e.center[0]},${e.center[1]}|${e.enabled}|${e.checked}` - ); - return parts.join(";"); -} - -function actionToCommand(action: ActionDecision): object { - switch (action.action) { - case "tap": - return { type: "tap", x: action.coordinates?.[0], y: action.coordinates?.[1] }; - case "type": - return { type: "type", text: action.text }; - case "enter": - return { type: "enter" }; - case "back": - return { type: "back" }; - case "home": - return { type: "home" }; - case "swipe": - case "scroll": - return { type: "swipe", x1: action.coordinates?.[0], y1: action.coordinates?.[1], x2: 540, y2: 400 }; - case "longpress": - return { type: "longpress", x: action.coordinates?.[0], y: action.coordinates?.[1] }; - case "launch": - return { type: "launch", packageName: action.package }; - case "clear": - return { type: "clear" }; - case "clipboard_set": - return { type: "clipboard_set", text: action.text }; - case "clipboard_get": - return { type: "clipboard_get" }; - case "paste": - return { type: "paste" }; - case "open_url": - return { type: "open_url", url: action.url }; - case "switch_app": - return { type: "switch_app", packageName: action.package }; - case "notifications": - return { type: "notifications" }; - case "keyevent": - return { type: "keyevent", code: action.code }; - case "open_settings": - return { type: "open_settings" }; - case "wait": - return { type: "wait", duration: 2000 }; - case "done": - return { type: "done" }; - default: - return { type: action.action }; - } -} - -export async function runAgentLoop(options: AgentLoopOptions): Promise { - const { - deviceId, - userId, - goal, - llmConfig, - maxSteps = 30, - onStep, - onComplete, - } = options; - - const sessionId = crypto.randomUUID(); - const llm = getLlmProvider(llmConfig); - const stuck = createStuckDetector(); - let lastScreenHash = ""; - - // notify dashboard - sessions.notifyDashboard(userId, { - type: "goal_started", - sessionId, - goal, - deviceId, - }); - - let stepsUsed = 0; - let success = false; - - try { - for (let step = 0; step < maxSteps; step++) { - stepsUsed = step + 1; - - // 1. Get screen state from device - const screenResponse = (await sessions.sendCommand(deviceId, { - type: "get_screen", - })) as ScreenState & { type: string; requestId: string }; - - const elements = screenResponse.elements ?? []; - const screenHash = computeScreenHash(elements); - const screenshot = screenResponse.screenshot; - - // 2. Build prompt - let userPrompt = `GOAL: ${goal}\n\nSTEP: ${step + 1}/${maxSteps}\n\n`; - userPrompt += `SCREEN ELEMENTS:\n${JSON.stringify(elements, null, 2)}\n\n`; - - if (screenHash === lastScreenHash) { - userPrompt += "NOTE: Screen has not changed since last action.\n\n"; - } - - if (stuck.isStuck()) { - userPrompt += stuck.getRecoveryHint() + "\n\n"; - } - - lastScreenHash = screenHash; - - // 3. Call LLM - // TODO: use the actual SYSTEM_PROMPT from llm.ts once adapted - const rawResponse = await llm.getAction( - "You are a phone automation agent...", // placeholder - userPrompt, - elements.length < 3 ? screenshot : undefined - ); - - // 4. Parse response - const parsed = parseJsonResponse(rawResponse); - if (!parsed || !parsed.action) { - stuck.recordAction("parse_error", screenHash); - continue; - } - - const action = parsed as unknown as ActionDecision; - stuck.recordAction(action.action, screenHash); - - // 5. Check for "done" - if (action.action === "done") { - success = true; - break; - } - - // 6. Report step to dashboard - const stepData: AgentStep = { - stepNumber: step + 1, - action, - reasoning: action.reason ?? "", - screenHash, - }; - onStep?.(stepData); - sessions.notifyDashboard(userId, { - type: "step", - sessionId, - step: step + 1, - action, - reasoning: action.reason ?? "", - screenHash, - }); - - // 7. Execute action on device - const command = actionToCommand(action); - await sessions.sendCommand(deviceId, command); - - // 8. Brief pause between steps - await new Promise((r) => setTimeout(r, 500)); - } - } catch (error) { - console.error(`Agent loop error: ${error}`); - } - - const result: AgentResult = { success, stepsUsed, sessionId }; - - // notify dashboard - sessions.notifyDashboard(userId, { - type: "goal_completed", - sessionId, - success, - stepsUsed, - }); - - onComplete?.(result); - return result; -} -``` - -**Step 4: Wire up agent loop in goals route** - -Update `server/src/routes/goals.ts` to start the agent loop: - -```typescript -// Replace the TODO in goals.ts POST handler: - -import { runAgentLoop } from "../agent/loop.js"; -import { db } from "../db.js"; - -// Inside the POST handler, after validation: - -// fetch user's LLM config from DB -// TODO: query llm_config table for this user -// For now, return error if not configured -const llmConfig = { provider: "groq", apiKey: "TODO", model: "TODO" }; - -const sessionId = crypto.randomUUID(); - -// start agent loop in background (don't await — it runs async) -runAgentLoop({ - deviceId: body.deviceId, - userId: user.id, - goal: body.goal, - llmConfig, -}).catch((err) => console.error("Agent loop failed:", err)); - -return c.json({ - sessionId, - deviceId: body.deviceId, - goal: body.goal, - status: "running", -}); -``` - -**Step 5: Verify typecheck** - -```bash -cd server && bun run typecheck -``` - -**Step 6: Commit** - -```bash -git add server/src/agent/ -git commit -m "feat: add agent loop with LLM integration and stuck detection" -``` - ---- - -## Task 7: Switch SvelteKit to Node Adapter - -**Files:** -- Modify: `web/package.json` (swap adapter) -- Modify: `web/svelte.config.js` (use node adapter) - -**Step 1: Install node adapter, remove cloudflare adapter** - -```bash -cd web && bun remove @sveltejs/adapter-cloudflare && bun add -D @sveltejs/adapter-node -``` - -**Step 2: Update svelte.config.js** - -```javascript -import adapter from '@sveltejs/adapter-node'; -import { vitePreprocess } from '@sveltejs/vite-plugin-svelte'; - -/** @type {import('@sveltejs/kit').Config} */ -const config = { - preprocess: vitePreprocess(), - kit: { - experimental: { - remoteFunctions: true - }, - adapter: adapter(), - alias: { - '@/*': './src/lib/*' - } - }, - compilerOptions: { - experimental: { - async: true - } - } -}; - -export default config; -``` - -**Step 3: Verify build** - -```bash -cd web && bun run build -``` - -**Step 4: Commit** - -```bash -git add web/package.json web/svelte.config.js web/bun.lock -git commit -m "feat: switch SvelteKit from Cloudflare to node adapter" -``` - ---- - -## Task 8: Dashboard Layout & Navigation - -**Files:** -- Modify: `web/src/routes/+layout.svelte` (add nav) -- Create: `web/src/routes/+layout.server.ts` (load session) -- Modify: `web/src/routes/+page.svelte` (redirect logic) -- Create: `web/src/routes/dashboard/+layout.svelte` (dashboard shell) -- Create: `web/src/routes/dashboard/+layout.server.ts` (auth guard) -- Create: `web/src/routes/dashboard/+page.svelte` (overview) - -**Step 1: Create root layout.server.ts** - -```typescript -// web/src/routes/+layout.server.ts - -import type { LayoutServerLoad } from './$types'; - -export const load: LayoutServerLoad = async ({ locals }) => { - return { - user: locals.user ?? null - }; -}; -``` - -**Step 2: Update root +page.svelte (redirect)** - -```svelte - - -``` - -**Step 3: Create dashboard layout.server.ts (auth guard)** - -```typescript -// web/src/routes/dashboard/+layout.server.ts - -import { redirect } from '@sveltejs/kit'; -import type { LayoutServerLoad } from './$types'; - -export const load: LayoutServerLoad = async ({ locals }) => { - if (!locals.user) { - redirect(307, '/login'); - } - - return { - user: locals.user - }; -}; -``` - -**Step 4: Create dashboard +layout.svelte** - -```svelte - - - -
- - -
- {@render children?.()} -
-
-``` - -**Step 5: Create dashboard overview page** - -```svelte - - - -

Dashboard

-

Welcome back, {data.user.name}.

- - -``` - -**Step 6: Verify dev server** - -```bash -cd web && bun run dev -``` - -Navigate to `http://localhost:5173` — should redirect to `/login` or `/dashboard`. - -**Step 7: Commit** - -```bash -git add web/src/routes/ -git commit -m "feat: add dashboard layout with navigation and auth guard" -``` - ---- - -## Task 9: API Keys Page - -**Files:** -- Create: `web/src/lib/api/api-keys.remote.ts` -- Create: `web/src/lib/schema/api-keys.ts` -- Create: `web/src/routes/dashboard/api-keys/+page.svelte` - -**Step 1: Create Valibot schema** - -```typescript -// web/src/lib/schema/api-keys.ts - -import { object, string, pipe, minLength } from 'valibot'; - -export const createKeySchema = object({ - name: pipe(string(), minLength(1)) -}); -``` - -**Step 2: Create remote functions** - -```typescript -// web/src/lib/api/api-keys.remote.ts - -import { form, query, getRequestEvent } from '$app/server'; -import { auth } from '$lib/server/auth'; -import { createKeySchema } from '$lib/schema/api-keys'; - -export const listKeys = query(async () => { - const { locals } = getRequestEvent(); - if (!locals.user) return []; - - const keys = await auth.api.listApiKeys({ - headers: getRequestEvent().request.headers - }); - - return keys ?? []; -}); - -export const createKey = form(createKeySchema, async (data) => { - const { request } = getRequestEvent(); - - const result = await auth.api.createApiKey({ - body: { - name: data.name, - prefix: 'dc', - expiresIn: undefined, // no expiry by default - remaining: undefined // unlimited - }, - headers: request.headers - }); - - return result; -}); - -export const deleteKey = form(async (formData: FormData) => { - const { request } = getRequestEvent(); - const keyId = formData.get('keyId') as string; - - await auth.api.deleteApiKey({ - body: { keyId }, - headers: request.headers - }); -}); -``` - -**Step 3: Create API Keys page** - -```svelte - - - -

API Keys

- -
-

Create New Key

-
{ - // capture the returned key value after submission - }} - class="flex gap-3" - > - - {#each createKey.fields.name.issues() ?? [] as issue (issue.message)} -

{issue.message}

- {/each} - -
- - {#if newKeyValue} -
-

Copy your API key now. It won't be shown again.

- {newKeyValue} -
- {/if} -
- -
-

Your Keys

- {#if keys.length === 0} -

No API keys yet. Create one to connect your Android device.

- {:else} -
- {#each keys as key (key.id)} -
-
-

{key.name ?? 'Unnamed Key'}

-

- {key.prefix}_{'*'.repeat(20)} · Created {new Date(key.createdAt).toLocaleDateString()} -

-
-
- - -
-
- {/each} -
- {/if} -
-``` - -> **Note for implementer:** The `createKey` remote function returns the full key value only on creation. The `listKeys` response only shows the prefix (hashed key). The page needs to capture the creation response to show the full key once. The exact mechanism depends on how remote functions return data in Svelte 5 async mode — check the existing `auth.remote.ts` pattern and adapt. You may need to use `$effect` or a callback to capture the created key value. - -**Step 4: Verify dev server** - -```bash -cd web && bun run dev -``` - -Navigate to `/dashboard/api-keys`. - -**Step 5: Commit** - -```bash -git add web/src/lib/api/api-keys.remote.ts web/src/lib/schema/api-keys.ts web/src/routes/dashboard/api-keys/ -git commit -m "feat: add API keys management page" -``` - ---- - -## Task 10: Settings Page (LLM Config) - -**Files:** -- Create: `web/src/lib/api/settings.remote.ts` -- Create: `web/src/lib/schema/settings.ts` -- Create: `web/src/routes/dashboard/settings/+page.svelte` - -**Step 1: Create Valibot schema** - -```typescript -// web/src/lib/schema/settings.ts - -import { object, string, pipe, minLength, optional } from 'valibot'; - -export const llmConfigSchema = object({ - provider: pipe(string(), minLength(1)), - apiKey: pipe(string(), minLength(1)), - model: optional(string()) -}); -``` - -**Step 2: Create remote functions** - -```typescript -// web/src/lib/api/settings.remote.ts - -import { form, query, getRequestEvent } from '$app/server'; -import { db } from '$lib/server/db'; -import { llmConfig } from '$lib/server/db/schema'; -import { eq } from 'drizzle-orm'; -import { llmConfigSchema } from '$lib/schema/settings'; - -export const getConfig = query(async () => { - const { locals } = getRequestEvent(); - if (!locals.user) return null; - - const config = await db - .select() - .from(llmConfig) - .where(eq(llmConfig.userId, locals.user.id)) - .limit(1); - - if (config.length === 0) return null; - - // mask the API key for display - return { - ...config[0], - apiKey: config[0].apiKey.slice(0, 8) + '...' + config[0].apiKey.slice(-4) - }; -}); - -export const updateConfig = form(llmConfigSchema, async (data) => { - const { locals } = getRequestEvent(); - if (!locals.user) return; - - const existing = await db - .select() - .from(llmConfig) - .where(eq(llmConfig.userId, locals.user.id)) - .limit(1); - - if (existing.length > 0) { - await db - .update(llmConfig) - .set({ - provider: data.provider, - apiKey: data.apiKey, - model: data.model ?? null - }) - .where(eq(llmConfig.userId, locals.user.id)); - } else { - await db.insert(llmConfig).values({ - id: crypto.randomUUID(), - userId: locals.user.id, - provider: data.provider, - apiKey: data.apiKey, - model: data.model ?? null - }); - } -}); -``` - -**Step 3: Create Settings page** - -```svelte - - - -

Settings

- -
-

LLM Provider

- -
- - - - - - - -
- - {#if config} -

- Current: {config.provider} · Key: {config.apiKey} - {#if config.model}· Model: {config.model}{/if} -

- {/if} -
-``` - -**Step 4: Verify dev server** - -```bash -cd web && bun run dev -``` - -Navigate to `/dashboard/settings`. - -**Step 5: Commit** - -```bash -git add web/src/lib/api/settings.remote.ts web/src/lib/schema/settings.ts web/src/routes/dashboard/settings/ -git commit -m "feat: add LLM provider settings page" -``` - ---- - -## Task 11: Devices Page - -**Files:** -- Create: `web/src/lib/api/devices.remote.ts` -- Create: `web/src/routes/dashboard/devices/+page.svelte` -- Create: `web/src/routes/dashboard/devices/[deviceId]/+page.svelte` - -**Step 1: Create remote functions** - -```typescript -// web/src/lib/api/devices.remote.ts - -import { query, getRequestEvent } from '$app/server'; -import { env } from '$env/dynamic/private'; - -const SERVER_URL = env.SERVER_URL || 'http://localhost:8080'; - -export const listDevices = query(async () => { - const { request } = getRequestEvent(); - - const res = await fetch(`${SERVER_URL}/devices`, { - headers: { - cookie: request.headers.get('cookie') ?? '' - } - }); - - if (!res.ok) return []; - return res.json(); -}); -``` - -> **Note for implementer:** The dashboard calls the Hono server's `/devices` endpoint. In production on Railway, `SERVER_URL` points to the Hono server's internal URL. The session cookie is forwarded so Hono can verify the user. You may need to adjust the Hono session middleware to accept forwarded cookies from SvelteKit. - -**Step 2: Create devices list page** - -```svelte - - - -

Devices

- -{#if devices.length === 0} -
-

No devices connected.

-

- Install the Android app, paste your API key, and your device will appear here. -

- - Create an API key - -
-{:else} -
- {#each devices as device (device.deviceId)} - -
-

{device.name}

-

- Connected {new Date(device.connectedAt).toLocaleString()} -

-
- -
- {/each} -
-{/if} -``` - -**Step 3: Create device detail page (goal input + live logs)** - -```svelte - - - -

Device: {deviceId.slice(0, 8)}...

- -
-
- -
- - -
-
- - {#if steps.length > 0} -

Steps

-
- {#each steps as step (step.step)} -
-

Step {step.step}: {step.action}

-

{step.reasoning}

-
- {/each} -
- {/if} - - {#if status === 'completed'} -

Goal completed successfully.

- {:else if status === 'failed'} -

Goal failed.

- {/if} -
-``` - -> **Note for implementer:** The device detail page needs a SvelteKit API route (`/api/goals`) that proxies to the Hono server, or it can call the Hono server directly via `PUBLIC_SERVER_URL`. The live step stream requires a WebSocket connection from the browser to Hono's `/ws/dashboard` endpoint. Implement the WebSocket connection in a `$effect` block that connects when the page mounts and disconnects on unmount. - -**Step 4: Add `SERVER_URL` and `PUBLIC_SERVER_WS_URL` to web/.env.example** - -Append to `web/.env.example`: - -``` -SERVER_URL="http://localhost:8080" -PUBLIC_SERVER_WS_URL="ws://localhost:8080" -``` - -**Step 5: Commit** - -```bash -git add web/src/lib/api/devices.remote.ts web/src/routes/dashboard/devices/ web/.env.example -git commit -m "feat: add devices page with goal input and step log" -``` - ---- - -## Task 12: Wire Up Goal Proxy API Route - -**Files:** -- Create: `web/src/routes/api/goals/+server.ts` - -The device detail page needs to POST goals to the Hono server. Create a SvelteKit API route that proxies the request. - -**Step 1: Create the API route** - -```typescript -// web/src/routes/api/goals/+server.ts - -import { json, error } from '@sveltejs/kit'; -import { env } from '$env/dynamic/private'; -import type { RequestHandler } from './$types'; - -const SERVER_URL = env.SERVER_URL || 'http://localhost:8080'; - -export const POST: RequestHandler = async ({ request, locals }) => { - if (!locals.user) { - return error(401, 'Unauthorized'); - } - - const body = await request.json(); - - const res = await fetch(`${SERVER_URL}/goals`, { - method: 'POST', - headers: { - 'Content-Type': 'application/json', - cookie: request.headers.get('cookie') ?? '' - }, - body: JSON.stringify(body) - }); - - if (!res.ok) { - const err = await res.json().catch(() => ({ error: 'Unknown error' })); - return error(res.status, err.error ?? 'Failed to submit goal'); - } - - return json(await res.json()); -}; -``` - -**Step 2: Commit** - -```bash -git add web/src/routes/api/goals/ -git commit -m "feat: add goal proxy API route" -``` - ---- - -## Task 13: Environment & Dockerfiles - -**Files:** -- Create: `web/Dockerfile` -- Modify: `server/.env.example` (finalize) -- Modify: `web/.env.example` (finalize) - -**Step 1: Create web Dockerfile** - -```dockerfile -FROM oven/bun:1 AS builder - -WORKDIR /app -COPY web/package.json web/bun.lock ./ -RUN bun install --frozen-lockfile - -COPY web/ . -RUN bun run build - -FROM oven/bun:1 - -WORKDIR /app -COPY --from=builder /app/build ./build -COPY --from=builder /app/package.json . -COPY --from=builder /app/node_modules ./node_modules - -EXPOSE 3000 -ENV PORT=3000 -CMD ["bun", "build/index.js"] -``` - -**Step 2: Finalize server .env.example** - -``` -DATABASE_URL="postgres://user:password@host:port/db-name" -PORT=8080 -CORS_ORIGIN="http://localhost:5173" -``` - -**Step 3: Finalize web .env.example** - -``` -DATABASE_URL="postgres://user:password@host:port/db-name" -SERVER_URL="http://localhost:8080" -PUBLIC_SERVER_WS_URL="ws://localhost:8080" -``` - -**Step 4: Commit** - -```bash -git add web/Dockerfile server/.env.example web/.env.example -git commit -m "feat: add Dockerfiles and finalize env examples" -``` - ---- - -## Task 14: Integration Smoke Test - -**No new files. Manual verification.** - -**Step 1: Start Postgres (local or Railway)** - -Ensure `DATABASE_URL` is set in both `web/.env` and `server/.env`. - -**Step 2: Run migrations** - -```bash -cd web && bun run db:push -``` - -**Step 3: Start both servers** - -```bash -# terminal 1 -cd web && bun run dev - -# terminal 2 -cd server && bun run dev -``` - -**Step 4: Manual test flow** - -1. Open `http://localhost:5173` -> redirects to `/login` -2. Sign up with email/password -3. Redirected to `/dashboard` -4. Go to `/dashboard/api-keys` -> create a key named "Test Device" -5. Copy the key -6. Go to `/dashboard/settings` -> set provider to "groq", enter API key, model -7. Go to `/dashboard/devices` -> shows "No devices connected" -8. Test Hono health: `curl http://localhost:8080/health` -> `{"status":"ok","connectedDevices":0}` -9. Test WebSocket auth (using wscat or similar): - ```bash - wscat -c ws://localhost:8080/ws/device - # send: {"type":"auth","apiKey":"dc_your_key_here"} - # expect: {"type":"auth_ok","deviceId":"..."} - ``` -10. Dashboard devices page should now show the connected device - -**Step 5: Commit any fixes from smoke test** - -```bash -git add -A && git commit -m "fix: address issues found during integration smoke test" -``` - ---- - -## Future: Android App (Task 15+) - -Not building now. The server is ready for Android connections via: -- `ws://server:8080/ws/device` — WebSocket endpoint -- API key auth on handshake -- Full command protocol defined in `@droidclaw/shared` - -When building the Android app, follow the structure in `OPTION1-IMPLEMENTATION.md` and the `android/` plan in the design doc. The Kotlin data classes should mirror `@droidclaw/shared` types. diff --git a/docs/plans/2026-02-17-option1-web-backend-design.md b/docs/plans/2026-02-17-option1-web-backend-design.md deleted file mode 100644 index 32636ea..0000000 --- a/docs/plans/2026-02-17-option1-web-backend-design.md +++ /dev/null @@ -1,357 +0,0 @@ -# Option 1: Web Dashboard + Backend Design - -> Date: 2026-02-17 -> Status: Approved -> Scope: Web (SvelteKit) + Backend (Hono.js) + Android app plan - ---- - -## Decisions - -- **Monorepo**: `web/` (SvelteKit dashboard) + `server/` (Hono.js backend) + `android/` (future) -- **Separate Hono server** for WebSocket + agent loop (independent lifecycle from dashboard) -- **SvelteKit** with node adapter for dashboard (deploy to Railway) -- **Multiple API keys** per user with labels (Better Auth apiKey plugin) -- **LLM config on dashboard only** (BYOK -- user provides their own API keys) -- **Goals sent from both** web dashboard and Android app -- **Dashboard v1**: API keys, LLM config, connected devices, goal input, step logs -- **Server runs the agent loop** (phone is eyes + hands) -- **Shared Postgres** on Railway (both services connect to same DB) -- **Build order**: web + server first, Android later - ---- - -## Monorepo Structure - -``` -droidclaw/ -├── src/ # existing CLI agent (kernel.ts, actions.ts, etc.) -├── web/ # SvelteKit dashboard (existing, extend) -├── server/ # Hono.js backend (WebSocket + agent loop) -├── android/ # Kotlin companion app (future) -├── packages/shared/ # shared TypeScript types -├── package.json # root -└── CLAUDE.md -``` - ---- - -## Auth & API Key System - -Both apps share the same Postgres DB and the same Better Auth tables. - -SvelteKit handles user-facing auth (login, signup, sessions). Hono verifies API keys from Android devices. - -### Better Auth Config - -Both apps use Better Auth with the `apiKey` plugin. SvelteKit adds `sveltekitCookies`, Hono adds session middleware. - -```typescript -// shared pattern -plugins: [ - apiKey() // built-in API key plugin -] -``` - -### Flow - -1. User signs up/logs in on SvelteKit dashboard (existing) -2. Dashboard "API Keys" page -- user creates keys with labels (e.g., "Pixel 8", "Work Phone") -3. Better Auth's apiKey plugin handles create/list/delete -4. User copies key, pastes into Android app SharedPreferences -5. Android app connects to Hono server via WebSocket, sends API key in handshake -6. Hono calls `auth.api.verifyApiKey({ body: { key } })` -- if valid, establishes device session -7. Dashboard WebSocket connections use session cookies (user already logged in) - -### Database Schema - -Better Auth manages: `user`, `session`, `account`, `verification`, `api_key` - -Additional tables (Drizzle): - -``` -llm_config - - id: text PK - - userId: text FK -> user.id - - provider: text (openai | groq | ollama | bedrock | openrouter) - - apiKey: text (encrypted) - - model: text - - createdAt: timestamp - - updatedAt: timestamp - -device - - id: text PK - - userId: text FK -> user.id - - name: text - - lastSeen: timestamp - - status: text (online | offline) - - deviceInfo: jsonb (model, androidVersion, screenWidth, screenHeight) - - createdAt: timestamp - -agent_session - - id: text PK - - userId: text FK -> user.id - - deviceId: text FK -> device.id - - goal: text - - status: text (running | completed | failed | cancelled) - - stepsUsed: integer - - startedAt: timestamp - - completedAt: timestamp - -agent_step - - id: text PK - - sessionId: text FK -> agent_session.id - - stepNumber: integer - - screenHash: text - - action: jsonb - - reasoning: text - - result: text - - timestamp: timestamp -``` - ---- - -## Hono Server Architecture (`server/`) - -``` -server/ -├── src/ -│ ├── index.ts # Hono app + Bun.serve with WebSocket upgrade -│ ├── auth.ts # Better Auth instance (same DB, apiKey plugin) -│ ├── middleware/ -│ │ ├── auth.ts # Session middleware (dashboard WebSocket) -│ │ └── api-key.ts # API key verification (Android WebSocket) -│ ├── ws/ -│ │ ├── device.ts # WebSocket handler for Android devices -│ │ ├── dashboard.ts # WebSocket handler for web dashboard (live logs) -│ │ └── sessions.ts # In-memory session manager (connected devices + active loops) -│ ├── agent/ -│ │ ├── loop.ts # Agent loop (adapted from kernel.ts) -│ │ ├── llm.ts # LLM provider factory (adapted from llm-providers.ts) -│ │ ├── stuck.ts # Stuck-loop detection -│ │ └── skills.ts # Multi-step skills (adapted from skills.ts) -│ ├── routes/ -│ │ ├── devices.ts # GET /devices -│ │ ├── goals.ts # POST /goals -│ │ └── health.ts # GET /health -│ ├── db.ts # Drizzle instance (same Postgres) -│ └── env.ts # Environment config -├── package.json -├── tsconfig.json -└── Dockerfile -``` - -### Key Design Points - -1. **Bun.serve() with WebSocket upgrade** -- Hono handles HTTP, Bun native WebSocket handles upgrades. No extra WS library. - -2. **Two WebSocket paths:** - - `/ws/device` -- Android app connects with API key - - `/ws/dashboard` -- Web dashboard connects with session cookie - -3. **sessions.ts** -- In-memory map tracking connected devices, active agent loops, dashboard subscribers. - -4. **Agent loop (loop.ts)** -- Adapted from kernel.ts. Same perception/reasoning/action cycle. Sends WebSocket commands instead of ADB calls. - -5. **Goal submission:** - - Dashboard: POST /goals -> starts agent loop -> streams steps via dashboard WebSocket - - Android: device sends `{ type: "goal", text: "..." }` -> same agent loop - ---- - -## SvelteKit Dashboard (`web/`) - -Follows existing patterns: remote functions (`$app/server` form/query), Svelte 5 runes, Tailwind v4, Valibot schemas. - -### Route Structure - -``` -web/src/routes/ -├── +layout.svelte # add nav bar -├── +layout.server.ts # load session for all pages -├── +page.svelte # redirect: logged in -> /dashboard, else -> /login -├── login/+page.svelte # existing -├── signup/+page.svelte # existing -├── dashboard/ -│ ├── +layout.svelte # dashboard shell (sidebar nav) -│ ├── +page.svelte # overview: connected devices, quick goal input -│ ├── api-keys/ -│ │ └── +page.svelte # list keys, create with label, copy, delete -│ ├── settings/ -│ │ └── +page.svelte # LLM provider config (provider, API key, model) -│ └── devices/ -│ ├── +page.svelte # list connected devices with status -│ └── [deviceId]/ -│ └── +page.svelte # device detail: send goal, live step log -``` - -### Remote Functions - -``` -web/src/lib/api/ -├── auth.remote.ts # existing (signup, login, signout, getUser) -├── api-keys.remote.ts # createKey, listKeys, deleteKey (Better Auth client) -├── settings.remote.ts # getConfig, updateConfig (LLM provider/key) -├── devices.remote.ts # listDevices (queries Hono server) -└── goals.remote.ts # submitGoal (POST to Hono server) -``` - -Dashboard WebSocket for live step logs connects directly to Hono server from the browser (not through SvelteKit). - ---- - -## WebSocket Protocol - -### Device -> Server (Android app sends) - -```json -// Handshake -{ "type": "auth", "apiKey": "dc_xxxxx" } - -// Screen tree response -{ "type": "screen", "requestId": "uuid", "elements": [], "screenshot": "base64?", "packageName": "com.app" } - -// Action result -{ "type": "result", "requestId": "uuid", "success": true, "error": null, "data": null } - -// Goal from phone -{ "type": "goal", "text": "open youtube and search lofi" } - -// Heartbeat -{ "type": "pong" } -``` - -### Server -> Device (Hono sends) - -```json -// Auth -{ "type": "auth_ok", "deviceId": "uuid" } -{ "type": "auth_error", "message": "invalid key" } - -// Commands (all 22 actions) -{ "type": "get_screen", "requestId": "uuid" } -{ "type": "tap", "requestId": "uuid", "x": 540, "y": 1200 } -{ "type": "type", "requestId": "uuid", "text": "lofi beats" } -{ "type": "swipe", "requestId": "uuid", "x1": 540, "y1": 1600, "x2": 540, "y2": 400 } -{ "type": "enter", "requestId": "uuid" } -{ "type": "back", "requestId": "uuid" } -{ "type": "home", "requestId": "uuid" } -{ "type": "launch", "requestId": "uuid", "packageName": "com.google.android.youtube" } -// ... remaining actions follow same pattern - -// Heartbeat -{ "type": "ping" } - -// Goal lifecycle -{ "type": "goal_started", "sessionId": "uuid", "goal": "..." } -{ "type": "goal_completed", "sessionId": "uuid", "success": true, "stepsUsed": 12 } -``` - -### Server -> Dashboard (live step stream) - -```json -// Device status -{ "type": "device_online", "deviceId": "uuid", "name": "Pixel 8" } -{ "type": "device_offline", "deviceId": "uuid" } - -// Step stream -{ "type": "step", "sessionId": "uuid", "step": 3, "action": {}, "reasoning": "...", "screenHash": "..." } -{ "type": "goal_started", "sessionId": "uuid", "goal": "...", "deviceId": "uuid" } -{ "type": "goal_completed", "sessionId": "uuid", "success": true, "stepsUsed": 12 } -``` - ---- - -## Shared Types (`packages/shared/`) - -``` -packages/shared/ -├── src/ -│ ├── types.ts # UIElement, Bounds, Point -│ ├── commands.ts # Command, CommandResult type unions -│ ├── actions.ts # ActionDecision type (all 22 actions) -│ └── protocol.ts # WebSocket message types -├── package.json # name: "@droidclaw/shared" -└── tsconfig.json -``` - -Replaces duplicated types across src/, server/, web/. Android app mirrors in Kotlin via @Serializable data classes. - ---- - -## Android App (future, plan only) - -``` -android/ -├── app/src/main/kotlin/ai/droidclaw/companion/ -│ ├── DroidClawApp.kt -│ ├── MainActivity.kt # API key input, setup checklist, status -│ ├── accessibility/ -│ │ ├── DroidClawAccessibilityService.kt -│ │ ├── ScreenTreeBuilder.kt -│ │ └── GestureExecutor.kt -│ ├── capture/ -│ │ └── ScreenCaptureService.kt -│ ├── connection/ -│ │ ├── ConnectionService.kt # Foreground service -│ │ ├── ReliableWebSocket.kt # Reconnect, heartbeat, message queue -│ │ └── CommandRouter.kt -│ └── model/ -│ ├── UIElement.kt # Mirrors @droidclaw/shared types -│ ├── Command.kt -│ └── DeviceInfo.kt -├── build.gradle.kts -└── AndroidManifest.xml -``` - -Follows OPTION1-IMPLEMENTATION.md structure. Not building now, but server protocol is designed for it. - ---- - -## Deployment (Railway) - -| Service | Source | Port | Notes | -|---|---|---|---| -| web | `web/` | 3000 | SvelteKit + node adapter | -| server | `server/` | 8080 | Hono + Bun.serve | -| postgres | Railway managed | 5432 | Shared by both services | - -Both services get the same `DATABASE_URL`. Web calls Hono via Railway internal networking for REST. Browser connects directly to Hono's public URL for WebSocket. - ---- - -## Data Flow - -``` -USER (browser) HONO SERVER PHONE (Android app) - | | | - | signs in (SvelteKit) | | - | creates API key | | - | | | - | | { type: "auth", key: "dc_xxx" } - | |<------------------------------| - | | { type: "auth_ok" } | - | |------------------------------>| - | | | - | POST /goals | | - | "open youtube, search lofi" | | - |------------------------------>| | - | | { type: "get_screen" } | - | |------------------------------>| - | | | - | | { type: "screen", elements } | - | |<------------------------------| - | | | - | | LLM: "launch youtube" | - | | | - | { type: "step", action } | { type: "launch", pkg } | - |<------------------------------|------------------------------>| - | | | - | | { success: true } | - | |<------------------------------| - | | | - | ... repeat until done ... | | - | | | - | { type: "goal_completed" } | { type: "goal_completed" } | - |<------------------------------|------------------------------>| -```