chore: remove docs/plans from repo and gitignore it
Contains product architecture, roadmaps, and implementation plans that should not be public. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -7,6 +7,7 @@ competitor/
|
|||||||
logs/
|
logs/
|
||||||
kernel_screenshot.png
|
kernel_screenshot.png
|
||||||
window_dump.xml
|
window_dump.xml
|
||||||
|
docs/plans/
|
||||||
docs/architecture-web-flow.md
|
docs/architecture-web-flow.md
|
||||||
docs/INTENT.md
|
docs/INTENT.md
|
||||||
OPTION1-IMPLEMENTATION.md
|
OPTION1-IMPLEMENTATION.md
|
||||||
|
|||||||
@@ -1,397 +0,0 @@
|
|||||||
# Android Companion App Design
|
|
||||||
|
|
||||||
> DroidClaw Android app: the eyes and hands of the AI agent. Connects to the Hono server via WebSocket, captures accessibility trees and screenshots, executes gestures on command, and supports device-initiated goals.
|
|
||||||
|
|
||||||
**Date:** 2026-02-17
|
|
||||||
**Scope:** Full v1 (all 4 phases)
|
|
||||||
**Package:** `com.thisux.droidclaw`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architecture Overview
|
|
||||||
|
|
||||||
Three independent layers with clear boundaries:
|
|
||||||
|
|
||||||
```
|
|
||||||
┌──────────────────────────────────────────────┐
|
|
||||||
│ UI Layer │
|
|
||||||
│ MainActivity + Compose (Home, Settings, Logs)│
|
|
||||||
│ Observes StateFlows from services │
|
|
||||||
├──────────────────────────────────────────────┤
|
|
||||||
│ Connection Layer │
|
|
||||||
│ ConnectionService (foreground service) │
|
|
||||||
│ ReliableWebSocket (Ktor) + CommandRouter │
|
|
||||||
├──────────────────────────────────────────────┤
|
|
||||||
│ Accessibility Layer │
|
|
||||||
│ DroidClawAccessibilityService (system svc) │
|
|
||||||
│ ScreenTreeBuilder + GestureExecutor │
|
|
||||||
│ ScreenCaptureManager (MediaProjection) │
|
|
||||||
└──────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
- **Accessibility Layer**: System-managed service. Reads screen trees, executes gestures, captures screenshots. Runs independently of app UI.
|
|
||||||
- **Connection Layer**: Foreground service with Ktor WebSocket. Bridges accessibility to server. Handles reconnection, heartbeat, message queuing.
|
|
||||||
- **UI Layer**: Compose with bottom nav. Observes service state via `StateFlow`. Goal input, settings, logs.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Project Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
android/app/src/main/java/com/thisux/droidclaw/
|
|
||||||
├── DroidClawApp.kt # Application class (DataStore init)
|
|
||||||
├── MainActivity.kt # Compose host + bottom nav
|
|
||||||
├── accessibility/
|
|
||||||
│ ├── DroidClawAccessibilityService.kt # System service, tree capture
|
|
||||||
│ ├── ScreenTreeBuilder.kt # NodeInfo → UIElement list
|
|
||||||
│ └── GestureExecutor.kt # Node-first actions + dispatchGesture fallback
|
|
||||||
├── connection/
|
|
||||||
│ ├── ConnectionService.kt # Foreground service, Ktor WebSocket
|
|
||||||
│ ├── ReliableWebSocket.kt # Reconnect, heartbeat, message queue
|
|
||||||
│ └── CommandRouter.kt # Dispatches server commands → GestureExecutor
|
|
||||||
├── capture/
|
|
||||||
│ └── ScreenCaptureManager.kt # MediaProjection screenshots
|
|
||||||
├── model/
|
|
||||||
│ ├── UIElement.kt # Mirrors @droidclaw/shared types
|
|
||||||
│ ├── Protocol.kt # WebSocket message types
|
|
||||||
│ └── AppState.kt # Connection status, steps, etc.
|
|
||||||
├── data/
|
|
||||||
│ └── SettingsStore.kt # DataStore for API key, server URL
|
|
||||||
├── ui/
|
|
||||||
│ ├── screens/
|
|
||||||
│ │ ├── HomeScreen.kt # Status + goal input + live log
|
|
||||||
│ │ ├── SettingsScreen.kt # API key, server URL, battery opt
|
|
||||||
│ │ └── LogsScreen.kt # Step history
|
|
||||||
│ └── theme/ # Existing Material 3 theme
|
|
||||||
└── util/
|
|
||||||
├── BatteryOptimization.kt # OEM-specific exemption helpers
|
|
||||||
└── DeviceInfo.kt # Model, Android version, screen size
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Dependencies
|
|
||||||
|
|
||||||
| Library | Version | Purpose |
|
|
||||||
|---------|---------|---------|
|
|
||||||
| `io.ktor:ktor-client-cio` | 3.1.x | HTTP/WebSocket client (coroutine-native) |
|
|
||||||
| `io.ktor:ktor-client-websockets` | 3.1.x | WebSocket plugin for Ktor |
|
|
||||||
| `org.jetbrains.kotlinx:kotlinx-serialization-json` | 1.7.x | JSON serialization |
|
|
||||||
| `org.jetbrains.kotlinx:kotlinx-coroutines-android` | 1.9.x | Coroutines |
|
|
||||||
| `androidx.datastore:datastore-preferences` | 1.1.x | Persistent settings (API key, server URL) |
|
|
||||||
| `androidx.lifecycle:lifecycle-service` | 2.8.x | Service lifecycle |
|
|
||||||
| `androidx.navigation:navigation-compose` | 2.8.x | Bottom nav routing |
|
|
||||||
| `androidx.compose.material:material-icons-extended` | latest | Nav icons |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Permissions
|
|
||||||
|
|
||||||
```xml
|
|
||||||
<uses-permission android:name="android.permission.INTERNET" />
|
|
||||||
<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
|
|
||||||
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_CONNECTED_DEVICE" />
|
|
||||||
<uses-permission android:name="android.permission.POST_NOTIFICATIONS" />
|
|
||||||
<uses-permission android:name="android.permission.REQUEST_IGNORE_BATTERY_OPTIMIZATIONS" />
|
|
||||||
<uses-permission android:name="android.permission.WAKE_LOCK" />
|
|
||||||
```
|
|
||||||
|
|
||||||
Plus the accessibility service declaration:
|
|
||||||
```xml
|
|
||||||
<service
|
|
||||||
android:name=".accessibility.DroidClawAccessibilityService"
|
|
||||||
android:permission="android.permission.BIND_ACCESSIBILITY_SERVICE"
|
|
||||||
android:exported="false">
|
|
||||||
<intent-filter>
|
|
||||||
<action android:name="android.accessibilityservice.AccessibilityService" />
|
|
||||||
</intent-filter>
|
|
||||||
<meta-data
|
|
||||||
android:name="android.accessibilityservice"
|
|
||||||
android:resource="@xml/accessibility_config" />
|
|
||||||
</service>
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 1: Accessibility Service
|
|
||||||
|
|
||||||
### DroidClawAccessibilityService
|
|
||||||
|
|
||||||
System-managed service. Android starts/stops it based on user toggling it in Settings > Accessibility.
|
|
||||||
|
|
||||||
**State exposed via companion StateFlow** (no binding needed):
|
|
||||||
```kotlin
|
|
||||||
companion object {
|
|
||||||
val isRunning = MutableStateFlow(false)
|
|
||||||
val lastScreenTree = MutableStateFlow<List<UIElement>>(emptyList())
|
|
||||||
var instance: DroidClawAccessibilityService? = null
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Lifecycle:**
|
|
||||||
- `onServiceConnected()`: Set `isRunning = true`, store `instance`
|
|
||||||
- `onAccessibilityEvent()`: Capture events for window changes, content changes
|
|
||||||
- `onInterrupt()` / `onDestroy()`: Set `isRunning = false`, clear `instance`
|
|
||||||
|
|
||||||
### ScreenTreeBuilder
|
|
||||||
|
|
||||||
Walks `rootInActiveWindow` depth-first, extracts:
|
|
||||||
- Bounds (Rect), center coordinates (x, y)
|
|
||||||
- text, contentDescription, className, viewIdResourceName
|
|
||||||
- State flags: enabled, checked, focused, scrollable, clickable, longClickable
|
|
||||||
- Parent context (parent class, parent description)
|
|
||||||
|
|
||||||
**Output:** `List<UIElement>` matching `@droidclaw/shared` UIElement type.
|
|
||||||
|
|
||||||
**Null handling:** `rootInActiveWindow` returns null during screen transitions. Retry with exponential backoff (50ms, 100ms, 200ms) up to 3 attempts. If still null, return empty list (server uses vision fallback).
|
|
||||||
|
|
||||||
**Memory safety:** `AccessibilityNodeInfo` must be recycled. Use extension:
|
|
||||||
```kotlin
|
|
||||||
inline fun <T> AccessibilityNodeInfo.use(block: (AccessibilityNodeInfo) -> T): T {
|
|
||||||
try { return block(this) } finally { recycle() }
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Screen hash:** `computeScreenHash()` — hash of element IDs + text + centers. Used by server for stuck-loop detection.
|
|
||||||
|
|
||||||
### GestureExecutor
|
|
||||||
|
|
||||||
Node-first strategy for all actions:
|
|
||||||
|
|
||||||
| Action | Primary (node) | Fallback (gesture) |
|
|
||||||
|--------|----------------|-------------------|
|
|
||||||
| tap | `performAction(ACTION_CLICK)` on node at (x,y) | `dispatchGesture()` tap at coordinates |
|
|
||||||
| type | `performAction(ACTION_SET_TEXT)` on focused node | Character-by-character gesture taps |
|
|
||||||
| long_press | `performAction(ACTION_LONG_CLICK)` | `dispatchGesture()` hold 1000ms |
|
|
||||||
| swipe | — | `dispatchGesture()` path from start→end |
|
|
||||||
| scroll | `performAction(ACTION_SCROLL_FORWARD/BACKWARD)` on scrollable parent | Swipe gesture |
|
|
||||||
| back | `performGlobalAction(GLOBAL_ACTION_BACK)` | — |
|
|
||||||
| home | `performGlobalAction(GLOBAL_ACTION_HOME)` | — |
|
|
||||||
| notifications | `performGlobalAction(GLOBAL_ACTION_NOTIFICATIONS)` | — |
|
|
||||||
| launch | `startActivity(packageManager.getLaunchIntentForPackage())` | — |
|
|
||||||
| clear | Focus node → select all → delete | — |
|
|
||||||
| enter | `performAction(ACTION_IME_ENTER)` or keyevent KEYCODE_ENTER | — |
|
|
||||||
|
|
||||||
**Result reporting:** Each action returns `ActionResult { success: Boolean, error: String? }`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 2: Connection Service
|
|
||||||
|
|
||||||
### ConnectionService
|
|
||||||
|
|
||||||
Foreground service with persistent notification.
|
|
||||||
|
|
||||||
**Lifecycle:**
|
|
||||||
1. User taps "Connect" → service starts
|
|
||||||
2. Reads API key + server URL from DataStore
|
|
||||||
3. Creates `ReliableWebSocket` and connects
|
|
||||||
4. Notification shows: "DroidClaw - Connected to server" (or "Reconnecting...")
|
|
||||||
5. Notification has "Disconnect" action button
|
|
||||||
6. Service stops when user disconnects or notification action tapped
|
|
||||||
|
|
||||||
**State exposed:**
|
|
||||||
```kotlin
|
|
||||||
companion object {
|
|
||||||
val connectionState = MutableStateFlow<ConnectionState>(ConnectionState.Disconnected)
|
|
||||||
val currentSteps = MutableStateFlow<List<AgentStep>>(emptyList())
|
|
||||||
val currentGoalStatus = MutableStateFlow<GoalStatus>(GoalStatus.Idle)
|
|
||||||
var instance: ConnectionService? = null
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### ReliableWebSocket
|
|
||||||
|
|
||||||
Wraps Ktor `WebSocketSession` with reliability:
|
|
||||||
|
|
||||||
- **Connect:** `HttpClient { install(WebSockets) }` → `client.webSocket(serverUrl + "/ws/device")`
|
|
||||||
- **Auth handshake:** First message: `{ type: "auth", apiKey: "dc_xxx", deviceInfo: { model, android, screenWidth, screenHeight } }`
|
|
||||||
- **Wait for:** `{ type: "auth_ok", deviceId: "uuid" }` or `{ type: "auth_error" }` → close + surface error
|
|
||||||
- **Heartbeat:** Ktor WebSocket has built-in ping/pong. Configure `pingIntervalMillis = 30_000`
|
|
||||||
- **Reconnect:** On connection loss, exponential backoff: 1s → 2s → 4s → 8s → max 30s. Reset backoff on successful auth.
|
|
||||||
- **Message queue:** `Channel<String>(Channel.BUFFERED)` for outbound messages. Drained when connected, buffered when disconnected.
|
|
||||||
- **State:** Emits `ConnectionState` (Disconnected, Connecting, Connected, Error(message))
|
|
||||||
|
|
||||||
### CommandRouter
|
|
||||||
|
|
||||||
Receives JSON from WebSocket, parses, dispatches:
|
|
||||||
|
|
||||||
```
|
|
||||||
"get_screen" → ScreenTreeBuilder.capture() → send screen response
|
|
||||||
"get_screenshot"→ ScreenCaptureManager.capture() → compress, base64, send
|
|
||||||
"execute" → GestureExecutor.execute(action) → send result response
|
|
||||||
"ping" → send { type: "pong" }
|
|
||||||
"goal_started" → update UI state to running
|
|
||||||
"step" → append to currentSteps, update UI
|
|
||||||
"goal_completed"→ update UI state to completed
|
|
||||||
"goal_failed" → update UI state to failed
|
|
||||||
```
|
|
||||||
|
|
||||||
All responses include the `requestId` from the command for server-side Promise resolution.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 3: Screen Capture
|
|
||||||
|
|
||||||
### ScreenCaptureManager
|
|
||||||
|
|
||||||
MediaProjection-based screenshot capture.
|
|
||||||
|
|
||||||
**Setup:**
|
|
||||||
1. Request `MediaProjection` via `MediaProjectionManager.createScreenCaptureIntent()`
|
|
||||||
2. User grants consent (Android system dialog)
|
|
||||||
3. Create `VirtualDisplay` → `ImageReader` (RGBA_8888)
|
|
||||||
4. Keep projection alive in ConnectionService scope
|
|
||||||
|
|
||||||
**Capture flow:**
|
|
||||||
1. Server requests screenshot
|
|
||||||
2. Acquire latest `Image` from `ImageReader`
|
|
||||||
3. Convert to `Bitmap`
|
|
||||||
4. Scale to max 720px width (maintain aspect ratio)
|
|
||||||
5. Compress to JPEG quality 50
|
|
||||||
6. Return `ByteArray`
|
|
||||||
|
|
||||||
**Edge cases:**
|
|
||||||
- **Android 14+:** Per-session consent. Projection dies if user revokes or after reboot. Re-prompt on next connect.
|
|
||||||
- **FLAG_SECURE:** Returns black frame. Detect by checking if all pixels are black (sample corners). Report `error: "secure_window"` to server.
|
|
||||||
- **Projection unavailable:** Graceful degradation. Server works with accessibility tree only (vision fallback without actual screenshot).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 4: Data & Settings
|
|
||||||
|
|
||||||
### SettingsStore
|
|
||||||
|
|
||||||
Preferences DataStore for persistent settings:
|
|
||||||
|
|
||||||
| Key | Type | Default |
|
|
||||||
|-----|------|---------|
|
|
||||||
| `api_key` | String | `""` |
|
|
||||||
| `server_url` | String | `"wss://localhost:8080"` |
|
|
||||||
| `device_name` | String | Device model name |
|
|
||||||
| `auto_connect` | Boolean | `false` |
|
|
||||||
|
|
||||||
Exposed as `Flow<T>` for reactive UI updates.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 5: UI
|
|
||||||
|
|
||||||
### Navigation
|
|
||||||
|
|
||||||
Bottom nav with 3 tabs:
|
|
||||||
- **Home** (icon: `Home`) — connection status, goal input, live steps
|
|
||||||
- **Settings** (icon: `Settings`) — API key, server URL, permissions checklist
|
|
||||||
- **Logs** (icon: `History`) — past session history
|
|
||||||
|
|
||||||
### HomeScreen
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────┐
|
|
||||||
│ ● Connected to server │ ← status badge (green/yellow/red)
|
|
||||||
├─────────────────────────────┤
|
|
||||||
│ [Enter a goal... ] [Run] │ ← goal input + submit
|
|
||||||
├─────────────────────────────┤
|
|
||||||
│ Step 1: tap (540, 800) │ ← live step log
|
|
||||||
│ "Tapping the search icon" │
|
|
||||||
│ │
|
|
||||||
│ Step 2: type "lofi beats" │
|
|
||||||
│ "Typing the search query" │
|
|
||||||
│ │
|
|
||||||
│ ✓ Goal completed (5 steps) │ ← final status
|
|
||||||
└─────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
- Goal input disabled when not connected or when a goal is running
|
|
||||||
- Steps stream in real-time via `ConnectionService.currentSteps` StateFlow
|
|
||||||
- Status transitions: idle → running → completed/failed
|
|
||||||
|
|
||||||
### SettingsScreen
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────┐
|
|
||||||
│ API Key │
|
|
||||||
│ [dc_••••••••••••••] [Edit]│
|
|
||||||
├─────────────────────────────┤
|
|
||||||
│ Server URL │
|
|
||||||
│ [wss://your-server.app ] │
|
|
||||||
├─────────────────────────────┤
|
|
||||||
│ Setup Checklist │
|
|
||||||
│ ✓ API key configured │
|
|
||||||
│ ✗ Accessibility service │ ← tap to open Android settings
|
|
||||||
│ ✗ Screen capture permission │ ← tap to grant
|
|
||||||
│ ✓ Battery optimization off │
|
|
||||||
└─────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
- Warning cards for missing setup items
|
|
||||||
- Deep-links to Android system settings for accessibility toggle
|
|
||||||
- Battery optimization request via `ACTION_REQUEST_IGNORE_BATTERY_OPTIMIZATIONS`
|
|
||||||
|
|
||||||
### LogsScreen
|
|
||||||
|
|
||||||
- In-memory list of past sessions: goal text, step count, success/failure, timestamp
|
|
||||||
- Tap to expand → shows all steps with action + reasoning
|
|
||||||
- Clears on app restart (persistent storage is v2)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## WebSocket Protocol (Device Side)
|
|
||||||
|
|
||||||
### Device → Server
|
|
||||||
|
|
||||||
| Message | When |
|
|
||||||
|---------|------|
|
|
||||||
| `{ type: "auth", apiKey, deviceInfo }` | On connect |
|
|
||||||
| `{ type: "screen", requestId, elements, screenHash }` | Response to get_screen |
|
|
||||||
| `{ type: "screenshot", requestId, image }` | Response to get_screenshot |
|
|
||||||
| `{ type: "result", requestId, success, error? }` | Response to execute |
|
|
||||||
| `{ type: "goal", text }` | User submits goal on phone |
|
|
||||||
| `{ type: "pong" }` | Response to ping |
|
|
||||||
|
|
||||||
### Server → Device
|
|
||||||
|
|
||||||
| Message | When |
|
|
||||||
|---------|------|
|
|
||||||
| `{ type: "auth_ok", deviceId }` | Auth succeeded |
|
|
||||||
| `{ type: "auth_error", message }` | Auth failed |
|
|
||||||
| `{ type: "get_screen", requestId }` | Agent loop needs screen tree |
|
|
||||||
| `{ type: "get_screenshot", requestId }` | Vision fallback |
|
|
||||||
| `{ type: "execute", requestId, action }` | Execute tap/type/swipe/etc |
|
|
||||||
| `{ type: "ping" }` | Heartbeat check |
|
|
||||||
| `{ type: "step", step, action, reasoning }` | Live step update (for phone UI) |
|
|
||||||
| `{ type: "goal_started", sessionId }` | Agent loop started |
|
|
||||||
| `{ type: "goal_completed", sessionId }` | Agent loop done |
|
|
||||||
| `{ type: "goal_failed", sessionId, error }` | Agent loop failed |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Battery Optimization
|
|
||||||
|
|
||||||
OEM-specific battery killers are the #2 reliability problem after Google Play policy.
|
|
||||||
|
|
||||||
**Strategy:**
|
|
||||||
1. Detect if battery optimization is disabled: `PowerManager.isIgnoringBatteryOptimizations()`
|
|
||||||
2. If not, show warning card in Settings with button to request exemption
|
|
||||||
3. For aggressive OEMs (Xiaomi, Huawei, Samsung, OnePlus, Oppo, Vivo), show additional guidance linking to dontkillmyapp.com
|
|
||||||
4. ConnectionService uses `PARTIAL_WAKE_LOCK` to prevent CPU sleep during active goals
|
|
||||||
5. Foreground service notification keeps process priority high
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Distribution
|
|
||||||
|
|
||||||
- **Primary:** APK sideload from droidclaw.ai
|
|
||||||
- **Secondary:** F-Droid
|
|
||||||
- **NOT Play Store:** Google Play policy (Nov 2025) explicitly prohibits autonomous AI action execution via AccessibilityService
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Known Limitations
|
|
||||||
|
|
||||||
1. **FLAG_SECURE apps** (banking, password managers) block both tree and screenshots
|
|
||||||
2. **WebView/Flutter** apps may return empty accessibility trees — server falls back to vision
|
|
||||||
3. **Android 14+** requires per-session MediaProjection consent
|
|
||||||
4. **Android 16 Advanced Protection** will auto-revoke accessibility for non-accessibility tools
|
|
||||||
5. **dispatchGesture()** can be detected/ignored by some apps — node-first strategy mitigates
|
|
||||||
6. **rootInActiveWindow** returns null during transitions — retry with backoff
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,357 +0,0 @@
|
|||||||
# Option 1: Web Dashboard + Backend Design
|
|
||||||
|
|
||||||
> Date: 2026-02-17
|
|
||||||
> Status: Approved
|
|
||||||
> Scope: Web (SvelteKit) + Backend (Hono.js) + Android app plan
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Decisions
|
|
||||||
|
|
||||||
- **Monorepo**: `web/` (SvelteKit dashboard) + `server/` (Hono.js backend) + `android/` (future)
|
|
||||||
- **Separate Hono server** for WebSocket + agent loop (independent lifecycle from dashboard)
|
|
||||||
- **SvelteKit** with node adapter for dashboard (deploy to Railway)
|
|
||||||
- **Multiple API keys** per user with labels (Better Auth apiKey plugin)
|
|
||||||
- **LLM config on dashboard only** (BYOK -- user provides their own API keys)
|
|
||||||
- **Goals sent from both** web dashboard and Android app
|
|
||||||
- **Dashboard v1**: API keys, LLM config, connected devices, goal input, step logs
|
|
||||||
- **Server runs the agent loop** (phone is eyes + hands)
|
|
||||||
- **Shared Postgres** on Railway (both services connect to same DB)
|
|
||||||
- **Build order**: web + server first, Android later
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Monorepo Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
droidclaw/
|
|
||||||
├── src/ # existing CLI agent (kernel.ts, actions.ts, etc.)
|
|
||||||
├── web/ # SvelteKit dashboard (existing, extend)
|
|
||||||
├── server/ # Hono.js backend (WebSocket + agent loop)
|
|
||||||
├── android/ # Kotlin companion app (future)
|
|
||||||
├── packages/shared/ # shared TypeScript types
|
|
||||||
├── package.json # root
|
|
||||||
└── CLAUDE.md
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Auth & API Key System
|
|
||||||
|
|
||||||
Both apps share the same Postgres DB and the same Better Auth tables.
|
|
||||||
|
|
||||||
SvelteKit handles user-facing auth (login, signup, sessions). Hono verifies API keys from Android devices.
|
|
||||||
|
|
||||||
### Better Auth Config
|
|
||||||
|
|
||||||
Both apps use Better Auth with the `apiKey` plugin. SvelteKit adds `sveltekitCookies`, Hono adds session middleware.
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
// shared pattern
|
|
||||||
plugins: [
|
|
||||||
apiKey() // built-in API key plugin
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Flow
|
|
||||||
|
|
||||||
1. User signs up/logs in on SvelteKit dashboard (existing)
|
|
||||||
2. Dashboard "API Keys" page -- user creates keys with labels (e.g., "Pixel 8", "Work Phone")
|
|
||||||
3. Better Auth's apiKey plugin handles create/list/delete
|
|
||||||
4. User copies key, pastes into Android app SharedPreferences
|
|
||||||
5. Android app connects to Hono server via WebSocket, sends API key in handshake
|
|
||||||
6. Hono calls `auth.api.verifyApiKey({ body: { key } })` -- if valid, establishes device session
|
|
||||||
7. Dashboard WebSocket connections use session cookies (user already logged in)
|
|
||||||
|
|
||||||
### Database Schema
|
|
||||||
|
|
||||||
Better Auth manages: `user`, `session`, `account`, `verification`, `api_key`
|
|
||||||
|
|
||||||
Additional tables (Drizzle):
|
|
||||||
|
|
||||||
```
|
|
||||||
llm_config
|
|
||||||
- id: text PK
|
|
||||||
- userId: text FK -> user.id
|
|
||||||
- provider: text (openai | groq | ollama | bedrock | openrouter)
|
|
||||||
- apiKey: text (encrypted)
|
|
||||||
- model: text
|
|
||||||
- createdAt: timestamp
|
|
||||||
- updatedAt: timestamp
|
|
||||||
|
|
||||||
device
|
|
||||||
- id: text PK
|
|
||||||
- userId: text FK -> user.id
|
|
||||||
- name: text
|
|
||||||
- lastSeen: timestamp
|
|
||||||
- status: text (online | offline)
|
|
||||||
- deviceInfo: jsonb (model, androidVersion, screenWidth, screenHeight)
|
|
||||||
- createdAt: timestamp
|
|
||||||
|
|
||||||
agent_session
|
|
||||||
- id: text PK
|
|
||||||
- userId: text FK -> user.id
|
|
||||||
- deviceId: text FK -> device.id
|
|
||||||
- goal: text
|
|
||||||
- status: text (running | completed | failed | cancelled)
|
|
||||||
- stepsUsed: integer
|
|
||||||
- startedAt: timestamp
|
|
||||||
- completedAt: timestamp
|
|
||||||
|
|
||||||
agent_step
|
|
||||||
- id: text PK
|
|
||||||
- sessionId: text FK -> agent_session.id
|
|
||||||
- stepNumber: integer
|
|
||||||
- screenHash: text
|
|
||||||
- action: jsonb
|
|
||||||
- reasoning: text
|
|
||||||
- result: text
|
|
||||||
- timestamp: timestamp
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Hono Server Architecture (`server/`)
|
|
||||||
|
|
||||||
```
|
|
||||||
server/
|
|
||||||
├── src/
|
|
||||||
│ ├── index.ts # Hono app + Bun.serve with WebSocket upgrade
|
|
||||||
│ ├── auth.ts # Better Auth instance (same DB, apiKey plugin)
|
|
||||||
│ ├── middleware/
|
|
||||||
│ │ ├── auth.ts # Session middleware (dashboard WebSocket)
|
|
||||||
│ │ └── api-key.ts # API key verification (Android WebSocket)
|
|
||||||
│ ├── ws/
|
|
||||||
│ │ ├── device.ts # WebSocket handler for Android devices
|
|
||||||
│ │ ├── dashboard.ts # WebSocket handler for web dashboard (live logs)
|
|
||||||
│ │ └── sessions.ts # In-memory session manager (connected devices + active loops)
|
|
||||||
│ ├── agent/
|
|
||||||
│ │ ├── loop.ts # Agent loop (adapted from kernel.ts)
|
|
||||||
│ │ ├── llm.ts # LLM provider factory (adapted from llm-providers.ts)
|
|
||||||
│ │ ├── stuck.ts # Stuck-loop detection
|
|
||||||
│ │ └── skills.ts # Multi-step skills (adapted from skills.ts)
|
|
||||||
│ ├── routes/
|
|
||||||
│ │ ├── devices.ts # GET /devices
|
|
||||||
│ │ ├── goals.ts # POST /goals
|
|
||||||
│ │ └── health.ts # GET /health
|
|
||||||
│ ├── db.ts # Drizzle instance (same Postgres)
|
|
||||||
│ └── env.ts # Environment config
|
|
||||||
├── package.json
|
|
||||||
├── tsconfig.json
|
|
||||||
└── Dockerfile
|
|
||||||
```
|
|
||||||
|
|
||||||
### Key Design Points
|
|
||||||
|
|
||||||
1. **Bun.serve() with WebSocket upgrade** -- Hono handles HTTP, Bun native WebSocket handles upgrades. No extra WS library.
|
|
||||||
|
|
||||||
2. **Two WebSocket paths:**
|
|
||||||
- `/ws/device` -- Android app connects with API key
|
|
||||||
- `/ws/dashboard` -- Web dashboard connects with session cookie
|
|
||||||
|
|
||||||
3. **sessions.ts** -- In-memory map tracking connected devices, active agent loops, dashboard subscribers.
|
|
||||||
|
|
||||||
4. **Agent loop (loop.ts)** -- Adapted from kernel.ts. Same perception/reasoning/action cycle. Sends WebSocket commands instead of ADB calls.
|
|
||||||
|
|
||||||
5. **Goal submission:**
|
|
||||||
- Dashboard: POST /goals -> starts agent loop -> streams steps via dashboard WebSocket
|
|
||||||
- Android: device sends `{ type: "goal", text: "..." }` -> same agent loop
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SvelteKit Dashboard (`web/`)
|
|
||||||
|
|
||||||
Follows existing patterns: remote functions (`$app/server` form/query), Svelte 5 runes, Tailwind v4, Valibot schemas.
|
|
||||||
|
|
||||||
### Route Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
web/src/routes/
|
|
||||||
├── +layout.svelte # add nav bar
|
|
||||||
├── +layout.server.ts # load session for all pages
|
|
||||||
├── +page.svelte # redirect: logged in -> /dashboard, else -> /login
|
|
||||||
├── login/+page.svelte # existing
|
|
||||||
├── signup/+page.svelte # existing
|
|
||||||
├── dashboard/
|
|
||||||
│ ├── +layout.svelte # dashboard shell (sidebar nav)
|
|
||||||
│ ├── +page.svelte # overview: connected devices, quick goal input
|
|
||||||
│ ├── api-keys/
|
|
||||||
│ │ └── +page.svelte # list keys, create with label, copy, delete
|
|
||||||
│ ├── settings/
|
|
||||||
│ │ └── +page.svelte # LLM provider config (provider, API key, model)
|
|
||||||
│ └── devices/
|
|
||||||
│ ├── +page.svelte # list connected devices with status
|
|
||||||
│ └── [deviceId]/
|
|
||||||
│ └── +page.svelte # device detail: send goal, live step log
|
|
||||||
```
|
|
||||||
|
|
||||||
### Remote Functions
|
|
||||||
|
|
||||||
```
|
|
||||||
web/src/lib/api/
|
|
||||||
├── auth.remote.ts # existing (signup, login, signout, getUser)
|
|
||||||
├── api-keys.remote.ts # createKey, listKeys, deleteKey (Better Auth client)
|
|
||||||
├── settings.remote.ts # getConfig, updateConfig (LLM provider/key)
|
|
||||||
├── devices.remote.ts # listDevices (queries Hono server)
|
|
||||||
└── goals.remote.ts # submitGoal (POST to Hono server)
|
|
||||||
```
|
|
||||||
|
|
||||||
Dashboard WebSocket for live step logs connects directly to Hono server from the browser (not through SvelteKit).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## WebSocket Protocol
|
|
||||||
|
|
||||||
### Device -> Server (Android app sends)
|
|
||||||
|
|
||||||
```json
|
|
||||||
// Handshake
|
|
||||||
{ "type": "auth", "apiKey": "dc_xxxxx" }
|
|
||||||
|
|
||||||
// Screen tree response
|
|
||||||
{ "type": "screen", "requestId": "uuid", "elements": [], "screenshot": "base64?", "packageName": "com.app" }
|
|
||||||
|
|
||||||
// Action result
|
|
||||||
{ "type": "result", "requestId": "uuid", "success": true, "error": null, "data": null }
|
|
||||||
|
|
||||||
// Goal from phone
|
|
||||||
{ "type": "goal", "text": "open youtube and search lofi" }
|
|
||||||
|
|
||||||
// Heartbeat
|
|
||||||
{ "type": "pong" }
|
|
||||||
```
|
|
||||||
|
|
||||||
### Server -> Device (Hono sends)
|
|
||||||
|
|
||||||
```json
|
|
||||||
// Auth
|
|
||||||
{ "type": "auth_ok", "deviceId": "uuid" }
|
|
||||||
{ "type": "auth_error", "message": "invalid key" }
|
|
||||||
|
|
||||||
// Commands (all 22 actions)
|
|
||||||
{ "type": "get_screen", "requestId": "uuid" }
|
|
||||||
{ "type": "tap", "requestId": "uuid", "x": 540, "y": 1200 }
|
|
||||||
{ "type": "type", "requestId": "uuid", "text": "lofi beats" }
|
|
||||||
{ "type": "swipe", "requestId": "uuid", "x1": 540, "y1": 1600, "x2": 540, "y2": 400 }
|
|
||||||
{ "type": "enter", "requestId": "uuid" }
|
|
||||||
{ "type": "back", "requestId": "uuid" }
|
|
||||||
{ "type": "home", "requestId": "uuid" }
|
|
||||||
{ "type": "launch", "requestId": "uuid", "packageName": "com.google.android.youtube" }
|
|
||||||
// ... remaining actions follow same pattern
|
|
||||||
|
|
||||||
// Heartbeat
|
|
||||||
{ "type": "ping" }
|
|
||||||
|
|
||||||
// Goal lifecycle
|
|
||||||
{ "type": "goal_started", "sessionId": "uuid", "goal": "..." }
|
|
||||||
{ "type": "goal_completed", "sessionId": "uuid", "success": true, "stepsUsed": 12 }
|
|
||||||
```
|
|
||||||
|
|
||||||
### Server -> Dashboard (live step stream)
|
|
||||||
|
|
||||||
```json
|
|
||||||
// Device status
|
|
||||||
{ "type": "device_online", "deviceId": "uuid", "name": "Pixel 8" }
|
|
||||||
{ "type": "device_offline", "deviceId": "uuid" }
|
|
||||||
|
|
||||||
// Step stream
|
|
||||||
{ "type": "step", "sessionId": "uuid", "step": 3, "action": {}, "reasoning": "...", "screenHash": "..." }
|
|
||||||
{ "type": "goal_started", "sessionId": "uuid", "goal": "...", "deviceId": "uuid" }
|
|
||||||
{ "type": "goal_completed", "sessionId": "uuid", "success": true, "stepsUsed": 12 }
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Shared Types (`packages/shared/`)
|
|
||||||
|
|
||||||
```
|
|
||||||
packages/shared/
|
|
||||||
├── src/
|
|
||||||
│ ├── types.ts # UIElement, Bounds, Point
|
|
||||||
│ ├── commands.ts # Command, CommandResult type unions
|
|
||||||
│ ├── actions.ts # ActionDecision type (all 22 actions)
|
|
||||||
│ └── protocol.ts # WebSocket message types
|
|
||||||
├── package.json # name: "@droidclaw/shared"
|
|
||||||
└── tsconfig.json
|
|
||||||
```
|
|
||||||
|
|
||||||
Replaces duplicated types across src/, server/, web/. Android app mirrors in Kotlin via @Serializable data classes.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Android App (future, plan only)
|
|
||||||
|
|
||||||
```
|
|
||||||
android/
|
|
||||||
├── app/src/main/kotlin/ai/droidclaw/companion/
|
|
||||||
│ ├── DroidClawApp.kt
|
|
||||||
│ ├── MainActivity.kt # API key input, setup checklist, status
|
|
||||||
│ ├── accessibility/
|
|
||||||
│ │ ├── DroidClawAccessibilityService.kt
|
|
||||||
│ │ ├── ScreenTreeBuilder.kt
|
|
||||||
│ │ └── GestureExecutor.kt
|
|
||||||
│ ├── capture/
|
|
||||||
│ │ └── ScreenCaptureService.kt
|
|
||||||
│ ├── connection/
|
|
||||||
│ │ ├── ConnectionService.kt # Foreground service
|
|
||||||
│ │ ├── ReliableWebSocket.kt # Reconnect, heartbeat, message queue
|
|
||||||
│ │ └── CommandRouter.kt
|
|
||||||
│ └── model/
|
|
||||||
│ ├── UIElement.kt # Mirrors @droidclaw/shared types
|
|
||||||
│ ├── Command.kt
|
|
||||||
│ └── DeviceInfo.kt
|
|
||||||
├── build.gradle.kts
|
|
||||||
└── AndroidManifest.xml
|
|
||||||
```
|
|
||||||
|
|
||||||
Follows OPTION1-IMPLEMENTATION.md structure. Not building now, but server protocol is designed for it.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Deployment (Railway)
|
|
||||||
|
|
||||||
| Service | Source | Port | Notes |
|
|
||||||
|---|---|---|---|
|
|
||||||
| web | `web/` | 3000 | SvelteKit + node adapter |
|
|
||||||
| server | `server/` | 8080 | Hono + Bun.serve |
|
|
||||||
| postgres | Railway managed | 5432 | Shared by both services |
|
|
||||||
|
|
||||||
Both services get the same `DATABASE_URL`. Web calls Hono via Railway internal networking for REST. Browser connects directly to Hono's public URL for WebSocket.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Data Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
USER (browser) HONO SERVER PHONE (Android app)
|
|
||||||
| | |
|
|
||||||
| signs in (SvelteKit) | |
|
|
||||||
| creates API key | |
|
|
||||||
| | |
|
|
||||||
| | { type: "auth", key: "dc_xxx" }
|
|
||||||
| |<------------------------------|
|
|
||||||
| | { type: "auth_ok" } |
|
|
||||||
| |------------------------------>|
|
|
||||||
| | |
|
|
||||||
| POST /goals | |
|
|
||||||
| "open youtube, search lofi" | |
|
|
||||||
|------------------------------>| |
|
|
||||||
| | { type: "get_screen" } |
|
|
||||||
| |------------------------------>|
|
|
||||||
| | |
|
|
||||||
| | { type: "screen", elements } |
|
|
||||||
| |<------------------------------|
|
|
||||||
| | |
|
|
||||||
| | LLM: "launch youtube" |
|
|
||||||
| | |
|
|
||||||
| { type: "step", action } | { type: "launch", pkg } |
|
|
||||||
|<------------------------------|------------------------------>|
|
|
||||||
| | |
|
|
||||||
| | { success: true } |
|
|
||||||
| |<------------------------------|
|
|
||||||
| | |
|
|
||||||
| ... repeat until done ... | |
|
|
||||||
| | |
|
|
||||||
| { type: "goal_completed" } | { type: "goal_completed" } |
|
|
||||||
|<------------------------------|------------------------------>|
|
|
||||||
```
|
|
||||||
Reference in New Issue
Block a user