chore: remove docs/plans from repo and gitignore it

Contains product architecture, roadmaps, and implementation plans
that should not be public.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Sanju Sivalingam
2026-02-17 20:17:06 +05:30
parent c395f9d83e
commit 0c0efe9b1e
5 changed files with 1 additions and 5712 deletions

1
.gitignore vendored
View File

@@ -7,6 +7,7 @@ competitor/
logs/ logs/
kernel_screenshot.png kernel_screenshot.png
window_dump.xml window_dump.xml
docs/plans/
docs/architecture-web-flow.md docs/architecture-web-flow.md
docs/INTENT.md docs/INTENT.md
OPTION1-IMPLEMENTATION.md OPTION1-IMPLEMENTATION.md

View File

@@ -1,397 +0,0 @@
# Android Companion App Design
> DroidClaw Android app: the eyes and hands of the AI agent. Connects to the Hono server via WebSocket, captures accessibility trees and screenshots, executes gestures on command, and supports device-initiated goals.
**Date:** 2026-02-17
**Scope:** Full v1 (all 4 phases)
**Package:** `com.thisux.droidclaw`
---
## Architecture Overview
Three independent layers with clear boundaries:
```
┌──────────────────────────────────────────────┐
│ UI Layer │
│ MainActivity + Compose (Home, Settings, Logs)│
│ Observes StateFlows from services │
├──────────────────────────────────────────────┤
│ Connection Layer │
│ ConnectionService (foreground service) │
│ ReliableWebSocket (Ktor) + CommandRouter │
├──────────────────────────────────────────────┤
│ Accessibility Layer │
│ DroidClawAccessibilityService (system svc) │
│ ScreenTreeBuilder + GestureExecutor │
│ ScreenCaptureManager (MediaProjection) │
└──────────────────────────────────────────────┘
```
- **Accessibility Layer**: System-managed service. Reads screen trees, executes gestures, captures screenshots. Runs independently of app UI.
- **Connection Layer**: Foreground service with Ktor WebSocket. Bridges accessibility to server. Handles reconnection, heartbeat, message queuing.
- **UI Layer**: Compose with bottom nav. Observes service state via `StateFlow`. Goal input, settings, logs.
---
## Project Structure
```
android/app/src/main/java/com/thisux/droidclaw/
├── DroidClawApp.kt # Application class (DataStore init)
├── MainActivity.kt # Compose host + bottom nav
├── accessibility/
│ ├── DroidClawAccessibilityService.kt # System service, tree capture
│ ├── ScreenTreeBuilder.kt # NodeInfo → UIElement list
│ └── GestureExecutor.kt # Node-first actions + dispatchGesture fallback
├── connection/
│ ├── ConnectionService.kt # Foreground service, Ktor WebSocket
│ ├── ReliableWebSocket.kt # Reconnect, heartbeat, message queue
│ └── CommandRouter.kt # Dispatches server commands → GestureExecutor
├── capture/
│ └── ScreenCaptureManager.kt # MediaProjection screenshots
├── model/
│ ├── UIElement.kt # Mirrors @droidclaw/shared types
│ ├── Protocol.kt # WebSocket message types
│ └── AppState.kt # Connection status, steps, etc.
├── data/
│ └── SettingsStore.kt # DataStore for API key, server URL
├── ui/
│ ├── screens/
│ │ ├── HomeScreen.kt # Status + goal input + live log
│ │ ├── SettingsScreen.kt # API key, server URL, battery opt
│ │ └── LogsScreen.kt # Step history
│ └── theme/ # Existing Material 3 theme
└── util/
├── BatteryOptimization.kt # OEM-specific exemption helpers
└── DeviceInfo.kt # Model, Android version, screen size
```
---
## Dependencies
| Library | Version | Purpose |
|---------|---------|---------|
| `io.ktor:ktor-client-cio` | 3.1.x | HTTP/WebSocket client (coroutine-native) |
| `io.ktor:ktor-client-websockets` | 3.1.x | WebSocket plugin for Ktor |
| `org.jetbrains.kotlinx:kotlinx-serialization-json` | 1.7.x | JSON serialization |
| `org.jetbrains.kotlinx:kotlinx-coroutines-android` | 1.9.x | Coroutines |
| `androidx.datastore:datastore-preferences` | 1.1.x | Persistent settings (API key, server URL) |
| `androidx.lifecycle:lifecycle-service` | 2.8.x | Service lifecycle |
| `androidx.navigation:navigation-compose` | 2.8.x | Bottom nav routing |
| `androidx.compose.material:material-icons-extended` | latest | Nav icons |
---
## Permissions
```xml
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_CONNECTED_DEVICE" />
<uses-permission android:name="android.permission.POST_NOTIFICATIONS" />
<uses-permission android:name="android.permission.REQUEST_IGNORE_BATTERY_OPTIMIZATIONS" />
<uses-permission android:name="android.permission.WAKE_LOCK" />
```
Plus the accessibility service declaration:
```xml
<service
android:name=".accessibility.DroidClawAccessibilityService"
android:permission="android.permission.BIND_ACCESSIBILITY_SERVICE"
android:exported="false">
<intent-filter>
<action android:name="android.accessibilityservice.AccessibilityService" />
</intent-filter>
<meta-data
android:name="android.accessibilityservice"
android:resource="@xml/accessibility_config" />
</service>
```
---
## Layer 1: Accessibility Service
### DroidClawAccessibilityService
System-managed service. Android starts/stops it based on user toggling it in Settings > Accessibility.
**State exposed via companion StateFlow** (no binding needed):
```kotlin
companion object {
val isRunning = MutableStateFlow(false)
val lastScreenTree = MutableStateFlow<List<UIElement>>(emptyList())
var instance: DroidClawAccessibilityService? = null
}
```
**Lifecycle:**
- `onServiceConnected()`: Set `isRunning = true`, store `instance`
- `onAccessibilityEvent()`: Capture events for window changes, content changes
- `onInterrupt()` / `onDestroy()`: Set `isRunning = false`, clear `instance`
### ScreenTreeBuilder
Walks `rootInActiveWindow` depth-first, extracts:
- Bounds (Rect), center coordinates (x, y)
- text, contentDescription, className, viewIdResourceName
- State flags: enabled, checked, focused, scrollable, clickable, longClickable
- Parent context (parent class, parent description)
**Output:** `List<UIElement>` matching `@droidclaw/shared` UIElement type.
**Null handling:** `rootInActiveWindow` returns null during screen transitions. Retry with exponential backoff (50ms, 100ms, 200ms) up to 3 attempts. If still null, return empty list (server uses vision fallback).
**Memory safety:** `AccessibilityNodeInfo` must be recycled. Use extension:
```kotlin
inline fun <T> AccessibilityNodeInfo.use(block: (AccessibilityNodeInfo) -> T): T {
try { return block(this) } finally { recycle() }
}
```
**Screen hash:** `computeScreenHash()` — hash of element IDs + text + centers. Used by server for stuck-loop detection.
### GestureExecutor
Node-first strategy for all actions:
| Action | Primary (node) | Fallback (gesture) |
|--------|----------------|-------------------|
| tap | `performAction(ACTION_CLICK)` on node at (x,y) | `dispatchGesture()` tap at coordinates |
| type | `performAction(ACTION_SET_TEXT)` on focused node | Character-by-character gesture taps |
| long_press | `performAction(ACTION_LONG_CLICK)` | `dispatchGesture()` hold 1000ms |
| swipe | — | `dispatchGesture()` path from start→end |
| scroll | `performAction(ACTION_SCROLL_FORWARD/BACKWARD)` on scrollable parent | Swipe gesture |
| back | `performGlobalAction(GLOBAL_ACTION_BACK)` | — |
| home | `performGlobalAction(GLOBAL_ACTION_HOME)` | — |
| notifications | `performGlobalAction(GLOBAL_ACTION_NOTIFICATIONS)` | — |
| launch | `startActivity(packageManager.getLaunchIntentForPackage())` | — |
| clear | Focus node → select all → delete | — |
| enter | `performAction(ACTION_IME_ENTER)` or keyevent KEYCODE_ENTER | — |
**Result reporting:** Each action returns `ActionResult { success: Boolean, error: String? }`.
---
## Layer 2: Connection Service
### ConnectionService
Foreground service with persistent notification.
**Lifecycle:**
1. User taps "Connect" → service starts
2. Reads API key + server URL from DataStore
3. Creates `ReliableWebSocket` and connects
4. Notification shows: "DroidClaw - Connected to server" (or "Reconnecting...")
5. Notification has "Disconnect" action button
6. Service stops when user disconnects or notification action tapped
**State exposed:**
```kotlin
companion object {
val connectionState = MutableStateFlow<ConnectionState>(ConnectionState.Disconnected)
val currentSteps = MutableStateFlow<List<AgentStep>>(emptyList())
val currentGoalStatus = MutableStateFlow<GoalStatus>(GoalStatus.Idle)
var instance: ConnectionService? = null
}
```
### ReliableWebSocket
Wraps Ktor `WebSocketSession` with reliability:
- **Connect:** `HttpClient { install(WebSockets) }``client.webSocket(serverUrl + "/ws/device")`
- **Auth handshake:** First message: `{ type: "auth", apiKey: "dc_xxx", deviceInfo: { model, android, screenWidth, screenHeight } }`
- **Wait for:** `{ type: "auth_ok", deviceId: "uuid" }` or `{ type: "auth_error" }` → close + surface error
- **Heartbeat:** Ktor WebSocket has built-in ping/pong. Configure `pingIntervalMillis = 30_000`
- **Reconnect:** On connection loss, exponential backoff: 1s → 2s → 4s → 8s → max 30s. Reset backoff on successful auth.
- **Message queue:** `Channel<String>(Channel.BUFFERED)` for outbound messages. Drained when connected, buffered when disconnected.
- **State:** Emits `ConnectionState` (Disconnected, Connecting, Connected, Error(message))
### CommandRouter
Receives JSON from WebSocket, parses, dispatches:
```
"get_screen" → ScreenTreeBuilder.capture() → send screen response
"get_screenshot"→ ScreenCaptureManager.capture() → compress, base64, send
"execute" → GestureExecutor.execute(action) → send result response
"ping" → send { type: "pong" }
"goal_started" → update UI state to running
"step" → append to currentSteps, update UI
"goal_completed"→ update UI state to completed
"goal_failed" → update UI state to failed
```
All responses include the `requestId` from the command for server-side Promise resolution.
---
## Layer 3: Screen Capture
### ScreenCaptureManager
MediaProjection-based screenshot capture.
**Setup:**
1. Request `MediaProjection` via `MediaProjectionManager.createScreenCaptureIntent()`
2. User grants consent (Android system dialog)
3. Create `VirtualDisplay``ImageReader` (RGBA_8888)
4. Keep projection alive in ConnectionService scope
**Capture flow:**
1. Server requests screenshot
2. Acquire latest `Image` from `ImageReader`
3. Convert to `Bitmap`
4. Scale to max 720px width (maintain aspect ratio)
5. Compress to JPEG quality 50
6. Return `ByteArray`
**Edge cases:**
- **Android 14+:** Per-session consent. Projection dies if user revokes or after reboot. Re-prompt on next connect.
- **FLAG_SECURE:** Returns black frame. Detect by checking if all pixels are black (sample corners). Report `error: "secure_window"` to server.
- **Projection unavailable:** Graceful degradation. Server works with accessibility tree only (vision fallback without actual screenshot).
---
## Layer 4: Data & Settings
### SettingsStore
Preferences DataStore for persistent settings:
| Key | Type | Default |
|-----|------|---------|
| `api_key` | String | `""` |
| `server_url` | String | `"wss://localhost:8080"` |
| `device_name` | String | Device model name |
| `auto_connect` | Boolean | `false` |
Exposed as `Flow<T>` for reactive UI updates.
---
## Layer 5: UI
### Navigation
Bottom nav with 3 tabs:
- **Home** (icon: `Home`) — connection status, goal input, live steps
- **Settings** (icon: `Settings`) — API key, server URL, permissions checklist
- **Logs** (icon: `History`) — past session history
### HomeScreen
```
┌─────────────────────────────┐
│ ● Connected to server │ ← status badge (green/yellow/red)
├─────────────────────────────┤
│ [Enter a goal... ] [Run] │ ← goal input + submit
├─────────────────────────────┤
│ Step 1: tap (540, 800) │ ← live step log
│ "Tapping the search icon" │
│ │
│ Step 2: type "lofi beats" │
│ "Typing the search query" │
│ │
│ ✓ Goal completed (5 steps) │ ← final status
└─────────────────────────────┘
```
- Goal input disabled when not connected or when a goal is running
- Steps stream in real-time via `ConnectionService.currentSteps` StateFlow
- Status transitions: idle → running → completed/failed
### SettingsScreen
```
┌─────────────────────────────┐
│ API Key │
│ [dc_••••••••••••••] [Edit]│
├─────────────────────────────┤
│ Server URL │
│ [wss://your-server.app ] │
├─────────────────────────────┤
│ Setup Checklist │
│ ✓ API key configured │
│ ✗ Accessibility service │ ← tap to open Android settings
│ ✗ Screen capture permission │ ← tap to grant
│ ✓ Battery optimization off │
└─────────────────────────────┘
```
- Warning cards for missing setup items
- Deep-links to Android system settings for accessibility toggle
- Battery optimization request via `ACTION_REQUEST_IGNORE_BATTERY_OPTIMIZATIONS`
### LogsScreen
- In-memory list of past sessions: goal text, step count, success/failure, timestamp
- Tap to expand → shows all steps with action + reasoning
- Clears on app restart (persistent storage is v2)
---
## WebSocket Protocol (Device Side)
### Device → Server
| Message | When |
|---------|------|
| `{ type: "auth", apiKey, deviceInfo }` | On connect |
| `{ type: "screen", requestId, elements, screenHash }` | Response to get_screen |
| `{ type: "screenshot", requestId, image }` | Response to get_screenshot |
| `{ type: "result", requestId, success, error? }` | Response to execute |
| `{ type: "goal", text }` | User submits goal on phone |
| `{ type: "pong" }` | Response to ping |
### Server → Device
| Message | When |
|---------|------|
| `{ type: "auth_ok", deviceId }` | Auth succeeded |
| `{ type: "auth_error", message }` | Auth failed |
| `{ type: "get_screen", requestId }` | Agent loop needs screen tree |
| `{ type: "get_screenshot", requestId }` | Vision fallback |
| `{ type: "execute", requestId, action }` | Execute tap/type/swipe/etc |
| `{ type: "ping" }` | Heartbeat check |
| `{ type: "step", step, action, reasoning }` | Live step update (for phone UI) |
| `{ type: "goal_started", sessionId }` | Agent loop started |
| `{ type: "goal_completed", sessionId }` | Agent loop done |
| `{ type: "goal_failed", sessionId, error }` | Agent loop failed |
---
## Battery Optimization
OEM-specific battery killers are the #2 reliability problem after Google Play policy.
**Strategy:**
1. Detect if battery optimization is disabled: `PowerManager.isIgnoringBatteryOptimizations()`
2. If not, show warning card in Settings with button to request exemption
3. For aggressive OEMs (Xiaomi, Huawei, Samsung, OnePlus, Oppo, Vivo), show additional guidance linking to dontkillmyapp.com
4. ConnectionService uses `PARTIAL_WAKE_LOCK` to prevent CPU sleep during active goals
5. Foreground service notification keeps process priority high
---
## Distribution
- **Primary:** APK sideload from droidclaw.ai
- **Secondary:** F-Droid
- **NOT Play Store:** Google Play policy (Nov 2025) explicitly prohibits autonomous AI action execution via AccessibilityService
---
## Known Limitations
1. **FLAG_SECURE apps** (banking, password managers) block both tree and screenshots
2. **WebView/Flutter** apps may return empty accessibility trees — server falls back to vision
3. **Android 14+** requires per-session MediaProjection consent
4. **Android 16 Advanced Protection** will auto-revoke accessibility for non-accessibility tools
5. **dispatchGesture()** can be detected/ignored by some apps — node-first strategy mitigates
6. **rootInActiveWindow** returns null during transitions — retry with backoff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,357 +0,0 @@
# Option 1: Web Dashboard + Backend Design
> Date: 2026-02-17
> Status: Approved
> Scope: Web (SvelteKit) + Backend (Hono.js) + Android app plan
---
## Decisions
- **Monorepo**: `web/` (SvelteKit dashboard) + `server/` (Hono.js backend) + `android/` (future)
- **Separate Hono server** for WebSocket + agent loop (independent lifecycle from dashboard)
- **SvelteKit** with node adapter for dashboard (deploy to Railway)
- **Multiple API keys** per user with labels (Better Auth apiKey plugin)
- **LLM config on dashboard only** (BYOK -- user provides their own API keys)
- **Goals sent from both** web dashboard and Android app
- **Dashboard v1**: API keys, LLM config, connected devices, goal input, step logs
- **Server runs the agent loop** (phone is eyes + hands)
- **Shared Postgres** on Railway (both services connect to same DB)
- **Build order**: web + server first, Android later
---
## Monorepo Structure
```
droidclaw/
├── src/ # existing CLI agent (kernel.ts, actions.ts, etc.)
├── web/ # SvelteKit dashboard (existing, extend)
├── server/ # Hono.js backend (WebSocket + agent loop)
├── android/ # Kotlin companion app (future)
├── packages/shared/ # shared TypeScript types
├── package.json # root
└── CLAUDE.md
```
---
## Auth & API Key System
Both apps share the same Postgres DB and the same Better Auth tables.
SvelteKit handles user-facing auth (login, signup, sessions). Hono verifies API keys from Android devices.
### Better Auth Config
Both apps use Better Auth with the `apiKey` plugin. SvelteKit adds `sveltekitCookies`, Hono adds session middleware.
```typescript
// shared pattern
plugins: [
apiKey() // built-in API key plugin
]
```
### Flow
1. User signs up/logs in on SvelteKit dashboard (existing)
2. Dashboard "API Keys" page -- user creates keys with labels (e.g., "Pixel 8", "Work Phone")
3. Better Auth's apiKey plugin handles create/list/delete
4. User copies key, pastes into Android app SharedPreferences
5. Android app connects to Hono server via WebSocket, sends API key in handshake
6. Hono calls `auth.api.verifyApiKey({ body: { key } })` -- if valid, establishes device session
7. Dashboard WebSocket connections use session cookies (user already logged in)
### Database Schema
Better Auth manages: `user`, `session`, `account`, `verification`, `api_key`
Additional tables (Drizzle):
```
llm_config
- id: text PK
- userId: text FK -> user.id
- provider: text (openai | groq | ollama | bedrock | openrouter)
- apiKey: text (encrypted)
- model: text
- createdAt: timestamp
- updatedAt: timestamp
device
- id: text PK
- userId: text FK -> user.id
- name: text
- lastSeen: timestamp
- status: text (online | offline)
- deviceInfo: jsonb (model, androidVersion, screenWidth, screenHeight)
- createdAt: timestamp
agent_session
- id: text PK
- userId: text FK -> user.id
- deviceId: text FK -> device.id
- goal: text
- status: text (running | completed | failed | cancelled)
- stepsUsed: integer
- startedAt: timestamp
- completedAt: timestamp
agent_step
- id: text PK
- sessionId: text FK -> agent_session.id
- stepNumber: integer
- screenHash: text
- action: jsonb
- reasoning: text
- result: text
- timestamp: timestamp
```
---
## Hono Server Architecture (`server/`)
```
server/
├── src/
│ ├── index.ts # Hono app + Bun.serve with WebSocket upgrade
│ ├── auth.ts # Better Auth instance (same DB, apiKey plugin)
│ ├── middleware/
│ │ ├── auth.ts # Session middleware (dashboard WebSocket)
│ │ └── api-key.ts # API key verification (Android WebSocket)
│ ├── ws/
│ │ ├── device.ts # WebSocket handler for Android devices
│ │ ├── dashboard.ts # WebSocket handler for web dashboard (live logs)
│ │ └── sessions.ts # In-memory session manager (connected devices + active loops)
│ ├── agent/
│ │ ├── loop.ts # Agent loop (adapted from kernel.ts)
│ │ ├── llm.ts # LLM provider factory (adapted from llm-providers.ts)
│ │ ├── stuck.ts # Stuck-loop detection
│ │ └── skills.ts # Multi-step skills (adapted from skills.ts)
│ ├── routes/
│ │ ├── devices.ts # GET /devices
│ │ ├── goals.ts # POST /goals
│ │ └── health.ts # GET /health
│ ├── db.ts # Drizzle instance (same Postgres)
│ └── env.ts # Environment config
├── package.json
├── tsconfig.json
└── Dockerfile
```
### Key Design Points
1. **Bun.serve() with WebSocket upgrade** -- Hono handles HTTP, Bun native WebSocket handles upgrades. No extra WS library.
2. **Two WebSocket paths:**
- `/ws/device` -- Android app connects with API key
- `/ws/dashboard` -- Web dashboard connects with session cookie
3. **sessions.ts** -- In-memory map tracking connected devices, active agent loops, dashboard subscribers.
4. **Agent loop (loop.ts)** -- Adapted from kernel.ts. Same perception/reasoning/action cycle. Sends WebSocket commands instead of ADB calls.
5. **Goal submission:**
- Dashboard: POST /goals -> starts agent loop -> streams steps via dashboard WebSocket
- Android: device sends `{ type: "goal", text: "..." }` -> same agent loop
---
## SvelteKit Dashboard (`web/`)
Follows existing patterns: remote functions (`$app/server` form/query), Svelte 5 runes, Tailwind v4, Valibot schemas.
### Route Structure
```
web/src/routes/
├── +layout.svelte # add nav bar
├── +layout.server.ts # load session for all pages
├── +page.svelte # redirect: logged in -> /dashboard, else -> /login
├── login/+page.svelte # existing
├── signup/+page.svelte # existing
├── dashboard/
│ ├── +layout.svelte # dashboard shell (sidebar nav)
│ ├── +page.svelte # overview: connected devices, quick goal input
│ ├── api-keys/
│ │ └── +page.svelte # list keys, create with label, copy, delete
│ ├── settings/
│ │ └── +page.svelte # LLM provider config (provider, API key, model)
│ └── devices/
│ ├── +page.svelte # list connected devices with status
│ └── [deviceId]/
│ └── +page.svelte # device detail: send goal, live step log
```
### Remote Functions
```
web/src/lib/api/
├── auth.remote.ts # existing (signup, login, signout, getUser)
├── api-keys.remote.ts # createKey, listKeys, deleteKey (Better Auth client)
├── settings.remote.ts # getConfig, updateConfig (LLM provider/key)
├── devices.remote.ts # listDevices (queries Hono server)
└── goals.remote.ts # submitGoal (POST to Hono server)
```
Dashboard WebSocket for live step logs connects directly to Hono server from the browser (not through SvelteKit).
---
## WebSocket Protocol
### Device -> Server (Android app sends)
```json
// Handshake
{ "type": "auth", "apiKey": "dc_xxxxx" }
// Screen tree response
{ "type": "screen", "requestId": "uuid", "elements": [], "screenshot": "base64?", "packageName": "com.app" }
// Action result
{ "type": "result", "requestId": "uuid", "success": true, "error": null, "data": null }
// Goal from phone
{ "type": "goal", "text": "open youtube and search lofi" }
// Heartbeat
{ "type": "pong" }
```
### Server -> Device (Hono sends)
```json
// Auth
{ "type": "auth_ok", "deviceId": "uuid" }
{ "type": "auth_error", "message": "invalid key" }
// Commands (all 22 actions)
{ "type": "get_screen", "requestId": "uuid" }
{ "type": "tap", "requestId": "uuid", "x": 540, "y": 1200 }
{ "type": "type", "requestId": "uuid", "text": "lofi beats" }
{ "type": "swipe", "requestId": "uuid", "x1": 540, "y1": 1600, "x2": 540, "y2": 400 }
{ "type": "enter", "requestId": "uuid" }
{ "type": "back", "requestId": "uuid" }
{ "type": "home", "requestId": "uuid" }
{ "type": "launch", "requestId": "uuid", "packageName": "com.google.android.youtube" }
// ... remaining actions follow same pattern
// Heartbeat
{ "type": "ping" }
// Goal lifecycle
{ "type": "goal_started", "sessionId": "uuid", "goal": "..." }
{ "type": "goal_completed", "sessionId": "uuid", "success": true, "stepsUsed": 12 }
```
### Server -> Dashboard (live step stream)
```json
// Device status
{ "type": "device_online", "deviceId": "uuid", "name": "Pixel 8" }
{ "type": "device_offline", "deviceId": "uuid" }
// Step stream
{ "type": "step", "sessionId": "uuid", "step": 3, "action": {}, "reasoning": "...", "screenHash": "..." }
{ "type": "goal_started", "sessionId": "uuid", "goal": "...", "deviceId": "uuid" }
{ "type": "goal_completed", "sessionId": "uuid", "success": true, "stepsUsed": 12 }
```
---
## Shared Types (`packages/shared/`)
```
packages/shared/
├── src/
│ ├── types.ts # UIElement, Bounds, Point
│ ├── commands.ts # Command, CommandResult type unions
│ ├── actions.ts # ActionDecision type (all 22 actions)
│ └── protocol.ts # WebSocket message types
├── package.json # name: "@droidclaw/shared"
└── tsconfig.json
```
Replaces duplicated types across src/, server/, web/. Android app mirrors in Kotlin via @Serializable data classes.
---
## Android App (future, plan only)
```
android/
├── app/src/main/kotlin/ai/droidclaw/companion/
│ ├── DroidClawApp.kt
│ ├── MainActivity.kt # API key input, setup checklist, status
│ ├── accessibility/
│ │ ├── DroidClawAccessibilityService.kt
│ │ ├── ScreenTreeBuilder.kt
│ │ └── GestureExecutor.kt
│ ├── capture/
│ │ └── ScreenCaptureService.kt
│ ├── connection/
│ │ ├── ConnectionService.kt # Foreground service
│ │ ├── ReliableWebSocket.kt # Reconnect, heartbeat, message queue
│ │ └── CommandRouter.kt
│ └── model/
│ ├── UIElement.kt # Mirrors @droidclaw/shared types
│ ├── Command.kt
│ └── DeviceInfo.kt
├── build.gradle.kts
└── AndroidManifest.xml
```
Follows OPTION1-IMPLEMENTATION.md structure. Not building now, but server protocol is designed for it.
---
## Deployment (Railway)
| Service | Source | Port | Notes |
|---|---|---|---|
| web | `web/` | 3000 | SvelteKit + node adapter |
| server | `server/` | 8080 | Hono + Bun.serve |
| postgres | Railway managed | 5432 | Shared by both services |
Both services get the same `DATABASE_URL`. Web calls Hono via Railway internal networking for REST. Browser connects directly to Hono's public URL for WebSocket.
---
## Data Flow
```
USER (browser) HONO SERVER PHONE (Android app)
| | |
| signs in (SvelteKit) | |
| creates API key | |
| | |
| | { type: "auth", key: "dc_xxx" }
| |<------------------------------|
| | { type: "auth_ok" } |
| |------------------------------>|
| | |
| POST /goals | |
| "open youtube, search lofi" | |
|------------------------------>| |
| | { type: "get_screen" } |
| |------------------------------>|
| | |
| | { type: "screen", elements } |
| |<------------------------------|
| | |
| | LLM: "launch youtube" |
| | |
| { type: "step", action } | { type: "launch", pkg } |
|<------------------------------|------------------------------>|
| | |
| | { success: true } |
| |<------------------------------|
| | |
| ... repeat until done ... | |
| | |
| { type: "goal_completed" } | { type: "goal_completed" } |
|<------------------------------|------------------------------>|
```