Commit Graph

24 Commits

Author SHA1 Message Date
Sanju Sivalingam
8ef15af97a debug: add logging to device auth for hash mismatch investigation 2026-02-18 12:27:56 +05:30
Sanju Sivalingam
05d1cc657d fix code 2026-02-18 12:01:23 +05:30
Sanju Sivalingam
d03be7365e debug: add logging to session middleware for auth investigation 2026-02-18 11:59:59 +05:30
Sanju Sivalingam
68ca812267 revert(server): use direct DB queries for all auth validation
Reverts middleware and dashboard WS to direct DB session lookups.
Replaces auth.api.verifyApiKey in device WS with direct DB query
using SHA-256 hash matching, removing dependency on BETTER_AUTH_SECRET
for auth validation.
2026-02-18 11:46:48 +05:30
Sanju Sivalingam
a1ec1ac731 fix(agent): use device screen dimensions for scroll/swipe coordinates
Swipe coordinates were hardcoded for 1080x2400 screens, causing scrolls
to fail on devices with different resolutions. Now reads screenWidth and
screenHeight from DeviceInfo and computes coordinates proportionally.
2026-02-18 10:48:37 +05:30
Sanju Sivalingam
81d78684a5 refactor: use better-auth api for session validation in server middleware and websocket 2026-02-18 10:38:15 +05:30
Sanju Sivalingam
792b42974f feat(agent): implement server-side multi-step skills
Skills (copy_visible_text, find_and_tap, submit_message, read_screen,
wait_for_content, compose_email) were CLI-only using direct ADB. The
server prompt advertised them but they silently failed when chosen.

Now intercepted in the agent loop before actionToCommand() and executed
server-side using existing WebSocket primitives (get_screen, tap, swipe,
clipboard_set). Each skill replaces 3-8 LLM calls with deterministic
server-side logic.
2026-02-18 00:58:59 +05:30
Sanju Sivalingam
db995e4913 fix(agent): prevent stuck loop by adding action history to LLM prompt
The UI agent had no memory of previous actions — each step was a fresh
single-shot LLM call. After typing and sending a message, the LLM saw
an empty text field and retyped the message in a loop.

- Add RECENT_ACTIONS (last 5 actions with text/result) to user prompt
- Add chat app completion detection rule to dynamic prompt
- Add send-success hints for WhatsApp and Messages apps
- Add git convention to CLAUDE.md (no co-author lines)
2026-02-18 00:53:13 +05:30
Sanju Sivalingam
9193b02d36 fix(agent): address code review issues
- Add empty goal guard in parser (returns done instead of passthrough)
- Replace `as any` casts in pipeline.ts with proper ActionDecision types
- Add runtime type guards for untrusted LLM output in classifier
- Add intent action to dynamic prompt so UI agent can fire intents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 00:32:14 +05:30
Sanju Sivalingam
3769b21ed1 refactor(agent): delete preprocessor.ts (replaced by parser.ts) 2026-02-18 00:28:50 +05:30
Sanju Sivalingam
d5c3466554 feat(agent): wire intent-first pipeline into all entrypoints
Replace preprocessor+runAgentLoop with runPipeline in both device.ts
(WebSocket) and goals.ts (REST). The pipeline orchestrates: deterministic
parser (stage 1) -> LLM classifier (stage 2) -> lean UI agent (stage 3).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 00:28:13 +05:30
Sanju Sivalingam
18b8509081 feat(agent): add pipeline mode with dynamic prompts to agent loop
When pipelineMode is enabled in AgentLoopOptions, the loop uses
buildDynamicPrompt() with per-screen context (editable fields,
scrollable elements, app hints, stuck state) instead of the static
mega-prompt. Legacy mode (default) is unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 00:24:25 +05:30
Sanju Sivalingam
3f389c5de6 feat(agent): add dynamic prompt builder for Stage 3 UI agent 2026-02-18 00:22:24 +05:30
Sanju Sivalingam
91a828452b feat(agent): add Stage 2 LLM goal classifier 2026-02-18 00:15:56 +05:30
Sanju Sivalingam
5dd199e0b8 feat(agent): add Stage 1 deterministic goal parser 2026-02-18 00:09:15 +05:30
Sanju Sivalingam
122bf87e72 feat(agent): add app-specific hints registry 2026-02-18 00:07:24 +05:30
Sanju Sivalingam
e300f04e13 feat: installed apps, stop goal, auth fixes, remote commands
- Android: fetch installed apps via PackageManager, send to server on connect
- Android: add QUERY_ALL_PACKAGES permission for full app visibility
- Android: fix duplicate Intent import, increase accessibility retry window
- Android: default server URL to ws:// instead of wss://
- Server: store installed apps in device metadata JSONB
- Server: inject installed apps context into LLM prompt
- Server: preprocessor resolves app names from device's actual installed apps
- Server: add POST /goals/stop endpoint with AbortController cancellation
- Server: rewrite session middleware to direct DB token lookup
- Server: goals route fetches user's saved LLM config from DB
- Web: show installed apps in device detail Overview tab with search
- Web: add Stop button for running goals
- Web: replace API routes with remote commands (submitGoal, stopGoal)
- Web: add error display for goal submission failures
- Shared: add InstalledApp type and apps message to protocol
2026-02-17 22:50:18 +05:30
Sanju Sivalingam
fae5fd3534 fix: goals route now finds devices by persistent DB ID, not connection UUID 2026-02-17 21:22:43 +05:30
Sanju Sivalingam
bf92ff4742 feat: handle heartbeat messages, update battery in DB + dashboard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 21:01:06 +05:30
Sanju Sivalingam
c395f9d83e feat: add DB persistence, real-time WebSocket, goal preprocessor, and Android companion app
- Add device/session/step DB persistence in server agent loop
- Add goal preprocessor for compound goals (e.g., "open YouTube and search X")
- Add step-level logging to agent loop
- Fix dashboard WebSocket auth (direct DB token lookup instead of auth.api)
- Fix web layout to use locals.session.token instead of cookie
- Add dashboard-ws.svelte.ts WebSocket store with auto-reconnect
- Rewrite devices page with direct DB queries and real-time updates
- Add device detail page with live step display and session history
- Add Android companion app resources, themes, and screen capture consent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 20:12:41 +05:30
Sanju Sivalingam
4c8241c964 feat: add agent loop with LLM integration and stuck detection
Server-side agent loop that adapts the CLI kernel to work over WebSocket.
Three new modules: stuck detection, LLM provider abstraction (OpenAI/Groq/
OpenRouter), and the main perception-reasoning-action loop. Also wires up
the goals route to start agent loops with duplicate-device protection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:27:26 +05:30
Sanju Sivalingam
577c195862 feat: add REST routes for devices, goals, and health
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:21:11 +05:30
Sanju Sivalingam
8fe3ad9926 feat: add WebSocket handlers for device and dashboard connections
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:17:29 +05:30
Sanju Sivalingam
bc014fd587 feat: scaffold Hono server with auth and health check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:07:19 +05:30