Sanju Sivalingam 610fd04818 10x improvement: vision, multi-turn memory, planning, streaming, smart filtering, logging
- Auto-detect screen resolution and compute dynamic swipe coordinates
- Detect foreground app each step via dumpsys activity
- Smart element filtering: deduplicate by position, score by relevance, compact to essentials
- Session logging with crash-safe .partial.json writes and final summary
- Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes)
- Multi-turn conversation memory: maintain full chat history across steps with trimming
- Multi-step planning: think/plan/planProgress fields on every LLM decision
- Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock)
- Comprehensive README with examples, architecture docs, and troubleshooting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:32:58 +05:30
Description
turn old phones into ai agents - give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.
2 MiB
Languages
TypeScript 48.1%
Kotlin 32.8%
HTML 9.1%
Svelte 8.6%
Shell 1%
Other 0.3%