- Auto-detect screen resolution and compute dynamic swipe coordinates - Detect foreground app each step via dumpsys activity - Smart element filtering: deduplicate by position, score by relevance, compact to essentials - Session logging with crash-safe .partial.json writes and final summary - Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes) - Multi-turn conversation memory: maintain full chat history across steps with trimming - Multi-step planning: think/plan/planProgress fields on every LLM decision - Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock) - Comprehensive README with examples, architecture docs, and troubleshooting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.2 KiB
3.2 KiB