10x improvement: vision, multi-turn memory, planning, streaming, smart filtering, logging

- Auto-detect screen resolution and compute dynamic swipe coordinates - Detect foreground app each step via dumpsys activity - Smart element filtering: deduplicate by position, score by relevance, compact to essentials - Session logging with crash-safe .partial.json writes and final summary - Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes) - Multi-turn conversation memory: maintain full chat history across steps with trimming - Multi-step planning: think/plan/planProgress fields on every LLM decision - Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock) - Comprehensive README with examples, architecture docs, and troubleshooting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:32:58 +05:30
parent a23c94ddd6
commit 610fd04818
9 changed files with 1512 additions and 215 deletions
@@ -11,9 +11,32 @@ MAX_RETRIES=3             # Retries on ADB/network failures
 STUCK_THRESHOLD=3         # Steps before stuck-loop recovery kicks in

 # ===========================================
-# Vision Fallback (when accessibility tree is empty)
+# Vision Mode
 # ===========================================
-VISION_ENABLED=true       # Auto-capture screenshot when UI elements not found
+# "off"      — never capture screenshots
+# "fallback" — only when accessibility tree is empty (default)
+# "always"   — send screenshot every step (uses more tokens, best accuracy)
+VISION_MODE=fallback
+
+# ===========================================
+# Smart Element Filtering
+# ===========================================
+MAX_ELEMENTS=40           # Max UI elements sent to LLM (scored & ranked)
+
+# ===========================================
+# Session Logging
+# ===========================================
+LOG_DIR=logs              # Directory for session JSON logs
+
+# ===========================================
+# Multi-turn Memory
+# ===========================================
+MAX_HISTORY_STEPS=10      # How many past steps to keep in conversation context
+
+# ===========================================
+# Streaming Responses
+# ===========================================
+STREAMING_ENABLED=true    # Stream LLM responses (shows progress dots)

 # ===========================================
 # LLM Provider: "groq", "openai", "bedrock", or "openrouter"