10x improvement: vision, multi-turn memory, planning, streaming, smart filtering, logging

- Auto-detect screen resolution and compute dynamic swipe coordinates
- Detect foreground app each step via dumpsys activity
- Smart element filtering: deduplicate by position, score by relevance, compact to essentials
- Session logging with crash-safe .partial.json writes and final summary
- Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes)
- Multi-turn conversation memory: maintain full chat history across steps with trimming
- Multi-step planning: think/plan/planProgress fields on every LLM decision
- Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock)
- Comprehensive README with examples, architecture docs, and troubleshooting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Sanju Sivalingam
2026-02-06 10:32:58 +05:30
parent a23c94ddd6
commit 610fd04818
9 changed files with 1512 additions and 215 deletions

View File

@@ -11,9 +11,32 @@ MAX_RETRIES=3 # Retries on ADB/network failures
STUCK_THRESHOLD=3 # Steps before stuck-loop recovery kicks in
# ===========================================
# Vision Fallback (when accessibility tree is empty)
# Vision Mode
# ===========================================
VISION_ENABLED=true # Auto-capture screenshot when UI elements not found
# "off" — never capture screenshots
# "fallback" — only when accessibility tree is empty (default)
# "always" — send screenshot every step (uses more tokens, best accuracy)
VISION_MODE=fallback
# ===========================================
# Smart Element Filtering
# ===========================================
MAX_ELEMENTS=40 # Max UI elements sent to LLM (scored & ranked)
# ===========================================
# Session Logging
# ===========================================
LOG_DIR=logs # Directory for session JSON logs
# ===========================================
# Multi-turn Memory
# ===========================================
MAX_HISTORY_STEPS=10 # How many past steps to keep in conversation context
# ===========================================
# Streaming Responses
# ===========================================
STREAMING_ENABLED=true # Stream LLM responses (shows progress dots)
# ===========================================
# LLM Provider: "groq", "openai", "bedrock", or "openrouter"