10x improvement: vision, multi-turn memory, planning, streaming, smart filtering, logging
- Auto-detect screen resolution and compute dynamic swipe coordinates - Detect foreground app each step via dumpsys activity - Smart element filtering: deduplicate by position, score by relevance, compact to essentials - Session logging with crash-safe .partial.json writes and final summary - Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes) - Multi-turn conversation memory: maintain full chat history across steps with trimming - Multi-step planning: think/plan/planProgress fields on every LLM decision - Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock) - Comprehensive README with examples, architecture docs, and troubleshooting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -11,9 +11,32 @@ MAX_RETRIES=3 # Retries on ADB/network failures
|
||||
STUCK_THRESHOLD=3 # Steps before stuck-loop recovery kicks in
|
||||
|
||||
# ===========================================
|
||||
# Vision Fallback (when accessibility tree is empty)
|
||||
# Vision Mode
|
||||
# ===========================================
|
||||
VISION_ENABLED=true # Auto-capture screenshot when UI elements not found
|
||||
# "off" — never capture screenshots
|
||||
# "fallback" — only when accessibility tree is empty (default)
|
||||
# "always" — send screenshot every step (uses more tokens, best accuracy)
|
||||
VISION_MODE=fallback
|
||||
|
||||
# ===========================================
|
||||
# Smart Element Filtering
|
||||
# ===========================================
|
||||
MAX_ELEMENTS=40 # Max UI elements sent to LLM (scored & ranked)
|
||||
|
||||
# ===========================================
|
||||
# Session Logging
|
||||
# ===========================================
|
||||
LOG_DIR=logs # Directory for session JSON logs
|
||||
|
||||
# ===========================================
|
||||
# Multi-turn Memory
|
||||
# ===========================================
|
||||
MAX_HISTORY_STEPS=10 # How many past steps to keep in conversation context
|
||||
|
||||
# ===========================================
|
||||
# Streaming Responses
|
||||
# ===========================================
|
||||
STREAMING_ENABLED=true # Stream LLM responses (shows progress dots)
|
||||
|
||||
# ===========================================
|
||||
# LLM Provider: "groq", "openai", "bedrock", or "openrouter"
|
||||
|
||||
Reference in New Issue
Block a user