- Auto-detect screen resolution and compute dynamic swipe coordinates - Detect foreground app each step via dumpsys activity - Smart element filtering: deduplicate by position, score by relevance, compact to essentials - Session logging with crash-safe .partial.json writes and final summary - Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes) - Multi-turn conversation memory: maintain full chat history across steps with trimming - Multi-step planning: think/plan/planProgress fields on every LLM decision - Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock) - Comprehensive README with examples, architecture docs, and troubleshooting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
87 lines
3.2 KiB
Plaintext
87 lines
3.2 KiB
Plaintext
# Android Action Kernel Configuration (TypeScript/Bun)
|
|
# Copy this file to .env and fill in your settings
|
|
# cp .env.example .env
|
|
|
|
# ===========================================
|
|
# Agent Configuration
|
|
# ===========================================
|
|
MAX_STEPS=30 # Maximum steps before stopping (30 for complex multi-app tasks)
|
|
STEP_DELAY=2 # Seconds to wait between steps
|
|
MAX_RETRIES=3 # Retries on ADB/network failures
|
|
STUCK_THRESHOLD=3 # Steps before stuck-loop recovery kicks in
|
|
|
|
# ===========================================
|
|
# Vision Mode
|
|
# ===========================================
|
|
# "off" — never capture screenshots
|
|
# "fallback" — only when accessibility tree is empty (default)
|
|
# "always" — send screenshot every step (uses more tokens, best accuracy)
|
|
VISION_MODE=fallback
|
|
|
|
# ===========================================
|
|
# Smart Element Filtering
|
|
# ===========================================
|
|
MAX_ELEMENTS=40 # Max UI elements sent to LLM (scored & ranked)
|
|
|
|
# ===========================================
|
|
# Session Logging
|
|
# ===========================================
|
|
LOG_DIR=logs # Directory for session JSON logs
|
|
|
|
# ===========================================
|
|
# Multi-turn Memory
|
|
# ===========================================
|
|
MAX_HISTORY_STEPS=10 # How many past steps to keep in conversation context
|
|
|
|
# ===========================================
|
|
# Streaming Responses
|
|
# ===========================================
|
|
STREAMING_ENABLED=true # Stream LLM responses (shows progress dots)
|
|
|
|
# ===========================================
|
|
# LLM Provider: "groq", "openai", "bedrock", or "openrouter"
|
|
# ===========================================
|
|
LLM_PROVIDER=groq
|
|
|
|
# ===========================================
|
|
# Groq Configuration (Free tier available)
|
|
# Get your key at: https://console.groq.com
|
|
# ===========================================
|
|
GROQ_API_KEY=gsk_your_key_here
|
|
GROQ_MODEL=llama-3.3-70b-versatile
|
|
# Other models: llama-3.1-8b-instant (faster, higher rate limits)
|
|
|
|
# ===========================================
|
|
# OpenAI Configuration
|
|
# Get your key at: https://platform.openai.com
|
|
# ===========================================
|
|
OPENAI_API_KEY=sk-your_key_here
|
|
OPENAI_MODEL=gpt-4o
|
|
# Other models: gpt-4o-mini (faster, cheaper)
|
|
|
|
# ===========================================
|
|
# AWS Bedrock Configuration
|
|
# Uses AWS credential chain (run 'aws configure' first)
|
|
# ===========================================
|
|
AWS_REGION=us-east-1
|
|
BEDROCK_MODEL=us.meta.llama3-3-70b-instruct-v1:0
|
|
# Other models:
|
|
# anthropic.claude-3-sonnet-20240229-v1:0
|
|
# anthropic.claude-3-haiku-20240307-v1:0
|
|
# meta.llama3-8b-instruct-v1:0
|
|
|
|
# ===========================================
|
|
# OpenRouter Configuration (via Vercel AI SDK)
|
|
# Access 200+ models through a single API
|
|
# Get your key at: https://openrouter.ai/keys
|
|
# ===========================================
|
|
OPENROUTER_API_KEY=sk-or-v1-your_key_here
|
|
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
|
|
# Popular models:
|
|
# anthropic/claude-3.5-sonnet (best reasoning)
|
|
# openai/gpt-4o (multimodal)
|
|
# google/gemini-2.0-flash-001 (fast + cheap)
|
|
# meta-llama/llama-3.3-70b-instruct (open source)
|
|
# mistralai/mistral-large-latest (European)
|
|
# deepseek/deepseek-chat (cost efficient)
|