Files
droidclaw/android-action-kernel/.env.example
Sanju Sivalingam 610fd04818 10x improvement: vision, multi-turn memory, planning, streaming, smart filtering, logging
- Auto-detect screen resolution and compute dynamic swipe coordinates
- Detect foreground app each step via dumpsys activity
- Smart element filtering: deduplicate by position, score by relevance, compact to essentials
- Session logging with crash-safe .partial.json writes and final summary
- Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes)
- Multi-turn conversation memory: maintain full chat history across steps with trimming
- Multi-step planning: think/plan/planProgress fields on every LLM decision
- Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock)
- Comprehensive README with examples, architecture docs, and troubleshooting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:32:58 +05:30

87 lines
3.2 KiB
Plaintext

# Android Action Kernel Configuration (TypeScript/Bun)
# Copy this file to .env and fill in your settings
# cp .env.example .env
# ===========================================
# Agent Configuration
# ===========================================
MAX_STEPS=30 # Maximum steps before stopping (30 for complex multi-app tasks)
STEP_DELAY=2 # Seconds to wait between steps
MAX_RETRIES=3 # Retries on ADB/network failures
STUCK_THRESHOLD=3 # Steps before stuck-loop recovery kicks in
# ===========================================
# Vision Mode
# ===========================================
# "off" — never capture screenshots
# "fallback" — only when accessibility tree is empty (default)
# "always" — send screenshot every step (uses more tokens, best accuracy)
VISION_MODE=fallback
# ===========================================
# Smart Element Filtering
# ===========================================
MAX_ELEMENTS=40 # Max UI elements sent to LLM (scored & ranked)
# ===========================================
# Session Logging
# ===========================================
LOG_DIR=logs # Directory for session JSON logs
# ===========================================
# Multi-turn Memory
# ===========================================
MAX_HISTORY_STEPS=10 # How many past steps to keep in conversation context
# ===========================================
# Streaming Responses
# ===========================================
STREAMING_ENABLED=true # Stream LLM responses (shows progress dots)
# ===========================================
# LLM Provider: "groq", "openai", "bedrock", or "openrouter"
# ===========================================
LLM_PROVIDER=groq
# ===========================================
# Groq Configuration (Free tier available)
# Get your key at: https://console.groq.com
# ===========================================
GROQ_API_KEY=gsk_your_key_here
GROQ_MODEL=llama-3.3-70b-versatile
# Other models: llama-3.1-8b-instant (faster, higher rate limits)
# ===========================================
# OpenAI Configuration
# Get your key at: https://platform.openai.com
# ===========================================
OPENAI_API_KEY=sk-your_key_here
OPENAI_MODEL=gpt-4o
# Other models: gpt-4o-mini (faster, cheaper)
# ===========================================
# AWS Bedrock Configuration
# Uses AWS credential chain (run 'aws configure' first)
# ===========================================
AWS_REGION=us-east-1
BEDROCK_MODEL=us.meta.llama3-3-70b-instruct-v1:0
# Other models:
# anthropic.claude-3-sonnet-20240229-v1:0
# anthropic.claude-3-haiku-20240307-v1:0
# meta.llama3-8b-instruct-v1:0
# ===========================================
# OpenRouter Configuration (via Vercel AI SDK)
# Access 200+ models through a single API
# Get your key at: https://openrouter.ai/keys
# ===========================================
OPENROUTER_API_KEY=sk-or-v1-your_key_here
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
# Popular models:
# anthropic/claude-3.5-sonnet (best reasoning)
# openai/gpt-4o (multimodal)
# google/gemini-2.0-flash-001 (fast + cheap)
# meta-llama/llama-3.3-70b-instruct (open source)
# mistralai/mistral-large-latest (European)
# deepseek/deepseek-chat (cost efficient)