New --flow mode executes scripted YAML steps without LLM, mapping 17
commands (tap, type, swipe, scroll, etc.) to existing actions. Element
finding uses accessibility tree text/hint/id matching.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New actions: open_url, switch_app, notifications, pull_file, push_file, keyevent, open_settings
- Workflow system: runWorkflow() for multi-app sub-goal sequences with --workflow CLI flag
- Export runAgent() with {success, stepsUsed} return for workflow integration
- Fix clipboard_set shell escaping (single-quote wrapping matching skills.ts)
- Improve type action escaping for backticks, $, !, ?, brackets, braces
- Move parseJsonResponse to llm-providers.ts and export it
- Update SYSTEM_PROMPT and Zod schema for 22 total actions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes the unnecessary nesting — all source, config, and docs now live
at the project root for simpler paths and commands.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-detect screen resolution and compute dynamic swipe coordinates
- Detect foreground app each step via dumpsys activity
- Smart element filtering: deduplicate by position, score by relevance, compact to essentials
- Session logging with crash-safe .partial.json writes and final summary
- Real multimodal vision: send base64 screenshots to LLMs (off/fallback/always modes)
- Multi-turn conversation memory: maintain full chat history across steps with trimming
- Multi-step planning: think/plan/planProgress fields on every LLM decision
- Streaming responses for all 4 providers (OpenAI, Groq, OpenRouter, Bedrock)
- Comprehensive README with examples, architecture docs, and troubleshooting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>