Clean up for public release: remove logs, debug artifacts, and future plans from tracking; rewrite readme minimal and lowercase

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 19:58:47 +05:30
parent 389ac81c98
commit 8b9f0a4e6e
39 changed files with 75 additions and 8763 deletions
--- a/README.md
+++ b/README.md
@@ -1,237 +1,142 @@
-# DroidClaw
+# droidclaw

-Give it a goal in plain English. It figures out what to tap, type, and swipe on your Android phone to get it done.
+ai agent that controls your android phone. give it a goal in plain english — it figures out what to tap, type, and swipe.

-It reads the screen (accessibility tree + optional screenshot), sends it to an LLM, gets back a JSON action like `{"action": "tap", "coordinates": [540, 1200]}`, executes it via ADB, and repeats. Perception → reasoning → action, in a loop.
-
-## See it work
+reads the screen (accessibility tree + optional screenshot), asks an llm what to do, executes via adb, repeats.

 ```
 $ bun run src/kernel.ts
-Enter your goal: Open YouTube and search for "lofi hip hop"
+enter your goal: open youtube and search for "lofi hip hop"

--- Step 1/30 ---
-Think: I'm on the home screen. I should launch YouTube directly.
-Decision: launch — Open YouTube app (842ms)
+--- step 1/30 ---
+think: i'm on the home screen. launching youtube.
+action: launch (842ms)

--- Step 2/30 ---
-Think: YouTube is open. I need to tap the search icon.
-Decision: tap — Tap search icon at top right (623ms)
+--- step 2/30 ---
+think: youtube is open. tapping search icon.
+action: tap (623ms)

--- Step 3/30 ---
-Think: Search field is focused and ready.
-Decision: type — Type "lofi hip hop" (501ms)
+--- step 3/30 ---
+think: search field focused.
+action: type "lofi hip hop" (501ms)

--- Step 4/30 ---
-Decision: enter — Submit the search (389ms)
+--- step 4/30 ---
+action: enter (389ms)

--- Step 5/30 ---
-Think: Search results showing lofi hip hop videos. Done.
-Decision: done (412ms)
-
-Task completed successfully.
+--- step 5/30 ---
+think: search results showing. done.
+action: done (412ms)
 ```

-## Quick start
+## setup

-You need: **Bun**, **ADB**, and an **API key** for any LLM provider.
+you need **bun**, **adb**, and an api key for any llm provider.

 ```bash
-# Install Bun
-curl -fsSL https://bun.sh/install | bash
-
-# Install ADB (macOS)
-brew install android-platform-tools
-
-# Clone and setup
 bun install
 cp .env.example .env
 ```

-Edit `.env` — fastest way to start is with Groq (free tier):
+edit `.env` — fastest way to start is with groq (free tier):

 ```bash
 LLM_PROVIDER=groq
 GROQ_API_KEY=gsk_your_key_here
 ```

-Get your key at [console.groq.com](https://console.groq.com).
-
-### Connect your phone
-
-Enable USB Debugging: Settings → About Phone → tap "Build Number" 7 times → Developer Options → USB Debugging.
+connect your phone (usb debugging on):

 ```bash
 adb devices   # should show your device
-```
-
-### Run it
-
-```bash
 bun run src/kernel.ts
 ```

-Type a goal and watch your phone do it.
+## workflows

-## Workflows
-
-Workflows chain multiple goals across apps. Way more powerful than single goals.
+chain goals across apps:

 ```bash
 bun run src/kernel.ts --workflow examples/weather-to-whatsapp.json
 ```

-### 34 ready-to-use workflows included
-
-**Messaging** — whatsapp-reply, whatsapp-broadcast, whatsapp-to-email, telegram-channel-digest, telegram-send-message, slack-standup, slack-check-messages, email-digest, email-reply, translate-and-reply
-
-**Social Media** — social-media-post (Twitter + LinkedIn), social-media-engage, instagram-post-check
-
-**Productivity** — morning-briefing, calendar-create-event, notes-capture, notification-cleanup, do-not-disturb, github-check-prs, screenshot-share-slack
-
-**Research** — google-search-report, news-roundup, multi-app-research, price-comparison
-
-**Lifestyle** — food-order, uber-ride, maps-commute, check-flight-status, spotify-playlist, youtube-watch-later, fitness-log, expense-tracker, wifi-password-share, weather-to-whatsapp
-
-Each workflow is a simple JSON file:
+each workflow is a simple json file:

 ```json
 {
-  "name": "Slack Daily Standup",
+  "name": "slack standup",
  "steps": [
    {
      "app": "com.Slack",
-      "goal": "Open #standup channel, type the standup message and send it.",
-      "formData": {
-        "Message": "Yesterday: Finished API integration\nToday: Writing tests\nBlockers: None"
-      }
+      "goal": "open #standup channel, type the message and send it",
+      "formData": { "Message": "yesterday: api integration\ntoday: tests\nblockers: none" }
    }
  ]
 }
 ```

-## What it can do
+35 ready-to-use workflows in `examples/` — messaging, social media, productivity, research, lifestyle.

-22 actions + 6 multi-step skills. Some example goals:
+## deterministic flows

-```
-Open WhatsApp and send "I'm running late" to Mom
-Turn on WiFi
-Search Google for "best restaurants near me"
-Open YouTube and play the first trending video
-Copy tracking number from Amazon and search it on Google
-```
-
-## LLM providers
-
-Pick one. They all work.
-
-| Provider | Cost | Vision | Best for |
-|---|---|---|---|
-| **Groq** | Free tier | No | Getting started fast |
-| **OpenRouter** | Pay per token | Yes | 200+ models (Claude, Gemini, etc.) |
-| **OpenAI** | Pay per token | Yes | Best accuracy with GPT-4o |
-| **AWS Bedrock** | Pay per token | Yes | Enterprise / Claude on AWS |
+for repeatable tasks that don't need ai, use yaml flows:

 ```bash
-# Groq (recommended to start)
-LLM_PROVIDER=groq
-GROQ_API_KEY=gsk_your_key_here
-GROQ_MODEL=llama-3.3-70b-versatile
-
-# OpenRouter
-LLM_PROVIDER=openrouter
-OPENROUTER_API_KEY=sk-or-v1-your_key_here
-OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
-
-# OpenAI
-LLM_PROVIDER=openai
-OPENAI_API_KEY=sk-your_key_here
-OPENAI_MODEL=gpt-4o
-
-# AWS Bedrock (uses aws configure credentials)
-LLM_PROVIDER=bedrock
-AWS_REGION=us-east-1
-BEDROCK_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
+bun run src/kernel.ts --flow examples/flows/send-whatsapp.yaml
 ```

-## Config
+no llm calls, just step-by-step adb commands.

-All in `.env`. Here's what matters:
+## providers

-| Setting | Default | What it does |
+| provider | cost | vision | notes |
+|---|---|---|---|
+| groq | free tier | no | fastest to start |
+| openrouter | per token | yes | 200+ models |
+| openai | per token | yes | gpt-4o |
+| bedrock | per token | yes | claude on aws |
+
+## config
+
+all in `.env`:
+
+| key | default | what |
 |---|---|---|
-| `MAX_STEPS` | 30 | Steps before giving up |
-| `STEP_DELAY` | 2 | Seconds between actions (UI settle time) |
-| `STUCK_THRESHOLD` | 3 | Steps before stuck-loop recovery kicks in |
-| `VISION_MODE` | fallback | `off` / `fallback` (screenshot when accessibility tree is empty) / `always` |
-| `MAX_ELEMENTS` | 40 | UI elements sent to LLM (scored & ranked) |
-| `MAX_HISTORY_STEPS` | 10 | Past steps kept in conversation context |
-| `STREAMING_ENABLED` | true | Stream LLM responses token-by-token |
-| `LOG_DIR` | logs | Session logs directory |
+| `MAX_STEPS` | 30 | steps before giving up |
+| `STEP_DELAY` | 2 | seconds between actions |
+| `STUCK_THRESHOLD` | 3 | steps before stuck recovery |
+| `VISION_MODE` | fallback | `off` / `fallback` / `always` |
+| `MAX_ELEMENTS` | 40 | ui elements sent to llm |

-## How it works
+## how it works

-Each step: dump accessibility tree → score & filter elements → optionally screenshot → send to LLM → execute action → log → repeat.
+each step: dump accessibility tree → filter elements → send to llm → execute action → repeat.

-The LLM thinks before acting:
+the llm thinks before acting — returns `{ think, plan, action }`. if the screen doesn't change for 3 steps, stuck recovery kicks in. when the accessibility tree is empty (webviews, flutter), it falls back to screenshots.

-```json
-{
-  "think": "Search field is focused. I should type the query.",
-  "plan": ["Launch YouTube", "Tap search", "Type query", "Submit"],
-  "planProgress": "Step 3: typing query",
-  "action": "type",
-  "text": "lofi hip hop"
-}
-```
-
-**Stuck detection** — if the screen doesn't change for 3 steps, the kernel tells the LLM to try a different approach.
-
-**Vision fallback** — when the accessibility tree is empty (games, WebViews, Flutter), it falls back to sending a screenshot.
-
-**Conversation memory** — the LLM sees its full history of observations and decisions, so it won't repeat itself.
-
-## Architecture
+## source

 ```
 src/
-  kernel.ts          — Main agent loop
-  actions.ts         — 22 actions + ADB retry logic
-  skills.ts          — 6 multi-step skills (read_screen, submit_message, etc.)
-  workflow.ts        — Workflow orchestration engine
-  llm-providers.ts   — 4 LLM providers + system prompt
-  sanitizer.ts       — Accessibility XML parser + smart filtering
-  config.ts          — Env config
-  constants.ts       — Keycodes, coordinates, defaults
-  logger.ts          — Session logging
+  kernel.ts          main loop
+  actions.ts         22 actions + adb retry
+  skills.ts          6 multi-step skills
+  workflow.ts        workflow orchestration
+  flow.ts            yaml flow runner
+  llm-providers.ts   4 providers + system prompt
+  sanitizer.ts       accessibility xml parser
+  config.ts          env config
+  constants.ts       keycodes, coordinates
+  logger.ts          session logging
 ```

-## Commands
+## troubleshooting

-```bash
-bun install              # Install dependencies
-bun run src/kernel.ts    # Start the agent
-bun run build            # Compile to dist/
-bun run typecheck        # Type-check (tsc --noEmit)
-```
+**"adb: command not found"** — install adb or set `ADB_PATH` in `.env`

-## Troubleshooting
+**"no devices found"** — check usb debugging is on, tap "allow" on the phone

-**"adb: command not found"** — Install ADB or set `ADB_PATH=/full/path/to/adb` in `.env`.
+**agent repeating** — stuck detection handles this. if it persists, use a better model

-**"no devices found"** — Run `adb devices`. Check USB debugging is enabled and you tapped "Allow" on the phone.
+## license

-**Agent keeps repeating the same action** — Stuck loop detection handles this automatically. If it persists, try a more capable model (GPT-4o, Claude).
-
-**High token usage** — Set `VISION_MODE=off`, lower `MAX_ELEMENTS` to 20, lower `MAX_HISTORY_STEPS` to 5, or use a cheaper model.
-
-## Docs
-
- [Use Cases](docs/use-cases.md) — 50+ examples across 15 categories
- [ADB Commands](docs/adb-commands.md) — 750+ shell commands reference
- [Capabilities & Limitations](docs/capabilities-and-limitations.md)
-
-## License
-
-MIT
+mit