143 lines
3.4 KiB
Markdown
143 lines
3.4 KiB
Markdown
# droidclaw
|
|
|
|
ai agent that controls your android phone. give it a goal in plain english — it figures out what to tap, type, and swipe.
|
|
|
|
reads the screen (accessibility tree + optional screenshot), asks an llm what to do, executes via adb, repeats.
|
|
|
|
```
|
|
$ bun run src/kernel.ts
|
|
enter your goal: open youtube and search for "lofi hip hop"
|
|
|
|
--- step 1/30 ---
|
|
think: i'm on the home screen. launching youtube.
|
|
action: launch (842ms)
|
|
|
|
--- step 2/30 ---
|
|
think: youtube is open. tapping search icon.
|
|
action: tap (623ms)
|
|
|
|
--- step 3/30 ---
|
|
think: search field focused.
|
|
action: type "lofi hip hop" (501ms)
|
|
|
|
--- step 4/30 ---
|
|
action: enter (389ms)
|
|
|
|
--- step 5/30 ---
|
|
think: search results showing. done.
|
|
action: done (412ms)
|
|
```
|
|
|
|
## setup
|
|
|
|
you need **bun**, **adb**, and an api key for any llm provider.
|
|
|
|
```bash
|
|
bun install
|
|
cp .env.example .env
|
|
```
|
|
|
|
edit `.env` — fastest way to start is with groq (free tier):
|
|
|
|
```bash
|
|
LLM_PROVIDER=groq
|
|
GROQ_API_KEY=gsk_your_key_here
|
|
```
|
|
|
|
connect your phone (usb debugging on):
|
|
|
|
```bash
|
|
adb devices # should show your device
|
|
bun run src/kernel.ts
|
|
```
|
|
|
|
## workflows
|
|
|
|
chain goals across apps:
|
|
|
|
```bash
|
|
bun run src/kernel.ts --workflow examples/weather-to-whatsapp.json
|
|
```
|
|
|
|
each workflow is a simple json file:
|
|
|
|
```json
|
|
{
|
|
"name": "slack standup",
|
|
"steps": [
|
|
{
|
|
"app": "com.Slack",
|
|
"goal": "open #standup channel, type the message and send it",
|
|
"formData": { "Message": "yesterday: api integration\ntoday: tests\nblockers: none" }
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
35 ready-to-use workflows in `examples/` — messaging, social media, productivity, research, lifestyle.
|
|
|
|
## deterministic flows
|
|
|
|
for repeatable tasks that don't need ai, use yaml flows:
|
|
|
|
```bash
|
|
bun run src/kernel.ts --flow examples/flows/send-whatsapp.yaml
|
|
```
|
|
|
|
no llm calls, just step-by-step adb commands.
|
|
|
|
## providers
|
|
|
|
| provider | cost | vision | notes |
|
|
|---|---|---|---|
|
|
| groq | free tier | no | fastest to start |
|
|
| openrouter | per token | yes | 200+ models |
|
|
| openai | per token | yes | gpt-4o |
|
|
| bedrock | per token | yes | claude on aws |
|
|
|
|
## config
|
|
|
|
all in `.env`:
|
|
|
|
| key | default | what |
|
|
|---|---|---|
|
|
| `MAX_STEPS` | 30 | steps before giving up |
|
|
| `STEP_DELAY` | 2 | seconds between actions |
|
|
| `STUCK_THRESHOLD` | 3 | steps before stuck recovery |
|
|
| `VISION_MODE` | fallback | `off` / `fallback` / `always` |
|
|
| `MAX_ELEMENTS` | 40 | ui elements sent to llm |
|
|
|
|
## how it works
|
|
|
|
each step: dump accessibility tree → filter elements → send to llm → execute action → repeat.
|
|
|
|
the llm thinks before acting — returns `{ think, plan, action }`. if the screen doesn't change for 3 steps, stuck recovery kicks in. when the accessibility tree is empty (webviews, flutter), it falls back to screenshots.
|
|
|
|
## source
|
|
|
|
```
|
|
src/
|
|
kernel.ts main loop
|
|
actions.ts 22 actions + adb retry
|
|
skills.ts 6 multi-step skills
|
|
workflow.ts workflow orchestration
|
|
flow.ts yaml flow runner
|
|
llm-providers.ts 4 providers + system prompt
|
|
sanitizer.ts accessibility xml parser
|
|
config.ts env config
|
|
constants.ts keycodes, coordinates
|
|
logger.ts session logging
|
|
```
|
|
|
|
## troubleshooting
|
|
|
|
**"adb: command not found"** — install adb or set `ADB_PATH` in `.env`
|
|
|
|
**"no devices found"** — check usb debugging is on, tap "allow" on the phone
|
|
|
|
**agent repeating** — stuck detection handles this. if it persists, use a better model
|
|
|
|
## license
|
|
|
|
mit
|