Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
droidclaw
ai agent that controls your android phone. give it a goal in plain english — it figures out what to tap, type, and swipe.
reads the screen (accessibility tree + optional screenshot), asks an llm what to do, executes via adb, repeats.
$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"
--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)
--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)
--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)
--- step 4/30 ---
action: enter (389ms)
--- step 5/30 ---
think: search results showing. done.
action: done (412ms)
setup
you need bun, adb, and an api key for any llm provider.
bun install
cp .env.example .env
edit .env — fastest way to start is with groq (free tier):
LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here
connect your phone (usb debugging on):
adb devices # should show your device
bun run src/kernel.ts
workflows
chain goals across apps:
bun run src/kernel.ts --workflow examples/weather-to-whatsapp.json
each workflow is a simple json file:
{
"name": "slack standup",
"steps": [
{
"app": "com.Slack",
"goal": "open #standup channel, type the message and send it",
"formData": { "Message": "yesterday: api integration\ntoday: tests\nblockers: none" }
}
]
}
35 ready-to-use workflows in examples/ — messaging, social media, productivity, research, lifestyle.
deterministic flows
for repeatable tasks that don't need ai, use yaml flows:
bun run src/kernel.ts --flow examples/flows/send-whatsapp.yaml
no llm calls, just step-by-step adb commands.
providers
| provider | cost | vision | notes |
|---|---|---|---|
| groq | free tier | no | fastest to start |
| openrouter | per token | yes | 200+ models |
| openai | per token | yes | gpt-4o |
| bedrock | per token | yes | claude on aws |
config
all in .env:
| key | default | what |
|---|---|---|
MAX_STEPS |
30 | steps before giving up |
STEP_DELAY |
2 | seconds between actions |
STUCK_THRESHOLD |
3 | steps before stuck recovery |
VISION_MODE |
fallback | off / fallback / always |
MAX_ELEMENTS |
40 | ui elements sent to llm |
how it works
each step: dump accessibility tree → filter elements → send to llm → execute action → repeat.
the llm thinks before acting — returns { think, plan, action }. if the screen doesn't change for 3 steps, stuck recovery kicks in. when the accessibility tree is empty (webviews, flutter), it falls back to screenshots.
source
src/
kernel.ts main loop
actions.ts 22 actions + adb retry
skills.ts 6 multi-step skills
workflow.ts workflow orchestration
flow.ts yaml flow runner
llm-providers.ts 4 providers + system prompt
sanitizer.ts accessibility xml parser
config.ts env config
constants.ts keycodes, coordinates
logger.ts session logging
troubleshooting
"adb: command not found" — install adb or set ADB_PATH in .env
"no devices found" — check usb debugging is on, tap "allow" on the phone
agent repeating — stuck detection handles this. if it persists, use a better model
license
mit