Files
droidclaw/README.md

3.4 KiB

droidclaw

ai agent that controls your android phone. give it a goal in plain english — it figures out what to tap, type, and swipe.

reads the screen (accessibility tree + optional screenshot), asks an llm what to do, executes via adb, repeats.

$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"

--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)

--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)

--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)

--- step 4/30 ---
action: enter (389ms)

--- step 5/30 ---
think: search results showing. done.
action: done (412ms)

setup

you need bun, adb, and an api key for any llm provider.

bun install
cp .env.example .env

edit .env — fastest way to start is with groq (free tier):

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here

connect your phone (usb debugging on):

adb devices   # should show your device
bun run src/kernel.ts

workflows

chain goals across apps:

bun run src/kernel.ts --workflow examples/weather-to-whatsapp.json

each workflow is a simple json file:

{
  "name": "slack standup",
  "steps": [
    {
      "app": "com.Slack",
      "goal": "open #standup channel, type the message and send it",
      "formData": { "Message": "yesterday: api integration\ntoday: tests\nblockers: none" }
    }
  ]
}

35 ready-to-use workflows in examples/ — messaging, social media, productivity, research, lifestyle.

deterministic flows

for repeatable tasks that don't need ai, use yaml flows:

bun run src/kernel.ts --flow examples/flows/send-whatsapp.yaml

no llm calls, just step-by-step adb commands.

providers

provider cost vision notes
groq free tier no fastest to start
openrouter per token yes 200+ models
openai per token yes gpt-4o
bedrock per token yes claude on aws

config

all in .env:

key default what
MAX_STEPS 30 steps before giving up
STEP_DELAY 2 seconds between actions
STUCK_THRESHOLD 3 steps before stuck recovery
VISION_MODE fallback off / fallback / always
MAX_ELEMENTS 40 ui elements sent to llm

how it works

each step: dump accessibility tree → filter elements → send to llm → execute action → repeat.

the llm thinks before acting — returns { think, plan, action }. if the screen doesn't change for 3 steps, stuck recovery kicks in. when the accessibility tree is empty (webviews, flutter), it falls back to screenshots.

source

src/
  kernel.ts          main loop
  actions.ts         22 actions + adb retry
  skills.ts          6 multi-step skills
  workflow.ts        workflow orchestration
  flow.ts            yaml flow runner
  llm-providers.ts   4 providers + system prompt
  sanitizer.ts       accessibility xml parser
  config.ts          env config
  constants.ts       keycodes, coordinates
  logger.ts          session logging

troubleshooting

"adb: command not found" — install adb or set ADB_PATH in .env

"no devices found" — check usb debugging is on, tap "allow" on the phone

agent repeating — stuck detection handles this. if it persists, use a better model

license

mit