diff --git a/site/index.html b/site/index.html new file mode 100644 index 0000000..4e86423 --- /dev/null +++ b/site/index.html @@ -0,0 +1,943 @@ + + +
+ + ++ give it a goal in plain english. it reads the screen, thinks about what to do, + taps and types via adb, and repeats until the job is done. +
+ + +$ bun run src/kernel.ts +enter your goal: open youtube and search for "lofi hip hop" + +--- step 1/30 --- +think: i'm on the home screen. launching youtube. +action: launch (842ms) + +--- step 2/30 --- +think: youtube is open. tapping search icon. +action: tap (623ms) + +--- step 3/30 --- +think: search field focused. +action: type "lofi hip hop" (501ms) + +--- step 4/30 --- +action: enter (389ms) + +--- step 5/30 --- +think: search results showing. done. +action: done (412ms)+
every step is a loop. dump the accessibility tree, filter interactive elements, send to an llm, execute the action, repeat.
+ +captures the screen via uiautomator dump and parses the accessibility xml into tappable elements with coordinates and state.
sends screen state + goal to an llm. the model returns think, plan, action - it explains its reasoning before acting.
+executes the chosen action via adb - tap, type, swipe, launch, press back. 22 actions available.
+if screen doesn't change for 3 steps, stuck recovery kicks in. empty accessibility tree falls back to screenshots.
+type a goal, chain goals across apps with ai, or run deterministic steps with no llm calls.
+ +run it and describe what you want. the agent figures out the rest.
+$ bun run src/kernel.ts +enter your goal: send "running +late, 10 mins" to Mom on whatsapp+
chain goals across multiple apps. natural language steps, the llm navigates.
+{
+ "name": "weather to whatsapp",
+ "steps": [
+ { "app": "com.google...",
+ "goal": "search chennai weather" },
+ { "goal": "share to Sanju" }
+ ]
+}
+ fixed taps and types. no llm, instant execution. for repeatable tasks.
+appId: com.whatsapp +name: Send WhatsApp Message +--- +- launchApp +- tap: "Contact Name" +- type: "hello from droidclaw" +- tap: "Send"+
delegate to on-device ai apps, control phones remotely, turn old devices into always-on agents.
+ +open google's ai mode, ask a question, grab the answer, forward it to whatsapp. or ask chatgpt something and share the response to slack. the agent uses apps on your phone as tools - no api keys for those services needed.
+install tailscale on phone + laptop. connect adb over the tailnet. your phone is now a remote agent - control it from anywhere. run workflows from a cron job at 8am every morning.
+# from anywhere: +adb connect <phone-tailscale-ip>:5555 +bun run src/kernel.ts --workflow morning.json+
that android in a drawer can now send standups to slack, check flight prices, digest telegram channels, forward weather to whatsapp. it runs apps that don't have apis.
+unlike predefined button flows, the agent actually thinks. if a button moves, a popup appears, or the layout changes - it adapts. it reads the screen, understands context, and makes decisions.
+across any app installed on the device.
+ +22 actions + 6 multi-step skills. here's the reality.
+ +git clone https://github.com/thisuxhq/droidclaw.git +cd droidclaw && bun install +cp .env.example .env+
edit .env - fastest way to start is groq (free tier):
LLM_PROVIDER=groq +GROQ_API_KEY=gsk_your_key_here+
| provider | cost | vision | notes |
|---|---|---|---|
| groq | free | no | fastest to start |
| openrouter | per token | yes | 200+ models |
| openai | per token | yes | gpt-4o |
| bedrock | per token | yes | claude on aws |
enable usb debugging in developer options, plug in via usb.
+adb devices # should show your device +bun run src/kernel.ts+
| key | default | what |
|---|---|---|
| MAX_STEPS | 30 | steps before giving up |
| STEP_DELAY | 2 | seconds between actions |
| STUCK_THRESHOLD | 3 | steps before stuck recovery |
| VISION_MODE | fallback | off / fallback / always |
| MAX_ELEMENTS | 40 | ui elements sent to llm |
ready to use. workflows are ai-powered (json), flows are deterministic (yaml).
+ +kernel.ts main loop +actions.ts 22 actions + adb retry +skills.ts 6 multi-step skills +workflow.ts workflow orchestration +flow.ts yaml flow runner +llm-providers.ts 4 providers + system prompt +sanitizer.ts accessibility xml parser +config.ts env config +constants.ts keycodes, coordinates +logger.ts session logging+