diff --git a/site/index.html b/site/index.html new file mode 100644 index 0000000..4e86423 --- /dev/null +++ b/site/index.html @@ -0,0 +1,943 @@ + + + + + + droidclaw - ai agent for android + + + + + + + + + + + + + +
+
+
experimental
+

turn old phones into
ai agents

+

+ give it a goal in plain english. it reads the screen, thinks about what to do, + taps and types via adb, and repeats until the job is done. +

+ + +
+
+ + + + droidclaw +
+
$ bun run src/kernel.ts
+enter your goal: open youtube and search for "lofi hip hop"
+
+--- step 1/30 ---
+think: i'm on the home screen. launching youtube.
+action: launch (842ms)
+
+--- step 2/30 ---
+think: youtube is open. tapping search icon.
+action: tap (623ms)
+
+--- step 3/30 ---
+think: search field focused.
+action: type "lofi hip hop" (501ms)
+
+--- step 4/30 ---
+action: enter (389ms)
+
+--- step 5/30 ---
+think: search results showing. done.
+action: done (412ms)
+
+
+
+ +
+ + +
+
+ +

perceive, reason, act, adapt

+

every step is a loop. dump the accessibility tree, filter interactive elements, send to an llm, execute the action, repeat.

+ +
+
+ +

1. perceive

+

captures the screen via uiautomator dump and parses the accessibility xml into tappable elements with coordinates and state.

+
+
+ +

2. reason

+

sends screen state + goal to an llm. the model returns think, plan, action - it explains its reasoning before acting.

+
+
+ +

3. act

+

executes the chosen action via adb - tap, type, swipe, launch, press back. 22 actions available.

+
+
+ +

4. adapt

+

if screen doesn't change for 3 steps, stuck recovery kicks in. empty accessibility tree falls back to screenshots.

+
+
+
+
+ +
+ + +
+
+ +

interactive, workflows, or flows

+

type a goal, chain goals across apps with ai, or run deterministic steps with no llm calls.

+ +
+
+
+ +

interactive

+
+ just type +

run it and describe what you want. the agent figures out the rest.

+
$ bun run src/kernel.ts
+enter your goal: send "running
+late, 10 mins" to Mom on whatsapp
+
+
+
+ +

workflows

+
+ ai-powered · json +

chain goals across multiple apps. natural language steps, the llm navigates.

+
{
+  "name": "weather to whatsapp",
+  "steps": [
+    { "app": "com.google...",
+      "goal": "search chennai weather" },
+    { "goal": "share to Sanju" }
+  ]
+}
+
+
+
+ +

flows

+
+ instant · yaml +

fixed taps and types. no llm, instant execution. for repeatable tasks.

+
appId: com.whatsapp
+name: Send WhatsApp Message
+---
+- launchApp
+- tap: "Contact Name"
+- type: "hello from droidclaw"
+- tap: "Send"
+
+
+ +
+
+

workflows

+
    +
  • json format, uses ai
  • +
  • handles ui changes and popups
  • +
  • slower (llm calls each step)
  • +
  • best for complex multi-app tasks
  • +
+
+
+

flows

+
    +
  • yaml format, no ai needed
  • +
  • breaks if ui changes
  • +
  • instant execution
  • +
  • best for simple repeatable tasks
  • +
+
+
+
+
+ +
+ + +
+
+ +

what you can build with this

+

delegate to on-device ai apps, control phones remotely, turn old devices into always-on agents.

+ +
+
+ +

delegate to ai apps on-device

+

open google's ai mode, ask a question, grab the answer, forward it to whatsapp. or ask chatgpt something and share the response to slack. the agent uses apps on your phone as tools - no api keys for those services needed.

+
+
+ +

remote control with tailscale

+

install tailscale on phone + laptop. connect adb over the tailnet. your phone is now a remote agent - control it from anywhere. run workflows from a cron job at 8am every morning.

+
# from anywhere:
+adb connect <phone-tailscale-ip>:5555
+bun run src/kernel.ts --workflow morning.json
+
+
+ +

old phones, always on

+

that android in a drawer can now send standups to slack, check flight prices, digest telegram channels, forward weather to whatsapp. it runs apps that don't have apis.

+
+
+ +

automation with ai intelligence

+

unlike predefined button flows, the agent actually thinks. if a button moves, a popup appears, or the layout changes - it adapts. it reads the screen, understands context, and makes decisions.

+
+
+
+
+ +
+ + +
+
+ +

things it can do right now

+

across any app installed on the device.

+ +
+
+
+ +

messaging

+
+
    +
  • send whatsapp to saved or unsaved numbers
  • +
  • reply to latest sms
  • +
  • compose emails via gmail
  • +
  • telegram messages to groups
  • +
  • post standups to slack
  • +
  • broadcast to multiple contacts
  • +
+
+
+
+ +

research

+
+
    +
  • search google, collect results
  • +
  • ask chatgpt / gemini, grab answer
  • +
  • check weather, stocks, flights
  • +
  • compare prices across apps
  • +
  • translate via google translate
  • +
  • compile multi-source digests
  • +
+
+
+
+ +

social

+
+
    +
  • post to instagram, twitter/x
  • +
  • like and comment on posts
  • +
  • check engagement metrics
  • +
  • save youtube to watch later
  • +
  • follow / unfollow accounts
  • +
  • check linkedin notifications
  • +
+
+
+
+ +

productivity

+
+
    +
  • morning briefing across apps
  • +
  • create calendar events
  • +
  • capture notes in google keep
  • +
  • check github pull requests
  • +
  • set alarms and reminders
  • +
  • triage notifications
  • +
+
+
+
+ +

lifestyle

+
+
    +
  • order food from delivery apps
  • +
  • book an uber ride
  • +
  • play songs on spotify
  • +
  • check commute on maps
  • +
  • log workouts, track expenses
  • +
  • toggle do not disturb
  • +
+
+
+
+ +

device control

+
+
    +
  • toggle wifi, bluetooth, airplane
  • +
  • adjust brightness, volume
  • +
  • force stop or clear cache
  • +
  • grant/revoke permissions
  • +
  • install/uninstall apps
  • +
  • run any adb shell command
  • +
+
+
+
+
+ +
+ + +
+
+ +

what works and what doesn't

+

22 actions + 6 multi-step skills. here's the reality.

+ +
+
+

works well

+
    +
  • native android apps with standard ui
  • +
  • multi-app workflows that chain goals
  • +
  • device settings via shell commands
  • +
  • text input, navigation, taps
  • +
  • stuck detection + recovery
  • +
  • vision fallback for empty trees
  • +
+
+
+

unreliable

+
    +
  • flutter, react native, games
  • +
  • webviews (incomplete tree)
  • +
  • drag & drop, multi-finger
  • +
  • notification interaction
  • +
  • clipboard on android 12+
  • +
  • captchas and bot detection
  • +
+
+
+

can't do

+
    +
  • banking apps (FLAG_SECURE)
  • +
  • biometrics (fingerprint, face)
  • +
  • bypass encrypted lock screen
  • +
  • access other apps' private data
  • +
  • audio or camera streams
  • +
  • pinch-to-zoom gestures
  • +
+
+
+
+
+ +
+ + +
+
+ +

getting started

+ +
+
+ 1 +

clone and install

+
git clone https://github.com/thisuxhq/droidclaw.git
+cd droidclaw && bun install
+cp .env.example .env
+
+
+ 2 +

configure an llm provider

+

edit .env - fastest way to start is groq (free tier):

+
LLM_PROVIDER=groq
+GROQ_API_KEY=gsk_your_key_here
+ + + + + + + + +
providercostvisionnotes
groqfreenofastest to start
openrouterper tokenyes200+ models
openaiper tokenyesgpt-4o
bedrockper tokenyesclaude on aws
+
+
+ 3 +

connect your phone

+

enable usb debugging in developer options, plug in via usb.

+
adb devices   # should show your device
+bun run src/kernel.ts
+
+
+ 4 +

tune (optional)

+ + + + + + + + + +
keydefaultwhat
MAX_STEPS30steps before giving up
STEP_DELAY2seconds between actions
STUCK_THRESHOLD3steps before stuck recovery
VISION_MODEfallbackoff / fallback / always
MAX_ELEMENTS40ui elements sent to llm
+
+
+
+
+ +
+ + +
+
+ +

35 workflows + 5 flows

+

ready to use. workflows are ai-powered (json), flows are deterministic (yaml).

+ +
+ + + messaging 10 workflows + + +
+ +
+
+ +
+ + + social 4 workflows + + +
+ +
+
+ +
+ + + productivity 8 workflows + + +
+ +
+
+ +
+ + + research 6 workflows + + +
+ +
+
+ +
+ + + lifestyle 8 workflows + + +
+ +
+
+ +
+ + + flows 5 deterministic + + +
+ +
+
+
+
+ + +
+
+
+
+ +

10 files in src/

+
+
kernel.ts          main loop
+actions.ts         22 actions + adb retry
+skills.ts          6 multi-step skills
+workflow.ts        workflow orchestration
+flow.ts            yaml flow runner
+llm-providers.ts   4 providers + system prompt
+sanitizer.ts       accessibility xml parser
+config.ts          env config
+constants.ts       keycodes, coordinates
+logger.ts          session logging
+
+
+
+ + + + + +