Update architecture doc: add simple ELI5 explanation + kernel bridge details

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Sanju Sivalingam
2026-02-06 17:38:26 +05:30
parent 434c5bb826
commit caa7c9a371

View File

@@ -4,11 +4,194 @@ Complete breakdown of what happens when you type "Send Mom I'll be late tonight"
--- ---
## The 3 Pieces ## You Have 3 Physical Things
```
📱 Phone 1 — your daily phone (in your pocket, anywhere)
💻 Laptop — sitting at home, always on, plugged in
📱 Phone 2 — the agent phone (sitting next to laptop, on WiFi)
```
The laptop does EVERYTHING. It runs both the web app AND the kernel.
```
Phone 1 = the remote control (just a web browser)
Laptop = the brain (runs the website + AI + sends ADB commands)
Phone 2 = the robot hands (receives ADB commands, taps/types on screen)
```
Phone 1 never talks to Phone 2 directly. Everything goes through the laptop.
---
## What's Running on the Laptop
```
💻 Your Laptop (home desk, always on)
┌─────────────────────────────────────────┐
│ │
│ 1. SvelteKit app (the website) │
│ - Shows a text box + run button │
│ - Listens on port 3000 │
│ │
│ 2. Kernel (the AI brain) │
│ - Gets called BY the SvelteKit app │
│ - Talks to Groq/OpenAI over internet│
│ - Sends ADB commands to Phone 2 │
│ │
│ 3. ADB connection to Phone 2 │
│ - adb connect 192.168.1.42:5555 │
│ - Already paired, always connected │
│ │
│ 4. Tailscale (just networking) │
│ - Makes this laptop reachable │
│ from anywhere as 100.64.0.2 │
│ │
└─────────────────────────────────────────┘
```
---
## What Happens Step by Step
### Step 0: Setup (one time)
```
You plug Phone 2 into laptop USB
You run: adb tcpip 5555
You unplug Phone 2, put it on charger next to laptop
Laptop runs: adb connect 192.168.1.42:5555 ← Phone 2's WiFi IP
Now laptop can control Phone 2 wirelessly
You start the SvelteKit app: bun run dev
Tailscale is running on laptop + Phone 1
```
### Step 1: You open the web app
```
📱 Phone 1 (you're at a coffee shop)
│ You open browser: http://100.64.0.2:3000
│ ^^^^^^^^^^^^^^^^
│ This is your laptop's Tailscale IP
│ Tailscale encrypts this and tunnels it to your laptop at home
💻 Laptop (at home)
SvelteKit serves the web page back to your phone's browser
```
### Step 2: You type the goal and hit Run
```
📱 Phone 1 browser
│ You type: "Send Mom I'll be late tonight"
│ You tap: [RUN]
│ Browser sends: POST http://100.64.0.2:3000/api/run
│ body: { goal: "Send Mom I'll be late tonight" }
💻 Laptop receives this HTTP request
│ SvelteKit API route catches it
│ Calls: kernel.run("Send Mom I'll be late tonight")
│ NOW THE KERNEL LOOP STARTS (on the laptop):
```
### Step 3: Kernel loop (runs on laptop, controls Phone 2)
```
💻 Laptop 📱 Phone 2
│ │
│ adb shell uiautomator dump ────────────────────>│
│ "tell me what's on your screen" │
│ │ (sends XML back)
│<──────────────────────────────────────────────────│
│ │
│ Parses XML: "home screen, WhatsApp icon at 540,800"
│ │
│ Sends to Groq API ──────────────> ☁️ Internet │
│ "screen shows home, goal is send msg to Mom" │
│ │
│ Groq replies: { action: "launch", package: "com.whatsapp" }
│ │
│ adb shell monkey -p com.whatsapp ──────────────>│
│ "open WhatsApp" │ (WhatsApp opens)
│ │
│ (waits 2 seconds) │
│ │
│ adb shell uiautomator dump ────────────────────>│
│ "what's on screen now?" │
│ │ (sends XML back)
│<──────────────────────────────────────────────────│
│ │
│ "WhatsApp is open, I see search icon" │
│ │
│ ... repeats 5 more times until message is sent │
│ │
│ Kernel returns: { success: true, steps: 7 } │
│ │
```
### Step 4: Result comes back to your phone
```
💻 Laptop
│ kernel.run() finished
│ SvelteKit sends HTTP response back
📱 Phone 1 (still at coffee shop)
│ Browser shows: "Done! Sent in 7 steps (12.4s)"
```
---
## Where Tailscale Fits
Tailscale is just a wire. It connects Phone 1 to the laptop when they're on different networks. If they're on the same WiFi, you don't even need Tailscale.
```
Without Tailscale: Phone 1 ──WiFi──> Laptop ──WiFi──> Phone 2
(must be same WiFi)
With Tailscale: Phone 1 ──Tailscale tunnel──> Laptop ──WiFi──> Phone 2
(works from anywhere)
```
Tailscale is invisible to the kernel. The kernel doesn't know or care about Tailscale. It just talks to ADB like normal. Tailscale just makes the network path between Phone 1's browser and the laptop work across the internet.
```
WHAT GETS INSTALLED WHERE:
Phone 1: Tailscale app (from Play Store)
Laptop: Tailscale daemon (curl install)
Phone 2: NOTHING. Just USB debugging ON.
```
---
---
# Technical Deep Dive
Detailed diagrams for implementation reference.
---
## The 3 Pieces (Technical View)
``` ```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PHONE 1 │ │ SERVER │ │ PHONE 2 │ │ PHONE 1 │ │ LAPTOP/SERVER │ │ PHONE 2 │
│ (your daily) │ │ (Raspberry Pi, │ │ (agent phone) │ │ (your daily) │ │ (Raspberry Pi, │ │ (agent phone) │
│ │ │ VPS, laptop) │ │ │ │ │ │ VPS, laptop) │ │ │
│ Browser with │ │ SvelteKit app │ │ Android phone │ │ Browser with │ │ SvelteKit app │ │ Android phone │
@@ -30,7 +213,7 @@ When all 3 devices are on the same home/office WiFi:
``` ```
┌──────────────────────── Home WiFi (192.168.1.x) ────────────────────────┐ ┌──────────────────────── Home WiFi (192.168.1.x) ────────────────────────┐
│ │ │ │
│ Phone 1 Server Phone 2 │ │ Phone 1 Laptop Phone 2 │
│ 192.168.1.10 192.168.1.100 192.168.1.42 │ │ 192.168.1.10 192.168.1.100 192.168.1.42 │
│ │ │ │
│ Browser ──HTTP──> SvelteKit (:3000) │ │ Browser ──HTTP──> SvelteKit (:3000) │
@@ -53,8 +236,6 @@ When all 3 devices are on the same home/office WiFi:
└──────────────────────────────────────────────────────────────────────────┘ └──────────────────────────────────────────────────────────────────────────┘
``` ```
**Problem:** Only works when you're home. Leave the house = can't reach the server.
--- ---
## With Tailscale (From Anywhere) ## With Tailscale (From Anywhere)
@@ -77,7 +258,7 @@ Tailscale creates a private encrypted network across all your devices, no matter
┌──────│──────────────────────────────────────────┐ ┌──────│──────────────────────────────────────────┐
│ │ │ │ │ │
│ ┌──▼──────────────────┐ ┌────────────────┐ │ │ ┌──▼──────────────────┐ ┌────────────────┐ │
│ │ Server │ │ Phone 2 │ │ │ │ Laptop │ │ Phone 2 │ │
│ │ Tailscale: 100.64.0.2│ │ 192.168.1.42 │ │ ← Same local │ │ Tailscale: 100.64.0.2│ │ 192.168.1.42 │ │ ← Same local
│ │ │ │ │ │ network │ │ │ │ │ │ network
│ │ SvelteKit + Kernel │────│ ADB WiFi :5555 │ │ │ │ SvelteKit + Kernel │────│ ADB WiFi :5555 │ │
@@ -88,21 +269,15 @@ Tailscale creates a private encrypted network across all your devices, no matter
└───────────────────────────────────────────────────┘ └───────────────────────────────────────────────────┘
``` ```
**What Tailscale does:**
- Phone 1 (anywhere) can reach Server at `100.64.0.2` as if they're on the same network
- Encrypted WireGuard tunnel, no port forwarding, no public IP needed
- Server + Phone 2 stay at home, always connected via local WiFi
- Phone 2 does NOT need Tailscale — only Phone 1 and Server need it
--- ---
## The Full Sequence — Step by Step ## The Full Sequence — With Timestamps
Here's exactly what happens when you type "Send Mom I'll be late tonight" and hit enter: Here's exactly what happens when you type "Send Mom I'll be late tonight" and hit enter:
``` ```
TIME PHONE 1 (browser) SERVER (SvelteKit + Kernel) PHONE 2 (agent) TIME PHONE 1 (browser) LAPTOP (SvelteKit + Kernel) PHONE 2 (agent)
───── ───────────────── ────────────────────────── ──────────────── ───── ───────────────── ────────────────────────── ────────────────
0.0s User types goal 0.0s User types goal
"Send Mom I'll be "Send Mom I'll be
@@ -194,7 +369,7 @@ TIME PHONE 1 (browser) SERVER (SvelteKit + Kernel) PHONE 2 (
--- ---
## The Communication Layers ## The 4 Communication Layers
``` ```
┌─────────────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────────────┐
@@ -211,7 +386,7 @@ TIME PHONE 1 (browser) SERVER (SvelteKit + Kernel) PHONE 2 (
│ ▼ │ │ │ ▼ │ │
│ LAYER 3: Web Server │ │ LAYER 3: Web Server │
│ ┌───────────────────────────────────────────────────────────────┐ │ │ ┌───────────────────────────────────────────────────────────────┐ │
│ │ SvelteKit backend (server-side, runs on the Server) │ │ │ │ SvelteKit backend (server-side, runs on the Laptop) │ │
│ │ - API route: POST /api/run { goal } │ │ │ │ - API route: POST /api/run { goal } │ │
│ │ - Starts kernel.run() as async task │ │ │ │ - Starts kernel.run() as async task │ │
│ │ - Streams step updates back to browser via SSE │ │ │ │ - Streams step updates back to browser via SSE │ │
@@ -238,7 +413,7 @@ TIME PHONE 1 (browser) SERVER (SvelteKit + Kernel) PHONE 2 (
│ ┌───────────────────────────────────────────────────────────────┐ │ │ ┌───────────────────────────────────────────────────────────────┐ │
│ │ ADB over WiFi (TCP connection to Phone 2) │ │ │ │ ADB over WiFi (TCP connection to Phone 2) │ │
│ │ │ │ │ │ │ │
│ │ Server ──TCP:5555──> Phone 2 │ │ │ │ Laptop ──TCP:5555──> Phone 2 │ │
│ │ │ │ │ │ │ │
│ │ Commands: │ │ │ │ Commands: │ │
│ │ adb shell uiautomator dump (read screen) │ │ │ │ adb shell uiautomator dump (read screen) │ │
@@ -254,45 +429,7 @@ TIME PHONE 1 (browser) SERVER (SvelteKit + Kernel) PHONE 2 (
--- ---
## Where Tailscale Fits ## SvelteKit App Structure
Tailscale is NOT part of the execution flow. It's a **network layer** that makes Phone 1 able to reach the Server when they're on different networks.
```
WITHOUT TAILSCALE:
Phone 1 ──192.168.1.x──> Server ──192.168.1.x──> Phone 2
✓ Works on same WiFi
✗ Doesn't work from outside
WITH TAILSCALE:
Phone 1 ──100.64.0.1──┐
│ Tailscale tunnel
│ (encrypted WireGuard)
Server ──100.64.0.2 ──┘──192.168.1.x──> Phone 2
✓ Works from ANYWHERE
✓ No port forwarding
✓ No public IP needed
✓ Encrypted
WHAT GETS INSTALLED WHERE:
Phone 1: Tailscale app (from Play Store)
Server: Tailscale daemon (curl install)
Phone 2: NOTHING. Just USB debugging ON.
```
Tailscale is invisible to the kernel. The kernel doesn't know or care about Tailscale. It just talks to ADB like normal. Tailscale just makes the network path between Phone 1's browser and the Server work across the internet.
---
## The SvelteKit App Structure
``` ```
web/ web/
@@ -312,7 +449,7 @@ web/
├── package.json ├── package.json
└── svelte.config.js └── svelte.config.js
kernel (existing): kernel (existing, no changes needed except kernel.ts):
├── src/ ├── src/
│ ├── kernel.ts ← Modified: export run() function │ ├── kernel.ts ← Modified: export run() function
│ ├── actions.ts ← No changes │ ├── actions.ts ← No changes
@@ -331,7 +468,7 @@ kernel (existing):
YOU type "Send Mom I'll be late tonight" YOU type "Send Mom I'll be late tonight"
Phone 1 browser ──HTTP POST──> Server (SvelteKit API route) Phone 1 browser ──HTTP POST──> Laptop (SvelteKit API route)
│ │ │ │
│ ▼ │ ▼
│ kernel.run(goal) │ kernel.run(goal)
@@ -365,7 +502,7 @@ Phone 1 browser ──HTTP POST──> Server (SvelteKit API route)
## One-Line Summary ## One-Line Summary
``` ```
Browser (Phone 1) ──HTTP──> SvelteKit (Server) ──ADB WiFi──> Android (Phone 2) Browser (Phone 1) ──HTTP──> SvelteKit (Laptop) ──ADB WiFi──> Android (Phone 2)
├──HTTPS──> LLM API (cloud) for decisions ├──HTTPS──> LLM API (cloud) for decisions