Building an LLM Robot with My Son — EP 9. 4-Month Retrospective, and What Comes Next

Building an LLM Robot with My Son — EP 9. 4-Month Retrospective, and What Comes Next When I wrote EP 0, there was a half-assembled acrylic chassis on the desk with one wheel turning in reverse. Now that robot walks to the kitchen on its own and finds a water glass. Can't pick it up — but it finds it. Four months. What Worked The agent harness approach held up Injecting domain knowledge through CLAUDE.md worked better than expected. The repeated context fatigue disappeared — AI wrote code that respected the project's rules without having to be reminded every session. The file started at ten lines and grew to 120. That growth is a record of the project itself. My son completed a behavior through eight prompts The scene from EP 5 is the one that stays with me. He sat down alone, iterated eight prompts, and built a working obstacle avoidance behavior. Never touched a line of code. "I made this" was not wrong. Apple Silicon local LLM actually works 112 tok/s on ...

Building an LLM Robot with My Son — EP 8. My Son Gave the AI Robot Its First Real Command

Building an LLM Robot with My Son — EP 8. My Son Gave the AI Robot Its First Real Command EP 6 connected the LLM server. EP 7 migrated to Pi. This episode: camera joins. Qwen2.5-VL-7B is now on the LLM server — the multimodal variant that accepts image input alongside text. Camera frames from the robot get sent with each request, and the model decides what to do based on what it sees. Camera + sensors + LLM + robot, all connected at once for the first time. Switching to Qwen2.5-VL From text-only Qwen2.5-7B to Qwen2.5-VL-7B. Same family — harness barely changed. Three things were different: New section added to CLAUDE.md: ## Vision Input - Camera resolution: 640×480 - Transmission format: JPEG (quality 70) - Frame timing: sent only at command request time (not continuous streaming) - Image + sensor data sent together ## LLM input format (vision mode) { "image": "<base64 encoded JPEG>", "sensor": "dist:45", "instruct...

Building an LLM Robot with My Son — EP 7. Upgrading the Robot Brain from Arduino to Raspberry Pi

Building an LLM Robot with My Son — EP 7. Upgrading the Robot Brain from Arduino to Raspberry Pi Arduino had reached its limit. The setup from EP 6 — Arduino plus a Python bridge laptop — worked, but it meant the robot was physically tethered to a laptop by a USB cable. An "autonomous" robot on a leash felt wrong. Migrating to Pi would let the bridge and ROS2 nodes run inside the robot itself. True independence. Also, my son had been asking about this for weeks. "When does the real computer go in?" had been the recurring question. The time had come. Pi 4 vs Pi 5 vs Banana Pi Three options, one criterion: LLM doesn't run on the Pi. The Pi handles ROS2 nodes, camera pipeline, and bridge role only. Heavy inference stays on the Mac LLM server. Device Approx. price RAM USB 3.0 Thermals Notes Raspberry Pi 4B 4GB $55 4GB 2 ports Hot Widely available Raspberry Pi 5 4GB $60 4GB 2 ports Better Stock inconsistent Banana Pi M5 $45 4GB ...

Building an LLM Robot with My Son — EP 6. Connecting the Robot to the LLM Server over LAN

Building an LLM Robot with My Son — EP 6. Connecting the Robot to the LLM Server over LAN The robot needed to talk to the LLM server. Until now the robot ran standalone — HC-SR04 measuring distance, motors responding to code. That works for basic behavior. But the whole point of this project is an LLM that makes decisions. The robot sends camera frames and sensor data to the LLM server, the LLM decides what to do, the command comes back. That communication layer had to be built. This episode is about how robot (edge) ↔ LLM server (Mac) gets connected. Three Options WebSocket : bidirectional real-time communication. Simple to implement, HTTP-based so firewall issues are minimal. Works well for a setup where the robot streams data and the server streams commands back. gRPC : Google's RPC framework. Protocol Buffers serialization means smaller payloads than WebSocket. Type safety and streaming support are both there. But setup is heavier — Protobuf schemas need to be maintained...

Building an LLM Robot with My Son — EP 5. My Son's First Day Coding a Robot with AI

Building an LLM Robot with My Son — EP 5. My Son's First Day Coding a Robot with AI The day came when my son wanted to try alone. Up until now I'd always been close — sitting next to him, typing alongside him, stepping in when errors appeared. But this afternoon: "I'll do it myself." A weekend afternoon. I went to another room. About thirty minutes later he came out. "Dad, the robot keeps turning left." His First Prompts Later we looked at everything he'd typed that day. The first prompt: "make the robot avoid obstacles by going left" Claude Code produced code. He uploaded it. The robot moved forward, detected an obstacle, stopped, turned left. So far correct. After turning, it didn't go forward again. Just kept turning left. Second prompt: "make it go forward again after avoiding the obstacle" Code was revised. Uploaded. This time: detect obstacle, stop, turn left, go forward again. But every turn was the sa...

Building an LLM Robot with My Son — EP 4. Choosing the Right Local LLM for Robot Control

Building an LLM Robot with My Son — EP 4. Choosing the Right Local LLM for Robot Control We needed to pick a model. Connecting a local LLM to the robot means committing to a specific open-source model. If we were using a cloud API, this decision would be trivial — just call GPT-4o or Claude. But our architecture runs a local LLM server on the home LAN. We had to test and decide ourselves. I set three evaluation criteria. Tool use — to send structured commands like "forward" or "stop," the model needs to reliably call JSON functions. If it sometimes returns proper JSON and sometimes writes prose explanations, parsing fails. Consistency matters more than peak performance. Korean language — my son gives instructions in Korean, and I want to read debug output in Korean. A model that drifts into English mid-response is just harder to use. Vision — we don't need it now, but we'll need camera frame input later. If the model has a vision variant in the same...

Building an LLM Robot with My Son — EP 3. Local LLM Speed Compared Across Mac M1, M4, and M4 Pro

Building an LLM Robot with My Son — EP 3. Local LLM Speed Compared Across Mac M1, M4, and M4 Pro The first time I ran a local LLM on the Mac mini M1, I watched Qwen2.5-7B output tokens one character at a time and paused for a second. About 8 tokens per second. Not slow, exactly. But whether that's fast enough for real-time robot control is a different question — how long does it take from the robot sending a camera frame to receiving a command back? That needed a measurement, not a guess. I had three Macs already: Mac mini M1 16GB, Mac mini M4 24GB, MacBook Pro M4 Pro 14" 24GB. Same prompt, same model, three machines. The comparison made itself. Test Setup Model: Qwen2.5-7B-Instruct, Q4_K_M quantization. Both mlx-lm and llama.cpp Metal backend, measured separately. Metrics: - tok/s : tokens generated per second - TTFT : Time to First Token - Memory usage : at 32K and 128K context - Thermals : CPU/GPU temperature after 5 minutes of sustained load Three prompt types:...