The honest measurements come from live telemetry on the Pi. The protocol-level token compression cuts spawn context from 68 tokens to 55 across the sample runs in run_telemetry.jsonl, a 19.1 percent reduction directly visible in the data. Higher combined figures appear in earlier drafts of the README; those compare against a hypothetical naive transcript paste baseline I did not run side by side, so I treat them as a single-machine direction signal, not a benchmark. Hailo HEFs are wired for the Whisper and CLIP encoders on the AI HAT; Qwen3 stays on CPU because LLM and embedding GGUFs do not run on Hailo10H.