Projects
All projects →Agent Skills Eval2026
TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, YAML suites, JSONL logs, and HTML reports.
Bench AI2026
One prompt, many models - compare LLM output quality, latency, tokens, and cost from a CLI or web UI.
Polite Retry2026
A TypeScript retry library designed to avoid retry amplification in distributed systems.
Devbox2026
A minimal self-hosted browser development environment with code-server, zsh, sudo, and the opencode CLI.
Grandma Qwen2.5 3B Instruct GGUF2026
A fine-tuned witty grandma chat model based on Qwen2.5 3B Instruct, exported in GGUF format for local inference.
Writing
All posts →
Harness Engineering: A Beginner's Guide to Single-Agent and Multi-Agent SystemsMost people building agents miss the point. The model is the easy part — the harness around it is where agents are won or lost. A beginner-friendly walk through single-agent and multi-agent systems, when each helps, and the read-vs-write rule that reconciles the famous Anthropic-vs-Cognition debate.
Devbox: Cloud VS Code, Opencode, And A Full AI Dev Setup In One ContainerWhy I built Devbox, a small Docker image that packages code-server and opencode into a self-hosted browser development environment.
Polite Retry: How to Stop Your Node.js Retries from Taking Down Your Own Backend
Fine-Tuning A Small Model: Building Grandma QwenA practical, personal article about fine-tuning a small Qwen model with Unsloth into a warm, witty character model and exporting it for local inference.