AVAILABLE FOR INTERESTING PROBLEMS · BAY AREA

Rishabh Mehan

Building AI tools, reliability experiments, and practical software systems.

Email·GitHub·LinkedIn·X·RSS

Projects

All projects →

Agent Skills Eval thumbnail showing the evaluation CLI and product title

Agent Skills Eval2026

TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, YAML suites, JSONL logs, and HTML reports.

Agent evaluation CLI/Open source

AI AgentsLLM EvalsTypeScript

Bench AI square thumbnail showing an LLM model comparison interface

One prompt, many models - compare LLM output quality, latency, tokens, and cost from a CLI or web UI.

AI evaluation toolkit/Live

AILLMTypeScript

Polite Retry square thumbnail showing retry budgeting and reliability controls

Polite Retry2026

A TypeScript retry library designed to avoid retry amplification in distributed systems.

Resilient systems library/Open source

TypeScriptJavaScriptReliability

Devbox square thumbnail showing a remote browser IDE environment

A minimal self-hosted browser development environment with code-server, zsh, sudo, and the opencode CLI.

Remote development image/Open source

DockerDevtoolscode-server

Grandma Qwen square thumbnail showing a local GGUF language model

Grandma Qwen2.5 3B Instruct GGUF2026

A fine-tuned witty grandma chat model based on Qwen2.5 3B Instruct, exported in GGUF format for local inference.

Fine-tuned local LLM/Model

Hugging FaceLLMGGUF

+2more projectsView all →

Writing

Harness Engineering

Harness Engineering: A Beginner's Guide to Single-Agent and Multi-Agent SystemsMost people building agents miss the point. The model is the easy part — the harness around it is where agents are won or lost. A beginner-friendly walk through single-agent and multi-agent systems, when each helps, and the read-vs-write rule that reconciles the famous Anthropic-vs-Cognition debate.Jun 9, 202619 min read

Devbox cover showing a browser IDE and terminal in a Docker environment

Devbox: Cloud VS Code, Opencode, And A Full AI Dev Setup In One ContainerWhy I built Devbox, a small Docker image that packages code-server and opencode into a self-hosted browser development environment.May 11, 20266 min read

Polite Retry: How to Stop Your Node.js Retries from Taking Down Your Own Backend

Polite Retry: How to Stop Your Node.js Retries from Taking Down Your Own BackendMay 6, 20269 min read

Grandma Qwen cover showing a fine-tuned local language model card

Fine-Tuning A Small Model: Building Grandma QwenA practical, personal article about fine-tuning a small Qwen model with Unsloth into a warm, witty character model and exporting it for local inference.May 5, 20266 min read