The AIGENSA Labs

Where we run experiments on real agents and LLMs in production. We measure what breaks, what scales, and what survives. Two tools did.

Tools Born in the Lab

Eval Arena

Invite-Only

Born from 100+ production deployments across Insurance, Healthcare, and GovTech. Eval Arena delivers business-specific evaluation recommendations β€” focused on the 20% of tests that prevent 80% of production failures.

Learn More

jl β€” Jupyter CLI

Open Source

Built after losing thousands of tokens to MCP schema overhead in Claude Code sessions. jl replaces jupyter-mcp-server with a single bash call β€” direct REST API, stateful kernel, SSH tunnel support.

View on GitHub

What We're Studying

Agent Reliability PatternsLLM Cost OptimizationEvaluation MethodologyMulti-Agent CoordinationContext Efficiency

Follow the Research

We write about what we learn β€” architecture decisions, failure patterns, and the trade-offs that don't show up in benchmarks.

Read the Blog