The AIGENSA Labs
Where we run experiments on real agents and LLMs in production. We measure what breaks, what scales, and what survives. Two tools did.
Tools Born in the Lab
Eval Arena
Invite-OnlyBorn from 100+ production deployments across Insurance, Healthcare, and GovTech. Eval Arena delivers business-specific evaluation recommendations β focused on the 20% of tests that prevent 80% of production failures.
Learn Morejl β Jupyter CLI
Open SourceBuilt after losing thousands of tokens to MCP schema overhead in Claude Code sessions. jl replaces jupyter-mcp-server with a single bash call β direct REST API, stateful kernel, SSH tunnel support.
View on GitHubWhat We're Studying
Follow the Research
We write about what we learn β architecture decisions, failure patterns, and the trade-offs that don't show up in benchmarks.
Read the Blog