Ecosystem
AgentV is the evaluation layer in the AI agent lifecycle. It works alongside runtime governance and observability tools — each handles a different concern with zero overlap.
The Three Layers
Section titled “The Three Layers”| Layer | Tool | Question it answers |
|---|---|---|
| Evaluate (pre-production) | AgentV | ”Is this agent good enough to deploy?” |
| Govern (runtime) | Agent Control | ”Should this action be allowed?” |
| Observe (runtime) | Langfuse | ”What is the agent doing in production?” |
AgentV — Evaluate
Section titled “AgentV — Evaluate”Offline evaluation and testing. Run eval cases against agents, score with deterministic code graders + LLM judges, detect regressions, gate CI/CD pipelines. Everything lives in Git.
agentv eval evals/my-agent.yamlAgent Control — Govern
Section titled “Agent Control — Govern”Runtime guardrails. Intercepts agent actions (tool calls, API requests) and evaluates them against configurable policies. Deny, steer, warn, or log — without changing agent code. Pluggable evaluators with confidence scoring.
Langfuse — Observe
Section titled “Langfuse — Observe”Production observability. Traces agent execution with explicit Tool/LLM/Retrieval observation types, ingests evaluation scores, and provides dashboards for debugging and monitoring. Self-hostable.
How They Connect
Section titled “How They Connect”Define evals (YAML in Git) | vRun evals locally or in CI (AgentV) | vDeploy agent to production | vEnforce policies on tool calls (Agent Control) | | v vTrace execution (Langfuse) Log violations (Agent Control) | vFeed production traces back into evals (AgentV)The feedback loop is key: Langfuse traces surface real-world failures that become new AgentV eval cases. Agent Control deny/steer events identify safety gaps that become new test scenarios.
Traditional Software Analogy
Section titled “Traditional Software Analogy”This maps to how traditional software works:
| Traditional | AI Agent Equivalent |
|---|---|
| Test suite (Jest, pytest) | AgentV |
| WAF / auth middleware | Agent Control |
| APM / logging (Datadog) | Langfuse |
When to Use What
Section titled “When to Use What”AgentV handles:
- Eval definition and execution
- Code + LLM graders
- Regression detection and CI/CD gating
- Multi-provider A/B comparison
Agent Control handles:
- Runtime policy enforcement (deny/steer/warn/log)
- Pre/post execution evaluation of agent actions
- Pluggable evaluators (regex, JSON, SQL, LLM-based)
- Centralized control plane with dashboard
Langfuse handles:
- Production tracing with agent-native observation types
- Live evaluation automation on trace ingestion
- Score ingestion from external evaluators
- Team dashboards and debugging