KyraDB YCSB Performance: Sub-Millisecond Latency for Agentic Architectures

KyraDB is an event-sourced context engine built for the agent era. Its core architecture — an immutable context log as the single source of truth, with graph, vector, search, ATOM facts, and identity as async read model projections — is designed for one specific constraint: agents need causally-consistent, auditable context at low latency, at scale.
We ran the standard Yahoo! Cloud Serving Benchmark (YCSB) to validate the storage layer that underpins all of this.
Benchmark Configuration
The test was run under the following conditions:
- Infrastructure: Azure
Standard_L16as_v4 - Dataset Target: 128 GiB logical, 109,425,919 records
- Execution: 16 client threads, 64 partitions
- Iterations: 1 warmup + 3 measured iterations, 1M ops per measured workload iteration
- Build: Release
YCSB Results
| Workload | Operations/Sec | p95 Read Latency | p95 Write/Update Latency |
|---|---|---|---|
| A (50/50 read/update) | 68,532 | 0.168 ms | 0.954 ms |
| B (95/5 read/update) | 154,291 | 0.203 ms | 0.101 ms |
| C (read-only) | 166,365 | 0.141 ms | n/a |
| D (read-latest/insert) | 266,170 | 0.023 ms | 0.477 ms (insert) |
| E (scan/insert) | 55,319 | 0.415 ms (scan) | 0.797 ms (insert) |
| F (read/RMW) | 64,229 | 0.173 ms | 0.940 ms (RMW) |
What These Numbers Mean for Agentic Systems
KyraDB queries follow a deterministic path: a natural language query resolves to a CompiledQuerySpec, routes through the intent engine, assembles context across read models, and commits an immutable ContextBundle the agent acts on. Every stage depends on the storage layer performing. These results validate that it does.
Context log append path
Workload D: 266,170 ops/s, 0.023 ms p95 read
The write path in KyraDB does one thing — append to the immutable context log and return a causal token. This workload directly mirrors that pattern. 266k ops/s means ingestion from Slack, CRM, HRIS, and agent write-backs never queues.
Read model queries — context retrieval and RAG
Workloads B & C: 154k–166k ops/s, sub-0.21 ms
Vector, document, and search read models all derive from the context log. These throughput numbers represent the storage-layer ceiling for concurrent knowledge queries inside LLM reasoning loops — without bottlenecking agent execution.
Graph-primed relational queries
Workloads A & F: 64k–68k ops/s, sub-ms updates
Deep relational queries in KyraDB are graph-primed before semantic retrieval. The continuous read-modify-write pattern of Workload F mirrors how the graph read model is updated as new signals arrive from the context log — 0.940–0.954 ms p95 keeps this path well within agent response budgets.
Episodic memory and causal delta retrieval
Workload E: 55,319 ops/s, 0.415 ms scan
Compliance queries, causal history reconstruction, and agent audit trails all require sequential scan access. 55k ops/s at sub-0.5 ms scan latency keeps these paths responsive without dedicated infrastructure.
KyraDB maintains strict sub-millisecond p95 latency across all core operations. For agents operating on shared, causally-consistent context, the storage layer is never the bottleneck.
Agent load benchmarking results coming next.