KyraDB YCSB Performance: Sub-Millisecond Latency for Agentic Architectures

May 20, 2026

KyraDB YCSB performance spotlight on Azure

KyraDB is an event-sourced context engine built for the agent era. Its core architecture — an immutable context log as the single source of truth, with graph, vector, search, ATOM facts, and identity as async read model projections — is designed for one specific constraint: agents need causally-consistent, auditable context at low latency, at scale.

We ran the standard Yahoo! Cloud Serving Benchmark (YCSB) to validate the storage layer that underpins all of this.

Benchmark Configuration

The test was run under the following conditions:

Infrastructure: Azure Standard_L16as_v4
Dataset Target: 128 GiB logical, 109,425,919 records
Execution: 16 client threads, 64 partitions
Iterations: 1 warmup + 3 measured iterations, 1M ops per measured workload iteration
Build: Release

YCSB Results

Workload	Operations/Sec	p95 Read Latency	p95 Write/Update Latency
A (50/50 read/update)	68,532	0.168 ms	0.954 ms
B (95/5 read/update)	154,291	0.203 ms	0.101 ms
C (read-only)	166,365	0.141 ms	n/a
D (read-latest/insert)	266,170	0.023 ms	0.477 ms (insert)
E (scan/insert)	55,319	0.415 ms (scan)	0.797 ms (insert)
F (read/RMW)	64,229	0.173 ms	0.940 ms (RMW)

What These Numbers Mean for Agentic Systems

KyraDB queries follow a deterministic path: a natural language query resolves to a CompiledQuerySpec, routes through the intent engine, assembles context across read models, and commits an immutable ContextBundle the agent acts on. Every stage depends on the storage layer performing. These results validate that it does.

Context log append path

Workload D: 266,170 ops/s, 0.023 ms p95 read

The write path in KyraDB does one thing — append to the immutable context log and return a causal token. This workload directly mirrors that pattern. 266k ops/s means ingestion from Slack, CRM, HRIS, and agent write-backs never queues.

Read model queries — context retrieval and RAG

Workloads B & C: 154k–166k ops/s, sub-0.21 ms

Vector, document, and search read models all derive from the context log. These throughput numbers represent the storage-layer ceiling for concurrent knowledge queries inside LLM reasoning loops — without bottlenecking agent execution.

Graph-primed relational queries

Workloads A & F: 64k–68k ops/s, sub-ms updates

Deep relational queries in KyraDB are graph-primed before semantic retrieval. The continuous read-modify-write pattern of Workload F mirrors how the graph read model is updated as new signals arrive from the context log — 0.940–0.954 ms p95 keeps this path well within agent response budgets.

Episodic memory and causal delta retrieval

Workload E: 55,319 ops/s, 0.415 ms scan

Compliance queries, causal history reconstruction, and agent audit trails all require sequential scan access. 55k ops/s at sub-0.5 ms scan latency keeps these paths responsive without dedicated infrastructure.

KyraDB maintains strict sub-millisecond p95 latency across all core operations. For agents operating on shared, causally-consistent context, the storage layer is never the bottleneck.

Agent load benchmarking results coming next.