BENCHMARKED1B keys · 70/30 read/write · NVMe · May 2026
The legacy floor 33ms P99 ————▶ The new floor

4.9ms

P99 read latency 1 billion keys Commodity NVMe

VeltrixDB is the extreme point-lookup engine for teams whose Redis bill grew faster than their revenue — a persistent, NVMe-backed alternative built for microsecond reads at billions of keys. Closed source. Bare-metal honest.

<5ms
P99 Latency
Sustained at 1B keys, mixed read/write load.
2M+
Requests / sec
Per node, with linear horizontal scale.
1B+
Keys / cluster
3-node cluster. 256 GB cache · 3 TB NVMe.
3–5×
Cost reduction
Versus DynamoDB at equivalent scale.
Why VeltrixDB

What actually changes when you switch.

Not a feature checklist. The real outcomes your engineers, finance team, and users feel — within weeks of deployment.

01 — Latency

Latency that never surprises you.

Sub-5ms reads at any scale, under any write load. No degradation during flash sales. No spikes when background cleanup runs. Consistent — always.

<5msP99 — guaranteed
02 — Cost

A bill you can actually predict.

Pay for fixed infrastructure, not per million operations. Your bill doesn't spike when a campaign goes viral or a cron job runs a full-table scan.

3–5×Lower vs DynamoDB
03 — Portability

Any cloud. Zero lock-in.

One Helm chart runs identically on GKE, EKS, AKS, or bare metal. Switch providers without rewriting your data layer or renegotiating vendor contracts.

4 platformsGKE · EKS · AKS · Bare
04 — Compaction

Background work, invisible to users.

Compaction and GC run on a completely separate I/O path. Your users never feel it. No more 3 AM pages because cleanup chose the wrong moment.

0 interferenceSeparate I/O path
05 — Observability

Observability from day one.

50+ Prometheus metrics, pre-wired Grafana dashboards, and alert rules ship in the Helm chart. No custom instrumentation. No dark weeks.

50+ metricsServiceMonitor · Alerts
06 — Time to prod

Zero to production in minutes.

One Helm command deploys storage, replication, auto-scaling, and monitoring. No weeks of tuning. No specialist hire. Day-one productive.

94shelm install → ready
Sound familiar

Every fast-growing team hits the same wall.

Your database was fine at 1M users. Somewhere between 10M and 100M, things got expensive. Then slow. Then both.

Our database bill tripled and we didn't ship a single feature.

More data means more reads. More reads mean bigger Redis clusters or paying DynamoDB per million ops. The bill grows faster than revenue.

— VP Engineering, Series C
Latency is fine at 9 AM, then spikes to 500ms at noon.

Traditional databases run internal cleanup jobs that fight your user requests for disk access. Peak traffic + compaction = your worst nightmare.

— Staff SRE, fintech
We're burning through SSDs every few months.

Databases without key-value separation rewrite your data over and over. Every unnecessary rewrite is wear on expensive NVMe — a tax on your latency.

— Platform Lead, ad-tech
Compared

Why replace what you're running today.

A feature-by-feature look at how VeltrixDB stacks up against the databases teams typically outgrow.

Capability Redis DynamoDB Cassandra VeltrixDB
Sub-5ms P99 at 1B keys In-memory only ~12ms typical ~80ms typical 4.9ms · NVMe-backed
Predictable infra cost RAM-bound Pay per million ops Ops overhead Fixed infrastructure
Compaction-free reads No compaction Hidden, but spikes Major spikes Separate I/O path
Cloud portability Self-managed work AWS-only Yes Helm · any cloud
Observability included Roll your own CloudWatch only Roll your own 50+ metrics + Grafana
Time to production Days Minutes Weeks 94 seconds
By the numbers

Results that speak for themselves.

Benchmarked on real cloud hardware — GCP N2 nodes with 8 NVMe SSDs, 64 cores, 480 GB RAM.

85%
Reduction in read latency
From 33ms down to 4.9ms P99
10×
Less unnecessary rewriting
Compaction only touches keys, never values
3–5×
Lower infrastructure cost
vs DynamoDB at equivalent scale
1B+
Keys on a 3-node cluster
256 GB cache · 3 TB NVMe per node

P99 read latency at 1 billion keys — head to head

Lower is better. Sustained mixed read/write load. Equivalent hardware.

— ms (P99) →
Cassandra
~80ms
~80ms P99
LSM-tree DB
~33ms
~33ms P99
DynamoDB
~12ms
~12ms P99
VeltrixDB · miss
4.9ms
4.9ms · cache miss
VeltrixDB · hit
0.3ms
0.3ms · cache hit
VeltrixDB achieves <5ms P99 even on cache misses because values are read directly from NVMe via zero-copy io_uring — no compaction, no page-cache eviction, no surprises.
Where it fits

Built for teams where microseconds move money.

We don't pretend to be a general-purpose database. VeltrixDB is a hyper-specialized point-lookup engine — if you're running one of these workloads at extreme scale, this was built for you.

01 — AdTech & Real-Time Bidding

User profile lookups inside the 50ms RTB budget.

Fetch the user profile, score the bid, write the impression — millions of times per second. Point lookups are the entire workload. Range queries aren't needed. Your RAM budget and bare-metal NVMe match the architecture perfectly.

User profilesBid scoringImpression writesDSP / SSP
02 — HFT & Crypto Exchanges

Lock-free architecture, microsecond determinism.

Order books, position state, and risk checks where every microsecond is a P&L line item. Lock-free shards, dedicated I/O queues, and zero-syscall reads — the architecture was built for trading desks who never accept jitter.

Order booksPositionsRisk stateTick storage
03 — Persistent caches & session stores

Redis-class speed, without RAM-class bills.

When your dataset has outgrown the budget of "keep everything in RAM," but you still need sub-millisecond access. DRAM index over NVMe values gives you 5–10× the working set on the same hardware spend — persistent, durable, no rehydration after restart.

Session storesHot cachesCDN edgeFeature lookups
04 — Fraud detection & rate limiting

Continuous writes, instantaneous lookups.

Stream transactions in, score them against history, decide in single-digit milliseconds. Write amplification of 1.0× means your NVMe survives the write rate; point-lookup speed means your risk engine never has to fall back to "approve and reconcile."

Fraud scoringRate limitsVelocity checksUser history
Pricing

Predictable pricing, no surprise bills.

Fixed per-cluster pricing. No charge per million ops. No egress surprises. Scale your data, not your invoice.

Starter

For early-stage teams

Validate sub-5ms reads on your workload before committing.

$499/ month
  • 3-node cluster, single region
  • Up to 100M keys
  • 64 GB cache · 1 TB NVMe per node
  • Community support · 8×5 response
  • 50+ Prometheus metrics
Enterprise

For Fortune 500 platforms

Dedicated capacity, custom security review, named TAM.

Custom
  • Unlimited cluster size + global active-active
  • 99.99% uptime SLA · contractual
  • Dedicated TAM + solutions architect
  • Custom security review + DPA
  • Bring-your-own-cloud or air-gapped
  • Migration services included
Cost calculator

Move the slider. Watch the savings.

Pricing assumes a 70/30 read/write workload. DynamoDB On-Demand is benchmarked at $1.00 per million blended ops (RCUs + WCUs + storage + IO).

How much would your team save per month?

Drag to see the gap. We'll model your exact workload during the demo — these are starter numbers.

You save $7,260 3.6× cheaper
Monthly operations 10 billion ops/mo
1B25B50B75B100B
DynamoDB On-DemandAWS
$10,000/ mo
$1.00 / M blended ops · 10B ops → $10,000
VeltrixDBFIXED
$2,740/ mo
Growth tier $2,490 + 5B overflow · same NVMe hardware
Numbers are illustrative · actual quote depends on read/write mix, payload, region. Get an exact quote for your workload →
Architecture

Fast, simple, and it just works.

All the complexity is hidden. You write data. You read data. At any scale. Without weekend on-call incidents.

› Live data path · 1024 shards Read packet Write packet
AppWALVLogIndexCache
Single-key write · 128 B value · TCP binary protocol
PATH 01 / 03 — DURABLE WRITE

Value lands on NVMe before ack.

Group-commit WAL batches concurrent writes into a single fsync. The value goes straight to the VLog via O_DIRECT — the index keeps only a 24-byte pointer. Zero write amplification.

01App → WAL~12µs
02WAL fsync (group-commit)~80µs
03VLog → NVMe O_DIRECT~30µs
04Index pointer update~8µs
05Cache warm → ack~5µs
End-to-end · P99 ~0.2 ms
Under the hood

Not a wrapper. A storage engine, rebuilt.

We didn't put a faster cache in front of a slow database. We re-engineered the storage layer at the kernel boundary — two technical bets explain everything.

Technical whitepaper · 42 pages

Want the full architectural document?

Block-by-block diagrams of the write/read paths, the durability proof behind the 99.999% SLA, the sharding model, the LIRS cache internals, and the chaos-engineering failure matrix — all in one PDF. Written for the engineer who'll be on call for this in production.

9 chapters 18 diagrams v0.9.4 · May 2026
01
Storage engine

Key–value separation, WiscKey-style.

Classical LSM trees rewrite every value during compaction — a 10× write amplification tax that destroys NVMe lifetime and detonates tail latency. We split keys from values.

Index shards hold only 24-byte VLog pointers — small, cache-resident, and fast to compact. Values live in an append-only VLog on NVMe and are never rewritten by compaction. Garbage collection runs on a dedicated I/O queue that cannot stall reads.

The benchmark we expose on the performance page shows write amplification of 1.0× sustained — RocksDB and Cassandra typically run at 6×–12×. Multiply that by your NVMe replacement cycle and the cost story writes itself.

Write amp
1.0×
vs 6–12× elsewhere
Index pointer
24 B
per key, cache-resident
GC interference
0
separate I/O queue
Inspired by the WiscKey paper (FAST '16) — production-hardened for sharded multi-cloud deployments.
See the benchmark Cost impact
02
Kernel I/O

io_uring with SQPOLL. Zero syscalls on the hot path.

Traditional databases pay a syscall and context-switch tax on every read. We bypass it entirely — the kernel polls our submission queue, and read latency collapses.

In io_uring SQPOLL mode, a kernel thread polls a shared ring buffer for I/O submissions. Our hot path issues reads with zero syscalls per operation. O_DIRECT sidesteps the page cache for deterministic NVMe latency — no eviction noise, no surprise jitter from another tenant.

Each of our 1024 shards owns a dedicated submission queue. At the kernel boundary it looks less like a database and more like a 1024-way parallel storage controller. The architecture is laid out in our technical whitepaper, and you can see the result live on the cluster dashboard.

Syscalls / read
0
SQPOLL kernel thread
Parallel queues
1024
one per shard
P99 NVMe read
4.9 ms
under sustained writes
Closed source. The architecture is fully documented — happy to walk a security-review team through every component.
Get the whitepaper Live cluster metrics
Transparency

Live cluster, live metrics.

A snapshot from one of our internal benchmark clusters. Every number on this page traces back to a metric you can read in Prometheus.

veltrixdb · cluster-prod-01 · us-east-1 Snapshot · May 21 2026 · 14:42 UTC · uptime 187 d
Active keys
1.04B
↗ +2.1M / hr
Reads scrubbed (lifetime)
108B+
14.3k/s sustained
Writes processed
3.42B+
WAL group-commit 80µs
P99 read latency
4.9 ms
stable for 187 d
Read latency · P50 / P95 / P99last 24h
Ops/sec per shard1024 shards
Cache hit rate 94.2% · Write amp 1.0× · SLA breaches 0 Exposed via /metrics · Prometheus + Grafana ship in the Helm chart
Honest fit check

What it won't do — and what it needs.

We'd rather lose the deal than oversell. VeltrixDB is a scalpel, not a Swiss Army knife — here's the truth about hardware, query patterns, and memory before we get on a call.

Hard requirements we won't bend on.

Last updated · May 2026
Bare-metal NVMe or local SSD only
Optimised strictly for direct-attached NVMe with Linux io_uring. Not recommended for AWS EBS, Azure managed disks, or any network-attached storage — they break the latency model.
Point lookups only
Purpose-built for high-throughput GET / SET / DEL at scale. Range queries, secondary indexes, joins, and full-text search are not supported natively — wrong workload for VeltrixDB.
RAM budget for the ART index
The adaptive radix index lives in DRAM. Plan ~12 GB RAM per 100M keys (varies with key size). 1B keys on a node needs ~120 GB RAM. You will not run this on a t3.medium.
Linux kernel ≥ 5.10 required for io_uring SQPOLL · AVX2 CPU baseline · 3-node minimum for production durability Talk to an architect before you commit →
Global Infrastructure

One binary. Any cloud. Anywhere.

Deploy on GKE, EKS, AKS, or bare metal with a single Helm command. Identical NVMe performance everywhere. No retuning, no specialist hire, no lock-in.

Edge presence

Replicate across 7 continents — without paying egress to do it.

Active-active replication keeps every region within milliseconds of writes — failover is automatic, traffic shifts before your alerting fires. Run on your own VPC, your own NVMe, your own networking.

14
Active regions
99.999%
SLA at edge
GCP
Google Cloud · GKE
n2-highmem-96
Local NVMe SSDs
AWS
Amazon · EKS
i3en / im4gn
Local NVMe
AZURE
Microsoft · AKS
Lsv3 NVMe series
Full StorageClass
SELF-HOST
Bare metal
Direct NVMe · any Linux
No hypervisor overhead
~/veltrixdb · zsh
$ helm repo add veltrixdb https://charts.veltrixdb.com
"veltrixdb" has been added to your repositories
$ helm install veltrixdb veltrixdb/veltrixdb \
    --namespace veltrixdb --create-namespace
StorageClass · veltrixdb-nvme
StatefulSet · 3 replicas, anti-affinity
ServiceMonitor · 50+ metrics
PodDisruptionBudget · minAvailable 2
Ready in 94s
$
⎈ Kubernetes-native
StatefulSet · anti-affinity · PDB
⚓ Helm chart
one command · full values.yaml
🤖 K8s operator
auto-scale · reshard · self-heal
📊 Prometheus + Grafana
50+ metrics · alerts · dashboards
FAQ

Questions every CTO asks first.

The exact concerns we hear in every initial conversation — answered up front so you don't have to chase them.

Is VeltrixDB a drop-in replacement for Redis or DynamoDB?
For point-lookup workloads — yes. We ship a Redis-wire-compatible mode and a native client. If your application uses GET, SET, DEL, HGET-style commands, swap the connection string and you're running. If you rely on range scans, secondary indexes, or full-text search, VeltrixDB is the wrong tool — we'll tell you that in the demo.
How do you actually achieve sub-5ms P99 on disk-backed reads?
Three things: io_uring SQPOLL eliminates syscalls on the hot path, a 256 GB LIRS cache absorbs ~94% of reads, and key-value separation means compaction never touches the values — so background work never competes with your user requests for NVMe bandwidth.
What does the bill actually look like at scale?
You pay for the infrastructure (NVMe nodes), not per million operations. A 3-node Growth cluster serving 2M ops/sec typically lands between $2,490 and $4,800/month in cloud costs — customers migrating from DynamoDB usually see 3–5× reduction. We'll model your workload during the demo.
How does data durability and replication work?
Synchronous WAL with group-commit fsync. Multi-region clusters replicate over QUIC with active-active conflict resolution. RPO is zero for single-region failures, <30s for cross-region. The 99.99% SLA is contractual on Growth and Enterprise tiers.
What's the security and compliance story?
SOC 2 Type II and ISO 27001 certified. AES-256 encryption at rest, TLS 1.3 in transit, mTLS between nodes. GDPR and HIPAA compliant. Enterprise customers get custom DPAs, BYOK / KMS, VPC peering, and air-gapped deployment options.
How long does production migration take?
Median: two sprints. Helm install is 94 seconds. Then shadow-read traffic for 3–7 days to validate, then flip the connection. Enterprise migrations include a dedicated solutions architect end-to-end — no extra cost.
What happens if I outgrow my plan?
Clusters scale online — add nodes, the operator reshards automatically without read interruption. You can upgrade tiers any time with no data migration; we just unlock the additional capacity and support level on your existing cluster.
Is there a free trial or open-source version?
Yes — 14-day free trial on the Starter tier (full feature parity, just smaller capacity). Core client libraries are MIT-licensed. The server is source-available under a BSL-style license that converts to Apache 2.0 after 4 years.
Resources

Built for teams who read the docs.

Every claim on this page has a benchmark, a paper, or a config file behind it. Go deep before you commit.

Get the benchmark report in your inbox.

One email. The full methodology, raw P99 distributions, and a side-by-side cost analysis vs DynamoDB, Redis Enterprise, and Cassandra at 1B keys.

Email us on veltrixdb@gmail.com
Why we built this

Born from a 3 AM page.

VeltrixDB started the night a flash sale buried our Redis cluster under 8 GB of session state. The on-call channel filled with screenshots of P99 climbing past 400 ms. The bill that quarter was already 3× what we'd budgeted.

We tried every workaround you've tried — read replicas, write batching, a DynamoDB migration that ended with a quarter of unpredictable per-op billing. None of it solved the actual problem.

The real problem was that every popular database rewrites your values during compaction. It's a 10× amplification tax on storage and a guaranteed spike in tail latency exactly when you can least afford it.

So we built VeltrixDB the way we wished a database had been built when we were on call: key–value separation so compaction never touches values, io_uring so the kernel never gets in the way, and fixed cluster pricing so the bill never surprises you.

Now we ship it to teams who recognise their own war stories in ours.

AG
Akanksha Gupta
Founder, VeltrixDB

Go from 33ms to <5ms this quarter.

33ms P99 ───────▶ 4.9ms P99
No commitment · 30-min session · Free migration analysis included