Two reasons, always the same.
Every team that walked us through a DynamoDB-to-VeltrixDB migration showed up with the same two concerns. The bill, and the tail.
If you're reading this guide, you probably already know your why. But for the engineering manager you'll be selling this migration to, the case lands cleanest as two numbers:
- Cost · DynamoDB on-demand bills at roughly
$1.00 per million blended ops. At 10 B ops/month that's $10K. Most VeltrixDB customers land between 3–5× lower on the same workload. The cost calculator is the fastest way to see the gap. - Tail latency · DynamoDB's published P99 is ~10ms for single-key reads. Real-world it's closer to 12-18ms with throttling and burst-bucket effects. VeltrixDB is
4.9mssustained, with no throttling tier.
The rest of this guide is the actual playbook — the same one our solutions architects walk Enterprise customers through. Two sprints. No big-bang cutover. Rollback at every stage.
Confirm you're in scope — before you book the project.
VeltrixDB is a scalpel, not a Swiss Army knife. Spend an hour validating fit before you spend two sprints validating performance.
Before you start, run the table through this checklist. If any row is a "no," talk to us before booking the migration — there's a good chance VeltrixDB is the wrong tool for that workload.
- Your hot path is point lookups — GET / SET / DEL / HGET-style. No
Querycalls with sort-key range conditions orScan. - No global secondary indexes (GSIs) on the critical path. Or you can model the GSI access patterns as separate keys.
- You have a partition key in DynamoDB that can map 1:1 to a VeltrixDB key. Composite (PK + SK) keys map to
pk:skstring concatenations. - Values are ≤ 400 KB (DynamoDB's own limit). VeltrixDB supports up to 64 MB, but smaller values keep your RAM budget honest.
- You can run on bare-metal NVMe or local SSD. EBS, Azure managed disks, and any network-attached storage are not supported.
If you're using DynamoDB Streams · we ship a sidecar that consumes a CDC stream from your existing table during the migration window — you keep the rest of your event-driven architecture intact. Drop us a note before sprint 1 and we'll provision the right consumer for your event volume.
Two sprints. Five stages.
No big-bang cutover. Every stage runs DynamoDB in parallel until the very last hour. Rollback is one config flip away at every checkpoint.
ExportTableToPointInTime to S3, then run our bulk-importer. ~3M writes/sec sustained — a 1B-key table fills in about 5.5 minutes. Validate row count and a sample of values match.Terraform you can paste into your repo.
Three resources, eight inputs. Drop into your existing EKS module, change the instance type to a local-NVMe SKU, run terraform apply.
This snippet provisions a 3-node EKS-hosted VeltrixDB cluster on i3en.6xlarge instances (the AWS equivalent of our reference GCP rig). Adjust node count, instance class, or replace the EKS bits with GKE/AKS modules as needed.
# Helm release wrapping the VeltrixDB chart resource "helm_release" "veltrixdb" { name = "veltrixdb" repository = "https://charts.veltrixdb.com" chart = "veltrixdb" version = "0.9.4" namespace = "veltrixdb" create_namespace = true values = [templatefile("./values.yaml.tpl", { nodes = 3 cache_gb = 256 nvme_gb = 3000 storage_class = "veltrixdb-nvme" region = "us-east-1" })] } # Local NVMe StorageClass (i3en family) resource "kubernetes_storage_class" "nvme" { metadata { name = "veltrixdb-nvme" } storage_provisioner = "kubernetes.io/no-provisioner" volume_binding_mode = "WaitForFirstConsumer" parameters = { type = "local-ssd" fsType = "ext4" } } # Node group: i3en.6xlarge — 24 vCPU, 192 GB RAM, 7.5 TB local NVMe module "veltrix_nodes" { source = "terraform-aws-modules/eks/aws//modules/eks-managed-node-group" cluster_name = var.eks_cluster_name name = "veltrixdb-nodes" instance_types = ["i3en.6xlarge"] desired_size = 3 min_size = 3 max_size = 6 labels = { workload = "veltrixdb" } taints = [{ key = "workload", value = "veltrixdb", effect = "NO_SCHEDULE" }] }
Bulk-import from DynamoDB
The import runs as a Kubernetes Job that streams from your DynamoDB S3 export directly into the new VeltrixDB cluster. It's idempotent — safe to re-run if it crashes partway through.
$ aws dynamodb export-table-to-point-in-time \ --table-arn arn:aws:dynamodb:us-east-1:123456789012:table/sessions \ --s3-bucket veltrix-migration-staging \ --export-format DYNAMODB_JSON $ kubectl apply -f - <<EOF apiVersion: batch/v1 kind: Job metadata: { name: veltrix-import, namespace: veltrixdb } spec: template: spec: containers: - name: importer image: ghcr.io/veltrixdb/ddb-importer:0.9.4 args: ["--s3-prefix", "s3://veltrix-migration-staging/AWSDynamoDB/.../"] restartPolicy: OnFailure EOF $ kubectl logs -f -n veltrixdb job/veltrix-import imported 247,308,124 rows · 5m23s · 0 mismatches · ✓
Fan out writes, keep DynamoDB as the source of truth.
During sprint 1, every write goes to both databases synchronously. If divergence climbs above 0.1%, an alarm fires and you fix it before you ever flip reads.
The dual-write step is where most botched migrations go wrong. The temptation is to fan out writes asynchronously to keep latency down — don't. Async fan-out hides bugs that only surface during cutover. Run the writes synchronously, in parallel, and accept the latency cost during the migration window.
Reference implementation
The pattern, in pseudocode, looks like this. The key invariants: write to DynamoDB first (still source of truth), then write to VeltrixDB, then log any inconsistencies — but never roll back a successful DynamoDB write because the VeltrixDB write failed.
func (r *Repo) Put(ctx context.Context, k string, v []byte) error { // 1. Source of truth still wins if err := r.ddb.Put(ctx, k, v); err != nil { return err } // 2. Mirror to VeltrixDB · best-effort go func() { cctx, cancel := context.WithTimeout(ctx, 200*time.Millisecond) defer cancel() if err := r.veltrix.Put(cctx, k, v); err != nil { metrics.DualWriteFail.WithLabelValues("veltrix").Inc() log.Warn("veltrix mirror failed", "key", k, "err", err) } }() return nil }
Watch the divergence metric. A healthy dual-write phase should see <0.01% mirror failures, almost entirely network blips. If you see >0.1%, something is wrong — most often a payload that exceeds 400 KB in DynamoDB but fits in VeltrixDB's 64 MB envelope and gets corrupted during your encoder's roundtrip. Fix before proceeding.
Read both. Trust neither. Compare everything.
Before flipping the read path, you spend 48 hours reading from both databases and comparing the answers byte-for-byte. This is where you catch the bugs that bulk-import didn't.
The shadow-read phase runs on the same dual-write pattern, inverted. For every read your application makes, fire a second read against VeltrixDB, compare, log mismatches, and surface the side-by-side P99 chart in Grafana. Your users still see DynamoDB's answer — VeltrixDB is on probation.
- Mismatch rate
< 0.001%over a full weekend cycle (catches any time-zone or weekend-batch effects) - VeltrixDB P99 read latency
< 8msfor 4 hours straight under peak prod load - Cache hit rate
> 90%in steady state — the LIRS cache should be warm by now - Zero
5xxerrors on the VeltrixDB endpoint during the window - An incident runbook merged that documents the rollback procedure for the next stage
5 → 25 → 50 → 100. No big-bang flips.
Move the read traffic in four feature-flag controlled steps. Each step must be stable for four hours before the next. Rollback at any step is a single config change.
We've never seen a migration fail at this stage if shadow-reads were clean — but we still walk through the cohorts. Discipline here is what makes the postmortem boring.
read_source=veltrix applied to a 5% user-id hash range. Watch P99, error rate, customer-support tickets. No deploys during this window.Rollback
If anything goes wrong at any step, the rollback is a single feature-flag flip: read_source=dynamodb. Because writes are still going to both databases, DynamoDB never went stale. You lose nothing, and you debug at leisure.
In four migrations to date, zero have needed a rollback at the cutover stage. The two times shadow-reads surfaced a problem (a serialization quirk and a Unicode normalization edge case), the team caught it during the 48-hour shadow window, fixed it, and proceeded clean. That's the entire point of running shadow-reads.
Decommission, save the receipts.
A week after cutover, drop the dual-write fan-out and put the DynamoDB table into read-only mode. Keep it warm for a month as cold storage. Then archive it.
The decommission step is procedural — no surprises. Drop the dual_writer.go path from your repo, point the DynamoDB IAM role at a read-only policy, and schedule a calendar reminder for 30 days out to delete the table. Most teams keep the table around longer than they need to — that's fine, and the cost is trivial.
What's not trivial: save the DynamoDB invoice from the month before cutover. Compare it to the VeltrixDB invoice three months later. That delta is the receipt you bring to the next budget review. We've seen customers cut their annual database spend by $340K to $1.2M, and that's the number your CFO will want to see in writing.