ElastiCache Graviton vs Intel: m6g.large vs m5.large Valkey Benchmark

Valkey 9.0 and 7.2 on Amazon ElastiCache, single-node non-cluster, one-hour memtier_benchmark runs, us-east-1 on-demand pricing. Last updated June 2026.

AWS has been steering more ElastiCache users toward Graviton-backed nodes, and the pitch is familiar: better price/performance. That makes cache.m6g.large vs cache.m5.large a practical architecture question, not a silicon preference. If you are choosing a new Redis OSS or Valkey node family, do you get the Graviton discount without giving up throughput?

This benchmark compares Graviton2-backed cache.m6g.large and Intel-backed cache.m5.large on Valkey 9.0 for Amazon ElastiCache and Valkey 7.2. The load generator is memtier_benchmark, running from ECS Fargate against a single ElastiCache replication group.

The short version: Graviton m6g.large is the cheaper node, about 4.5% lower than m5.large in these us-east-1 on-demand runs. On a single saturated Valkey 9.0 run it also looked slower, with Intel m5.large posting 18.6% more throughput. But after running that pair three times on each node family, the lead did not hold up cleanly: it ranged from a 6% Graviton win to a 25% Intel win (median 18.6% Intel). What did hold up is that Graviton was cheaper and more predictable overall: its run-to-run throughput range was tighter, while Intel's within-run coefficient of variation reached nearly 12% in two repeats. On Valkey 7.2 the two were tied.

Bottom line: for ElastiCache, Graviton is a strong default. It is consistently cheaper, latency is excellent, and it was the more predictable performer overall. Intel m5.large can deliver more peak throughput, and the median here favored Intel, but the direction was not stable enough to treat raw speed as guaranteed either way. For hot, saturated nodes, benchmark your own workload before assuming either architecture wins.

The Latest Runs At A Glance

The runs below cover two engine versions and two load levels. Both node types are two-vCPU "large" instances with roughly 6.4 GB of cache memory, so this is a like-for-like core-count comparison. The 9-worker runs were calibrated to reach maxmemory and sustain evictions, see Designing for the Cliff for the calibration background.

Run Node Engine Load Avg ops/s Avg latency $/hr
m6g, 9.0, 9 workers m6g.large Valkey 9.0 Saturated 45,239 0.40 ms $0.149
m5, 9.0, 9 workers m5.large Valkey 9.0 Saturated 53,641 0.33 ms $0.156
m6g, 7.2, 9 workers m6g.large Valkey 7.2 Saturated 49,702 0.36 ms $0.149
m5, 7.2, 9 workers m5.large Valkey 7.2 Saturated 49,105 0.36 ms $0.156
m6g, 9.0, 3 workers m6g.large Valkey 9.0 Moderate 18,919 0.31 ms $0.149
m5, 9.0, 3 workers m5.large Valkey 9.0 Moderate 13,630 0.44 ms $0.156

The saturated Valkey 9.0 figures are medians of three runs per node, the run-to-run spread, including one run where Graviton came out ahead, is in the next section. The moderate-load pair is directional only: the hit rates and memory pressure were not closely matched, and the m5 run had a 10.0% throughput coefficient of variation. The cleanest apples-to-apples comparison is the 9-worker Valkey 9.0 pair, run three times.

m6g.large Vs m5.large: Valkey 9.0 Saturated Benchmark

This is the comparison I trust most, so I ran it more than once. Same engine version, same 9-worker load profile, both nodes driven to maxmemory, both running for a full hour, and the whole pair provisioned and run three times (r1-r3) on fresh ElastiCache hosts to see whether a single-run result would survive replication. Within each run-pair the two nodes saw matched load, across the set the cache pressure varied (hit rate ranged from about 5% to 54%), which is part of what the repeats exposed.

It did not survive. A single run had Intel ahead by 18.6%, three runs tell a different story:

Run Graviton m6g ops/s Intel m5 ops/s Faster node
r1 (heavy eviction, ~5-6% hit) 45,239 53,641 Intel +18.6%
r2 (~50% hit) 43,917 54,962 Intel +25.1%
r3 (~50% hit, best-matched) 48,838 46,123 Graviton +5.9%
Median 45,239 53,641 Intel +18.6% (range: Graviton +6% to Intel +25%)

The direction is not stable. Intel led twice and Graviton led once, and the cleanest-matched pair of the three (r3, ~50% hit on both nodes) went to Graviton. The headline metrics, read as medians across the three runs:

Metric (median of 3 runs) Graviton m6g.large Intel m5.large Reading
Median throughput 45,239 ops/s 53,641 ops/s Intel at the median, not every run
Run-to-run throughput range 43,917 - 48,838 46,123 - 54,962 Graviton tighter
Within-run stability (CV) 0.6 - 1.3% 0.2 - 11.8% Graviton steadier overall, Intel steadier in r1
Median avg latency 0.395 ms 0.333 ms Intel, but close
Median client p99 latency 0.43 ms 0.39 ms Intel, but close
On-demand list price $0.149/hr $0.156/hr Graviton (-4.5%)

The single run said Intel won outright. Three independent runs say something more useful and more honest: the throughput lead is not reproducible as a rule. It swung from a 6% Graviton win to a 25% Intel win. Graviton had the tighter run-to-run throughput range and held its coefficient of variation under 1.3% in all three runs, Intel was steadier in r1, but its CV reached nearly 12% in r2 and r3. Latency stayed close throughout, a few tens of microseconds apart.

The verdict, now that it is backed by repeated runs rather than one: at this node size, neither architecture has a dependable architecture-wide raw-throughput edge on saturated Valkey 9.0. The median favored Intel, but one well-matched repeat flipped to Graviton, so do not treat raw throughput as a guaranteed m5 advantage. The two results that were most useful for decisions are the consistent ones: Graviton is about 4.5% cheaper, and Graviton was more predictable overall. That is the generalizable takeaway, and it is more useful than the first single run implied.

Valkey 7.2 At Saturation: Throughput Tie, Different Cache Profile

On Valkey 7.2, the raw throughput gap almost disappears: 49,702 ops/s on m6g.large vs 49,105 ops/s on m5.large, with essentially identical average latency. That looks like a dead heat from a throughput standpoint.

The caveat is cache profile. Both runs reached maxmemory and sustained evictions, but the m6g run reported a much higher average cache-hit rate than the m5 run. I would not use this pair to claim one architecture is universally better. I would use it to say that the Valkey 9.0 Intel advantage did not repeat cleanly on Valkey 7.2, and that engine version plus workload shape can matter more than CPU family.

That lines up with the broader engine-version pattern in Redis 7.1 vs Valkey 7.2 / 8.2 / 9.0 on ElastiCache: the engine and workload can move the result as much as the instance family.

Price And Price/Performance

Graviton's structural advantage is the hourly bill. In these us-east-1 on-demand runs, Amazon ElastiCache pricing put cache.m6g.large at $0.149/node-hour and cache.m5.large at $0.156/node-hour. That is a small but real discount:

Node $/hr $/month (730h) $/year
m6g.large (Graviton) $0.149 $108.77 $1,305
m5.large (Intel) $0.156 $113.88 $1,367
Difference -$0.007 -$5.11 -$61 / node

Single-node, on-demand, us-east-1. A production HA pair roughly doubles both the spend and the saving. Reserved nodes, Database Savings Plans, and other regions change the absolute numbers.

Hourly price is only half of price/performance. When normalized to estimated cost per billion operations, the winner depends on the run:

Workload Graviton $/B ops Intel $/B ops Cheaper per op
Valkey 9.0, saturated (9 workers) $0.92 $0.81 Intel at median (flipped to Graviton in 1 of 3 runs)
Valkey 7.2, saturated (9 workers) $0.83 $0.88 Graviton
Valkey 9.0, moderate (3 workers) $2.19 $3.18 Graviton

Moderate-load pair not fully workload-matched, directional only.

So the per-operation winner depends on the run, not just the engine version. On Valkey 9.0 the median favored Intel, but the run where Graviton outperformed Intel also made Graviton cheaper per operation, so even this "Intel wins on cost per op" line is a median, not a rule. The dependable part is the 4.5% lower hourly price, the throughput that sits on top of it is too noisy to bank on. AWS's own ElastiCache cost optimization guide is worth reading alongside benchmark results, because the biggest savings usually come from right-sizing and buying model, not CPU family.

What About T3 Vs T4g?

The same Graviton-vs-Intel question exists one tier down: t4g is Graviton burstable, while t3 is Intel burstable. In my T3 vs T4g across micro, small, and medium Valkey 9.0 runs, the pattern broadly repeated: Graviton was not automatically faster, and Intel t3 often led raw throughput. I would still keep that comparison separate from this page's main m6g-vs-m5 tables, because burstable nodes bring CPU-credit effects and much noisier throughput variation, it reinforces the thesis, but it is not as clean as the large-node saturation data.

Latency: Both Nodes Are Fast Enough For Most Applications

Latency should not be the deciding factor between these two node families. Across the runs here, average client-side latency stayed between 0.31 ms and 0.44 ms, with client p99 at or near half a millisecond in the clean saturated comparisons. The differences are tens of microseconds, and most real applications will see more variation from network path, client behavior, pipelining, and request mix than from m6g vs m5 alone.

How I Would Choose

If you care most about Choose
Lower hourly price m6g.large
More predictable repeated runs m6g.large
Highest median throughput in this test m5.large
Hot production workload Benchmark your own workload
  • Default to Graviton for new moderate-load clusters. The hourly discount is real, latency is excellent, and the platform direction is clear.
  • Benchmark before assuming either way for hot nodes. A single saturated Valkey 9.0 run had Intel m5.large ahead, but across three runs the lead swung from a Graviton win to a 25% Intel win, so neither family is a safe bet for peak throughput without measuring your own workload.
  • Do not choose between m6g and m5 on latency alone. Both stayed comfortably below 1 ms in these runs.
  • Right-size before tuning instance family. Picking the right node size matters more than picking between two similarly sized families. For smaller nodes, see T3 vs T4g across micro, small and medium, for AWS's general guidance, see ElastiCache supported node types.

FAQ

Is Graviton faster than Intel on ElastiCache?

Not reliably, and "reliably" is the key word. A single saturated Valkey 9.0 run had Intel m5.large 18.6% ahead, but repeating that pair three times on each node family, the lead ranged from a 6% Graviton win to a 25% Intel win (median 18.6% Intel). The median still favored Intel, but the direction was not stable enough to treat m5 as a guaranteed throughput winner. Graviton is cheaper and was more predictable overall, so decide on price, predictability, and your own workload. On Valkey 7.2 the two were tied.

How much cheaper is m6g.large than m5.large?

In these us-east-1 on-demand runs, cache.m6g.large was $0.149 per node-hour and cache.m5.large was $0.156 per node-hour, or about 4.5% cheaper. That is about $61 per year for one always-on node before reserved pricing or savings plans.

Should I migrate existing ElastiCache nodes from m5 to m6g?

For moderate-load clusters, m6g is a reasonable default because it lowers the hourly bill, keeps latency low, and was more predictable overall in these repeats. For nodes that run hot, benchmark first. A single saturated Valkey 9.0 run favored m5, but that lead did not survive three repeats as a dependable rule, so for most clusters the Graviton discount and steadier overall behavior make m6g the safer default.

Does instance family matter more than node size?

No. Node size is usually the bigger decision. Treat m6g vs m5 as a second-order choice after memory capacity, throughput headroom, high availability shape, and cost model.

Methodology

Each result comes from a single-node, non-cluster Amazon ElastiCache replication group under about one hour of memtier_benchmark load from ECS Fargate. Metrics were exported from CloudWatch and memtier logs after the run. Saturated 9-worker runs were calibrated to reach maxmemory and sustain evictions, moderate 3-worker runs stayed below that cliff. The headline saturated Valkey 9.0 pair was provisioned and run three times per node family (r1-r3) on fresh ElastiCache hosts to test reproducibility across independent placements, the saturated 9.0 figures above are medians, with the run-to-run range called out in the head-to-head section. Throughput and latency are averages over the absolute memtier log window, and the reported prices are us-east-1 on-demand node prices captured with the run metadata or applied consistently for the same node type. The numbers here are measurements from this test harness, not vendor claims, your result will vary with value size, data structure mix, pipelining, connection count, region, and cache-hit behavior.