ElastiCache Graviton vs Intel: m6g.large vs m5.large Valkey Benchmark
Valkey 9.0 and 7.2 on Amazon ElastiCache, single-node non-cluster, one-hour memtier_benchmark runs, us-east-1 on-demand pricing. Last updated June 2026.
AWS has been steering more ElastiCache users toward Graviton-backed nodes, and the pitch is familiar: better price/performance. That makes cache.m6g.large vs cache.m5.large a practical architecture question, not a silicon preference. If you are choosing a new Redis OSS or Valkey node family, do you get the Graviton discount without giving up throughput?
This benchmark compares Graviton2-backed cache.m6g.large and Intel-backed cache.m5.large on Valkey 9.0 for Amazon ElastiCache and Valkey 7.2. The load generator is memtier_benchmark, running from ECS Fargate against a single ElastiCache replication group.
The short version: Graviton m6g.large is the cheaper node, about 4.5% lower than m5.large in these us-east-1 on-demand runs. On a single saturated Valkey 9.0 run it also looked slower, with Intel m5.large posting 18.6% more throughput. But after running that pair three times on each node family, the lead did not hold up cleanly: it ranged from a 6% Graviton win to a 25% Intel win (median 18.6% Intel). What did hold up is that Graviton was cheaper and more predictable overall: its run-to-run throughput range was tighter, while Intel's within-run coefficient of variation reached nearly 12% in two repeats. On Valkey 7.2 the two were tied.
Bottom line: for ElastiCache, Graviton is a strong default. It is consistently cheaper, latency is excellent, and it was the more predictable performer overall. Intel m5.large can deliver more peak throughput, and the median here favored Intel, but the direction was not stable enough to treat raw speed as guaranteed either way. For hot, saturated nodes, benchmark your own workload before assuming either architecture wins.
The Latest Runs At A Glance
The runs below cover two engine versions and two load levels. Both node types are two-vCPU "large" instances with roughly 6.4 GB of cache memory, so this is a like-for-like core-count comparison. The 9-worker runs were calibrated to reach maxmemory and sustain evictions, see Designing for the Cliff for the calibration background.
The saturated Valkey 9.0 figures are medians of three runs per node, the run-to-run spread, including one run where Graviton came out ahead, is in the next section. The moderate-load pair is directional only: the hit rates and memory pressure were not closely matched, and the m5 run had a 10.0% throughput coefficient of variation. The cleanest apples-to-apples comparison is the 9-worker Valkey 9.0 pair, run three times.
m6g.large Vs m5.large: Valkey 9.0 Saturated Benchmark
This is the comparison I trust most, so I ran it more than once. Same engine version, same 9-worker load profile, both nodes driven to maxmemory, both running for a full hour, and the whole pair provisioned and run three times (r1-r3) on fresh ElastiCache hosts to see whether a single-run result would survive replication. Within each run-pair the two nodes saw matched load, across the set the cache pressure varied (hit rate ranged from about 5% to 54%), which is part of what the repeats exposed.
It did not survive. A single run had Intel ahead by 18.6%, three runs tell a different story:
The direction is not stable. Intel led twice and Graviton led once, and the cleanest-matched pair of the three (r3, ~50% hit on both nodes) went to Graviton. The headline metrics, read as medians across the three runs:
The single run said Intel won outright. Three independent runs say something more useful and more honest: the throughput lead is not reproducible as a rule. It swung from a 6% Graviton win to a 25% Intel win. Graviton had the tighter run-to-run throughput range and held its coefficient of variation under 1.3% in all three runs, Intel was steadier in r1, but its CV reached nearly 12% in r2 and r3. Latency stayed close throughout, a few tens of microseconds apart.
The verdict, now that it is backed by repeated runs rather than one: at this node size, neither architecture has a dependable architecture-wide raw-throughput edge on saturated Valkey 9.0. The median favored Intel, but one well-matched repeat flipped to Graviton, so do not treat raw throughput as a guaranteed m5 advantage. The two results that were most useful for decisions are the consistent ones: Graviton is about 4.5% cheaper, and Graviton was more predictable overall. That is the generalizable takeaway, and it is more useful than the first single run implied.
Valkey 7.2 At Saturation: Throughput Tie, Different Cache Profile
On Valkey 7.2, the raw throughput gap almost disappears: 49,702 ops/s on m6g.large vs 49,105 ops/s on m5.large, with essentially identical average latency. That looks like a dead heat from a throughput standpoint.
The caveat is cache profile. Both runs reached maxmemory and sustained evictions, but the m6g run reported a much higher average cache-hit rate than the m5 run. I would not use this pair to claim one architecture is universally better. I would use it to say that the Valkey 9.0 Intel advantage did not repeat cleanly on Valkey 7.2, and that engine version plus workload shape can matter more than CPU family.
That lines up with the broader engine-version pattern in Redis 7.1 vs Valkey 7.2 / 8.2 / 9.0 on ElastiCache: the engine and workload can move the result as much as the instance family.
Price And Price/Performance
Graviton's structural advantage is the hourly bill. In these us-east-1 on-demand runs, Amazon ElastiCache pricing put cache.m6g.large at $0.149/node-hour and cache.m5.large at $0.156/node-hour. That is a small but real discount:
Single-node, on-demand, us-east-1. A production HA pair roughly doubles both the spend and the saving. Reserved nodes, Database Savings Plans, and other regions change the absolute numbers.
Hourly price is only half of price/performance. When normalized to estimated cost per billion operations, the winner depends on the run:
Moderate-load pair not fully workload-matched, directional only.
So the per-operation winner depends on the run, not just the engine version. On Valkey 9.0 the median favored Intel, but the run where Graviton outperformed Intel also made Graviton cheaper per operation, so even this "Intel wins on cost per op" line is a median, not a rule. The dependable part is the 4.5% lower hourly price, the throughput that sits on top of it is too noisy to bank on. AWS's own ElastiCache cost optimization guide is worth reading alongside benchmark results, because the biggest savings usually come from right-sizing and buying model, not CPU family.
What About T3 Vs T4g?
The same Graviton-vs-Intel question exists one tier down: t4g is Graviton burstable, while t3 is Intel burstable. In my T3 vs T4g across micro, small, and medium Valkey 9.0 runs, the pattern broadly repeated: Graviton was not automatically faster, and Intel t3 often led raw throughput. I would still keep that comparison separate from this page's main m6g-vs-m5 tables, because burstable nodes bring CPU-credit effects and much noisier throughput variation, it reinforces the thesis, but it is not as clean as the large-node saturation data.
Latency: Both Nodes Are Fast Enough For Most Applications
Latency should not be the deciding factor between these two node families. Across the runs here, average client-side latency stayed between 0.31 ms and 0.44 ms, with client p99 at or near half a millisecond in the clean saturated comparisons. The differences are tens of microseconds, and most real applications will see more variation from network path, client behavior, pipelining, and request mix than from m6g vs m5 alone.
How I Would Choose
- Default to Graviton for new moderate-load clusters. The hourly discount is real, latency is excellent, and the platform direction is clear.
- Benchmark before assuming either way for hot nodes. A single saturated Valkey 9.0 run had Intel m5.large ahead, but across three runs the lead swung from a Graviton win to a 25% Intel win, so neither family is a safe bet for peak throughput without measuring your own workload.
- Do not choose between m6g and m5 on latency alone. Both stayed comfortably below 1 ms in these runs.
- Right-size before tuning instance family. Picking the right node size matters more than picking between two similarly sized families. For smaller nodes, see T3 vs T4g across micro, small and medium, for AWS's general guidance, see ElastiCache supported node types.
FAQ
Is Graviton faster than Intel on ElastiCache?
Not reliably, and "reliably" is the key word. A single saturated Valkey 9.0 run had Intel m5.large 18.6% ahead, but repeating that pair three times on each node family, the lead ranged from a 6% Graviton win to a 25% Intel win (median 18.6% Intel). The median still favored Intel, but the direction was not stable enough to treat m5 as a guaranteed throughput winner. Graviton is cheaper and was more predictable overall, so decide on price, predictability, and your own workload. On Valkey 7.2 the two were tied.
How much cheaper is m6g.large than m5.large?
In these us-east-1 on-demand runs, cache.m6g.large was $0.149 per node-hour and cache.m5.large was $0.156 per node-hour, or about 4.5% cheaper. That is about $61 per year for one always-on node before reserved pricing or savings plans.
Should I migrate existing ElastiCache nodes from m5 to m6g?
For moderate-load clusters, m6g is a reasonable default because it lowers the hourly bill, keeps latency low, and was more predictable overall in these repeats. For nodes that run hot, benchmark first. A single saturated Valkey 9.0 run favored m5, but that lead did not survive three repeats as a dependable rule, so for most clusters the Graviton discount and steadier overall behavior make m6g the safer default.
Does instance family matter more than node size?
No. Node size is usually the bigger decision. Treat m6g vs m5 as a second-order choice after memory capacity, throughput headroom, high availability shape, and cost model.
Methodology
Each result comes from a single-node, non-cluster Amazon ElastiCache replication group under about one hour of memtier_benchmark load from ECS Fargate. Metrics were exported from CloudWatch and memtier logs after the run. Saturated 9-worker runs were calibrated to reach maxmemory and sustain evictions, moderate 3-worker runs stayed below that cliff. The headline saturated Valkey 9.0 pair was provisioned and run three times per node family (r1-r3) on fresh ElastiCache hosts to test reproducibility across independent placements, the saturated 9.0 figures above are medians, with the run-to-run range called out in the head-to-head section. Throughput and latency are averages over the absolute memtier log window, and the reported prices are us-east-1 on-demand node prices captured with the run metadata or applied consistently for the same node type. The numbers here are measurements from this test harness, not vendor claims, your result will vary with value size, data structure mix, pipelining, connection count, region, and cache-hit behavior.