ML
Scaling

Horizontal vs Vertical Scaling: When to Grow Out, When to Grow Up

Every team debates this eventually. Here's the math, the trade-offs, and why 'just scale out' is often the wrong first answer.

April 09, 20257 min readScalingArchitecture

Two ways to handle more load: make one machine bigger (vertical) or add more machines (horizontal). Most teams assume horizontal is the "real" answer. Most of the time that assumption is wrong — at least to start.

Vertical scaling

Bigger CPU, more RAM, faster disks. The app stays simple: one process, one database, one cache, one copy of state.

Pros:

  • No distributed-system problems. No eventual consistency, no split-brain, no consensus.
  • Code ports 1:1 from laptop to prod.
  • Modern single machines are huge: a $2k/month cloud VM goes to 96 vCPU / 384 GiB RAM. That handles an enormous amount of traffic.

Cons:

  • A single point of failure.
  • Hard ceiling: the biggest box ever.
  • Upgrades require downtime or a careful failover.

Horizontal scaling

More replicas behind a load balancer; state in shared systems (Postgres, Redis, S3). Autoscale based on CPU or request rate.

Pros:

  • Linear-ish scaling of stateless workloads (APIs, renderers).
  • Natural high availability — a dead node loses 1/N of capacity, not 100%.
  • Rolling deploys with zero downtime.

Cons:

  • State becomes someone else's problem (the DB, the cache) — and that "someone else" is now the bottleneck.
  • Session stickiness, cache coherence, idempotency, distributed tracing — all of which cost engineer-time.

The honest order of operations

  1. Profile and optimise the single-node path first. A 3× speedup in code is a 3× capacity increase for free.
  2. Scale vertical until it becomes awkward. Cheaper and simpler than making the app cluster-safe.
  3. Scale the stateless tier horizontally. Put the API behind a load balancer; multiple pods, one DB.
  4. Scale the stateful tier last. Read replicas → sharding → ultimately multi-region. Each step is a project.

The database is always the bottleneck

When people say "we're scaling horizontally," they usually mean the app tier. The database still stands alone and still eventually gets too big. That's when you pay the complexity tax: read replicas, sharding, or a move to a distributed database (Spanner, CockroachDB, Yugabyte). Plan for it, but don't volunteer early.

SharePostLinkedIn

Reader Discussion

6 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Highlighted by author
    Evan Whitfield· Eng DirectorStory

    "scale vertical until it becomes awkward" — yes. We sunk SIX MONTHS into a sharding project that a $4k/month bigger box would have made unnecessary for two more years. Premature distribution is the bourgeois cousin of premature optimisation.

    Apr 11, 2025·2 days later
  2. Quốc Anh🇸🇬 SG· Cloud ArchitectAgrees

    PACELC > CAP. Once you accept that partitions aren't the only trade-off (latency-vs-consistency happens 24/7, partitions are a side quest), the whole topology debate gets clearer. Wish PACELC had CAP's branding budget.

    Apr 13, 2025·4 days later
  3. Grace Liu· Senior EngineerFrom experience

    idempotency keys are such a small API change for such a huge reliability win. ship them on every new POST endpoint as a default. the day you need them and don't have them is one of the worst days of your career.

    Apr 12, 2025·3 days later
  4. Kofi Mensah· Infra EngineerAsks

    Q: idempotency-key table — keep on primary or shard it? we hit ~120M rows/month and the index is bigger than the data. partition by month + retention helps but I'm curious what others do

    Apr 14, 2025·5 days later
  5. Monique Laurent· Principal EngineerFrom experience

    cell-based architecture is how you sleep at night past a certain scale. blast-radius math is so much friendlier when one cell going down hurts 1/N customers instead of all of them. it's also 2x ops cost — pick your poison.

    Apr 15, 2025·6 days later
  6. Rachel Gold· Staff SREAgrees

    the on-call framing throughout this piece is what makes it land. too many infra articles assume you never get paged. those are written by people who never got paged.

    Apr 12, 2025·3 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email