S3 Patterns: Multipart, Versioning, Lifecycle Done Right
S3 is the most boring service that breaks the most production systems. Get these five patterns right and it disappears.
S3 is the closest thing to infinite, durable storage we have. It is also the closest thing to a footgun-shaped bucket of options. Five patterns separate teams that pay S3 attention quarterly from teams that pay it attention at 3am.
1. Multipart upload — for everything over 100 MiB
A single PUT over 5 GiB is impossible by spec; over 100 MiB it is brittle. Multipart splits the object into parts (5 MiB to 5 GiB each), uploads them in parallel, and assembles them server-side. The benefits are not just speed:
- Parts retry independently. A flaky connection retries one 8 MiB part, not your 4 GiB file.
- Throughput scales with concurrency. 30 parallel parts on a fast link saturates the pipe.
- You can resume an interrupted upload by listing the parts already uploaded.
aws s3 cp ./backup.tar.gz s3://bucket/key \
--expected-size 8589934592 \
--cli-write-timeout 0
The SDKs do this transparently above a threshold. The gotcha: incomplete multipart uploads still cost money. If your job dies mid-upload, the parts stay in the bucket and accrue storage indefinitely. Add a lifecycle rule.
2. Lifecycle policies — set them on day one
Every bucket needs at minimum:
- Abort incomplete multipart uploads after 7 days. Cleans up the previous mess.
- Transition old objects to a cheaper tier. S3 Standard → Standard-IA at 30 days, Standard-IA → Glacier Instant at 90 days for logs/backups.
- Expire ancient objects if regulation allows. Or transition to Glacier Deep Archive at a year.
Lifecycle is free. Skipping it costs real money over years.
3. Versioning — useful, expensive if mishandled
With versioning on, deleting an object writes a delete marker; the previous versions are retained. Great for accidental-delete recovery. Without a lifecycle policy to expire old versions, you keep paying for every version of every object that has ever existed.
{
"Rules": [{
"ID": "expire-old-versions",
"Status": "Enabled",
"NoncurrentVersionExpiration": { "NoncurrentDays": 30 }
}]
}
This is the single most common "why is my S3 bill 5× last quarter" cause. Versioned bucket, no expiry, somebody loops a job that rewrites a million objects daily — every one of those rewrites silently retains the previous version forever.
4. Strong consistency is real now (since 2020)
S3 used to be eventually consistent on overwrites. It is now read-after-write strongly consistent for new objects and overwrites. You no longer need DynamoDB-backed indexing libraries for "did my write land yet" — the write is visible immediately to anyone who can read.
What is still not strongly consistent: cross-region replication. Replication is asynchronous; treat the destination bucket as a follower with a few-second lag.
5. Access patterns — the one that actually matters
S3 charges three things: storage, requests, and egress. Requests are cheap individually and expensive in aggregate. A million HEAD calls a day adds up.
Mistakes I have paid for:
- Listing a prefix on every request to "see if a file exists" — use a database for that instead.
- Tiny objects in their millions — each one charges a minimum 1 KiB billing unit. Pack small files into archives (Parquet, gzipped JSONL).
- Cross-region reads in a hot path — pay the cross-region transfer every time. Replicate the bucket or move the compute.
6. Security defaults to set on creation
- Block Public Access at the bucket level. The default is correct; never relax it broadly.
- Enable default encryption (SSE-S3 minimum, SSE-KMS if compliance requires it).
- Enable Object Lock in Compliance mode for backup buckets. Even a compromised IAM user cannot delete the data.
- Turn on Access Logs to a separate bucket. Cheap; pays for itself the first audit.
The rules of thumb
- Multipart everything over 100 MiB.
- Lifecycle policies on every bucket, day one.
- Versioning on, with expiry of non-current versions.
- Block Public Access stays on.
- If S3 is part of a hot path, measure requests/sec — the surprise is usually request cost, not storage cost.