Database Indexing: B-Tree, Hash, and When Each Actually Helps
Why a single missing index can turn your 10ms endpoint into a 10-second endpoint — and why adding the wrong one makes writes slower.
An index is a sorted (or hashed) copy of one or more columns, pointing back to the row. It makes reads faster and writes slower. Every index is a trade-off you sign up for.
B-Tree — the workhorse
The default for every major database. Supports equality (=), range (<, >, BETWEEN), prefix LIKE ('abc%'), and ordered scans. The leaves form a linked list so range scans are cheap.
CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at);
-- Uses the index:
SELECT * FROM orders WHERE customer_id = 42 AND created_at > NOW() - INTERVAL '7 days';
-- Also uses it (prefix match on first column):
SELECT * FROM orders WHERE customer_id = 42;
-- Does NOT use it:
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days';
Rule: a composite index can be used for any prefix of its columns, never a suffix alone. Order matters. Put the most-filtered column first.
Hash — equality only, blazing fast
Unordered, no range support. In Postgres, Hash indexes only make sense for =-heavy columns you never range-scan. Most teams never need them — B-Tree on the same column is already ~O(log n) with better versatility.
Covering index (INCLUDE)
An index that contains every column the query needs can return results without touching the table at all. This is the single biggest optimisation for hot read paths.
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at) INCLUDE (total, status);
Partial index
An index over a subset of rows. Useful when most rows share one value and the "interesting" minority are queried a lot.
CREATE INDEX idx_pending_orders
ON orders (created_at) WHERE status = 'pending';
Costs
- Every INSERT, UPDATE (on indexed columns), DELETE updates every relevant index.
- Indexes consume disk and buffer-pool memory.
- Bloated or unused indexes hurt write throughput and waste RAM.
How to actually decide
- Run
EXPLAIN ANALYZEon the slow query. - If it's a sequential scan over a large table, you probably need an index.
- If the query filters on multiple columns, look at selectivity — the most selective column goes first.
- Check
pg_stat_user_indexes(or equivalent) periodically and drop indexes withidx_scan = 0.
Adding indexes is the first-year move. Removing indexes is the third-year move.