Database Indexing: B-Tree, Hash, and When Each Actually Helps

Why a single missing index can turn your 10ms endpoint into a 10-second endpoint — and why adding the wrong one makes writes slower.

July 14, 20259 min readDatabasePerformance

An index is a sorted (or hashed) copy of one or more columns, pointing back to the row. It makes reads faster and writes slower. Every index is a trade-off you sign up for.

B-Tree — the workhorse

The default for every major database. Supports equality (=), range (<, >, BETWEEN), prefix LIKE ('abc%'), and ordered scans. The leaves form a linked list so range scans are cheap.

CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at);

-- Uses the index:
SELECT * FROM orders WHERE customer_id = 42 AND created_at > NOW() - INTERVAL '7 days';
-- Also uses it (prefix match on first column):
SELECT * FROM orders WHERE customer_id = 42;
-- Does NOT use it:
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days';

Rule: a composite index can be used for any prefix of its columns, never a suffix alone. Order matters. Put the most-filtered column first.

Hash — equality only, blazing fast

Unordered, no range support. In Postgres, Hash indexes only make sense for =-heavy columns you never range-scan. Most teams never need them — B-Tree on the same column is already ~O(log n) with better versatility.

Covering index (INCLUDE)

An index that contains every column the query needs can return results without touching the table at all. This is the single biggest optimisation for hot read paths.

CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at) INCLUDE (total, status);

Partial index

An index over a subset of rows. Useful when most rows share one value and the "interesting" minority are queried a lot.

CREATE INDEX idx_pending_orders
ON orders (created_at) WHERE status = 'pending';

Costs

Every INSERT, UPDATE (on indexed columns), DELETE updates every relevant index.
Indexes consume disk and buffer-pool memory.
Bloated or unused indexes hurt write throughput and waste RAM.

How to actually decide

Run EXPLAIN ANALYZE on the slow query.
If it's a sequential scan over a large table, you probably need an index.
If the query filters on multiple columns, look at selectivity — the most selective column goes first.
Check pg_stat_user_indexes (or equivalent) periodically and drop indexes with idx_scan = 0.

Adding indexes is the first-year move. Removing indexes is the third-year move.

Database Indexing: B-Tree, Hash, and When Each Actually Helps

B-Tree — the workhorse

Hash — equality only, blazing fast

Covering index (INCLUDE)

Partial index

Costs

How to actually decide

9 replies// weighed in

More from this topic

ACID, Demystified: What Each Letter Actually Costs

The N+1 Query Problem: Spot It, Kill It, Prevent It

Pessimistic vs Optimistic Locking: Which One You Actually Need