A PostgreSQL incident is almost always one of: a query missing an index, the connection pool exhausted, lock contention, table/index bloat from MVCC + lazy autovacuum, replication lag, or cache/IO pressure. This guide is Postgres-specific — the catalog views, the MVCC quirks, and the knobs that actually move the needle.
First: how MVCC shapes everything
Postgres uses MVCC (Multi-Version Concurrency Control). An UPDATE
doesn't overwrite a row — it writes a new version (tuple) and marks the old one dead.
A DELETE just marks dead. Readers never block writers because each transaction sees
a snapshot. Two consequences drive most Postgres pathology:
- Dead tuples accumulate → tables and indexes bloat until
VACUUMreclaims them. Bloat = wasted disk + slower scans. - Every row carries transaction IDs (xmin/xmax). XIDs are 32-bit and wrap →
VACUUMmust "freeze" old rows or the DB shuts down to protect data.
So "my DB got slow over weeks" is usually bloat/autovacuum, not the query. Keep MVCC in mind.
Slow queries — find them
Two angles: what's slow right now, and what's slow cumulatively.
-- running now, longest first
SELECT pid, now() - query_start AS dur, state, wait_event_type, wait_event,
left(query, 80) AS query
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY dur DESC;
-- cumulative worst offenders (needs pg_stat_statements extension)
SELECT calls, round(total_exec_time::numeric, 1) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms, rows,
left(query, 80) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
pg_stat_statements is the single most useful thing to enable. Sort by
total_exec_time (frequent + cheap can outweigh rare + slow). Also log slow ones:
log_min_duration_statement = 500ms. Note wait_event — it tells you
why it's slow (Lock, IO, BufferPin, etc.) without guessing.
Read the plan — EXPLAIN deeply
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ...;
What to look for, in order:
| Signal | Meaning / action |
|---|---|
| Seq Scan on a big table | No usable index for the filter — add one (if selective). |
| rows estimated ≫ or ≪ actual | Stale stats — run ANALYZE; the planner is flying blind. |
| Nested Loop with huge row count | Planner expected few rows, got many — bad estimate → wrong join. |
| Rows Removed by Filter high | Reading rows then throwing them away — index the filter. |
| Heap Fetches high on Index Only Scan | Visibility map stale — VACUUM the table. |
| external merge Disk in sort | Sort spilled to disk — raise work_mem for this query. |
EXPLAIN ANALYZE shows
estimated 10 / actual 2,000,000, the stats are stale or skewed — ANALYZE the table,
or raise the column's statistics target. Fix the estimate and the plan fixes itself.Indexes — missing, unused, wrong, bloated
Find missing indexes (tables taking lots of sequential scans):
SELECT relname, seq_scan, idx_scan, seq_tup_read FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 20; -- high seq_scan + seq_tup_read on a big table = candidate
Find unused indexes (dead weight — they slow writes + waste space):
SELECT relname, indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC;
- Index type matters: B-tree (default, equality/range), GIN (jsonb, arrays, full-text), GiST (geo/ranges), BRIN (huge append-only tables), Hash (equality only).
- Composite order matters: index
(a,b)serves filters onaanda,b— notbalone. - Partial / covering indexes:
WHERE status='active', orINCLUDE (col)for index-only scans. - Build without locking writes:
CREATE INDEX CONCURRENTLY.
WHERE lower(email)=... won't use an index
on email — index the expression, or store normalized. Same for leading wildcards
LIKE '%foo' and implicit type casts.Bloat & autovacuum (the silent killer)
Symptom. Table/index much bigger than its live rows; scans slow down over weeks; disk creeps; a table that "should be small" is gigabytes.
-- dead tuples + last (auto)vacuum per table
SELECT relname, n_live_tup, n_dead_tup,
round(n_dead_tup*100.0/nullif(n_live_tup,0),1) AS dead_pct,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;
SELECT pg_size_pretty(pg_total_relation_size('mytable'));
Causes. Autovacuum can't keep up with churn (high update/delete rate); a long-running transaction or idle-in-transaction holds the xmin horizon so vacuum can't remove dead tuples; autovacuum throttled too conservatively.
Fix.
- Kill long/idle-in-transaction sessions (they pin vacuum — see below).
- Tune autovacuum: lower
autovacuum_vacuum_scale_factorfor hot tables, raiseautovacuum_max_workers/ cost limits so it keeps up. - Reclaim existing bloat:
VACUUM (ANALYZE)for stats + visibility;REINDEX CONCURRENTLYfor index bloat;pg_repackto compact a table without an exclusive lock (plainVACUUM FULLtakes anACCESS EXCLUSIVElock — blocks everything).
VACUUM reclaims space for reuse (online). VACUUM FULL rewrites the
table and takes an exclusive lock — never run it on a live hot table. Use pg_repack.Transaction ID wraparound
Symptom. Warnings "database must be vacuumed within N transactions"; worst case Postgres refuses writes to protect data.
SELECT datname, age(datfrozenxid) AS xid_age FROM pg_database ORDER BY xid_age DESC; -- approaching ~2 billion = danger
Fix. This means autovacuum's freeze work fell behind (usually the same
root cause: long transactions, or autovacuum disabled/starved). Let autovacuum catch up, or run a
manual VACUUM (FREEZE). Don't disable autovacuum — this is what it prevents.
Connection states & idle-in-transaction
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
| state | meaning |
|---|---|
active | running a query right now |
idle | connected, no transaction — fine |
idle in transaction | danger — open transaction doing nothing; holds locks + pins vacuum |
idle in transaction (aborted) | failed transaction not rolled back |
BEGINs then waits (slow client, missing commit/rollback) holds its
snapshot — blocking vacuum cluster-wide and possibly holding row locks. Set
idle_in_transaction_session_timeout to auto-kill them.Connection pool exhaustion
Symptom. "FATAL: sorry, too many clients already"; "remaining connection slots are reserved"; latency cliff under load while DB CPU is low.
SHOW max_connections; SELECT count(*) FROM pg_stat_activity; SELECT count(*), state, wait_event FROM pg_stat_activity GROUP BY state, wait_event;
Causes & fix.
- App-instances × per-instance pool size exceeds
max_connections. Each Postgres connection is a process with real memory cost — you don't want thousands. - Use a pooler: PgBouncer in
transactionmode lets hundreds of app connections share a small server pool. This is the standard fix, not raisingmax_connectionsto 5000. - Leaked connections — ensure release in
finally; cap pool size; setidle_in_transaction_session_timeout.
Locks, blocking & deadlocks
Find who blocks whom:
SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query,
blocking.pid AS blocking_pid, blocking.query AS blocking_query,
blocked.wait_event
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';
Lock modes that bite: plain SELECT takes
ACCESS SHARE (conflicts with almost nothing). UPDATE/DELETE take a
ROW EXCLUSIVE + per-row locks. Schema changes (ALTER TABLE,
VACUUM FULL, CREATE INDEX non-concurrent) take ACCESS EXCLUSIVE
— they block everything, including reads.
Deadlocks. Two transactions grab locks in opposite order. Postgres detects it,
kills one with "deadlock detected" (SQLSTATE 40P01). Fix: acquire locks/rows in a
consistent order; keep transactions short; SELECT ... FOR UPDATE carefully;
retry the loser (it's expected and transient).
ALTER TABLE waits behind one slow query for its ACCESS EXCLUSIVE
lock — and meanwhile queues every new query behind it, freezing the table. Always set
lock_timeout (and statement_timeout) before migrations.Cache hit ratio & I/O
Symptom. Queries slow with high I/O wait; wait_event shows
DataFileRead. Working set doesn't fit in memory.
SELECT sum(heap_blks_hit) / nullif(sum(heap_blks_hit)+sum(heap_blks_read),0) AS cache_hit_ratio FROM pg_statio_user_tables; -- want > 0.99 for OLTP
Fix. Raise shared_buffers (~25% of RAM is a common start); ensure
the OS page cache has room; faster disks (SSD/NVMe, more IOPS); reduce the data scanned (indexes,
partitioning). effective_cache_size should reflect total cache so the planner prefers
index scans.
Key config knobs
| Setting | What it controls |
|---|---|
shared_buffers | Postgres's own page cache (~25% RAM). |
work_mem | Per-sort/hash memory before spilling to disk. Per operation — careful × connections. |
effective_cache_size | Planner's estimate of total cache → index-scan bias. |
max_connections | Hard cap (use a pooler instead of cranking this). |
random_page_cost | Lower (e.g. 1.1) on SSD so the planner uses indexes. |
statement_timeout / lock_timeout / idle_in_transaction_session_timeout | Guardrails against runaway queries, lock waits, and zombie transactions. |
Replication lag / stale reads
Symptom. Read replica returns data the primary already has; "row not found" right after a write that was routed to a replica; failover risks losing recent writes.
-- on the primary: per-replica lag in bytes
SELECT client_addr, state, sent_lsn, replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
-- on a replica: time lag
SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;
Causes & fix. Replica underpowered (CPU/disk) or network-saturated; a big
write burst; long queries on the replica pausing replay (max_standby_streaming_delay).
Fix: route read-after-write to the primary; size the replica; for failover safety consider
synchronous replication on critical writes (trades latency for zero data loss).
The monitoring views to know
| View / function | Tells you |
|---|---|
pg_stat_activity | Live sessions: state, query, wait_event, age. |
pg_stat_statements | Cumulative query cost ranking. |
pg_stat_user_tables | Seq vs index scans, dead tuples, last vacuum. |
pg_stat_user_indexes | Index usage (find unused). |
pg_statio_user_tables | Cache hit vs disk reads. |
pg_stat_replication | Replica state + lag. |
pg_locks + pg_blocking_pids() | Who holds/waits on locks. |
Triage flow
- What's happening now?
pg_stat_activity— long/active queries, theirwait_event, idle-in-transaction count. - Blocked?
pg_blocking_pids()— find the blocker; decide to wait orpg_terminate_backend()it. - Slow query?
EXPLAIN (ANALYZE, BUFFERS)— estimate vs actual, Seq Scan, sort spill. Add index / ANALYZE / raise work_mem. - Slow over time? Bloat + autovacuum (
n_dead_tup, last_autovacuum), cache hit ratio. - Too many connections? Pooler + idle-in-transaction timeout.
Quick reference
-- live activity + waits SELECT pid, now()-query_start AS dur, state, wait_event, left(query,60) FROM pg_stat_activity WHERE state<>'idle' ORDER BY dur DESC; -- worst cumulative queries SELECT calls, total_exec_time, mean_exec_time, query FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20; -- blocking tree SELECT pg_blocking_pids(pid), pid, query FROM pg_stat_activity WHERE wait_event_type='Lock'; -- bloat / vacuum SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 20; -- cache hit ratio SELECT sum(heap_blks_hit)/nullif(sum(heap_blks_hit)+sum(heap_blks_read),0) FROM pg_statio_user_tables; -- plan EXPLAIN (ANALYZE, BUFFERS); -- terminate / cancel SELECT pg_cancel_backend(pid); -- cancel current query (gentle) SELECT pg_terminate_backend(pid); -- kill the connection
Rapid-fire Q&A
- Why does Postgres bloat?MVCC: updates/deletes leave dead tuples. VACUUM reclaims them; if autovacuum can't keep up (or a long transaction pins the xmin horizon), tables/indexes bloat.
- VACUUM vs VACUUM FULL?VACUUM reclaims space for reuse online. VACUUM FULL rewrites the table under an ACCESS EXCLUSIVE lock — blocks everything. Use pg_repack instead.
- Why is "idle in transaction" dangerous?It holds a snapshot + possibly locks, blocking vacuum and other writers. Set idle_in_transaction_session_timeout.
- How do you fix too-many-connections?A pooler (PgBouncer transaction mode), not a huge max_connections — each connection is a process with memory cost.
- First thing on a slow query?EXPLAIN (ANALYZE, BUFFERS); compare estimated vs actual rows; look for Seq Scan / sort spill. Stale stats → ANALYZE.
- How are deadlocks handled?Postgres auto-detects and kills one transaction (40P01). Lock in consistent order, keep txns short, retry the loser.
- What is transaction ID wraparound?XIDs are 32-bit; VACUUM must freeze old rows. If freezing falls behind, Postgres stops writes to prevent data loss. Keep autovacuum healthy.
- Multi-AZ/standby vs read replica?Sync standby = HA failover (durability). Async read replica = scale reads but can serve stale data (replay lag).
- cancel vs terminate backend?pg_cancel_backend cancels the running query; pg_terminate_backend kills the whole connection.