← Debug Guides

DEBUG GUIDE · NETWORK · SRE PLAYBOOK

Debugging Connection Timeouts & Refused.

network connectivity sre tcp
Three failures, three meanings. Refused = reached the host, nothing listening. Timeout = no reply (firewall, wrong host, black hole). Reset = something actively closed it. The error word tells you where to look.

Decode the error

ErrorMeansLook at
Connection refusedHost up, no listenerService up? Right port? Bound to 0.0.0.0?
Timed outNo replyFirewall/SG, wrong IP, routing
Reset (RST)Peer closed mid-streamApp crash, LB idle timeout, proxy
No route to hostNo pathRouting / subnet / NAT

Connection refused

curl -v http://host:port/ ; nc -vz host port
ss -ltnp | grep :port        # listening? which addr?

Fix. Start the service; bind to 0.0.0.0 not 127.0.0.1; correct the port.

127.0.0.1 vs 0.0.0.0 App bound to 127.0.0.1 in a container is unreachable from outside even with -p. Bind 0.0.0.0.

Timeouts

ping host ; traceroute host ; nc -vz host port   # filtered = timeout

Fix. Open firewall/SG/NetworkPolicy; verify route/NAT/IP. K8s: Service selector matches pods? NetworkPolicy dropping?

Resets & idle drops

App crashed mid-request; LB idle timeout < client keepalive; MTU/MSS on tunnels. Fix: align idle timeouts (LB ≥ client); fix upstream; lower MSS if big packets fail.

Port / conntrack exhaustion

ss -s
sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_max

Fix. Pool connections; widen ephemeral range; raise conntrack max.

Quick reference

curl -v URL ; nc -vz host port ; ss -ltnp
ping host ; traceroute host ; mtr host
kubectl get endpoints <svc> ; kubectl get networkpolicy -n <ns>
← prev: Disk Full & I/O next: Database →
© cvam — written in plaintext, served warm