Network Engineer Interview Questions

Network-engineer questions — the OSI/TCP-IP stack, routing/switching, DNS, TLS, and real-world troubleshooting — graded easy → hard with full answers. Click to expand. Pair with the Networking cheatsheet.

easy fundamentals medium applied hard senior / troubleshooting

Easy — fundamentals

Walk through the OSI model layers. easy

Seven layers, bottom-up: 1 Physical (bits on wire/radio), 2 Data Link (frames, MAC, switches, Ethernet/ARP), 3 Network (packets, IP, routing), 4 Transport (segments, TCP/UDP, ports), 5 Session, 6 Presentation (encoding/encryption/TLS), 7 Application (HTTP, DNS, SMTP). In practice the TCP/IP model collapses it to Link, Internet, Transport, Application. The value is as a troubleshooting framework: isolate the problem layer by layer (cable → switch → IP/route → port → app).

TCP vs UDP — when use each? easy

TCP is connection-oriented and reliable: 3-way handshake, ordered delivery, retransmission, flow control, and congestion control — used where correctness matters (HTTP, SSH, DB). UDP is connectionless and best-effort: no handshake, no ordering/retransmit, lower overhead and latency — used for DNS, real-time voice/video, gaming, and as the base for QUIC/HTTP3. Rule of thumb: TCP when you can't lose data, UDP when speed/latency beats guaranteed delivery (or you handle reliability yourself).

What is a subnet / CIDR notation? easy

A subnet splits an IP network into smaller segments. CIDR (e.g. 10.0.0.0/24) puts a prefix length after the address: /24 means the first 24 bits are the network portion, leaving 8 bits for hosts → 256 addresses (254 usable; first is network, last is broadcast). Smaller prefix = bigger network (/16 = 65k addresses); larger prefix = smaller (/30 = 4 addresses, point-to-point links). Subnetting controls broadcast domains, routing, and IP allocation.

What happens when you type a URL and hit enter? easy

Classic question. (1) DNS resolution — resolve the hostname to an IP (browser/OS cache → resolver → root → TLD → authoritative). (2) TCP handshake to the IP on port 443 (SYN/SYN-ACK/ACK). (3) TLS handshake — negotiate cipher, validate the certificate, derive session keys. (4) HTTP request sent; server responds. (5) Browser parses HTML, fetches sub-resources (more DNS/TCP/TLS or reused connections), renders. Routing (IP, NAT, BGP between networks) and ARP (MAC resolution on the local link) happen under the transport steps.

What is DNS and what are the common record types? easy

DNS maps human names to addresses/info. Records: A (name → IPv4), AAAA (→ IPv6), CNAME (alias to another name), MX (mail servers), TXT (arbitrary text — SPF/DKIM/verification), NS (delegated nameservers), SOA (zone authority), PTR (reverse lookup). TTL controls caching duration. Resolution is hierarchical and recursive: resolver → root → TLD → authoritative.

OSI vs TCP/IP model? easy

OSI has 7 layers (physical→application); TCP/IP collapses to 4 (link, internet, transport, application). OSI teaches, TCP/IP is implemented.

TCP vs UDP? easy

TCP: connection-oriented, reliable, ordered, flow/congestion control. UDP: connectionless, best-effort, low overhead — DNS, streaming, QUIC.

Explain the TCP three-way handshake. easy

SYN → SYN-ACK → ACK establishes a connection and syncs sequence numbers before data flows.

What is CIDR / a subnet mask? easy

Splits an IP into network and host bits; /24 means 24 network bits, defining the subnet's address range.

What is NAT? easy

Maps private addresses to a public one (and back), letting many hosts share an IP and hiding internal addressing.

DNS main record types? easy

A/AAAA (IP), CNAME (alias), MX (mail), TXT (verification), NS (delegation); TTL controls caching.

What is a default gateway? easy

The router a host sends packets to when the destination is outside its local subnet.

L4 vs L7 load balancer? easy

L4 routes on IP/port (fast, protocol-agnostic); L7 understands HTTP for path/host routing, TLS termination, header rules.

What are MTU and MSS? easy

MTU is the largest L2 payload (~1500B); MSS is max TCP segment size; mismatch causes fragmentation/blackholes.

What is ARP? easy

Maps an IP to a MAC on the local network so L2 frames can be delivered.

Medium — applied

Explain the TCP three-way handshake and connection teardown. medium

Open: client sends SYN (with its initial sequence number) → server replies SYN-ACK (its ISN + ack) → client sends ACK. Now both sides have synchronized sequence numbers and the connection is established. Close: the four-way FIN/ACK exchange — each side sends a FIN and gets an ACK (connections are full-duplex, so each direction closes independently). The closer enters TIME_WAIT (~2×MSL) to absorb stray packets and ensure the final ACK arrived; lots of TIME_WAIT sockets on a busy server is normal but can exhaust ephemeral ports.

Connection refused vs connection timeout vs reset — what does each tell you? medium

Connection refused: you reached the host but nothing is listening on that port (or it actively rejected) — the host sent a TCP RST. App down / wrong port / bound to 127.0.0.1 not 0.0.0.0. Timeout: no response at all — packets dropped silently, usually a firewall/security-group, wrong route, or the host is down/unreachable (SYN gets no SYN-ACK). Connection reset (RST mid-stream): the connection was established then abruptly killed — app crash, idle timeout on a LB/firewall, or a proxy closing it. The distinction localizes the fault: refused = reachable but no listener; timeout = network/firewall path; reset = something killed an existing connection.

What is NAT and why does it exist? medium

Network Address Translation rewrites IP addresses (and ports, for PAT) as packets cross a boundary — typically mapping many private addresses (RFC1918, e.g. 10.x, 192.168.x) to one public IP. It exists primarily to conserve scarce IPv4 addresses and to hide internal topology. Source NAT (outbound, masquerade) lets private hosts reach the internet via a shared public IP; destination NAT (port forwarding) exposes an internal service. It breaks end-to-end addressing (hence needing port forwarding / hole punching for inbound), which IPv6 largely removes.

How does TLS establish a secure connection? medium

TLS provides confidentiality, integrity, and authentication. Handshake (TLS 1.3, simplified): client sends ClientHello with supported ciphers + a key-share; server responds with its key-share, certificate, and a signature; both derive a shared session key via (EC)DHE — giving forward secrecy. The client verifies the server's certificate chains to a trusted CA, matches the hostname, and isn't expired/revoked. Then symmetric encryption (AES-GCM/ChaCha20) protects the data. TLS 1.3 cut round-trips (1-RTT, optional 0-RTT) and removed weak ciphers. mTLS adds a client certificate so both sides authenticate.

What is the difference between a switch, a router, and a load balancer? medium

A switch operates at L2 — forwards Ethernet frames within a LAN using MAC addresses (one broadcast domain per VLAN). A router operates at L3 — forwards IP packets between networks, making path decisions via routing tables/protocols (and often NAT/firewalling). A load balancer distributes connections/requests across backend servers: L4 (TCP/UDP, by IP:port) or L7 (HTTP-aware — routes by path/host, terminates TLS, does health checks and sticky sessions). Switch = within a network, router = between networks, LB = spread load across servers.

How does TLS establish a secure connection? medium

Negotiate version/cipher, server presents a CA-verified cert, key exchange (ECDHE for forward secrecy), then symmetric encryption protects the session.

What is BGP and why does it matter? medium

The internet's inter-domain routing protocol exchanging reachability between ASes; misconfigs/leaks cause big outages and hijacks.

Explain TCP congestion control. medium

Reno/CUBIC/BBR adjust the congestion window via slow start, congestion avoidance, and reaction to loss/RTT to avoid overwhelming the network.

What causes a Path MTU blackhole and how to fix? medium

A smaller-MTU link plus blocked ICMP 'fragmentation needed' silently drops large packets; fix by allowing ICMP, MSS clamping, or lowering MTU.

What is anycast and where is it used? medium

One IP advertised from many locations; routing picks the nearest — used by DNS roots, CDNs, DDoS mitigation.

Symmetric vs asymmetric routing — why care? medium

Asymmetric paths can break stateful firewalls/NAT that expect both directions, dropping connections.

How does a CDN reduce latency? medium

Caches content at edge PoPs near users, terminates TLS close by, offloads origin — controlled via cache headers/TTLs.

What is QUIC / HTTP3? medium

UDP-based transport with built-in TLS 1.3, multiplexed streams without head-of-line blocking, and 0-RTT setup — faster than TCP+TLS.

How does a stateful firewall / security group work? medium

Tracks connection state so return traffic for an allowed outbound flow is auto-permitted; stateless ACLs need both directions + ephemeral ports.

What is VLAN segmentation? medium

Logically partitions one physical switch into isolated broadcast domains, separating traffic/tenants without separate hardware.

Hard — senior & troubleshooting

"The site is slow" — how do you methodically diagnose a network performance problem? hard

Localize layer by layer and separate latency from bandwidth from loss. (1) Is it DNS? time resolution (dig); slow DNS adds to every request. (2) Path/latency: ping (RTT, loss) and mtr/traceroute to find where latency or loss spikes along the hops. (3) TCP behavior: ss -ti / packet capture (tcpdump/Wireshark) — look for retransmissions (loss), zero-window (receiver slow), high RTT, or small congestion window; MTU/MSS mismatch causing fragmentation or PMTUD black-holing. (4) TLS: handshake time, no session resumption. (5) App/server: is the server slow (TTFB) rather than the network? Distinguish: high RTT everywhere = distance/routing; loss = retransmits + jittery; bandwidth cap = throughput plateaus; intermittent = one bad hop or overloaded device. Always compare against a known-good path/client.

What is BGP and why is it critical (and fragile) for the internet? hard

BGP (Border Gateway Protocol) is the routing protocol between autonomous systems (ASes) — it's how networks announce which IP prefixes they can reach and choose paths across the internet. It's policy-based (not shortest-path): operators prefer routes by business relationships, AS-path length, local preference, etc. It's fragile because it historically trusts announcements: a misconfigured or malicious AS can announce prefixes it doesn't own (route hijack) or leak routes, blackholing or intercepting traffic — famous outages came from fat-fingered BGP. Mitigations: RPKI (cryptographically validate origin), route filtering, prefix limits, IRR. Internally, iBGP distributes external routes within an AS. It's "the protocol that holds the internet together," and also the one that periodically breaks it.

Two pods/hosts can't talk and there's no obvious error. How do you isolate the break? hard

Work the layers from local outward. (1) Name vs IP: does it fail by hostname only? → DNS (dig, resolver config, search domains, in k8s CoreDNS/kube-dns). (2) L3 reachability: ping the IP; check routing (ip route) and that the IP/subnet is correct. (3) L4: nc -vz host port / ss -ltn on the target — is the service listening, and on 0.0.0.0 vs 127.0.0.1? (4) Firewall/policy: security groups/NACLs (cloud), iptables/nftables, k8s NetworkPolicy default-deny, conntrack. (5) Path: traceroute/tcpdump on both ends to see where packets vanish (sent but not received = drop in between; SYN with no SYN-ACK = filtered). (6) MTU: large packets fail but small succeed → MTU/PMTUD black hole (common with VPN/overlay/tunnels). Capturing on both sides and seeing which direction's packets are missing is the fastest way to pin the responsible hop.

Explain MTU, MSS, and how a mismatch causes "works for small requests, hangs on large ones." hard

MTU is the largest L2 frame payload (typically 1500 bytes Ethernet; less over tunnels/VPN/overlays due to encapsulation overhead). MSS is the max TCP segment size, negotiated in the SYN, derived from MTU. If a path has a smaller MTU than the endpoints assume, large packets must be fragmented or, if the Don't-Fragment bit is set, the router sends an ICMP "fragmentation needed" so the sender lowers its packet size (Path MTU Discovery). When that ICMP is blocked by a firewall (common), the sender never learns — small packets (handshake, short requests) get through, but large packets (a big POST/response) silently vanish and the connection hangs. Fixes: allow ICMP type 3 code 4, or MSS clamping on the gateway to advertise a safe segment size. This is a classic overlay/VPN/Kubernetes-CNI gotcha.

How does Kubernetes networking work (pod-to-pod, service, ingress)? hard

The k8s model: every Pod gets its own routable IP and all pods can reach each other without NAT (the CNI plugin — Calico/Cilium/etc — implements this via overlay (VXLAN/Geneve) or native routing). Services give a stable virtual IP (ClusterIP) in front of a changing set of pod IPs; kube-proxy (iptables/IPVS) or eBPF (Cilium) load-balances connections to the service to backend pod endpoints. DNS (CoreDNS) resolves svc.namespace.svc.cluster.local to the ClusterIP. External access: NodePort (port on every node), LoadBalancer (cloud LB → NodePort), or an Ingress/Gateway controller (L7 routing by host/path, TLS termination) fronting Services. NetworkPolicies add L3/L4 firewalling between pods (default is allow-all until a policy selects them). Debugging usually comes down to: DNS, endpoints populated?, kube-proxy rules, NetworkPolicy, and MTU on the overlay.

Design a network for an HA, low-latency global app. hard

Anycast/GeoDNS to nearest region, CDN at edge, multi-AZ LBs with health checks, redundant paths, sane MTU/MSS, DDoS protection, segmented tiers with firewalls.

How do you mitigate a volumetric DDoS? hard

Upstream scrubbing/anycast absorption (CDN/DDoS service), edge rate limiting, drop malformed traffic, SYN cookies, capacity headroom — can't absorb volumetric at origin alone.

What is buffer bloat and how do you fix it? hard

Oversized network buffers queue packets, inflating latency under load; fix with AQM (CoDel/FQ-CoDel) and right-sized buffers/BBR.

How does ECMP load balancing work? hard

Equal-Cost Multi-Path hashes flows across multiple equal-cost routes (per-flow to preserve ordering), spreading load and adding path redundancy.

Explain TCP window scaling and BDP. hard

Bandwidth-Delay Product = bandwidth × RTT is the in-flight data needed to fill a link; window scaling raises the max window beyond 64KB so long-fat links aren't throughput-capped.

How do you design DNS for HA and fast failover? hard

Multiple authoritative NS across providers, health-checked failover/weighted/latency routing, sane TTLs (low enough for failover, high enough to cache), and anycast.

What is segment routing / MPLS used for? hard

Label-based forwarding for traffic engineering — steering flows along chosen paths for QoS, fast reroute, and predictable latency in carrier/large networks.

How do you secure traffic in a zero-trust network? hard

mTLS everywhere, per-request authn/authz, microsegmentation/network policies, no implicit trust by location, and continuous verification + logging.

How does NAT traversal (STUN/TURN/ICE) work? hard

STUN discovers your public mapping, TURN relays when direct fails, ICE tries candidate pairs to find a working path — used for P2P/VoIP through NATs.

How do you troubleshoot asymmetric latency (one direction slow)? hard

Compare forward/return paths (mtr both ways), check per-direction congestion/queueing, routing asymmetry, and one-way buffer bloat — RTT averages hide direction-specific issues.

Scenario-based

A service is intermittently unreachable. How do you troubleshoot layer by layer? hard

Bottom-up the stack. L3: ping the host (reachability, loss, RTT). Path: traceroute for where it breaks. DNS: does the name resolve, right IP, TTL flapping between good/bad endpoints? L4: telnet/nc the port — open? LB: are backends passing health checks (intermittent = one bad backend in rotation)? MTU: large packets dropped (path MTU / fragmentation). Capture with tcpdump at both ends to localize. "Intermittent" usually = one unhealthy member or a flapping route/DNS.

Latency between two data centers is high. How do you diagnose? medium

traceroute/mtr to find which hop adds latency or loss. Check for MTU/fragmentation (path MTU discovery blackholes), congestion (latency rises under load), routing (suboptimal/asymmetric path, recent route change), and TCP window / BDP (small window caps throughput on long-fat links — tune window scaling). Confirm propagation delay vs queuing delay. Distinguish a physics floor (distance) from a fixable bottleneck.

DNS resolves correctly but the connection still fails. What's wrong? medium

Name→IP works, so the problem is L3/L4+. Check a firewall / security group / NACL blocking the port, the service down or not listening on that port/interface, routing (no path/return path), or the resolved IP being stale/wrong (old record, wrong record type). Test with nc -vz host port and tcpdump to see if SYNs leave and whether anything comes back (silent drop = firewall; RST = nothing listening).

You see intermittent packet loss. How do you find the cause? hard

mtr/ping over time to quantify and locate loss per hop. Check interface errors/discards (ifconfig/ethtool -S, switch counters) — CRC/errors point to bad cable/NIC/duplex mismatch. Look at congestion (loss under load = queue drops), MTU issues (only large packets lost), and buffer/rate limits. Capture with tcpdump to see retransmits. Duplex mismatch and a flaky cable/optic are classic intermittent-loss culprits.

Design the network for a highly available web app. What do you include? hard

Multi-AZ redundancy with a load balancer fronting backends in each zone, health checks to route around failures, and redundant paths (no single switch/router/link SPOF). DNS with health-checked failover (and possibly anycast/GeoDNS) for region resilience. Segment tiers (public/private/data subnets), firewalls/SGs between them, and sane MTU. For internet edge: CDN + DDoS protection. Capacity headroom and graceful degradation. Match redundancy to the failure domain you're protecting against.

TLS handshakes are failing. How do you debug? medium

openssl s_client -connect host:443 -servername name shows the real failure. Check: certificate chain (missing intermediates), expiry, SNI (wrong cert served for the hostname), protocol/cipher mismatch (client wants TLS1.2+, server too old or vice-versa), and clock skew (cert "not yet valid"). Also a MITM/proxy intercepting, or hostname not matching SAN. The s_client output (verify code, served chain, negotiated version) pinpoints which.

Site is 'sometimes' down for users. Isolate it. hard

Check DNS (flapping/TTL), one bad LB backend, a flapping route, or MTU; reproduce from multiple locations, tcpdump both ends.

Throughput far below link speed. Diagnose. medium

Likely TCP window/BDP cap on a long-fat link (enable window scaling), loss-triggered backoff, or MTU; measure RTT/loss, tune window, check retransmits.

DNS changes not taking effect for some users. Why? medium

Cached records — TTL too high; lower TTL before changes, verify authoritative vs cached, and note some resolvers ignore low TTLs.

Firewall 'allows' the port but traffic drops. Check? medium

Stateless ACL missing return/ephemeral ports, asymmetric routing breaking state, wrong direction, or another device; tcpdump to see SYNs/replies.

Intermittent packet loss only under load. Cause? hard

Congestion/queue drops or saturated link/NIC buffer; check interface errors/discards, duplex mismatch, QoS — congestion vs physical errors.

TLS works in browser but fails from a service. Debug. medium

Missing intermediate certs (services don't fetch them), wrong SNI, protocol/cipher mismatch, or untrusted CA; test with openssl s_client.

New microservice can't reach the DB. Walk layers. medium

DNS resolves? port reachable (nc)? SG/NACL ingress+egress? routing/subnet? DB listening + accepting from that source? Bisect bottom-up.

Latency spikes every few minutes on a stable link. Cause? hard

Periodic congestion (cron/backup), route flapping/BGP reconvergence, or buffer bloat; correlate timing with traffic, check queue depths.

Subnet ran out of IPs, new instances fail. Fix? medium

CIDR too small — add a secondary CIDR/larger subnet, reclaim unused IPs/ENIs short-term; plan non-overlapping ranges with headroom.

Users in one region see high latency, others fine. Diagnose. hard

Routing/peering issue to that region or no nearby PoP; check traceroute from that region, add anycast/CDN edge or a regional deployment, verify GeoDNS.

what industry actually asks

Network/infra and cloud-network loops mix fundamentals (OSI, TCP vs UDP, subnetting/CIDR math, DNS records, "what happens when you type a URL") with troubleshooting scenarios ("site is slow," "two hosts can't connect," "works small / hangs large" = MTU). Senior/cloud roles add BGP, TLS internals, load balancing, and cloud + Kubernetes networking (VPC, security groups vs NACLs, CNI, services, ingress). Subnetting math and a clean layer-by-layer debugging method are almost always tested — practice both.

Network Engineer — Interview Questions.

Easy — fundamentals

Medium — applied

Hard — senior & troubleshooting

Scenario-based