← Debug Guides

DEBUG GUIDE · LINUX · SRE PLAYBOOK

Debugging Ubuntu / Linux Server.

linux ubuntu debugging sre
Server debugging walks layers: can I reach it (SSH/network) → is the service up (systemd) → are resources OK (CPU/mem/disk) → what do the logs say (journalctl). For deep dives see CPU, Memory, Disk.

Can't SSH in

# from your machine:
ping host ; nc -vz host 22         # reachable? port open (refused vs timeout)?
ssh -v user@host                   # verbose handshake — where it fails
# causes: firewall/SG, sshd down, wrong key/perms (key must be 600),
#         host out of disk (sshd can't fork), fail2ban ban

If you have console access: systemctl status ssh, journalctl -u ssh, check /var/log/auth.log and disk space.

A service won't start

systemctl status myapp              # state + recent log lines
journalctl -u myapp -n 100 --no-pager
journalctl -u myapp -f              # follow
systemctl cat myapp                 # the unit file
systemctl daemon-reload             # after editing a unit

Causes. Bad config (it logs why); wrong ExecStart path; missing dependency / env; port already in use; permission denied; crashed and hit StartLimit. Reset with systemctl reset-failed.

read the journal, not guesses journalctl -u svc -n 100 almost always contains the exact failure line. Read it before changing anything.

Boot / won't come up

systemctl --failed                  # what failed at boot
journalctl -b -p err                # this boot, errors only
journalctl -b -1                    # previous boot (after a crash)
systemd-analyze blame               # slow units

Common: a bad /etc/fstab mount hangs boot (use nofail); full /; a service in a restart loop.

Resource pressure

uptime ; top ; htop               # load avg vs cores
free -m                            # 'available' is the real number
df -h ; df -i                      # disk space ; inodes
iostat -xz 1 ; iotop -o            # disk I/O
ss -ltnp                           # listening ports + pid
dmesg -T | tail                    # OOM kills, hardware, filesystem errors

Networking

ip a ; ip route                    # interfaces + routes
cat /etc/netplan/*.yaml ; netplan try   # Ubuntu network config
ss -tunap ; ufw status             # firewall
dig name ; curl -v http://target

Packages & disk from apt

apt update && apt upgrade
apt --fix-broken install           # dpkg interrupted
journalctl --vacuum-time=7d        # shrink journal if /var full
apt clean ; du -sh /var/* | sort -rh | head

Where the logs are

Path / cmdWhat
journalctlsystemd unit + kernel logs (primary)
/var/log/sysloggeneral system log
/var/log/auth.loglogins, sudo, sshd
dmesgkernel ring buffer (OOM, disk, hardware)
/var/log/cloud-init.logcloud VM first-boot provisioning
← prev: AWS next: Windows Server →
© cvam — written in plaintext, served warm