Most AWS tickets are networking (security groups, routes, subnets) or IAM (not authorized). When something "can't connect" or is "denied", check those two first — they cause 80% of it.
Can't SSH/connect to EC2
Checklist, in order:
- Security group — inbound rule for port 22 (or your app port) from your IP?
- Public IP / subnet — instance in a public subnet with a public IP, and the subnet routes
0.0.0.0/0to an Internet Gateway? - NACL — subnet NACL allows the port both directions (stateless)?
- Key / user — right
.pem(chmod 400), right user (ec2-user/ubuntu/admin)? - Instance health — status checks passing? Or use SSM Session Manager (no SSH needed).
SG stateful, NACL stateless
Security groups auto-allow return traffic; NACLs don't — you must allow ephemeral ports
(1024–65535) outbound for replies. A "works then hangs" connection often = NACL missing the
return range.
"not authorized to perform" / AccessDenied
aws sts get-caller-identity # who am I, really? # decode the denial: aws iam simulate-principal-policy --policy-source-arn <arn> \ --action-names s3:GetObject --resource-arns <arn>
Causes. Missing IAM permission; an explicit Deny (always wins);
S3 bucket policy / SCP / permission boundary overriding; wrong role assumed; resource in another
account. Read the error — it names the action + resource. Use CloudTrail to see the exact denied call.
No internet / can't reach a service
| Need | Requires |
|---|---|
| Public subnet → internet | route 0.0.0.0/0 → Internet Gateway + public IP |
| Private subnet → internet (outbound) | route → NAT Gateway (in a public subnet) |
| Reach AWS APIs privately | VPC Endpoint (no NAT needed) |
| VPC ↔ VPC | peering / Transit Gateway + routes both sides |
ELB 5xx / unhealthy targets
# Target group health = the usual culprit # check: health check path returns 200? SG allows LB → target port? # 503 from ALB = no healthy targets ; 504 = target too slow
Fix. Health-check path/port correct and returning 200; target SG allows the LB's SG on the app port; targets registered and passing.
Can't connect to RDS
- RDS security group must allow your source SG/IP on the DB port (5432/3306).
- Same VPC or peered + routes; publicly accessible flag if connecting from outside.
- Hitting
max_connections? Use RDS Proxy / a pooler.
Lambda errors / timeouts
# logs are in CloudWatch Logs /aws/lambda/<fn> # Task timed out -> raise timeout or fix slow downstream # permission errors -> the function's execution ROLE lacks the action # in a VPC + needs internet -> route via NAT (Lambda in private subnets)
Where to look
- CloudWatch Logs/Metrics — app + service logs, alarms.
- CloudTrail — every API call (who did what, and why denied).
- VPC Flow Logs — accepted/rejected packets (prove a SG/NACL drop).
- VPC Reachability Analyzer — path test between two resources.