EC2 Troubleshooting

Summary

Most EC2 issues fall into a handful of categories: launch failures, connectivity, performance, storage/network, cost, and status check failures.
Effective troubleshooting starts with a standard checklist (state, network/Security Group/route, logs, CloudWatch metrics) before making configuration changes.
Strong monitoring and logging (CloudWatch, VPC Flow Logs, Systems Manager) dramatically reduce mean time to resolution.

Troubleshooting flow

mermaid

flowchart TD
  A[EC2 issue] --> B{Category?}
  B --> C[Launch failure]
  B --> D[Cannot connect]
  B --> E[Performance problem]
  B --> F[Storage/Network issue]
  B --> G[Cost anomaly]

  C --> C1[Check limits, AZ capacity, instance type]
  D --> D1[Check SG/NACL/Route, IP, key/user]
  E --> E1[Check CPU/Mem/IO; right-size/ASG]
  F --> F1[Check EBS type, IOPS, network metrics]
  G --> G1[Use Cost Explorer; find idle resources & data transfer]

Best Practices

Always verify the basics first: instance state, correct Security Group rules, route tables, NACLs, DNS/IP, and correct SSH/RDP user and key.
Use CloudWatch metrics and logs to pinpoint root causes (high CPU, I/O bottlenecks, network throttling, status check failures) instead of guessing.
For status check failures:
- System status → underlying host issue → stop/start to move to a new host, check AWS Health.
- Instance status → OS/app/network issue inside the instance → inspect logs and configuration.
For performance issues, prioritize right‑sizing, code/DB optimization, appropriate EBS volume and instance type selection, and enhanced networking/placement groups where applicable.
For cost anomalies, look for always‑on instances, unattached EBS volumes, idle Elastic IPs, old snapshots, and unexpected data transfer patterns.
Document root causes and fixes, then improve monitoring/alerting or configuration to prevent recurrence.

Exam Notes

Many questions describe “cannot SSH/RDP” or “no internet”; the expected approach is to walk through the basic connectivity checklist (state, SG, NACL, route, IP, key/user).
Know the difference between System status checks (host issues, fixed by stop/start) and Instance status checks (OS/app/network issues, fixed by configuration changes).
Cost‑related troubleshooting often involves identifying idle resources (running instances, unattached EBS, unused Elastic IPs, old snapshots) and high data transfer patterns.
Key tools to remember: CloudWatch (metrics/logs), VPC Flow Logs, Systems Manager, EC2 Instance Connect, AWS Health Dashboard, Cost Explorer, AWS Budgets.

Amazon EC2

EC2 Troubleshooting

Summary

Troubleshooting flow

Best Practices

Exam Notes

AWS documentation

EC2 Troubleshooting ​

Summary ​

Troubleshooting flow ​

Best Practices ​

Exam Notes ​

AWS documentation ​

Related docs in this Hub ​

EC2 Troubleshooting

Summary

Troubleshooting flow

Best Practices

Exam Notes

AWS documentation

Related docs in this Hub