EC2 Lifecycle, Fleet & Recovery Diagnostic Troubleshooter
Use the interactive troubleshooter below to identify your EC2 lifecycle, fleet, or recovery blind spot by symptom, review the raw evidence, understand the root cause, and apply the recommended fix.
🚨 Step 1: What specific monitoring blind spot are you experiencing?
Please click the most accurate description:
Quick Reference Table
| # | Scenario | Key Error Signal | Root Cause | The Fix |
|---|---|---|---|---|
| 1 | DLM Error states silently cause infinite snapshot retention | "state": "ERROR" | Data Lifecycle Manager policies entering an error state (e.g., deleted IAM roles) silently suspend age-based deletion schedules, creating a cost black hole. | N/A |
| 2 | EC2 Fleet configuration errors block target fulfillment | "sub-type": "spotFleetRequestConfigurationInvalid" | A structurally invalid launch request (like unsupported instance types) blocks the fleet indefinitely without retrying or scaling. | N/A |
| 3 | Packets dropped silently due to Security Group conntrack exhaustion | conntrack_allowance_exceeded | The instance exceeds its maximum allowance for tracked connections, causing the network interface to drop new packets regardless of security group rules. | N/A |
| 4 | CloudWatch recovery failure notifications are suppressed | You might not receive recovery failure notifications | An ongoing AWS Service Health event can simultaneously block automated CloudWatch recovery actions and suppress the delivery of failure notifications to SNS. | N/A |