Skip to main content

EC2 Lifecycle, Fleet & Recovery Diagnostic Troubleshooter

Use the interactive troubleshooter below to identify your EC2 lifecycle, fleet, or recovery blind spot by symptom, review the raw evidence, understand the root cause, and apply the recommended fix.

🚨 Step 1: What specific monitoring blind spot are you experiencing?

Please click the most accurate description:


Quick Reference Table

#ScenarioKey Error SignalRoot CauseThe Fix
1DLM Error states silently cause infinite snapshot retention"state": "ERROR"Data Lifecycle Manager policies entering an error state (e.g., deleted IAM roles) silently suspend age-based deletion schedules, creating a cost black hole.N/A
2EC2 Fleet configuration errors block target fulfillment"sub-type": "spotFleetRequestConfigurationInvalid"A structurally invalid launch request (like unsupported instance types) blocks the fleet indefinitely without retrying or scaling.N/A
3Packets dropped silently due to Security Group conntrack exhaustionconntrack_allowance_exceededThe instance exceeds its maximum allowance for tracked connections, causing the network interface to drop new packets regardless of security group rules.N/A
4CloudWatch recovery failure notifications are suppressedYou might not receive recovery failure notificationsAn ongoing AWS Service Health event can simultaneously block automated CloudWatch recovery actions and suppress the delivery of failure notifications to SNS.N/A