Goals
- Validate that the pet site remains accessible during an AZ-wide outage
- Measure performance degradation and failover behavior under controlled conditions
Experiment Components
| Component | Description |
|---|---|
| Given | The pet site is deployed across multiple Availability Zones. |
| Hypothesis | A single AZ power failure should not render the pet site unavailable; minor latency spikes are acceptable. |
Limiting the Blast Radius
To ensure a focused test, only resources tagged with the following key-value pair will be targeted by FIS:Only resources labeled
AZ impairment power: ready are affected by the experiment, keeping the rest of your environment safe.
Expected Behavior
- FIS triggers a simulated power loss in the targeted AZ.
- EC2 instances and containers in that AZ go offline.
- Application load balancers and Auto Scaling groups in remaining AZs absorb the traffic.
- Pet site remains reachable, with possible latency increase during failover.
Always run FIS experiments in a non-production or staging environment first. Improper scoping can lead to real service disruptions.