Key | Value |
---|---|
Environment | LocalStack |
Services | API Gateway, Lambda, DynamoDB, SNS, SQS, Route53 |
Integrations | AWS CLI, Maven, pytest |
Categories | Chaos Engineering, Serverless, Multi-Region |
Level | Advanced |
Use Case | Chaos Engineering, Serverless, Multi-Region |
GitHub | Repository link |
This sample demonstrates how to test resiliency in serverless applications using chaos engineering principles, provided by LocalStack's Chaos API. The application features a multi-region product management system that gracefully handles service outages through automated failover mechanisms and message queuing. To test this application sample, we will demonstrate how you use the Chaos API to inject controlled failures into your infrastructure and validate that your application responds appropriately. We will show how Route53 health checks automatically redirect traffic between regions during outages and how SNS/SQS messaging ensures no data is lost when services are unavailable.
Note
This sample demonstrates LocalStack's new Chaos API, which replaces the previous FIS (Fault Injection Simulator) functionality in this sample application. Chaos API provides more comprehensive local fault injection testing for cloud-native applications and is available in LocalStack Enterprise.
The following diagram shows the architecture that this sample application builds and deploys:
Note
The above architecture diagram is a simplified view of the application. The actual architecture is more complex and includes additional services and components, distributed across multiple regions.
Primary Region (us-east-1):
- API Gateway with product management and health check endpoints
- Lambda Functions for product CRUD operations and health monitoring
- DynamoDB table for product storage with streams enabled
- SNS Topic for publishing failed requests during outages
- SQS Queue for buffering requests when DynamoDB is unavailable
Secondary Region (us-west-1):
- Identical service stack for failover scenarios
- DynamoDB table synchronized via streams and Lambda replication
- Independent health check endpoint for Route53 monitoring
Cross-Region Components:
- Route53 hosted zone with health checks and failover routing policies
- DNS-based traffic routing with automatic failover capabilities
LOCALSTACK_AUTH_TOKEN
- Docker and Docker Compose
- AWS CLI with the
awslocal
wrapper - Maven 3.8.5+ & Java 17
- Python 3.11+
make
(optional, but recommended for running the sample application)dig
command-line DNS lookup utility
To run the sample application, you need to install the required dependencies.
First, clone the repository:
git clone https://github.com/localstack/sample-chaos-serverless-multi-region-failover.git
Then, navigate to the project directory:
cd sample-chaos-serverless-multi-region-failover
Install the project dependencies by running the following command:
make install
This will:
- Build the Java Lambda functions and package them into JAR files
- Install Python test dependencies for the integration test suite
Start LocalStack using Docker Compose with the LOCALSTACK_AUTH_TOKEN
pre-configured:
LOCALSTACK_AUTH_TOKEN=<your-auth-token> docker compose up
The infrastructure will be automatically deployed using LocalStack's Initialization Hooks. The deployment creates:
- DynamoDB tables in both
us-east-1
andus-west-1
regions - Lambda functions for product management and health checks
- API Gateway endpoints with custom domain configurations
- SNS topics and SQS queues for message buffering
- DynamoDB streams with replication Lambda triggers
To deploy additional chaos engineering scenarios, run:
make deploy
This executes the solution scripts:
./solutions/dynamodb-outage.sh # Sets up DynamoDB outage handling
./solutions/route53-failover.sh # Configures Route53 DNS failover
The sample application provides comprehensive test coverage for both chaos engineering scenarios.
Execute the complete test suite:
make test
This runs:
- DynamoDB outage resilience tests
- Route53 DNS failover validation
- End-to-end integration scenarios
Test normal product operations:
curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \
--header 'Content-Type: application/json' \
--data '{
"id": "prod-2024",
"name": "Test Product",
"price": "29.99",
"description": "A product for testing chaos scenarios"
}'
Expected response: Product added/updated successfully.
Verify Route53 failover configuration:
dig @localhost test.hello-localstack.com CNAME
This should resolve to the primary API Gateway endpoint initially, then switch to the secondary during outages.
This sample demonstrates comprehensive chaos engineering practices by using LocalStack's Chaos API to inject controlled failures into your infrastructure. The chaos testing validates that your application can gracefully handle service outages without data loss.
The application includes sophisticated error handling for database outages. When DynamoDB becomes unavailable, the Lambda functions:
- Catch
DynamoDbException
errors from AWS SDK calls - Return user-friendly error messages instead of failing completely
- Publish failed requests to SNS for later processing
- Use SQS dead letter queues and retry mechanisms
- Automatically process queued items when services recover
To simulate a DynamoDB outage:
curl -X POST 'http://localhost:4566/_localstack/chaos/faults' \
-H 'Content-Type: application/json' \
-d '[{"service": "dynamodb", "region": "us-east-1"}]'
During the outage, product creation requests are gracefully handled:
curl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \
--data '{"id": "prod-outage", "name": "Outage Test", "price": "19.99", "description": "Testing resilience"}'
Expected response: A DynamoDB error occurred. Message sent to queue.
The message is automatically processed when you clear the outage:
curl -X DELETE 'http://localhost:4566/_localstack/chaos/faults' \
-H 'Content-Type: application/json' \
-d '[]'
Query the DynamoDB table to see the product:
awslocal dynamodb scan --table-name Products
The key chaos engineering patterns used in this sample are:
- Using LocalStack Chaos API for controlled service failures
- Monitoring application behavior during failure scenarios
- Ensuring systems return to normal operation after faults clear
- Limiting failures to specific services and regions
- Validating resilience through repeatable test scenarios
The sample showcases advanced DNS failover capabilities using Route53 health checks and routing policies. This ensures high availability by automatically redirecting traffic from failed regions to healthy alternatives.
The Route53 setup includes:
- Monitoring primary region endpoints every 10 seconds
- Primary and secondary CNAME records with different priorities
- Services deployed across
us-east-1
(primary) andus-west-1
(secondary) - DNS resolution changes based on health check status
- Traffic automatically returns to primary when healthy
Verify initial DNS resolution points to primary:
dig @localhost test.hello-localstack.com CNAME
# Expected: 12345.execute-api.localhost.localstack.cloud
Inject chaos into the primary region:
curl -X POST 'http://localhost:4566/_localstack/chaos/faults' \
-H 'Content-Type: application/json' \
-d '[
{"service": "apigateway", "region": "us-east-1"},
{"service": "lambda", "region": "us-east-1"}
]'
Wait for health check failures and verify the failover:
dig @localhost test.hello-localstack.com CNAME
# Expected: 67890.execute-api.localhost.localstack.cloud
Clear the chaos to test failback:
curl -X DELETE 'http://localhost:4566/_localstack/chaos/faults' \
-H 'Content-Type: application/json' \
-d '[]'
Issue | Resolution |
---|---|
DNS resolution returns NXDOMAIN | Ensure LocalStack is running with DNS enabled (port 53). Verify hosted zone exists with awslocal route53 list-hosted-zones |
Health checks always report unhealthy | Check that API Gateway endpoints respond with HTTP 200. Verify Lambda functions are deployed and working: awslocal lambda list-functions |
Failover not triggering after chaos injection | Wait at least 25 seconds for health check failure threshold. Check chaos faults are active: curl --location --request GET 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' |
Products not appearing in DynamoDB after recovery | Verify SQS queue processing with awslocal sqs receive-message . Check Lambda function logs for processing errors |