Published on

Complete Guide: Deploying Production Grade Spring Boot API on AWS ECS

Authors
  • avatar
    Name
    Motions Technologies
    Twitter

Deploying Spring Boot APIs on AWS ECS: A Production-Grade Guide

Introduction

In today's cloud-native landscape, deploying microservices effectively isn't just about getting your code to run in production—it's about building a reliable, scalable, and maintainable system that can evolve with your business needs. While there are numerous ways to deploy Spring Boot applications in the cloud, Amazon ECS (Elastic Container Service) offers a compelling sweet spot between complexity and control, especially for teams starting their cloud journey.

This guide emerged from real-world experience deploying and managing Spring Boot microservices in production environments. Whether you're a startup launching your first production service or an enterprise team modernizing your infrastructure, you'll find practical, battle-tested approaches here.

Why This Guide Matters

The path from a local Spring Boot application to a production-grade microservice involves numerous critical decisions and potential pitfalls. Teams often struggle with questions like:

  • How do we properly containerize our Spring Boot applications?
  • What's the right way to handle configuration and secrets in different environments?
  • How do we ensure our services are both reliable and cost-effective?
  • What monitoring and observability practices should we implement?

This guide addresses these challenges head-on, providing concrete solutions that work in real production environments.

Why AWS ECS?

While platforms like Kubernetes offer powerful features, ECS provides several advantages for teams starting with containerized deployments:

  1. Lower Operational Overhead: ECS abstracts away much of the container orchestration complexity while still providing essential features for production workloads.

  2. Cost-Effective Scaling: With Fargate, you pay only for the resources you use, and Fargate Spot can reduce costs by up to 70% for non-critical workloads.

  3. AWS Integration: Seamless integration with AWS services like CloudWatch, IAM, and AWS Secrets Manager simplifies many operational tasks.

  4. Production-Ready Features: Built-in service discovery, load balancing, and auto-scaling capabilities make it suitable for production workloads from day one.

Guide Overview

1. Application Fundamentals

  • Spring Boot application setup and best practices
  • Container optimization for cloud deployment
  • Configuration management and externalization
  • Health checks and monitoring endpoints

2. Infrastructure Foundation

  • Network architecture and security setup
  • Load balancer configuration
  • Security groups and IAM roles
  • VPC design and subnet organization

3. Container Registry and Deployment

  • ECR repository setup and management
  • Image lifecycle policies
  • Tagging strategies
  • Vulnerability scanning

4. ECS Configuration

  • Cluster architecture and capacity providers
  • Task definition optimization
  • Service configuration and scaling
  • Resource allocation and monitoring

5. Operational Excellence

  • Logging and monitoring setup
  • Alerting configuration
  • Performance optimization
  • Cost management

6. Production Considerations

  • High availability setup
  • Disaster recovery planning
  • Security best practices
  • Performance tuning

7. Maintenance and Updates

  • Rolling deployment strategies
  • Backup and restore procedures
  • Scaling guidelines
  • Troubleshooting practices

This guide takes a practical approach, focusing on real-world scenarios and solutions rather than theoretical concepts. Each section includes concrete examples, configuration snippets, and lessons learned from production deployments. By the end, you'll have a comprehensive understanding of how to deploy and maintain Spring Boot applications on AWS ECS in a production environment.

Let's begin with setting up your Spring Boot application for cloud deployment...

Why Choose AWS ECS for Spring Boot Applications?

Amazon ECS offers several compelling advantages for Spring Boot deployments:

  1. Container Orchestration Benefits

    • Automated container placement and scheduling
    • Built-in service discovery
    • Integrated load balancing
    • Automated container recovery
    • Easy rolling updates and rollbacks
  2. AWS Integration Features

    • Native CloudWatch integration for logs and metrics
    • AWS Secrets Manager for sensitive data
    • AWS Parameter Store for configuration
    • IAM roles for fine-grained security
    • VPC integration for network isolation
  3. Cost Optimization Capabilities

    • Pay-per-use pricing model
    • Spot instance support for cost savings
    • Right-sizing recommendations
    • Resource utilization tracking
    • Reserved capacity options
  4. Operational Excellence

    • Managed container infrastructure
    • Automated patch management
    • Built-in high availability
    • Multiple deployment strategies
    • Extensive monitoring capabilities

Prerequisites and Environment Setup

Before starting the deployment process, ensure you have:

  1. Development Environment
# Required software versions
java -version  # Java 17 or later
mvn -version   # Maven 3.8+ or Gradle 7+
docker --version  # Docker 20.10+
aws --version    # AWS CLI 2.0+
git --version    # Git 2.0+
  1. AWS Account Configuration
# Configure AWS CLI
aws configure
AWS Access Key ID: YOUR_ACCESS_KEY
AWS Secret Access Key: YOUR_SECRET_KEY
Default region name: us-east-1
Default output format: json

# Verify configuration
aws sts get-caller-identity
  1. Required AWS Permissions
{
    "Version": "2024-10-24",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecs:*",
                "ecr:*",
                "elasticloadbalancing:*",
                "cloudwatch:*",
                "logs:*",
                "ec2:*",
                "iam:PassRole"
            ],
            "Resource": "*"
        }
    ]
}

1. Spring Boot Application Setup

1.1 Application Configuration

Basic Spring Boot Setup

  1. Project Structure
spring-api/
├── src/
│   ├── main/
│   │   ├── java/
│   │   │   └── com/example/api/
│   │   │       ├── Application.java
│   │   │       ├── config/
│   │   │       ├── controller/
│   │   │       ├── service/
│   │   │       └── model/
│   │   └── resources/
│   │       ├── application.yml
│   │       ├── application-prod.yml
│   │       └── logback-spring.xml
├── Dockerfile
├── docker-compose.yml
└── pom.xml
  1. Essential Dependencies (pom.xml)
<dependencies>
    <!-- Spring Boot Starters -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-validation</artifactId>
    </dependency>
    
    <!-- Monitoring -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
    
    <!-- AWS SDK -->
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk-secretsmanager</artifactId>
        <version>1.12.261</version>
    </dependency>
</dependencies>
  1. Application Properties (application.yml)
spring:
  application:
    name: spring-api
  profiles:
    active: ${SPRING_PROFILES_ACTIVE:local}
    
server:
  port: 8080
  shutdown: graceful
  
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    health:
      show-details: always
      probes:
        enabled: true
  health:
    livenessState:
      enabled: true
    readinessState:
      enabled: true
    
logging:
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"
  level:
    root: INFO
    com.example.api: DEBUG
    
app:
  cors:
    allowed-origins: ${CORS_ALLOWED_ORIGINS:*}
  1. Production Properties (application-prod.yml)
spring:
  main:
    banner-mode: off
    
server:
  tomcat:
    max-threads: 200
    accept-count: 100
    
management:
  endpoints:
    web:
      exposure:
        include: health,prometheus
  endpoint:
    health:
      show-details: never
      
logging:
  pattern:
    console: "%d{ISO8601} [%X{traceId}/%X{spanId}] %-5level [%t] %C{40}: %msg%n"

1.2 Containerization Setup

  1. Dockerfile (Multi-stage build)
# Build stage
FROM maven:3.8.4-openjdk-17-slim AS builder
WORKDIR /app
COPY pom.xml .
# Download dependencies first (cache layer)
RUN mvn dependency:go-offline -B
COPY src ./src
RUN mvn clean package -DskipTests

# Runtime stage
FROM eclipse-temurin:17-jre-focal
WORKDIR /app

# Add non-root user
RUN addgroup --system --gid 1001 appuser && \
    adduser --system --uid 1001 --group appuser

# Install tools for debugging and health checks
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    jq \
    && rm -rf /var/lib/apt/lists/*

# Copy jar from builder stage
COPY --from=builder /app/target/*.jar app.jar

# Set permissions
RUN chown -R appuser:appuser /app
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
    CMD curl -f http://localhost:8080/actuator/health || exit 1

# Environment variables
ENV JAVA_OPTS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"

# Expose port
EXPOSE 8080

# Run application
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
  1. Docker Compose (for local testing)
version: '3.8'
services:
  api:
    build: .
    ports:
      - "8080:8080"
    environment:
      - SPRING_PROFILES_ACTIVE=local
      - JAVA_OPTS=-Xmx512m -Xms256m
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 30s
      timeout: 3s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

1.3 Local Testing and Validation

  1. Build and Run Locally
# Build application
mvn clean package

# Build Docker image
docker build -t spring-api .

# Run container
docker run -p 8080:8080 spring-api

# Test endpoints
curl http://localhost:8080/actuator/health
curl http://localhost:8080/actuator/info
  1. Performance Testing
# Install Apache Benchmark
apt-get install apache2-utils

# Run load test
ab -n 1000 -c 10 http://localhost:8080/actuator/health

# Memory monitoring
docker stats spring-api
  1. Common Issues and Solutions
  • Memory Issues
# Check container logs
docker logs spring-api

# Adjust memory settings
docker run -p 8080:8080 -e JAVA_OPTS="-Xmx1g -Xms512m" spring-api
  • Connection Issues
# Check container networking
docker network inspect bridge

# Test container DNS
docker exec spring-api nslookup google.com
  1. Security Scanning
# Scan Docker image
docker scan spring-api

# Check for vulnerabilities in dependencies
mvn dependency-check:check

2. AWS Infrastructure Setup

2.1 Network Architecture Overview

graph TB I[Internet] --> IGW[Internet Gateway] IGW --> ALB[Application Load Balancer] subgraph "AZ-1" PUB1[Public Subnet] PRIV1[Private Subnet] PUB1 --> ALB PRIV1 --> ECS1[ECS Tasks] end subgraph "AZ-2" PUB2[Public Subnet] PRIV2[Private Subnet] PUB2 --> ALB PRIV2 --> ECS2[ECS Tasks] end ECS1 --> NAT1[NAT Gateway 1] ECS2 --> NAT2[NAT Gateway 2] NAT1 --> IGW NAT2 --> IGW

2.2 Network Component Setup

  1. VPC Creation
# Create VPC
VPC_ID=$(aws ec2 create-vpc \
    --cidr-block 10.0.0.0/16 \
    --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=spring-api-vpc}]' \
    --query 'Vpc.VpcId' \
    --output text)

# Enable DNS hostnames
aws ec2 modify-vpc-attribute \
    --vpc-id $VPC_ID \
    --enable-dns-hostnames

# Create Internet Gateway
IGW_ID=$(aws ec2 create-internet-gateway \
    --query 'InternetGateway.InternetGatewayId' \
    --output text)

# Attach Internet Gateway to VPC
aws ec2 attach-internet-gateway \
    --vpc-id $VPC_ID \
    --internet-gateway-id $IGW_ID

2.3 Subnet Architecture

  1. Create Subnets
# Public Subnets
PUBLIC_SUBNET_1=$(aws ec2 create-subnet \
    --vpc-id $VPC_ID \
    --cidr-block 10.0.1.0/24 \
    --availability-zone us-east-1a \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-1a}]' \
    --query 'Subnet.SubnetId' \
    --output text)

PUBLIC_SUBNET_2=$(aws ec2 create-subnet \
    --vpc-id $VPC_ID \
    --cidr-block 10.0.3.0/24 \
    --availability-zone us-east-1b \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-1b}]' \
    --query 'Subnet.SubnetId' \
    --output text)

# Private Subnets
PRIVATE_SUBNET_1=$(aws ec2 create-subnet \
    --vpc-id $VPC_ID \
    --cidr-block 10.0.2.0/24 \
    --availability-zone us-east-1a \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-1a}]' \
    --query 'Subnet.SubnetId' \
    --output text)

PRIVATE_SUBNET_2=$(aws ec2 create-subnet \
    --vpc-id $VPC_ID \
    --cidr-block 10.0.4.0/24 \
    --availability-zone us-east-1b \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-1b}]' \
    --query 'Subnet.SubnetId' \
    --output text)

2.4 Security Group Architecture

  1. Create Security Groups
# ALB Security Group
ALB_SG_ID=$(aws ec2 create-security-group \
    --group-name alb-sg \
    --description "ALB Security Group" \
    --vpc-id $VPC_ID \
    --query 'GroupId' \
    --output text)

# ECS Security Group
ECS_SG_ID=$(aws ec2 create-security-group \
    --group-name ecs-sg \
    --description "ECS Security Group" \
    --vpc-id $VPC_ID \
    --query 'GroupId' \
    --output text)

# Configure Security Group Rules
aws ec2 authorize-security-group-ingress \
    --group-id $ALB_SG_ID \
    --protocol tcp \
    --port 80 \
    --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
    --group-id $ECS_SG_ID \
    --protocol tcp \
    --port 8080 \
    --source-group $ALB_SG_ID

2.5 Load Balancer Architecture

When deploying Spring Boot applications in production, the Application Load Balancer (ALB) serves as the primary entry point for all traffic. Understanding its architecture is crucial for a reliable deployment.

Understanding Load Balancer Components

  1. Listeners and Routing

    • Port 80 (HTTP): Used for redirection to HTTPS
    • Port 443 (HTTPS): Primary listener for secure traffic
    • Rules determine how traffic is routed to targets
  2. Target Groups

    • Groups ECS tasks that can receive traffic
    • Implements health checking
    • Manages task registration/deregistration
    • Handles load balancing algorithms
  3. Health Checks

    • Regular checks against /actuator/health endpoint
    • Determines task availability
    • Affects traffic routing decisions

Implementation Guide

[Rest of the implementation details...]

  1. Create Application Load Balancer
# Create ALB
ALB_ARN=$(aws elbv2 create-load-balancer \
    --name spring-api-alb \
    --subnets $PUBLIC_SUBNET_1 $PUBLIC_SUBNET_2 \
    --security-groups $ALB_SG_ID \
    --scheme internet-facing \
    --type application \
    --query 'LoadBalancers[0].LoadBalancerArn' \
    --output text)

# Create Target Group
TG_ARN=$(aws elbv2 create-target-group \
    --name spring-api-tg \
    --protocol HTTP \
    --port 8080 \
    --vpc-id $VPC_ID \
    --target-type ip \
    --health-check-path /actuator/health \
    --health-check-interval-seconds 30 \
    --health-check-timeout-seconds 5 \
    --healthy-threshold-count 2 \
    --unhealthy-threshold-count 3 \
    --query 'TargetGroups[0].TargetGroupArn' \
    --output text)

# Create Listener
aws elbv2 create-listener \
    --load-balancer-arn $ALB_ARN \
    --protocol HTTP \
    --port 80 \
    --default-actions Type=forward,TargetGroupArn=$TG_ARN

2.6 Infrastructure Validation

  1. Network Validation
# Test VPC
aws ec2 describe-vpcs --vpc-ids $VPC_ID

# Test Subnets
aws ec2 describe-subnets \
    --filters "Name=vpc-id,Values=$VPC_ID" \
    --query 'Subnets[].{ID:SubnetId,CIDR:CidrBlock,AZ:AvailabilityZone}'

# Test Security Groups
aws ec2 describe-security-groups \
    --group-ids $ALB_SG_ID $ECS_SG_ID
  1. Load Balancer Validation
# Check ALB health
aws elbv2 describe-load-balancer-health \
    --load-balancer-arn $ALB_ARN

# Check target group health
aws elbv2 describe-target-health \
    --target-group-arn $TG_ARN

2.7 Common Infrastructure Issues and Solutions

  1. Connectivity Issues
# Check route tables
aws ec2 describe-route-tables \
    --filters "Name=vpc-id,Values=$VPC_ID"

# Verify NAT Gateway status
aws ec2 describe-nat-gateways \
    --filter "Name=vpc-id,Values=$VPC_ID"

# Check security group rules
aws ec2 describe-security-group-rules \
    --filters "Name=group-id,Values=$ECS_SG_ID"
  1. Load Balancer Issues
# Check ALB access logs
aws s3 ls s3://your-alb-logs-bucket/AWSLogs/

# Enable access logging
aws elbv2 modify-load-balancer-attributes \
    --load-balancer-arn $ALB_ARN \
    --attributes Key=access_logs.s3.enabled,Value=true \
    Key=access_logs.s3.bucket,Value=your-alb-logs-bucket

3. Container Registry and ECS Setup

3.1 Understanding Container Registry (ECR)

Amazon Elastic Container Registry (ECR) serves as the backbone of your container deployment strategy. Think of it as a secure, managed container image library that seamlessly integrates with ECS.

Why ECR is Critical for Production

  1. Security:

    • Private repository for your container images
    • Integration with IAM for access control
    • Automatic vulnerability scanning
    • Encryption at rest using AWS KMS
  2. Performance:

    • Fast image pulls within AWS network
    • Reduced latency for ECS deployments
    • Regional replication for global deployments
  3. Cost Efficiency:

    • Pay only for storage and data transfer
    • No infrastructure to manage
    • Lifecycle policies for automatic cleanup

Here's how to set up your ECR repository with production-grade configurations:

# Create repository with security features enabled
REPO_URI=$(aws ecr create-repository \
    --repository-name spring-api \
    --image-scanning-configuration scanOnPush=true \
    --encryption-configuration encryptionType=KMS \
    --image-tag-mutability IMMUTABLE \
    --query 'repository.repositoryUri' \
    --output text)

What this configuration does:

  • Enables automatic vulnerability scanning
  • Uses KMS encryption for security
  • Makes tags immutable to prevent overwrites
  • Provides a unique URI for your repository

3.2 Image Lifecycle Management

In production environments, managing container images effectively is crucial. You need to balance between keeping necessary images and controlling storage costs.

Best Practices for Image Management:

  1. Tagging Strategy:

    # Example tagging scheme
    VERSION=$(git rev-parse --short HEAD)
    ENVIRONMENT="prod"
    BUILD_NUMBER=${CIRCLE_BUILD_NUM:-local}
    
    docker tag spring-api:${VERSION} ${REPO_URI}:${VERSION}
    docker tag spring-api:${VERSION} ${REPO_URI}:${ENVIRONMENT}-${BUILD_NUMBER}
    docker tag spring-api:${VERSION} ${REPO_URI}:latest
    

    Why this matters:

    • Git hash provides traceability
    • Environment tag helps in deployment management
    • Build number enables rollback capabilities
  2. Lifecycle Policies:

{
    "rules": [
        {
            "rulePriority": 1,
            "description": "Keep last 30 production images",
            "selection": {
                "tagStatus": "tagged",
                "tagPrefixList": ["prod"],
                "countType": "imageCountMoreThan",
                "countNumber": 30
            },
            "action": {
                "type": "expire"
            }
        }
    ]
}

This policy:

  • Retains important production images
  • Automatically cleans up old images
  • Reduces storage costs
  • Maintains compliance requirements

3.3 ECS Cluster Architecture

Your ECS cluster design significantly impacts scalability, reliability, and cost efficiency.

graph TB subgraph "ECS Cluster Design" subgraph "Production Workloads" PS[Production Service] --> PT1[Task 1] PS --> PT2[Task 2] PT1 --> PC1[Container] PT2 --> PC2[Container] end subgraph "Capacity Providers" CP1[Fargate] --> PS CP2[Fargate Spot] --> PS end subgraph "Auto Scaling" M1[CPU Metrics] M2[Memory Metrics] M3[Custom Metrics] end end

Understanding Capacity Providers

When setting up your ECS cluster, choosing the right capacity provider strategy is crucial:

aws ecs create-cluster \
    --cluster-name production \
    --capacity-providers FARGATE FARGATE_SPOT \
    --default-capacity-provider-strategy \
    capacityProvider=FARGATE,weight=1,base=1 \
    capacityProvider=FARGATE_SPOT,weight=3

This configuration:

  • Uses both Fargate and Fargate Spot instances
  • Maintains at least one Fargate task for stability
  • Leverages Spot instances for cost savings
  • Provides automatic scaling capabilities

Real-world tip: Many organizations use a mix of 30% Fargate and 70% Fargate Spot for production workloads, saving up to 70% on compute costs while maintaining stability.

3.4 Task Definition Deep Dive

Your task definition is the blueprint for how your application runs in ECS. Getting this right is crucial for production performance.

Memory and CPU Allocation

Understanding container memory patterns is crucial:

{
  "containerDefinitions": [
    {
      "memory": 2048,
      "memoryReservation": 1024,
      "cpu": 1024,
      "environment": [
        {
          "name": "JAVA_OPTS",
          "value": "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"
        }
      ]
    }
  ]
}

Why these settings matter:

  • Memory reservation ensures resource availability
  • CPU units align with Java thread pool sizing
  • JVM settings optimize garbage collection
  • Container support enables proper resource detection

Health Check Configuration

Proper health checks are crucial for production reliability:

"healthCheck": {
  "command": [
    "CMD-SHELL",
    "curl -f http://localhost:8080/actuator/health || exit 1"
  ],
  "interval": 30,
  "timeout": 5,
  "retries": 3,
  "startPeriod": 60
}

Real-world considerations:

  • Start period accommodates Java warm-up time
  • Interval balances responsiveness and overhead
  • Timeout prevents hanging health checks
  • Retries prevent false negatives