TL;DR: Migrate your over-provisioned B2B SaaS PostgreSQL databases to Amazon Aurora Serverless v2 to efficiently handle 9-to-5 traffic spikes without paying for 24/7 peak capacity. This guide covers how to validate workload variance using AWS CloudWatch CLI metrics and deploy a highly available cluster with Terraform. You will also learn how to configure optimal Aurora Capacity Unit (ACU) limits to prevent buffer pool thrashing while maximizing cost savings.
⚡ Key Takeaways
- Analyze 14-day
CPUUtilizationmetrics via the AWS CLI to validate a high peak-to-trough variance before committing to a serverless migration. - Calculate compute needs using Aurora Capacity Units (ACUs), where 1 ACU equates to approximately 2 GB of RAM.
- Avoid setting minimum ACUs to the 0.5 floor to prevent PostgreSQL from dropping
shared_buffersand causing buffer pool thrashing during scale-down. - Use Terraform's
serverlessv2_scaling_configurationto set precise boundaries, such asmin_capacity = 2.0for overnight baseline andmax_capacity = 32.0for morning connection storms. - Deploy at least one read replica (
count = 2in Terraform) in a separate Availability Zone to ensure rapid failover and high availability.
Running a B2B SaaS application often means your multi-tenant PostgreSQL database is bleeding cash. Because your users follow standard 9-to-5 business hours, the application experiences massive traffic spikes at 9:00 AM when thousands of users log in, run complex analytical queries, and load their dashboards. By 6:00 PM, traffic drops by 90%, leaving your database practically idle until the next morning.
To survive this 9:00 AM connection storm, you've likely severely over-provisioned your Amazon RDS instance. You are paying 24/7 for a massive db.r6g.8xlarge just to handle a few hours of peak load. If you try to downsize, the morning spike exhausts your memory, resulting in connection timeouts, out-of-memory (OOM) kills, and furious customers. Traditional RDS auto-scaling takes too long—adding read replicas takes minutes, and scaling instance classes requires a reboot and downtime.
Amazon Aurora Serverless v2 solves this infrastructure bottleneck natively. Unlike its predecessor (v1), which suffered from cold starts and scaling pauses, v2 scales compute capacity in place, in milliseconds. However, migrating a production multi-tenant workload isn't just about clicking "modify" in the AWS console. You need to manage connection storms, calculate your true baseline capacity, and execute the migration with minimal downtime.
Here is the engineering playbook for migrating a high-traffic SaaS PostgreSQL workload to Aurora Serverless v2.
Analyzing the Multi-Tenant Workload Profile
Before migrating, you must mathematically validate that your workload is "spiky" enough to benefit from a serverless architecture. Aurora Serverless v2 charges by the Aurora Capacity Unit (ACU). If your database runs at 80% CPU utilization 24/7, migrating to Serverless v2 will actually increase your AWS bill.
To determine if your B2B SaaS fits the serverless model, analyze your existing RDS CloudWatch metrics over a 14-day period to identify the delta between your peak and trough utilization.
You can pull this data programmatically using the AWS CLI to analyze your CPU variance:
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=your-prod-rds-instance \
--start-time 2026-04-01T00:00:00Z \
--end-time 2026-04-14T00:00:00Z \
--period 3600 \
--statistics Maximum Minimum Average \
--query 'Datapoints[*].[Timestamp, Maximum, Minimum, Average]' \
--output table
If the results show a peak CPU of 85% during business hours and an average of 10% overnight and on weekends, your workload is the perfect candidate. Traditional provisioned instances force you to pay for the 85% high-water mark indefinitely. Serverless v2 allows your infrastructure to perfectly hug that usage curve, scaling down to minimal ACUs when your tenants log off.
Aurora Serverless v2 Architecture: ACUs and Scaling
Aurora Serverless v2 capacity operates using ACUs. Each ACU represents roughly 2 GB of RAM, along with corresponding CPU and networking capacity. The cluster scales seamlessly from 0.5 ACUs to 128 ACUs.
When defining your cluster, the most critical architectural decision is setting your minimum and maximum ACU limits. Setting the minimum too low (e.g., 0.5 ACUs) can lead to buffer pool thrashing when scaling up under sudden, heavy read loads, because PostgreSQL drops cached data from shared_buffers when scaling down memory.
Here is how you provision an Aurora Serverless v2 cluster using Terraform, optimized for a medium-scale SaaS application:
resource "aws_rds_cluster" "saas_aurora_cluster" {
cluster_identifier = "prod-multi-tenant-db"
engine = "aurora-postgresql"
engine_version = "15.4"
database_name = "saas_production"
master_username = "dbadmin"
master_password = var.db_password
backup_retention_period = 7
preferred_backup_window = "02:00-04:00"
skip_final_snapshot = false
serverlessv2_scaling_configuration {
# 2 ACUs = ~4GB RAM baseline. Keeps hot data in memory overnight.
min_capacity = 2.0
# 32 ACUs = ~64GB RAM max capacity for 9 AM spikes.
max_capacity = 32.0
}
}
resource "aws_rds_cluster_instance" "saas_aurora_instances" {
count = 2 # One writer, one reader for high availability
identifier = "prod-aurora-instance-${count.index}"
cluster_identifier = aws_rds_cluster.saas_aurora_cluster.id
instance_class = "db.serverless"
engine = aws_rds_cluster.saas_aurora_cluster.engine
engine_version = aws_rds_cluster.saas_aurora_cluster.engine_version
publicly_accessible = false
}
Production Note: Always deploy at least one Serverless v2 read replica in a different Availability Zone. In Aurora, read replicas share the underlying distributed storage volume. If the primary writer fails, Aurora promotes the reader to writer in seconds. If the replica's ACU limits are tied to the primary, it will pre-scale alongside the writer, ensuring no performance degradation upon failover.
Tackling Connection Storms with PgBouncer
Aurora Serverless v2 scales CPU and memory in milliseconds, but it cannot bend the rules of PostgreSQL architecture. PostgreSQL follows a process-per-connection model. Every new connection forks a new OS process, consuming roughly 10MB of memory just to establish the connection, regardless of whether a query is executing.
If your multi-tenant application scales out its application servers (e.g., Node.js or Go pods in EKS) during a traffic spike, thousands of simultaneous connection attempts will hit the database. Even if Aurora detects the load and scales up ACUs, the sheer volume of connections can exhaust the memory limit of the current ACU tier before the scale-up completes, triggering an immediate database restart due to OOM errors.
To survive connection storms on a serverless database, you must implement an external connection pooler like PgBouncer (or Amazon RDS Proxy) in transaction mode.
Below is a production-ready pgbouncer.ini configured for a multi-tenant environment. This limits the actual connections hitting Aurora to a safe maximum, while allowing thousands of logical client connections from your app servers:
[databases]
# Route tenant traffic to the Aurora cluster endpoint
saas_production = host=prod-multi-tenant-db.cluster-xxx.eu-west-1.rds.amazonaws.com port=5432 dbname=saas_production pool_size=150
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
# Transaction pooling is critical for serverless.
# It assigns a server connection to a client only during an active transaction.
pool_mode = transaction
# Allow the application to open up to 5000 logical connections
max_client_conn = 5000
# Keep the physical connections to Aurora manageable
default_pool_size = 100
max_db_connections = 200
# Drop aggressive idle connections to allow Aurora to scale down ACUs
server_idle_timeout = 60
By enforcing max_db_connections = 200, you guarantee that your Aurora cluster will never face more than 200 simultaneous physical connections. This prevents memory exhaustion while the ACUs dynamically scale to provide more CPU throughput for your active transactions.
The Migration Playbook: Provisioned RDS to Aurora Serverless v2
Migrating a terabyte-scale SaaS database from provisioned RDS PostgreSQL to Aurora PostgreSQL requires a minimal-downtime strategy. Because Aurora uses a fundamentally different storage engine (a purpose-built distributed log-structured system), you cannot simply change the instance type.
The safest approach leverages an Aurora Read Replica. AWS allows you to create an Aurora cluster that acts as a physical read replica of your existing provisioned RDS instance. Once replication catches up, you stop application writes, promote the Aurora cluster to be a standalone primary, and redirect your PgBouncer endpoints.
Here is the AWS CLI sequence to initiate the migration:
# Step 1: Create an Aurora Serverless v2 cluster as a replica of your RDS instance
aws rds create-db-cluster \
--db-cluster-identifier prod-aurora-migration \
--engine aurora-postgresql \
--engine-version 15.4 \
--replication-source-identifier arn:aws:rds:eu-west-1:123456789012:db:your-prod-rds-instance \
--serverless-v2-scaling-configuration MinCapacity=2,MaxCapacity=32
# Step 2: Add a serverless instance to the new cluster to handle replication processing
aws rds create-db-instance \
--db-instance-identifier prod-aurora-migration-instance \
--db-cluster-identifier prod-aurora-migration \
--engine aurora-postgresql \
--instance-class db.serverless
# Wait for replication lag to reach zero (monitor via CloudWatch).
# Step 3: During your maintenance window, stop writes to your app, then promote:
aws rds promote-read-replica-db-cluster \
--db-cluster-identifier prod-aurora-migration
When our DevOps team handles database migrations through our DevOps and cloud deployment services, we automate this cutover process. Once the promotion completes, we update the Terraform state to take management of the new cluster and safely decommission the legacy provisioned instance.
Cost-Benefit Analysis: When Does Serverless v2 Save Money?
Switching to Aurora Serverless v2 is an architectural upgrade, but it is also a financial decision that requires a rigorous cost-benefit analysis.
Currently, Serverless v2 costs approximately $0.12 per ACU hour in the us-east-1 region.
Compare this to a provisioned db.r6g.4xlarge (16 vCPU, 128 GiB RAM), which costs roughly $1.18 per hour (~$850/month).
If you provision Serverless v2 to run constantly at 16 ACUs (roughly equivalent memory), you will pay 16 * $0.12 = $1.92 per hour (~$1,380/month). If your workload never scales down, Serverless v2 is nearly 60% more expensive.
The savings only materialize if your database scales down significantly during off-peak hours. If your SaaS app runs at an average of 4 ACUs for 16 hours a day, and 16 ACUs for 8 hours a day, your blended cost is ((4 * 0.12 * 16) + (16 * 0.12 * 8)) * 30 = $691/month. In this scenario, you save roughly 20% on your AWS bill while gaining the ability to burst to much higher capacities (e.g., 32 ACUs) during unexpected spikes without manual intervention. Balancing these trade-offs is key to achieving predictable infrastructure pricing in the cloud.
To accurately model this before migrating, run this SQL query on your existing Postgres instance to profile your active session load. This helps you map your current usage to equivalent ACUs:
-- Profile active, idle, and waiting connections over time
SELECT
state,
wait_event_type,
count(*) as total_connections
FROM
pg_stat_activity
WHERE
pid <> pg_backend_pid()
AND datname = 'saas_production'
GROUP BY
state, wait_event_type
ORDER BY
total_connections DESC;
Coupling this internal Postgres connection profile with your AWS CPU metrics allows you to set the optimal min_capacity and max_capacity boundaries, preventing expensive over-scaling.
Monitoring and Fine-Tuning Scale Targets
Once you are in production on Aurora Serverless v2, your operational focus shifts from managing instances to monitoring the ACU scaling behavior.
If your minimum ACU is set too low, you might observe a drop in your BufferCacheHitRatio and elevated ReadIOPS. This happens because aggressive downscaling purges PostgreSQL's shared_buffers. When the next spike hits, the database has to read from disk instead of memory, causing latency spikes even as ACUs scale up.
You must set up CloudWatch alarms to track your ACU utilization metrics. Here is a Terraform snippet to alert your engineering team via SNS if your cluster consistently hits its maximum capacity, indicating that you need to raise the ceiling:
resource "aws_cloudwatch_metric_alarm" "acu_max_utilization" {
alarm_name = "aurora-serverless-max-capacity-reached"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "2"
metric_name = "ServerlessDatabaseCapacity"
namespace = "AWS/RDS"
period = "300"
statistic = "Maximum"
# Trigger if we hit our max_capacity of 32 ACUs
threshold = "32.0"
alarm_description = "Aurora cluster has maxed out its ACU limits. App may degrade."
dimensions = {
DBClusterIdentifier = aws_rds_cluster.saas_aurora_cluster.id
}
alarm_actions = [aws_sns_topic.db_alerts.arn]
}
Aurora Serverless v2 is a game-changer for B2B SaaS workloads. By eliminating the necessity to pay for peak capacity 24/7 and removing the operational burden of manual instance scaling, your engineering team can focus on shipping features rather than babysitting the database. The combination of PgBouncer for traffic control, well-calibrated ACU boundaries, and proactive CloudWatch monitoring ensures a resilient, cost-effective data tier.
Need help building this in production?
SoftwareCrafting is a full-stack dev agency — we ship fast, scalable React, Next.js, Node.js, React Native & Flutter apps for global clients.
Get a Free ConsultationFrequently Asked Questions
How do I know if my PostgreSQL workload will actually save money on Aurora Serverless v2?
Aurora Serverless v2 is cost-effective primarily for "spiky" workloads, such as B2B SaaS applications with distinct 9-to-5 traffic peaks. If your database runs at a consistently high CPU utilization (e.g., 80%) 24/7, migrating to Serverless v2 will likely increase your AWS bill. You should analyze your CloudWatch metrics to ensure there is a significant delta between peak and trough usage before migrating.
Does Aurora Serverless v2 suffer from cold starts or scaling downtime?
Unlike its v1 predecessor, Aurora Serverless v2 scales compute capacity in place in milliseconds. It does not suffer from cold starts or scaling pauses, making it ideal for handling sudden morning connection storms without dropping tenant requests or requiring a reboot.
Why shouldn't I set my minimum Aurora Capacity Units (ACUs) to 0.5 to maximize savings?
Setting your minimum ACU too low can lead to buffer pool thrashing when your database scales up under sudden, heavy read loads. Because PostgreSQL drops cached data from shared_buffers when scaling down memory, it is safer to set a baseline (like 2 ACUs) that keeps your hot data in memory overnight.
What if I need help migrating my production PostgreSQL database without disrupting tenants?
Migrating a high-traffic multi-tenant database requires careful planning around connection pooling, capacity baselines, and failover states. The cloud architecture team at SoftwareCrafting offers specialized DevOps services to execute your Aurora Serverless v2 migration safely, ensuring your traffic spikes are handled without downtime.
How do I ensure high availability when using Aurora Serverless v2?
To maintain high availability, you should deploy at least one Serverless v2 read replica in a different Availability Zone. Because Aurora read replicas share the underlying distributed storage volume with the primary instance, the replica can be promoted to the primary writer in seconds if a failure occurs.
Can SoftwareCrafting help optimize our Terraform configurations for Aurora Serverless?
Yes, SoftwareCrafting provides expert Infrastructure as Code consulting to ensure your Aurora Serverless v2 clusters are provisioned correctly. We can help you define the optimal minimum and maximum ACU limits in your Terraform scripts to maximize cost savings while preventing performance degradation.
📎 Full Code on GitHub Gist: The complete
commands-1.shfrom this post is available as a standalone GitHub Gist — copy, fork, or embed it directly.
