TL;DR: Replace the legacy Kubernetes Cluster Autoscaler with Karpenter to dynamically provision highly discounted AWS Spot Instances for memory-heavy Node.js microservices. By configuring multi-architecture NodePools and utilizing SQS for 2-minute Spot Interruption Warnings, you can achieve a 60% reduction in EKS compute costs with zero downtime.
⚡ Key Takeaways
- Query your cluster with
kubectl get nodes -l eks.amazonaws.com/capacityType=ON_DEMANDto identify wasted compute and poor bin-packing in legacy static ASGs. - Deploy Karpenter
v0.32.1via Helm and setsettings.interruptionQueueNameto monitor an SQS queue for AWS EventBridge Spot Interruption Warnings, granting a 2-minute window to safely drain pods. - Define a
NodePoolresource withkarpenter.sh/capacity-type: ["spot"]to force just-in-time provisioning of discounted Spot instances instead of On-Demand compute. - Maximize Spot capacity pool availability by allowing both
amd64andarm64architectures alongside a broad mix of instance families (e.g.,c5,m6g,t4g). - Enable automated cost-savings by setting
consolidationPolicy: WhenEmptyandconsolidateAfter: 5min your NodePool to aggressively terminate and bin-pack underutilized nodes.
If your AWS EKS bill is scaling linearly with your traffic, you are likely bleeding capital on idle compute. You have dozens of Node.js microservices deployed, but to handle traffic spikes and ensure high availability, your engineering team has consistently over-provisioned On-Demand EC2 instances.
The traditional Kubernetes Cluster Autoscaler (CAS) is failing you. It operates by adjusting the desired capacity of AWS Auto Scaling Groups (ASGs). Because ASGs are statically configured to specific instance families, CAS cannot dynamically provision the cheapest or best-fitting compute for your workloads. If a single Node.js pod requires 500m of CPU and the only available ASG spins up m5.xlarge instances, you are paying for an entire instance just to host a tiny fraction of a workload.
We fix this by removing the Cluster Autoscaler, bypassing ASGs, and introducing Karpenter alongside Spot Instances and Pod Disruption Budgets (PDB).
This architecture allows you to run production Node.js workloads on highly volatile, 70% discounted Spot instances with strictly zero downtime. Here is the exact technical blueprint we use when engineering high-performance AWS DevOps and Kubernetes deployment architectures for enterprise clients.
The Bottleneck of Legacy Cluster Autoscaling
The legacy Cluster Autoscaler evaluates pending pods and blindly increments your ASG size. It has zero awareness of real-time AWS EC2 pricing, Spot capacity pools, or optimal bin-packing strategies.
You can instantly identify how much waste exists in your current infrastructure by querying your rigid On-Demand EKS nodes and comparing their resource allocation to their age.
# Check the age and instance type of your On-Demand nodes
kubectl get nodes -l eks.amazonaws.com/capacityType=ON_DEMAND \
-o custom-columns=NAME:.metadata.name,INSTANCE:.metadata.labels."node\.kubernetes\.io/instance-type",AGE:.metadata.creationTimestamp
# View allocated resources versus total node capacity
kubectl describe nodes | grep -A 2 -e "^Allocated resources"
If your nodes have been running for weeks and are sitting at 30% memory utilization, you are throwing money away. Node.js microservices are naturally memory-heavy and single-threaded. When you horizontally scale them, you need tightly packed, dynamically provisioned compute—not static ASGs.
Deploying Karpenter for Just-in-Time Compute
Karpenter is an open-source, high-performance node provisioning project that observes unschedulable pods, directly evaluates the AWS EC2 Fleet API, and provisions the exact instance type needed—down to the specific CPU and Memory requirements—in milliseconds.
To install Karpenter, you first establish OIDC trust and create an IAM Role allowing Karpenter to make EC2 API calls. Once the IAM roles are mapped, deploy Karpenter via Helm:
# Upgrade or install Karpenter using the AWS OCI registry
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version v0.32.1 \
--namespace karpenter \
--create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
--set settings.clusterName=${CLUSTER_NAME} \
--set settings.interruptionQueueName=${CLUSTER_NAME} \
--wait
Production Note: The
interruptionQueueNamesetting is critical for running Spot instances safely. It instructs Karpenter to monitor an AWS SQS queue subscribed to AWS EventBridge rules for Spot Interruption Warnings. This gives your cluster a 2-minute head start to cordon the node and drain pods before AWS forcibly terminates the instance.
Architecting Spot-First NodePools
With Karpenter installed, we configure a NodePool (formerly Provisioner). A NodePool defines the constraints and rules for the compute Karpenter is allowed to purchase.
To slash costs by 60%, we configure a Spot-first NodePool. By providing a broad array of instance families (e.g., c5, m5, r5, and their ARM64 Graviton equivalents), Karpenter can query AWS for the deepest, cheapest Spot capacity pools available at any given second.
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-compute
spec:
template:
spec:
requirements:
# Heavily prefer Spot instances
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
# Allow multi-architecture flexibility for maximum availability
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
# Provide a massive list of acceptable instance families
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["c5", "c6g", "m5", "m6g", "r5", "r6g", "t3", "t4g"]
# Disrupt underutilized nodes to continuously consolidate workloads
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 5m
Supplying both amd64 and arm64 architectures drastically reduces the risk of an EC2 Spot stockout. If m5 Spot instances run out in us-east-1a, Karpenter instantly pivots and provisions m6g (Graviton) instances instead.
Tip: Ensure your Docker images are built using multi-architecture manifests (
docker buildx) so your Node.js apps can run natively on both Intel and Graviton processors without crashing.
Graceful Termination: Handling Spot Interruptions in Node.js
Because we are running on Spot instances, AWS will eventually reclaim our nodes. When this happens, AWS fires a 2-minute interruption notice. Karpenter intercepts this notice, cordons the node, and sends a SIGTERM signal to all running pods.
If your Node.js microservice ignores SIGTERM, Kubernetes will forcefully kill it after the termination grace period expires. This results in dropped database transactions and 502 Bad Gateway errors for your users.
Your Node.js Express server must intercept the signal, stop accepting new requests, finish processing in-flight requests, and cleanly close database pools.
import express from 'express';
import { dbPool } from './database.js';
const app = express();
const PORT = process.env.PORT || 3000;
app.get('/health', (req, res) => res.status(200).send('OK'));
const server = app.listen(PORT, () => {
console.log(`Node.js microservice listening on port ${PORT}`);
});
// Graceful shutdown sequence
const shutdown = async (signal) => {
console.log(`\n${signal} received. Initiating graceful shutdown...`);
server.close(async (err) => {
if (err) {
console.error('Error closing HTTP server:', err);
process.exit(1);
}
console.log('HTTP server closed. No longer accepting new connections.');
try {
await dbPool.end(); // Cleanly close PostgreSQL/MySQL connections
console.log('Database connections closed.');
process.exit(0);
} catch (dbErr) {
console.error('Error closing database connections:', dbErr);
process.exit(1);
}
});
};
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
For your Node.js application to successfully receive the SIGTERM signal, it must be running as PID 1 inside the container. If you start your app with npm start in your Dockerfile, npm will swallow the SIGTERM signal, and your app will abruptly die.
Always use a lightweight init system like dumb-init in your Dockerfile to properly forward OS signals:
FROM node:20-alpine
RUN apk add --no-cache dumb-init
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
# dumb-init forwards signals directly to the Node process
CMD ["dumb-init", "node", "server.js"]
Protecting Uptime with Pod Disruption Budgets
A graceful shutdown handles in-flight requests, but what if Karpenter provisions a massive r5.4xlarge Spot instance, and the Kubernetes scheduler packs all 10 replicas of your authentication service onto that single node?
If that node gets interrupted, your entire authentication layer goes offline simultaneously.
To prevent this, you must explicitly enforce a Pod Disruption Budget (PDB). A PDB guarantees that a minimum number (or percentage) of pods remains available during voluntary disruptions, such as Karpenter draining a node.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: auth-service-pdb
namespace: microservices
spec:
# Guarantee that at least 50% of replicas are always alive
minAvailable: 50%
selector:
matchLabels:
app: auth-service
When Karpenter attempts to drain a node following a Spot Interruption warning, the Kubernetes eviction API checks the PDB. If draining a pod violates the PDB (e.g., dropping available replicas below 50%), the API temporarily rejects the drain. This buys Karpenter the time needed to spin up a new Spot instance and migrate pods over systematically, ensuring zero downtime.
Enforcing High Availability with Topology Spread Constraints
To solve the underlying issue of all pods landing on a single node in the first place, we must force the Kubernetes scheduler to distribute pods across different nodes and Availability Zones.
This is achieved using Topology Spread Constraints. By embedding these rules into your Node.js Deployment manifest, you force Karpenter to provision multiple smaller Spot instances distributed across different zones, rather than one monolithic instance.
apiVersion: apps/v1
kind: Deployment
metadata:
name: auth-service
spec:
replicas: 6
selector:
matchLabels:
app: auth-service
template:
metadata:
labels:
app: auth-service
spec:
topologySpreadConstraints:
# Do not allow more than 1 pod difference across nodes
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: auth-service
# Spread evenly across Availability Zones
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: auth-service
containers:
- name: auth-node
image: my-registry/auth-service:v1.2.3
With this configuration, if us-east-1a loses its Spot capacity, your instances in us-east-1b and us-east-1c seamlessly absorb the traffic while Karpenter replaces the lost node in the background.
Right-Sizing Node.js Resource Requests
Karpenter's superpower is exact bin-packing. It aggregates the CPU and Memory requests of all pending pods to calculate the precise EC2 instance needed. If your developers omit resource requests or copy-paste arbitrary limits, Karpenter's logic breaks down, and you will end up provisioning the wrong instances.
Node.js is single-threaded. Assigning a CPU limit can lead to catastrophic CPU throttling during V8 garbage collection spikes. The best practice for FinOps in EKS is to set precise memory requests/limits and CPU requests, but omit CPU limits entirely.
resources:
requests:
# Tell Karpenter exactly what we need to schedule
cpu: "250m"
memory: "512Mi"
limits:
# Prevent memory leaks from taking down the node
memory: "1Gi"
# NO CPU LIMIT - prevents artificial CFS throttling
When you define exactly 250m CPU, Karpenter mathematically knows it can safely pack 15 of these Node.js microservices onto a single t3.xlarge instance (which has 4 vCPUs).
Measuring the FinOps Impact
When clients evaluate the cost to re-architect their cloud infrastructure, we point to the immediate, measurable ROI of this exact Karpenter and Spot instance stack.
By tearing down static ASGs and transitioning from On-Demand m5.large instances to aggressively bin-packed c6g and t4g Spot instances, you will typically observe a 60% drop in your EC2 compute spend within the first 72 hours.
You can monitor Karpenter's active consolidation via its native logs:
# Watch Karpenter actively terminate underutilized nodes to save money
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter | grep "consolidating"
You will see real-time logs where Karpenter identifies two nodes running at 30% utilization, provisions a single smaller Spot instance, migrates the pods, and terminates the two expensive nodes. This is continuous, automated FinOps in action.
Migrating to Spot instances on Kubernetes doesn't have to mean sacrificing reliability. By layering Karpenter's intelligent provisioning, Node.js graceful termination, and rigorous Pod Disruption Budgets, you can run enterprise-grade APIs at startup prices. If you want to stop burning runway on AWS compute and need engineers who have successfully deployed this architecture at scale, book a free architecture review with our backend and DevOps experts.
Work With Us
Need help building this in production? SoftwareCrafting is a full-stack dev agency — we ship React, Next.js, Node.js, React Native, and Flutter apps for global clients.
Frequently Asked Questions
What is the main advantage of migrating from a monolith to microservices?
Microservices allow individual components of an application to be scaled, deployed, and updated independently. This reduces the risk of system-wide failures and enables different teams to work on separate services simultaneously using the best technology stack for each task.
How do microservices communicate with each other securely?
Microservices typically communicate over network protocols like HTTP/REST or gRPC, often utilizing an API Gateway to route external requests. For enhanced internal security, developers implement mutual TLS (mTLS) and use service meshes to encrypt traffic and manage authentication between internal services.
How can SoftwareCrafting help my team transition to a microservices architecture?
SoftwareCrafting provides expert architectural consulting and hands-on engineering to guide your team through the complexities of decoupling a legacy monolith. Our specialists help design scalable service boundaries, implement robust CI/CD pipelines, and ensure a seamless migration with minimal downtime.
What are the biggest challenges when managing distributed data in microservices?
Maintaining data consistency across multiple services is challenging because each microservice should ideally manage its own isolated database. Developers must often rely on eventual consistency models, distributed transaction patterns like Saga, and event-driven architectures to keep data synchronized.
How do developers monitor and debug errors across multiple independent services?
Effective debugging in a distributed system requires implementing centralized logging and distributed tracing tools like Jaeger or OpenTelemetry. These tools inject unique correlation IDs into requests, allowing developers to track a single transaction as it flows through various microservices to pinpoint failures.
Does SoftwareCrafting offer ongoing support for managing containerized microservices?
Yes, SoftwareCrafting offers comprehensive DevOps and managed infrastructure services tailored to your specific distributed architecture. We assist with Kubernetes orchestration, automated performance monitoring, and continuous optimization to ensure your microservices remain highly available and cost-effective.
📎 Full Code on GitHub Gist: The complete
unresolved-template-variables.jsfrom this post is available as a standalone GitHub Gist — copy, fork, or embed it directly.
