Part I of the saga. How we slashed 80% of our cloud costs - how we dropped serverless, migrated DBs and fully embraced k8s
Humble Beginnings: From $5 Droplets to $10k Billsh1
Startups are scrappy by nature. Ours was no different. A few years back we were in scaling mode: money in the bank, shipping features at breakneck speed, and still running everything on good old $5 DigitalOcean droplets.
Life was simple. Life was good.
Until it wasn’t.
Hitting the Ceilingh2
As traffic grew, our humble droplets started to choke. Our duck-taped docker swarm stack had many problems. Infrastructure became the bottleneck. With AWS credits in hand thanks to our wonderful VCs, the “obvious” choice was clear:
- No more database babysitting
- Serverless magic that “scales forever”
- Kubernetes for everything else
We moved to AWS! And for a while? It worked. Beautifully.
- 100 requests/sec hitting our API → ~8.6M per day
- 4M jobs daily running through Amazon MQ → later migrated to SQS
- RDS cluster quietly powering through 200–300 connections concurrently and 20GB RAM
Beautiful. Just beautiful. Well, until the first bill showed up.
The Day the Credits Ran Outh2
AWS credits: gone. No more safety net. We are big boys now.
Luckily, we were preparing for this day for some time now:
- Staging? Nuked.
- Logging? Self-hosted.
- MQ? Swapped for “cheaper” SQS.
- Databases? Moved to Aurora.
And yet, even after all that slash-and-burn, the invoice still mocked us: $10,000 every month.
That noose around our neck kept tightening as our runway shrank and our traffic grew.
The Temptation of Other Cloudsh2
The first thought was obvious: maybe AWS was the problem. I scoped out GCP, Azure… even flirted with smaller hyperscalers. On paper:
Pros (+)
- Slightly lower bills
- Kubernetes looked nicer (especially on GCP)
- Service parity… more or less
Cons (−)
- Serverless was weaker everywhere else
- The “almost the same” services were different enough to eat hours of engineering time
For a few percentage points in savings, it wasn’t worth the detour.
Maybe BigCloud just isn’t for us.
Following the Moneyh2
We broke down the monthly AWS bill:
- RDS (Aurora): ~40%
- Serverless: ~20%
- Kubernetes (ECR, volumes, etc): ~20%
- VPC (NAT/egress): ~15%
- Other (S3, SQS, etc): ~5%
The biggest line item was staring us in the face: databases.
So what if we just… moved that?
Dead Endsh2
We explored every angle:
- Planetscale? Nice, but missing critical features.
- Self-hosting MySQL? Madness. Setting up streaming backups alone are a nightmare.
- Other “managed MySQL” vendors? Almost as pricey as RDS, plus nasty gotchas.
It was a maze of dead ends.
Until I, half-jokingly, typed “DigitalOcean managed databases” into Google.
Back to the Starth2
And there it was.
Since the scrappy $5 droplet days, DO had quietly leveled up:
- Managed databases (MySQL, Redis, Postgres)
- Managed Kubernetes
- Even serverless (barebones, but still)
Suddenly the idea of running production on DO didn’t sound so ridiculous.
The Numbers That Made My Jaw Droph2
Running the math felt like stepping into an alternate universe:
- MySQL: ~$1k on DO vs ~$4k on AWS
- Traffic: basically free on DO vs ~$1.5k on AWS
- Droplets: comparable with ec2 costs (lower in some cases for the same performance)
- Serverless: cheaper, but not powerful enough for our workload
Just the database + traffic savings would cut our bill by 55%.
Fifty. Five. Percent.
For the first time in months - we had some hope. Gears were grinding. A plan was forming.
Serverless is weak on DO - definetely unusable for our types of workloads. But 55% cost drop is really enticing. Maybe we can replace AWS lambda completely?
Replacing AWS Lambdah2
This stumped me for a while. We were using AWS Lambda to process SQS messages. Lots of them. The instant scaling is a godsend. How could we replace this?
But we did notice some things during our time with AWS Lambda, partly because we used it in a weird way - I guess. We were using it to run Docker custom images of RoadRunner + PHP - to fully take advantage of the extra processes we could jam into the lambda as we scaled their RAM (thus adding more CPU cores). Definetely not a normal thing to do, but helped us process more messages with the same resources.
Here are the things we noticed with our setup:
- The lambda would process messages beautifully and after 15 minutes it would just stop. Mind you, not after 15 minutes of processing the same message. After 15 minutes from when it first starts up to process messages.
- Since we used custom lambda docker images. We always had cold-starts basically. Not a huge problem for us, but still annoying.
- We were starting to hit the concurrent lambdas limit of our account. We had to request increases.
- For some reason we couldn’t use UDP logging. I still to this day don’t know why we couldn’t send UDP logs to our logstash instance. TCP worked, UDP didn’t. The setup looked correct. Kinda slowed things down a bit.
Kind of annoying problems to have, but not big issues.
Could we maybe use the new kubernetes features of pod horizontal autoscaling + node scaling that were kinda wonky when we first moved to AWS?
Yes. We can move the lambda processing completely to kubernetes. More in part III.
What’s Nexth2
Of course, moving databases is no small feat. And replacing AWS Lambda? Even harder.
But the cost-savings changed the equation - we should do it.
This is Part I of our saga leaving AWS behind.
- Part II: the blood, sweat and caffeine of migrating from RDS Aurora to DO manages MySQL.
- Part III: how we replaced serverless without burning everything down.
Stay tuned.