Hi, here's how the story goes.
We started transforming a monolith, single-machine, e-commerce application (Apache/PHP) to cloud infrastructure. Obviously, the application and the database (MySQL) were on the same machine.
We decided to move to AWS. And as the first step of transformation, we decided to split the database and application. Hosting application on a c4.xlarge machine. And hosting database to RDS Aurora MySQL on a db.r5.large machine, with default options.
This setup performed well. Especially the database performance went up high.
Unfortunately, when the traffic spiked up, we started experiencing long response times. Looked like RDS, although being really fast for executing queries, wasn't returning results fast enough over the network to the Amazon EC2 machine.
So that was our conclusion after an in-depth analysis of the setup including Apache/MySQL/PHP tuning parameters. The delayed response time was definitely due to the network latency between EC2 and RDS/Aurora machine, both machines being in the same region.
Before adding additional resources (ex: ElastiCache etc) we'd first like to look into any default configuration we can play around to solve this problem.
What do you think we missed there?
If you are using the Aurora Serverless option and not enough initial compute capacity, it could be in a position where it is always scaling up and down and causing you latency issues. I dealt with a client and our solution was to move away from serverless to an EC2 based implementation with fixed resources adequate enough to handle the load needed.
I've handled absolutely incredible burst traffic with RDS/EC2. I have two questions:
- Have you enabled the RDS slow and index-less query logs to spot problematic queries in your design?
2.1 Are your RDS and EC2 instances in the same availability zone?
2.2 In the same VPC?
If you're convinced it's network related points 2.1 and 2.2 here are crucial, for optimal performance you MUST have your ec2 instance and your RDS instance in the same VPC connecting over internal connections.
Hi Dleblanc. Yes, they're in the same VPC. But internal connection is something I don't know. Do you mean AWS PrivateLink? RDS security group is already configured to block outside traffic except the EC2. The RDS endpoint I am using is the one provided in RDS console. That DNS based hostname. That's why I think connection isn't private and local. Can you please suggest me more to read about the internal connection?
Thanks Alexandre. Yes for all these points. This difference in latency is more prominent in small queries when run in a big number. I know I can optimize for a better alternative query approach, but question here is the network latency.