Database Scaling Strategies for Startups

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Applause] [Music] most people call me hoots at Starbucks for some reason they decided to call me Steve I guess that's what you get for ordering pumpkin spice lattes so first of all what is fields great some people don't know what that is we're essentially an eyewear company that sells glasses that mitigate the negative effects of looking at your screen we essentially filter the blue light that comes from your devices and it makes it easier for you to work on your screen all day you don't get any blurry vision headaches sleep disturbances and they're actually really cool looking to wear so I'm going to talk to you today about why we decided to move to Aurora and kind of the story behind how that all happened the troubles that we went through to the lessons that we learned handling fail overs figuring out what type of instances you need and then finally going over and optimizing and figuring out what a better way is of making our infrastructure more reliable so we decided to go to Aurora because it was just by default much faster than my sequel and we started to scale up and we started to see our queries start to slow down so we looked at other solutions that would mitigate that effect we looked at going to no sequel but our data wasn't exactly very lookup oriented it was very structured so going to Aurora seemed like the next best option it also was very nice with data sizes you can pretty much provision any size you want and the database size increments by 10 gigabytes now my favorite feature right now is backtrack if you're doing testing is something you could just roll back the database by like X number of hours and go back to the state that it was originally in so you could run your tests then roll back and now you're back to where you were you don't have to go to a snapshot reload it and then wait another 40 minutes for that instance to spin up automatic snapshots our TS by default has that but it's also really cool zero downtime patching is also really cool the asterisk there is because it's not technically zero downtime patching as I learned it does go down for a few seconds depending on what it's doing it tries to find the window and where it can kind of figure out to patch the system and it kind of takes the system down for about two to three seconds reloads it up and then it kind of holds your queries while that is happening it's extremely cost effective especially if you're using reserved instances that's my main key it point is make sure you use reserved instances once you figure out the size you need because that saves you about 45% on your AWS costs two new things that just came out server list that's really cool if you're trying to save on money and if you have a application that doesn't need to run 24 hours a day our website needs to run like every second so I can't really use doesn't make much sense to service but if you're running an API or something that people hit like every 10 minutes it's probably makes sense to try your service configuration now parallel queries that's really cool too if you have something that you're doing a lot of data analysis for parallel crease can help you with that you could use you can do it for like long or low running queries you don't need to have like a redshift instance that way you know it makes it a lot easier for you to just spin up a read replica and do parallel queries so migrating to Aurora step one I'm gonna kind of go through and show you how easy it is to do this it only takes about six steps and it takes about five minutes normally except when I did this about a year ago this was when Amazon decided to change their user interface at the point of me doing my migration and I was doing this at 3:00 a.m. so when I got up to this step and I could create instance nothing happened so I waited about 10 minutes and I realized that as I press refresh nothing was still happening so I called up 80s support and something interesting happens when you call AWS support at 3:00 a.m. someone in Australia picks up and the dude was like oh good I'm 8 and I was like no it is not a good day I'm talking to you right now so we went through it and then he told me that I have to go through the terminal to actually do this migration and unlike most developers I know this is different for most people but I hate the terminal I'm probably one of the only people that uses github desktop and a lot of my employees give me a lot of slack for that so once you get through that you'll see that it'll start creating the instance for you and now we're wait it takes about 30 to 35 minutes for this to happen so you're gonna probably need to wait about 30 minutes but the good part about that is that once that's happening your master is still up so your application can still be running you don't have to worry about taking your application down and then worrying about you know a downtime or anything so the key factor here is that make sure the lag is zero you'll see that if you look up in log in the cloud watch area of your thing make sure that hits zero that's when you know your replica has matched the master in lag so it has the same amount of data and the same rows and queries step four you go and you select your master instance and now you can stop it so step five here you can take the sorry - take what I meant to say is once you go here stop it and then over here you can promote this to the master promote the read replica the button right over here and that will make it into a master now a few caveats to that is make sure you go into your application and change the end point to the aurora clusters writer endpoint not the individual instance of Aurora because Aurora has individual instances you don't want to use that endpoint because if there's a failover as I learned the hard way it will actually not change the endpoint for you so you want to use the writers the Aurora cluster endpoints and you could see that in the menus so handling fail overs the most important thing is make sure you have your instances and multiple availability zones and have at least one or two to read replicas in each zone now you can actually do multi regions so it's a good idea maybe to have another region if your budget allows that and test this by promoting one of your read replicas to master and see what happens make sure that you're ready for this failover and another thing I learned the hard way is that most drivers by default don't support the Aurora failover and the reason behind that is that the the value the CE name of the the database is still pointing to the master instance so what you have to do is if you're doing rails for example there is a patch that you can use that kind of goes through and checks the connections and then tries to figure out what to do next it'll give you about there will be about five seconds of downtime but you'll definitely be back up and running if you're using go I believe they fix the issue actually so if you're using that language you're all set no js' I'm not sure if they fixed it yet that's still an issue there as well so you want to figure out the instance size that you're going to use because that's kind of the best way of figuring out what everything is going to cost and what you're going to do so think about the number of connections your database has from your application most applications in production environments have a connection pool so figure out what your connection pool size is so for example if you have four application servers each with a connection pool size of 15 how many connections do you think you would have any answer is simple Matt 60 so here's the AWS formula for figuring out the instance size based on the number of connections you need just kidding they actually have a nice table at you could Google it takes about 30 minutes to Google and find that but it's there now you'll see that T twos only support a very limited number of connection and you can go into the parameter groups and change the max connection size if you really need to but I really don't recommend it these are actually really good numbers to support the load on the database and that brings me to my next point is always use our class instances in production they cost a little more but they definitely support it if your application starts the scale rapidly I was being very frugal let's say and decided to stick with t2 in production for a while and then what really happened is that really happens as new instances started to auto scale they started to fail and the reason I before I even got the CloudWatch alerts customer service started yelling and screaming that the site is down and that they can go home now so I was already I already had two heart attacks and a stroke by the time the cloud watch alerts came in so make sure you do that and what actually happened is our our application auto scale to ten ec2 instances and the connection pool of each instance was about ten t to only kind of doesn't support that as you saw earlier and the other problem was some of our pages actually took about twenty to thirty sequel queries to just load that is not a good way to run a production scale application it's something I wasn't really paying attention to I kind of saw that but it was just like alright we'll deal with it later until later came along and we had about two to three thousand people on our site at one point so finally performance bottlenecks you're going to need to use a tool New Relic is a really great tool they kind of go through and they show you some of your controllers and application areas where it's kind of lagging and you'll see for us it lags right now in the production controller now that's kind of this page where it kind of loads all the glasses and you'll see they have different colors so how can we fix that well you'll see that if you click into it there's a bunch of queries that happen that are really slow now we have two options if you know a query doesn't change you should probably cache it so if you're doing something like loading a page with the same number of products every time cache that query but if you're loading a card that someone is adding and removing things don't cache that query that's a bad idea so queries that change here's a little code example in rails you'll see that where cash each individual property and individual product value that way when the page loads it just gets cached and it's a quick query to our memcache servers it doesn't even hit our RDS instance the next step is if you can't cash something and you really need to simplify it here are some basic rules first of all every time I see a select star I just I get very nervous don't use select star always restrict to the number of columns that you really need and then use aware whenever possible to restrict the number of rows that are returns sometimes it's easier to get things done in multiple queries and you especially if you could cache one right if we can cache the cart for example if we can't catch the whole query we can't really do it but if we could cache the product section and then do a separate query then that kind of offloads the database load don't use temporary tables some people say you can use them with special caveats I just say don't do it it's a bad idea and then finally create indexes on the columns that you frequent the most it's really important that to make sure that things if queries that have like order by or any sort of complex queries that you're doing that you're constantly doing lookups make sure to index those columns other things to do that you could use to optimize your service is a bunch of other AWS services for I just talked about caching alas the cache is a really way is a really great way to do quick lookups and cache your SQL queries elasticsearch is if you have like a search bar on your application for example and you're doing a lot of searching have elastic search kind of take the information in your database and store it in elastic search it's a faster lookup and it also doesn't need to touch your database and finally if you're doing analytic stuff if you have a data team you want to look through queries that law that run for a long long period of time I would recommend using redshift and then obviously if you're storing images don't put them in your database that's just that's really not the best place to put them s3 and then store a link to s3 in your database so if you were to actually kind of put our infrastructure in a little simple model this is kind of how it would look like you have your ec2 instances elasticsearch does all the searching and that just grab stuff from one of the read replicas elastic cache sits in front of Aurora which grabs all the queries that are don't change over time and we store all of our images in s3 and serve them through cloud fronts so that's optimization that way and then redshift is for database and analytics another way you could optimize is kind of at the edge of your application if you are doing stuff on web web side of things or if you're doing an API there's cloud front and CloudFlare which is not to be mistaken with cloud front this is a third party service they are technically free for like the first I don't know how many usage but they don't cost that much to run in a production environment either I think it's about 100 or 200 hours a month and they kind of manage all the security optimizations and DNS for you it's really much easier to use than cloud front Claflin it's really cool too but there is a lot of overhead and managing and faring out what to do and what to use so after going through all that you'll see that our average response time from the beginning when we first started to look at all these database issues started to slowly trickle down and now we're at about 2 to 2.5 seconds so you can see there's a huge improvement as you you know start using other types of AWS services to offload your workload and figuring out what instance size you need so overall the conclusion is migrated to Aurora whenever possible always use your art class instances in production make sure your connections won't be limited I know this can be a little costly at first but it'll definitely save you a lot of heartache and then profile your application to file any bottlenecks and use AWS services to offload load on your database thank you so the special thanks to the AWS loft team for having me here the migration blog post is here and here's a 10% discount if anyone has any questions now would be a time so we use both we use memcache for the query side of things and we use Redis for job and background processing but it depends what you're doing memcache for example if you're doing something that's HIPAA compliant or you need to store information that's sensitive I would recommend Redis because that you can encrypt memcache you can't encrypt so but memcache definitely is quicker and easier to set up it's just like a few menu options where Redis is about like a little more especially if you're doing encryption and things like that any other questions you mean the ec2 instance for everybody so yeah usually we use so it's a combination of different things it's it varies it's based on the network load the network traffic that we get it's based on the number of hits per second and then there's like a third thing I don't remember exactly but we have a combination of things if any of those trigger then scale up the incidence this is did you guys think about using Beanstalk yeah we do use Beanstalk actually okay yeah it's it's it's a Beanstalk configure we're actually moving to containers soon if you're actually a startup and you're trying to start to do things I would recommend starting small if you're doing a very small application probably lightsail then graduate to elastic Beanstalk and then finally ECS I think that's the very natural progression of things cuz lightsail is like really simple to set up takes like two seconds eb is a little more complicated and ECS don't even get me started on the same topic what made you decide to move to containers from two containers a few things one is we need a little more configurability in our setup the second part is that EB kind of doesn't handle all the things we need in our current infrastructure it's hard to describe without going through our entire infrastructure and kind of telling you but we also want to kind of do builds and deploy easily without having to create EB config files that have to set up each instance every time a new instance deploys so if you start using elastic beanstalk you'll realize that to configure the instance because every time it auto scales and adds a new instance it's essentially rebuilding a new system for you so you have to tell it what to do what items it needs so it turns into a configuration nightmare so how do you sync the data between my sequel like or a cluster and redshift oh that's a great question so long story is there's no easy way of doing it there are my actually they just released a tool I think it's called AWS glue which kind of goes through and takes all your data puts it in s3 for you then loads it into redshift there are a bunch of third-party services that do this for you and cost the small fee but there's no like direct way of taking data from RDS and putting it to redshift that's why I was really looking into Aurora parallel queries if you look into that you'll see that they it's kind of like a redshift type thing where it lets you do a lot of the things that redshift is supposed to do for you how do I deal with security well I mean I encrypt everything so that's how I deal with it by d f-- but you don't have to encrypt everything by default but I highly recommend that and then make sure the connection from your application to RDS is encrypted and it uses the AWS shirts I just want to jump in here for a second actually if you need to get data from an RDS instance Aurora in the redshift there is an AWS service that you could use if it fits your circumstances the database migration service will actually pull data out of RDS instances and redshift is an actual target of the service or it's something new every day so we'll catch the poor results how to choose between elastic cache and the built-in cache of the well we chose elastic cache because it was the easiest thing to implement with rails so you could just kind of load a library and it kind of does everything for you you can use the you can use stored procedures and a bunch of things you if your database supports it but that takes a lot of time and effort EC is kind of like very easy to set up and easy to use also I'll get everyone back to drinks and food all right thanks everyone [Applause] [Music]

Info

Channel: Amazon Web Services

Views: 2,449

Rating: 4.4736843 out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, startup, felixgray, database, developer, cto, technology

Id: vn38PIClYj4

Channel Id: undefined

Length: 21min 15sec (1275 seconds)

Published: Wed May 15 2019