Shopify's Architecture to handle 80K RPS Celebrity Sales • Simon Eskildsen • GOTO 2017

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] thank you Kevin my name is Simon as I said I work as an infrastructure lead a company called Shopify based out of Canada Shopify just as a way of introduction is a company that powers a fairly large amount of online commerce actually both here in Scandinavia but also in North America so if you bought something that's not been on Amazon there's a good probability that you may have been through a Shopify store and use some of the infrastructure that we're going to talk about today the talk today is going to be a fairly high level and I'm gonna be talking a lo of the architecture that we built at Shopify to support these very very large sales that we see from some of the largest or biggest American celebrities and some of the largest brands both in Europe and also in North America so we're gonna be talking about over especially the architecture that we've evolved over the past five years and we won't go super deep into any particular point but rather give a comprehensive overview of all the different components how they fit together and hopefully inspire something that I think can be leveraged in a lot of other companies especially SAS companies that can have a similar architecture to ours clickers having trouble so Shopify is a company that helps people sell both online and in many other places and about five years ago we faced a bit of a fork in the road we were starting to see these customers that were sort of online first when retailers went online about 7 8 10 years ago often they'd had a physical presence and then moved online but we were starting to see a lot more customers that started with an online presence and they also started to require new ways of selling with social media becoming very very big over the past the past decade and especially five years ago smartphones were out in everyone's hands and Instagram Facebook Twitter and so on had become quite large and dissin able new ways of selling online that people hadn't quite been able to do in the past what we call this a flash sale and yeah idea of a flash sale is that you announce a product sometimes minutes in advance sometimes days in advance sometimes hours in advance that you're going to release some kind of exclusive product at this day and time and as you can imagine that can drive an astonishing amount of traffic to one site in from just one minute to the other it's one minute before noon and everything is good but at noon you have hundreds of thousands of people coming in trying to buy some of the same products at the same time placing an enormous strain on the system you can imagine this from superball Kylie Jenner if some of you know her famous from for being famous and she's out of the Kardashian family selling lipstick and she can drive a very amount enormous amount of traffic such as good and Kanye can do that as well so about five years ago is not these very large Super Bowls and things like that that we're starting to use his way of selling but rather still customers that could drive a very enormous amount of traffic and we were faced with a fork in the road do we become a company that tries to support these sales or do we just tell them to go and find another platform a platform that actually didn't exist but it's a very hard thing to be profitable from takes thousands of engineering hours to be able to scale to these sales when you have an architected for it from day one we chose the path of trying to support these we chose to see these sales as a way to identify bottlenecks and it's a bit of a canary in a coal mine as to what it's going to break next the flash sales of today tell us something about what the traffic is going to look like one or two years from now this was what wrong direction this was such a powerful decision at the time and so important to our CEO who's very technical that he wrote an internal essay on why we support flash sales this is really important in our company philosophy that we want to as in the face of adversity become stronger and not weaker or just not use it as an opportunity to grow when something bad happens we want to come out stronger and he wrote an essay about why flash sales were such a case we have a very strong culture internally of trying to inject chaos into various parts of the organization and try to come out stronger as a result if you drop a piece of glass it breaks if you drop a rock it's indifferent but what can you drop that becomes stronger as a result you know if you go down to the gym you run on a treadmill you come back a couple days later and you can run further and faster how can you build software in that way and flash sales was such a thing for our organization it was a way for our infrastructure to become stronger just a couple before we get into the meat of the talk and with this preface of how and how important flash sales have been to the history of our application here's some of the numbers to keep in the back of your head for the scale that we're running at we support about half a million merchants paying merchants all over the world we processes almost six billion dollars in the last quarter we run up to 80,000 requests per second to our Ruby on Rails processes we do about 42 place deploys a day to this and we have over 2,000 employees a large fraction of those engineers deploying all the time the mental model that I wanted to incorporate for this talk is that we have about three different tiers that going to talk about we have the trafficked here with all the traffic coming in and which is responsible for getting your requests from your home network all the way to our infrastructure serving back to page that you're requesting we're gonna be talking about the application and the data it's here that work together to serve those pages and we're gonna be talking a bit about some of these arrows how do we failover between regions with zero downtime how do we shard our application and how do we balance those shards as some shops grow bigger so that Kanye and Kylie are not on the same same chart but rather spread out and how do we get from the traffic layer into your application layer and failover regions how does all of this stuff work together you should have a pretty good idea for how that works for us by the end of this talk we're going to be starting by talking about the traffic tear as I mentioned the traffic tear to me is the layer that is responsible for how to get the request from A to B in this case from your home network to our network we're going to be talking about how we do global routing because this is something that we face a lot of debate with internally we're going to be talking about a technology called open resti which is absolutely incredible but quite underused we're going to be talking about how we protect ourselves from bots doing sales how we serve cache hits from the load balancers instead of the application tier orders of magnitudes faster and how we throttle the checkouts doing very very large sales to understand how your request gets from the customer to our data centers or regions as we call them we need to understand a little bit about how the internet works this is a very high-level overview of how the internet works and how their routes propagate if you are Facebook or Google and you have one domain and not a bunch of customers pointing their dns to your domain then you can do a lot of really complex traffic engineering at the dns layer but when you have about a million domains from half a million different customers pointing to your IP you have very limited amount of control at your traffic level you only have the IPS that your customer that the customers are pointing us to you do can't control anything at the DNS level so we did something that hasn't had a lot of use in the industry for those of you who know about networks it's called BGP TCP any caste and I'll explain here how it works how does traffic get from the customer to our regions in with this TCP BGP anycast so what happens is that Shopify has eyepiece as you can see here we have this slash 24 on this block of 255 IPs and our regions will announce to the Internet that these IPS can be routed by these regions so if the ISP sees that IP it should be routed to the Shopify region so what Shopify will do is that we will connect our routers into our ayahs piece and say hey we own these IPS to our I Jason a is piece so we may have let's say two ISPs do we connect directly to from our regions and we tell them if you encounter these IP send them to us because that is Shopify traffic for some Shopify store those eyes those eyes piece then tell their adjacent neighbors that the ISPs that they're plugged into that they can route those IPs and they know how to route them and like that it's propagating through this entire rap you can think of this as a bit of a gossip algorithm where every neighbour or every our ISP is constantly telling its neighboring ISPs which IP as it knows how to route this entire propagation - you can imagine ISPs ranging from Asia to Europe to America to South America of DS IPS that entire internet broadcast takes about seven minutes which is either amazing or very slow depending on where you're coming from so now we have this connected web how does that customer then get their traffic to the region while they make a request right so in this case they're going to do a walrus tour and there it's pointing to the Shopify owned IP that is part of that block that IP block that the region's are announcing so it's going to our ISP and then it either needs to turn left or right depending on which region is closest to that ISP in this case we're going right at least from where I'm standing and the request is then routed to the region this works really really well for when you're pointing all these customer domains directly to IPs and you don't have too much control over it so now we know how the traffic gets to Shopify now I want to talk a little bit about what happens when that request enters our network we use the technology called open rusty open rusty is perhaps one of the best editions we've done to our technology stack in the past four years it is it enables you to script your load balancers your nginx load balancers with Lua you can do pretty much everything that you desire you can customize your load balancing algorithm you can redirect to other data centers you can change the headers you can inspect cookies you can serve different content based on requests and cookies you can do external networks you can do all of this stuff all in the comfort of inside of nginx which is extremely fast and extremely stable I think open resti is quite underrated and I don't hear quite enough talk about it because it is so stable so fast and so scalable I want to talk about a couple of modules that we built with this it really has become our hammer and nail to solve a lot of traffic related issues and we've gotten away with waiting off a lot of D doses and doses just from simple Liuba modules there's really powerful to be able to script your low balancing tier and here's just a very quick example of what it looks like this is a very bare-bones nginx config that is just listening on a port and in this case serving some content by Lua and you can do all kinds of things here right you can do customized low balancing algorithms like you can imagine doing Network IO in this block you could do pretty much anything so one of the modules I want to explain is that we have this big problem of bots there is surprisingly as we learned there is a very large secondary market for sneakers so there's a lot of people out there sneaker heads and they collect sneakers and when to get these sneakers you need to be there when the sale happens and you need to be really lucky because there's limited inventory maybe there is you know two hundred thousand people competing for 5,000 pairs of shoes when that happens there is an economy here where you can buy these shoes and if you buy them you don't wear them you sell them on eBay for maybe two times three times four times the price of course merchants don't want that they don't want secondhand market charging more than they were and not controlling the customer experience so they were asking us to come up with a way to detect these bots and banned em so we took out our hammer and nail open rusty and built some modules helped us solve this problem the CAF Galaga module is one that we've had for a long time every single request that comes in to Shopify is locked into Kafka a distributed message bus and all the messages there are relayed into our data warehouse but we can also consume these messages in real time and make decisions about them so you can imagine that there's a Catholic luster here and for a checkout we log that check out into Kafka then there's a stream a stream aggregator and this is the baud Squasher it listens on Kafka and tries to in all of these different events see whether there's a suspicious pattern is there something that doesn't look like a human is it refreshing a page very very rapidly in a way that a human in a Chrome browser would never do or is there something else at stake this is fairly sophisticated algorithm to try to figure out what is going on it is not an open or SD module it's just a simple consumer I think written in go in this case and when it finds something an IP or user rating or something else that looks like a bot it can tell the rule banner which is written in engine accents just a simple DSL that allows you to ban-ban various patterns very simple patterns like user agents eyepiece and the bot cog washer can then based on the assumptions made from the CAFTA stream ban an IP or something else with the rule banner on all the low bouncers now we can reject these BOTS in few microseconds and people can write very sophisticated algorithms that the bot squashing layer to ban these BOTS very rapidly when we deployed this we saw a fairly sizable decrease in the amount requests coming to Shopify and we have to turn it off again because we're so flustered like this was way too many but turns out that a very very large fraction of our traffic was actually just bought another thing we do with open rest is that we serve cache hits so we can if you go and browse a Shopify product there's probably someone else maybe browsing the same page at the same product later so we can cache that if nothing changes we can go to the application tier and serve that cache hit which is stored in memcache but we can do this orders of magnitude faster if we do that the nginx layer layer just because of the overhead of the Ruby VM and a whole stack that we have there which is very difficult to optimize at this point so you can imagine a request coming in it's getting some collection of full filled with walrus products and you have a cache miss you get to the web application tier but when you are fulfilling that request getting all the data from the data stores and rendering requests back to the user we're also filling the recruitment filling the request into the cache to someone else conservative really really simple stuff nothing novel but on the next hit what can happen is that each cache which is an open or SD module will check the cache directly at the low bouncing tier this can this is fairly simple right so now we can get it directly from the cache and not even touching the application tier making is able to handle orders of magnitude more requests per second with a fairly small addition to the stack this is more complex than it looks at least for our stack because generating and cache keys and things like that has turned out to be quite complex but for smaller applications this might be quite simple we don't have this turned on for everyone but we do have this turned on for some larger merchants and the reason why this is hard for us to turn on is that people can on Shopify can customize their themes completely and getting out to play very well with caching has turned out to be more difficult than we thought the last one I want to talk about is the checkout throttle some merchants come in and can drive so much traffic that we can't handle the amount of Rights that they can do on a checkout and you can't cash a write so you need to do something at some point there's so many rides going into the database for Eddy adding the checkout or running scripts that modify the cart all these kinds of very complex are operations that happen when you're checking out getting shipping rates processing payments that we have to throttle it very few merchants ever hit this I would say probably less than 10 or 20 unique merchants in the lifetime of Shopify have ever hit this throttle but when it does happen it's very important that all the shops that are on the same chart as you do not suffer under your sale so what we do is that when a request comes in for a very busy store to the checkout we will put that request into a queue again happening at the low balancing tier now the while you're in the queue you're redirected to a wait area which is completely customizable for the Virgin again we strive as hard as possible to not put people in this at all but we need to have a safeguard in case it does happen which control the rate of which we leak from the queue with a throttle and then when the throttle is done we will redirect you back to the checkout now we've talked a little bit about the traffic tear some of the opener rst modulus we do and how traffic gets all the way to Shopify now I want to dig a bit into how does our application layer work and how does our data tier work I'm gonna talk about both of them at the same time because they're quite intermingle a very important concept for us is what we call a pod a cut some companies make hole there's a shard but a shard to us means a my sequel I'd like a relational Sharpe iPod to us encompasses more a part to us is a small Shopify small isolated Shopify that can run anywhere in the world can run in a cloud it can run in the data center can run anywhere so a pod is an isolated unit of Shopify that would one or more shops you can look at it like this if we have let's say pod 7 part 2 and part 14 they all have a bunch of shops on them these shops don't talk to each other the pods don't talk to each other they're completely sharted and isolated from each other giving us some very nice scaling capabilities this is a fundamental isolate isolation principle that we call the shop isolation principle all shops must be isolated from each other if you want to do something cross shop you have to use the data warehouse or some other mechanism because doing things across hundreds of shops hundreds of pods in multiple region is not something that's going to scale at least to do in real time so another thing I want you to note here is that these pots are not at the same size some have more shops than others and we're gonna address that problem later but first I want to look at what's in a pod so a pod is one or more stores what does that mean well it means that all the stateful layers of that pod are isolated so they if they're a pod has a my sequel it has a memcache instance it has writers instances and it has cron workers that are doing periodic work so all the stateful data of all of these shops is shared between or is isolated from all the other pods in all the other shops now there is a tier that is well that is shared and those are the workers so when a web request is handled or a job is handle it's shared between the pods now why is that why would that tier be shared the reason is because of the sales remember the profits of this is that we have to architect for these massive sales and pod too is experiencing a massive sale some kind of launch then they may need to be able to get perhaps sixty to seventy percent of all the worker capacity to handle that sale so the stateless stuff is shared and the stateful stuff is isolated by pod now why wouldn't you have an auto scaler for this this sounds like the canonical way it's a thing to solve with a cloud autoscaler right so when you have a sale you just start spending up more instances the problem is that spinning up those instances is going to take tens of seconds and during that time you're going to be serving a lot of errors and have a very bad experience for all the customers coming in so we don't do that and maybe one day we can maybe we can do some really crazy things so soda we can spin up these workers fast enough with an auto scaler but today it's not the case another stainless thing that is shared is the traffic here that we talked about before where all the open rusty modules are running and the low bouncing is happening another tool that I just want to introduce while we're here to test the stay list here is that we have a load testing tool called Genghis what Genghis does is that it runs Lua scripts that define a flow through our application and allow us to test with an astonishing amount of load using cloud instances what happens when we send for example 10,000 people per minute through the checkout it can you can customize this as much as you want with Lua scripts that allow you to say things like oh this customer is gonna go to the product page you're gonna browse around a bit they're gonna be a little bit indecisive then we're gonna add some products remove some products from the cart checkout the entire thing and pay for it and then you multiply that with maybe 10,000 6000 the interface to this is really simple we have a slack bot which just responds when you tell it to in queue some Lua script the JSON file here that you were in queueing describes with Lua script to use which store to target how long to do it and at what rate how many of these executions do you want per minute so remember that problem from before of all of these pods being of different sizes this year that's a problem that we've been trying to address in the past what we've done is that when we needed to provision a new pod because we've had too many shops we replicated everything in that old pod that we wanted to split in half over to a new pod den for a short duration of time we would change the routing layer to have half the shops go to the new pod and the other half to the old pod and then start cleaning up the old pod it was a bit of a mess and a very operational manual amount of work I think the checklist for doing this was like a hundred items so we wanted to bake make this better and less risky and have less like big hand wavy movements to move all of these shops between shops with less downtime so we wanted to be really really good at moving shops between parts and we wanted to see if we could do this with the minimal then the smallest amount of downtime possible we shouldn't punish large merchants such as a large Shopify store that may take hours to move and give them hours of downtime even if they can schedule it that's a terrible experience for a merchant that maybe has an opportunity cause there of perhaps several million dollars that they're losing in that transition so what the pod balancer does is that it looks at these pods and it looks at the shops and then the pod balancer will make some decisions about how to balance these pod so you can imagine in this case it sees that part two is a bit bigger it moves from shelf to pod 7 but in interim pod 14 grew one of these shops is occupying three units three shop units this could be a shop that has large sales it could be a shop that is very large maybe this is a shop that is replac forming from another commerce solution and has grown tremendously in size some of these jobs are bigger than others just in terms of size scale sales whatever they do and the pod balancer just continues to work right so it moves some other shops from pod 14 over to some of the other pods but the complexity increases because you also have sign ups you have new shops coming in and you have other shops growing and the pod balancer just keeps working at it the pod balancer keeps trying to to fill up all these positive equal size before we had this we would have some pods that were maybe twice as big as others giving you a bit of a noisy neighborhood problem where on some of these shards they were you were experiencing more incidents than before so what happens now when they're all filled up well you create a new pod and we're working this year on making that really easy and the pod balancers simply making a decision to create a new pod and then moving the shops over wanted to be something that is always on and something that we never have to touch it just creates pods continuously this is not something that is perfect yet and there's still a little bit of or still some manual labor involved here but this is what we're going and it's going quite well so far I want to just talk a little bit about how we move those shops from sharp - sharp how can we do that with minimal downtown how can we not punish these shops that are very large so if we just look at two of the data stores here like red isn't my sequel we can imagine if we want to move a shop from pod 9 to pod 23 they're very naive way of doing that is that you copied a shop by going through every single table and you find all the records that belong to the shop with a simple query like this give me all the orders where the shop ID is guess give me all the products where the shop idea.this put it into the target shard and while that is happening you don't accept any rights because the problem is that if you have a shop and while you're copying that shop it's inserting a check up but you've just copied to checkout table that checkout is going to be lost so you need to basically lock the shop while you're copying it for a small shop that's not a problem that may just take a minute maybe less than a minute maybe a couple seconds but for a large shop this can take a very long time if you don't allow any rights while you're copying it what we're working on now is not doing this lock move on lock but instead we look at the my sequel bin log and stream these events as we're copying the shop to avoid locking the shop at having downtime while we're copying it the bin log is what is in my sequel vocabulary where every modification to the database is logged so if you change a row insert a new check out it is registered in the bin log as events you know insert a new check out for this shop insert a new product for this shop so what the pod balancer is do is going to do and we're starting to work on now is that the pod bouncer is going to stream the bin log and look at new events that are coming in that haven't been copied at the bottom so the bottom tries to copy Alden all the old beta and the bin log tailor part component of the pod balancer is replicating new events so old at the bottom new at the top this means that we don't have to lock the shop for an extended duration of time now when that's done and we can now imagine this is discreet moment in time where the shop is completely replicated for now we need to move it to the new pod by updating the routing information about it so first we locked a shop no rights can come in we interrupt all jobs all web requests that's going on for that shop for a brief couple of seconds then we go into a routing data store the data store that is dead is storing that the mapping from shop to pod and from pot to data center and we update the shops pot ID in there immediately this is reflected and things start going to the new pod we unlocked a shop and the shop has been moved then we remove it from the my sequel layer we move the jobs and other things that are in Redis over as well which are asynchronous and now the shop has been completely moved you multiply this by running a thousand of these in parallel and suddenly you can move a lot of shops to a lot of pods very very rapidly and you also add on top of this making really good decisions about which shops you're moving at what time to keep all those shelves balanced so now we've talked a bit about traffic we've talked a bit about how the application tier works with these pods now I want to talk talk briefly about how we connect these two how does the traffic layer know how to route the request to the correct pods for this we have a very well may named component called sorting hat or sorting hat does is that it looks at requests that are coming in to the Shopify network and tries to figure out which pod that request belongs to it needs to go to the region where that pod is active and it needs to work with things like failovers and shop moves and all of these different components this is a very very core part of the potting architecture and one of the first things that we worked on so again we have the traffic layer and we have another open resti lua module called sorting hat what sorting hat does is that it looks at these it knows which pods are active and which parts are inactive so in a region a pod is active or inactive there's always a hot replica pair so that pot two can become active in region a if there's a cat a catastrophe or we need to do some maintenance in that region and of course the other way around too so they're always set up in pairs then we have this routing database from before that we use when we move shops around sorting hat is the consumer of that is the only thing that queries it and asks about where shops are living so if you have a customer coming in getting their products for sneaker shop comm sorting hat is going to ask the routing layer to route sneaker shop comm sticker shop comm is gonna return back a shop ID shop 238 it's on pot 2 and we know that it is active in region B so sorting had been routes the traffic there this is very simple it's I think a couple hundred lines of Lua querying this data store and doing this thousands and thousands of times a second this is what all what ties this whole part potting architecture together and makes the traffic layer marry the data and application layer the last thing I want to talk about is how we fill these regions over right we have the active pod and we have the inactive pod how do we flip them over and how do we do that with minimal disruption for this we have the pod mover the pod mover moves parts with between regions with minimal downtime we're now at a point where we're getting very very close to having almost zero downtime regional failures this is really important for us because we think this is such a core thing that we want to be able to do all the time if someone wants to do maintenance in a cloud region with without any repercussions they should be able to move all the pods to another region due to your maintenance and move them back without any risk this avoids this problem of having very expensive staging production environments that emulate an entire data center when instead you can just get really really good at moving parts between data centers so what the pod mover does is that if region B is having some kind of incident it may be out of power power we may be doing maintenance it may be subject to a hurricane it may be subject to flooding whatever happens we need to be able to move that region over this is the pod movers job and it can do this very very rapidly the way the podmobile works is that it goes through a series of steps to failover region the first thing it does is disable cron in both regions so we're not doing new periodic jobs the second thing it does is that it goes back to that routing data store the data store that keeps all the information between shop to pod to region mappings and where the shop balancer or the pod balancer was looking and modifying the shop to Padma the pot mover is responsible for the pot to region mapping so it updates it what happens now it's sorting hat that thing that was looking at requests and routing them to the appropriate regions routes the request to the new target region we then failover my sequel to the target region that little animation that was before we didn't enable cron and we transfer the jobs that have not yet been executed in the source that's in the source to the target to then be executed later it's fairly simple at a high level of course there's a lot the devil is in the detail here but this is at a high level what happens when we do a pod move so what about the errors right I mentioned these I mentioned that we wanted to minimize the errors when you when sorting hat starts to route to that in active region the database is not yet rewritable there if someone is doing a check out they're gonna have that dreaded experience that everyone in this room has probably tried your checking out and you get an error back when you've just entered your credit card details and now you have to call someone and ask what's going on we don't want that scenario so we sat down and thought about how we could fix this again we took out our hammer and nail open resti and solve this problem if you have my CSI at my sequel replicant to data centers they can't both be writable at the same time one will be writable one would be readable there will be a couple seconds of downtime where people have these terrible experiences but what if you can pause the request that the trafficked here if you can pause the request of all they can't be served in the new region and you know that they're going to do a right you can do basically zero downtime failovers so the Pasir module will pause request in the middle of failovers to avoid serving errors so a customer comes in they're trying to create a new check out doing a failover doing those critical seconds where the database is not yet writable in the new region so we add them into a queue we we drain that queue from a throttle to avoid too many people coming in at the same time and the throttle is of course disabled while we're failing over the region and my Seco has not fell over yet then the posture will finally give you an HTTP 5 a 200 response later so this instead of this giving you dist dreaded err page when you've entered in your credit card details you're just waiting five seconds this works now and it's in production for us so this changes the pot mover flow a little bit all this stuff is the same but sorting had when it rounds requested a target region it inspects the request and makes a good judgment on whether or not this is a request that would perform it right if it will it will hold the request back at the low bouncing tier then it will fail over to my Seco region resume the request enable cron and then transfer to jobs really really simple and this has gotten us down to very very few errors doing failures we want to be so confident these failures that people can just issue a command in slack and fail over a region we already have this working but we're not quite confident enough yet to do it in production pods but it will be just weeks before we're there you type in a command and then you fail over a pod you don't have to care about the customer disruption because there should be none and we work as hard as possible to make that true this is a core primitive of the pod architecture if there's an incident if there's an incident someone should just be able to come in middle of the night failover all the pods and go back to bed maybe some day this will be automated but that dis is still a tremendous step from where we were years ago just the last thing I want to note no dolphin is just another one of these secondary benefits of second-order benefits of this pods architecture right now we're starting to experimenting more with the cloud and with this architecture it's just a matter of putting a couple of clouds in or a couple pods in the cloud of course there's a lot of work going on to get all these stateful tears and the stateless tears in the cloud but when we have it there we can just gradually move shops there without disrupting the rest of the platform so we have a little gas pedal we're slowly we move shops to the cloud and as we gain that operational expertise with the cloud we can move all of Shopify over there this really March the end of my talk I hope you have a pretty good idea now of how Shopify works at a high level and maybe some of this you can take home and and and think about maybe some of this will be applicable and especially if you're a SAS company a lot of this should be applicable I've talked about all these components the pods the pod bouncer the pod mover and sorting hat thank you so much really cool talk Thank You Simon hey there's also some really good questions and one of them is they understand how you move parts but how do you move them is the region is down unreachable yeah quick question so okay let me go back to this slide because it's quite it's quite close here so the question is what happens when when region B here is on fire that region is unreachable how do you fail it over well there's certain things you can't do but you can do checks right so the problem one of the problems would be well the database is not completely caught up in the target - in the new region so you have to make a judgment call right at this point that MIT did process is still manual and it will tell you hey you're going to lose this much data if you perform this failover are you sure you want to do it so the pod mover is as resilient as possible best case it will do a pod move with all the jobs all the data moved but indicates where it can't it will ask you questions it will ask you are you okay with losing this much data are you okay with not transferring jobs and it will still perform the best effort thing that we can do cool there's a one there's a question about like it sounds like the pod more a respond in multiple instances how do you avoid rake race condition between them and how to avoid smaller shells being moved all the time yeah I think the slide is appropriate to show that so when we talked about the pod move herb initially what we wanted to do is well okay let's design it so that when you update the routing information there's a bunch of subscribers one that's responsible for jobs one that's responsible for stopping cron once it's responsible for failing over one miss responsible for sorting hat and then they all subscribe to just like kind of pop subsystem but distributive carbonate coordination is very difficult so we didn't want to do it it's too complex and the ROI is just not high enough so this is literally one script that is running on a machine somewhere right and which machine it will run on is of course resilient to which readings are down and things like that but it's one machine doing a bunch of sequential steps that is resilient and it's very very simple sounds like how you would avoid that it's a question in this another question here is the customization of engine egg a task for feature de Ville of us or operation the Indian X task the murse T oh yeah okay so we have it's primarily been a place where infrastructure engineers have made modifications the opener st layer but we have had some application engineers and do various modifications for example to redirect fractions of traffic and do tests and things like that so I would say it's probably like 90% infrastructure engineering at this tier and 10% product and product engineering there's no question about like how do you implement molten tendency within a part yeah okay this is almost a talk in itself but sharding is essentially we shard by the key so the shop ID and every time you do a query you were always in the context of the shop so you can imagine that if you're doing a wipe request or a job you can only curry data that is belonging to that shop because every query is appended with the shop ID so it does like select all from orders where shop ideas desk or select product where ID is this and shop ID is this so they're all in they're all in the same schema we don't run hundreds or thousands of schemas because that's not something that that's not a very normal workload for my sequel and we want the workload to try to be as normalized as possible that's the very high level overview want one other thing I'll note is that we do this charting at the application level we don't have a proxy or something like that when we implemented sharding that's not something we were comfortable with so it's application level sharding sharted that the shop ID key schema is shared between the shops on the pod and every single table has a shop ID key embedded in it so that we can do these queries efficiently really cool like I think one of the two last questions M just there's something about really a question about it's a cafe is it something that would replace my school pin lock for moving shops I'm not entirely sure how to interpret that question but I'll give it my best if the question is whether we could put the entire my single bin log into CAF can use that instead the answer is yes we currently don't relate the bin log into into not sure not sure exactly why it's not something that we really found it use for tailing the bin log works really really well for us right now moving to Kafka maybe that's something that we would do if if that's if there were other use cases for it but right now I don't really see that like the ROI on using Kafka but have been longer that's very high and I think the last question somebody asked are you looking into kubernetes yes yes like those those little cloud regions there on the right that's criminales series in production yes okay okay I think there was a really a gift salmon hand and remember to write in in the app a few fingers good and also check out Simon's blog he has a really cool blog where he writes about of this and cookie his name and you will find it it's really good but give me a hand thank you [Applause]

Info

Channel: GOTO Conferences

Views: 20,526

Rating: 4.9407406 out of 5

Keywords: GOTO, GOTOcon, GOTO Conference, GOTO (Software Conference), Videos for Developers, Computer Science, GOTOcph, GOTO Copenhagen, Shopify, Simon Eskildsen, Cloud Complex, Cloud, Architecture, infrastructure

Id: N8NWDHgWA28

Channel Id: undefined

Length: 40min 18sec (2418 seconds)

Published: Wed Oct 18 2017