Andrew Godwin - Scaling Django With Distributed Systems

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is a talk about scaling Django but in more in general about scaling any kind of web-based system but first of all Who am I I'm Andrew Godwin I am a django core developer I have been for too long now one might say I also work at Eventbrite doing sort of ticketing and back-end work there and I'm perhaps most well known for Django's migration systems and now channels but what I'm here to talk about is distributed systems and firstly I want to cover why they even exist why do we have systems that are distributed why not have synchronous systems that work purely in lockstep we have none of these problems well it starts with the speed of light of course this for those who aren't aware is how fast light travels and it is against the laws of physics for any data to travel faster than this now in reality electronic signals in your computer travel about 2/3 this speed so around 200 million meters per second now this is related because the way computers work is in clock cycles what happens inside your computer is that every clock cycle the logic propagates one step and so if you're cycling at five megahertz that's five million times a second that means that in that time a signal can make it approximately sixty meters which is quite far I know that as long as this hall here is but modern CPUs are at three gigahertz and so you have this problem where you only have about 10 centimeters and that's the literal limit of the universe to send data between these clock ticks and so in reality this combined with our engineering means that we just can't write systems that run quickly that are synchronous we have to have distributed stuff even inside your computer you have distributed components multi-core processors are literally this they are processors that are you know two or four or eight cores on the same die and they runs often independently with different clock speeds and so they already have a mini distributed system they're passing messages to each other and the key thing is that these systems are made of independent components what that means is you can reason about each part of that system independently and treat it as its own thing and test it and try it out as its own thing now you might ask well why isn't everything distributed and the answer is they're not faster they're slower and more difficult to write if any of you have tried to write threaded code or networking code or similar you will understand that it is very difficult to write that kind of code because the complexities involved it'd be much easier for us as programmers if we just wrote simple synchronous code everywhere so then why do we write them at all well the answer is they can be scaled up much further because we can take distributed systems and take them beyond that ten centimeter limit we can take them to a whole data center or a whole country or a whole globe and that's kind of what I want to cover here in a small way but first of all we have to talk about trade-offs so one of the things you get with distributed systems is that there is never a perfect solution there is no possible way you can have all the things you want and no downsides if any software claims to give you a perfect solution don't trust them they are lying to you so my favorite example this is the the product triangle fast cheap and good you can have any two if you have all three it's impossible and this is from project management's often if you're trying to write software like well you can have it fast and cheap but it won't be very good or you can have it good and it's very cheap but it somehow very slowly and this sort of common concept is there in a lot of trade-offs and programming it's not always three items like this but it's often that same kind of well we've loved all of the things but you can't have one of them so you need to choose to drop them and so often in a startup you will drop the good because you'd have any money you don't have any time so you end up writing very fast software that's quite cheap and often not great and then you spend the rest of your life paying for the not great part and then companies often well they want it they want it quite cheaply once it well so it happens very slowly so how does this all relate into Django well Django is a pretty standard is the wrong word cuz it's kind of the almost the example of Python web frameworks in a way and a lot of people know how to basically scale Django if you don't here's a quick idea of how to sort of scale Django so Django starts off with WSGI workers they are Python processes safe on a machine they use USD or G unicorn or mod whiskey or anything you like and the first step is you have one box and you go oh we have too much traffic we will have multiple boxes and we will put a load balancer in front of them for this please use H a proxy it is highly recommended and this is it's simple enough and the reason we can do this is that HTTP in its design is shared-nothing every request you get over HTTP contains all the information you need to serve it cookies are there the host header is there to all the requests body is there and so you don't need to share anything between these servers so what happens when you do let's take for example having a cache so in this situation we have a shared cache say there's a really expensive result we want to compute well how do we take that cash and scale it like it's fine for three Eve all we have 300 workers because we can add hundreds of workers and not have a problem H a proxy will happily deal with that kind of thing but the cache just won't scale and the key thing is a cache is actually also shared nothing you can take the cash and you can split it to be one per machine or one per group of machines and it will still work pretty much the same as long as your requests are distributed evenly you'll end up with about as good caching performance as you'd want because if a real slow request keeps coming in it will hit every server and it will fill the cash in that server it's not quite as good this is already a trade-off but in practice it's a pretty good start but then let's think about databases now you can't do the same of this that's what I did here because if you had a date per worker when somebody saved a page and loaded it again it would maybe just have vanished because on a different database so we need to take our database or we need to scale it and this is kind of the essence of what's difficult about scaling Django and web applications and the problem is setting databases is something called the c.a.p theorem I will point out this theorem is not scientific it's more sort of a guideline but it's a good way of reasoning about it so let's bring back the triangle so the c.a.p theorem says that you can have any two of these three you can be partition tolerant that means you can deal with a network that's latent or down you can be available which means that you will always answer a request and that means answering without an error crucially as well and you can be consistent which means that you read the same thing from everywhere in the graph databases can be at most two of these that's what the theorem says if a database claims all three of them again they're lying to you don't trust them I would not mention certain databases that claim this but let's think about each of them so partition tolerance basically means can you work over a network most modern databases are partition tolerant because otherwise they just wouldn't work if you can't deal with latency you're going to quickly decay into a load of unusable data and so the real choice for us is between availability and consistency availability very much means that you can always answer requests but at the expense of them being correct so what that looks like for example is if a database is available but not consistent I can always write to it but then when I read it back it might be an old value if you're writing a shopping cart for example this means that you might add someone to the cart and then check out but it's vanish from the cart and then finish checking out it's reappeared in the cart and you don't charge them any money for it which is not great obviously but then again what you can do is you can drop availability for consistency and so you say well the database won't always be there but when it is there I will have perfect consistency in my transactions and so this is what Postgres is post-race and my sequel and most relational databases are roughly ACP they are consistent than the repetition tolerant this means that if you can write to them if they're available they will always give you an answer there's consistent across the entire set and you can trust that what's in there is good now this obviously has extra extensions into transactions and acid and similar but that's the basic summary and as a contrast we have Cassandra Cassandra is AP it is always available you can write to any node in a Cassandra cluster and they will always take your right but it won't necessarily give you back the same answer like if you write to node 1 and then you read from node 5 if it hasn't been enough time to propagate round you will get the old answer that used to be in the the value you wrote you and really the choice of which one of these you use is down to what you're designing it can be very hard to design a product around inconsistency and for this reason most small companies opt for a consistent database like post present my sequel because you can reason about it very easily you understand what's happening and you don't need to design pages for well the user might see a thing and then the next reload the thing may have vanished and then the next reload it may have reappeared again which is what happens on a consistent database so for this reason most people don't choose something like Cassandra or react another example of this but if you want to those databases make it very easy to scale because of that trade-off in consistency you can just add in a few thousand nodes into that cluster and it will pretty much take the weight for you but of course if you're adding a shopping cart or a ticketing system like I do at work you really want consistency you don't want to sell people tickets and I have them vanish afterwards so how do we take a relational database and find other solutions well the most obvious first one and the one that sort of I would recommend to all of you if you go this route is read replicas this is often called master slave as well and the idea here is that you start off with having a main database which is again just a poster as on my sequel and then replicas of it often called slaves and what these do these stream data from the main database into their own copy and you can query them and you can only query data from them you can't write data to them they are read-only and so this is really good for scaling reads if you're say Wikipedia and you've got a very high read load compared to your write load then this is perfectly scaling that kind of thing because if pretty much everything is reads and you can add these replicas pretty much forever we have you know tens of them at work to get serving our sites it works pretty well but if you're right load is heavy this doesn't work you can still only write to that middle server you can't write anywhere else and so as soon as you get over a critical number of writes and in particular as soon as you run out of Amazon boxes that are big enough to handle your service because the cheap way is just increase the size of the database server until it stops working you need to fix this and there's also a second caveat to this if you do this example then you have a problem and it's a consistency problem because by making this system you've made it very slightly inconsistent so if you write to the main database there is a small lag of a few seconds at most between it going into the main one and appearing in the replicas and if in that time say you're again in a shopping cart you save to your cart which writes to the main and then you want to display the car which reasonable replicas you can actually show an old version of a data and so for this reason when somebody makes a request if that writes to a table you have to what's called pin all reads that table to the master database for as long as that request is available so that it looks like they have concurrent consistent data and this also is a load problem because what you have is you have the problem of well now not only the rights but any reads are the close to rights hit this main database and you're quickly gonna start running out of not only space but things like your number of TCP connections can even be limited in these cases and so when you get to this point you end up having to really start thinking about sharding now sharding is a term that is very generic it can mean a lot of different things well I'm going to cover here is something a couple of different ways so first of all the easiest thing is sort of called vertical sharding what this really means is just taking your different tables and putting them in different databases this works very well if you have a couple of very big tables that are sort of equally sharing the main database you can take all of your big ones put them on their own server and you're fine the problem with this is they're on different servers which means you can't join between them and so if your site relies heavily on joins you'll quickly find that well now I can't join the users table against the events table because they're on you know different ends of my data center I can't that's not possible and so already you've lost some of the attraction of a sequel if you can't join us I'll sort of do one query to get all the users bring it in to Python do another query to get all the events bring it to Python do a dictionary comprehension join them together and you might get another in the end so that doesn't seem great what's the other option well the other option is horizontal sharding and in particular I'm talking about something called consistent hash sharding so what you do is you have lots of databases that all have all of the tables but what they have is a subset of the data and the first thing to do with sharding like this is to pick how you divide the data and this is often the thing people get wrong so the best way of doing this is often by the user or by some sort of top-level object so let's say we're going to invite it by user in this case so we're going to take our user ID we're going to apply a hash to it so that we get equal entropy and we're gonna take the first character of that hash and look it up so what this means is hopefully if we chose our hash function correctly we have all of our IDs because of course numbers are all towards the beginning they're like one two three four five go into an even distribution of hashes which then evenly go into these shards now this ends up with roughly balanced databases with the right data in each of them but you still have a problem with joins so in this scenario we do have all of the tables in each database but we can't necessarily join between databases and so as long as we're just querying only four things owned by one user we know that say all of the users tickets all of that users events are in the same shard we can do joins as normal but if we trying to query across users say we're trying to get a summary of ten users tickets we then have to issue queries and all of the shards take the results and manually combine them again in Python to get a single valid list now this often ends up being better because usually you can split your product by something like users or companies or similar but there are problems in particular you can have unbalanced shards so imagine you have something like Instagram and you have lots of users with like a few hundred followers it's pretty normal and then you have say Justin Timberlake joins he has 10 million followers and so now you have a small shot at a small shard a small shard and then a giant shard way that person happens to have landed on and so you have these problems and this happens too in other areas like say you're doing ticketing and you have one company who's selling a million tickets then you have a problem with it where to put them and so one of the problems with this kind of sharding is trying to balance them out and in particular with consistent hash sharding you have the problem that you don't get to choose which server these go onto and so the trick here is often rather than hash into as I've shown here like sort of 16 you take the IDE or whatever you want the hash it down into around 4096 buckets is a good number and then you have these 4,000 logical shards and you map them down into physical shards say well shards 1 through 1,000 are on this server and then if one of them gets really big you can manually say oh but shard 300 is actually over on this server that's a good way being flexible but of course this also only gets you so far and so then you need both of them you need to have vertical shards of each your tables which themselves have our horizontal shards of each of their things and you quickly descend into a nightmare of which of my hundred databases do I contact how do I understand all of this and of course to be more efficient you want to cache well that's had a cache on the end of this and so you start seeing the problem that you have at scale and this is not just a problem of programming complexity it's a problem understanding you've got to remember that when you have a small project a small piece of code you can understand all of the code like as a programmer as an engineer you can understand the entire world of that code when it gets bigger and bigger you have no chance like at work we have a code base that is so big that no one person can understand any of it and it's even more true of say Google or Facebook or similar and so you have a problem not only of scaling the database but of scaling your logic of your team's like no one in a big company or in a big product should have to understand everything there should be somebody who understands the general layout you have a Software Architect they know only a high-level but they don't know any details they all know how things talk to each other and so what do you do now how do you solve this problem of having you know twenty a hundred a thousand databases well this is where you bring in often concept called services or micro services you may have heard them called the micro is a prefix I'm not particularly a fan of it sort of just means oh they're smaller but smaller than what they're the smaller than code sure and what services are is a way of taking both your data and your business logic your Python code and splitting them out into separate sort of silos and so in that example we had there we could take our databases as well as the code that talks to them that understands the sharding that does things like oh when I change a password I set the hash and wrap it all in what's called a service and this lets us then take those things and reason about them as black boxes as opaque boxes and so we can say well this team understands the user service the understand how to talk to it they understand how it works and the rest of us can ignore it all we need to know is that we can say hey make a user hey delete a user hey change a password so we know and then as somebody outside that system we can then reason much more easily about this system like how it implements that stuff we don't we don't care about all we care about is it fulfills a contract no that's good we can write against it and so you end up with each of these services being its own smaller project in fact you what you've kind of done is you've made your own specialized data store you've taken databases or Redis or similar you encapsulate it with extra code and you've made a new kind of data store just for the thing you're storing like tailored to your business logic this can be not only databases it can be queueing it can be things like a firehose of all the events it can be things like locking and unlocking of inventory and so you get a lot of flexibility and how you do this stuff but of course you have a problem you have services they probably run on different clusters how'd you talk between them how do you say when I make a user make them a password if it's separate if I make an event how to make tickets for that event these are on separate stores now it's like the joint problem earlier for even larger not only can we not join we can't even talk between databases we have to sort of stand back and go oh we need to talk of a high level of services so the obvious option here is just talking directly between them each service knows where the other services are just sends a request probably over HTTP to that service seems great you know simple enough three lines what could go wrong well let's have 500 dear there's now ten lines there well let's go up to company it can't get worse let's get 200 oh dear and so this is you know so this is a quadratically increasing number of connections if you have 20 services you have 190 connections if you have a hundred services I haven't even cap lived it out if you want to in Python you can do sum of range of a number and it tells you this number I have heard some companies have over a thousand services you can imagine it's impossible to do this with a thousand services so how do you solve this problem how do you take all these services and try and route them to each other well the first obvious thing you can try and do is have a central place they all talk to a Rooter you say hey I have this message for this service and it sort of opens a TCP connection and just like bit like a proxy routes it through to the right place that works pretty well you still have the problems of having connections though in particular if you try and connect and the service isn't there or it's being restarted you're going to lose your message and so the slightly better version of this is the message bus and the message bus is the same idea is it's still a central point that you talk to but that point also gains things like queuing and routing and a bit more logic to itself and the idea here is that a service can just write to the bus said hello I have a message request for service to our the the request is to make a user and when you finish can you send it back to service one with this end point and this of course makes something that every programmer fears a single point of failure and if you are like me you have it drilled into you that single points of failure are the work of the devil they are terrible they should never happen this is not true a single point of failure is sometimes bad it's not all bad and you have to consider so let's think about this again we have a mesh network and mesh networks seem good but you have as we saw that yesterday did you actually have you know 20 or 30 different points of failure and I can guarantee you that you'll if you have one of them monitor that incessantly you will understand it you will know how it works you know its limitations you have a team of five people just watching it all day and all night if you have 20 of them you don't have 20 people watching each of them independently you're not gonna notice as much and so often having a single point of failure that's really well understood that's really well monitored known is better like you know you can have one of these and like we have one and a hot standby so if it fails we can switch it over within a few seconds but having that single point of failure is good because it's not only a place to root think through but understand as a developer of a service all you know is well hello I am developing this service all I need is one thing the IP address of the message bus I don't need to know what services are there that need discovery all these know is hey by sending things over here I receive things from here that's it and if you're scaling things out as well in terms of connections this is great you can just have an image that says oh boot up this image of the service connect to this bus and off you go no need to update a Rooter or a proxy or anything like that and so obviously I'm a fan of this that's why I'm talking about it and this is kind of what Jango channels is based on so a lot of people think that Jango channels is about WebSockets only and that indeed was where it started but what Jango channels is is actually a set of things it is both WebSocket handling but underneath it has a message bus and the way it makes Jango work the way that everything is asynchronous is because it rewrites Jango to run like this on a message bus in particular what happens is we have I hate HTTP server as one thing on the bus and Jango workers on the other parts of the bus this means that this part the server that terminates HTTP and WebSockets is written and twisted it's fully asynchronous it can handle thousands of connections by itself without any threading and the Jango work is a synchronous they are old Jango they do one request at a time but because all the asynchronous handled on the bus we can have what looks like web sockets but handle it synchronously in Jango without rewriting all of Jango which in case you're not aware is probably a four five five year long project so we don't want to do that and in particular it has those trade-offs so when you pick a message bus like with any solution that's distributed systems you must pick some trade-offs and what I've tried to do here is make a message bus and a Python interface that captures what I think and what I believe is the best set of those trade-offs that you can pick you may disagree no one is the same but my goal here is have a solution that like Django is good for about 80% of people not 100% about 80% so this is how people think of channels we have Django up here we have a channels library and then what you don't know about is there's this thing here so channels is a library that sort of just links to Django understands how to decode things into Django requests and Django responses underneath it is a separate abstraction called a SGI which is definitely not based on WSGI for the reference and what that is is it's a sort of abstraction of message buses it has send it has received it has a few other things that's about it and then what this does is it backs onto something behind it so the most common one we see here is one that backs onto Redis there's also one Reb MQ and a few other ones for testing and so what you can do is you can take this and you can actually take away Django you can just run pure Python on top of this and have pure Python message buses and this is what we're doing at work our services are just pure Python occasionally using the Django ORM because we like Django so M but directly on top of the channel us channel layer and so what we can do is we can say well actually we just have these pipes and processes that run and talk to each other over this thing and it's maintained by Django right like this is part of the Django channels projects it's maintained it's tested the security releases sure I do most of that stuff but this is a guarantee there so in a really good way of writing it but let's talk about those trade-offs so first of all the most important thing is how you fail again if someone claims to deliver messages exactly once on a cue they're lying to you let's go find this isn't it so your choice is when you have a queue and it fails to deliver how does it fail well the obvious one that you think of first is the it livers at most once that is if it fails to deliver the message vanishes never to be seen again this is how your sort of simple queue would work say like we've used a database you would sort of write a row in the database and then your worker will just read a row and delete it when it reads it and if you read a row and delete it and then the process crashes you've lost a message so that's at most once the other option is something called at least once which means when you fail you redeliver you try again and what this means you can get duplication so for example Kafka does this celery in some modes others too so this means that what you do is you read a row and then you process it and then you delete it want to finish processing and so well this means if you read a row and process it and then died just before you delete it another thing comes along as well it hasn't been done yet and also runs it and then tries to lease it you've run things twice I believe in software that's designed to deal with loss and crashes and so given that I personally believe that at most once it's easier to write for you should already be able to deal with things disappearing so why not have that just be the same as your message queue if you do at least once you have to write code that deals a deduplication say you was sort of charging credit cards or taking payment if you have this you might charge somebody twice and I would rather not charge somebody than charge them twice the next thing is guarantees the way channels is designed it is a very very low latency queue the end-to-end response time is around one or two milliseconds which is you know very reasonable for a network the trade-off here is that because it's so low latency we sacrifice some reliability some persistence and so yes it's very quick but there's a slightly higher percentage of them that go missing so it's like 99.99% delivered then they were something like celery for example celery has a very low loss rate it has a lot more sort of negotiation and a standing but the latency is higher because of that like looking at probably 1/10 to half maybe even a full second or two and because channels is a message bus that powers sockets and web sockets we need that low latency and so for that reason it's designed specifically to be something to communicate fast rather than sort of a slow task kind of cue another thing is queuing type so the obvious thing for a queue and what a queue is is the first thing in the first thing out you know the head of the queue has been there the longest and then when you go and fetch it you get the oldest thing in the queue this is good for consistency it means that everyone in the queue it's the same experience everyone is equal basically but it can have problems during big backlogs because if you get accused 100 or 200 items long and takes you know a couple of seconds to get through that queue everyone on your site will have a couple of seconds of latency like it'd be bad for everyone the opposite is first in last out it's a stack right the first thing in the oldest thing is the very last thing that's pulled out and what this ends up being is it may seem bad at first but it ends up Hank good responsiveness for everyone that's coming new to the site if you backlog the stack grows and a huge amount downwards but everyone who turns up after that still gets served pretty quickly and then you just timeout the people who are slow and then drop them this is what happens is you make a totally terrible experience for a small number of users but keep a great experience everybody else personally I prefer the easiness and simplicity of first-in first-out especially as long as you have a queue that is wrapping and going through quickly first-in first-out is much better at just giving you predictable understanding information it's quite possible if you're just on the edge of how fast you can process this can make like one in ten pages just not render just not good the last thing here is queue sizing as we heard yesterday queues should definitely have a size four so for reference if you don't have a size in your queues they're infinite well that means that you can craft a deal you don't have to deal with fail extending failure to send you can always the shut things on the queue and go on and keep well this means is if you have a little traffic or an error or something goes down your queue gets longer and longer and longer people who turn up get slower and slower and slower if it ends up having a really bad time what you can do if you really want to is combine infinite queues with first in last out with the stack that actually works quite well as long as you time out things that are sort of a certain amount hold but given we're choosing the first one here having finite queues is good and what channel's has is each channel that you send on has a capacity it's about a hundred by default if it doesn't fit on the channel you get error saying channel full and your application deals with that so say you're trying to open a HTTP connection to Django much going through channels if the channel is full it means that the workers are completely full they have no room left to process and so what we do is we say oh well we return a 503 service unavailable the same thing that H a proxy will return if it worked out had two big Hugh's as well whereas if we're trying to process a payment and we see the channel is full we can go well we don't care about speed we just want to have it happenss we just sit there and wait and wait when it opens up again and we can put the thing on to the channel and we're good and so the common theme here is you've got to understand what you're making this is as it says here surprisingly uncommon most engineers don't know what they're making in its full extent which is somewhat normal especially for startups and new companies you don't know entirely what you're making but often you want to at least have an understanding try and sit down and sketch out the rough set of failure cases and and how it should work and once you understand that you can go through and be informed about which trade-offs to pick now I've shown you some here there is a whole load of different ones in distributed systems it's a huge field with many different problems but the first thing to think about in those meetings about trade-offs is what's your priority is your priority to have a perfect system that's really well built but it's slow number the fast good cheat triangle audial Gould's gets on the ships and works and if you want the second thing and you probably do here are some tips first of all always try and design as much as possible around shared-nothing try and make all the information you need to process a request or any kind of message in that message some good examples of this are django signed cookies what jungle by default does is django had a session ID in a cookie and you do a request django goes our session ID goes and queries a database Yanks out JSON decode the JSON and that's our session but what you can do is you can put all that session data in the cookie as long as it's small enough and then what happens is you have a request and you have a date already there's no database to hit there's a security issue there of course which is well if the session is just in the cookie the end user can just edit their session which is why you sign it so what you do is you have a cryptographic signatures as well we gave this cookie to you we've signed it you can't edit it all the signature is invalid so we trust it and this is true of other things so I showed you before the per machine caches this is a perfect way of having sort of it shared nothing caches as well another top trick with the caches is say you have a site that gets really large load spikes on a couple of pages a one-second cache is incredibly effective at that kind of thing if your job is to make it look always as new as possible but stop huge spikes just a couple of seconds of cache will give you from a hundred requests a second down to one request a second on your back-end which can make the difference and it will still look pretty live is updating all the time another thing for example is some nailing so often when you upload a media file people will will take that media file and will go and some they'll it now will just send it off to a salary job or something and thumbnail it if that fails that's a big shared resource what happens do you sort of show the user an empty page the alternative is a couple of Python services or commercial services that thumbnail on demand when you request the image you say oh I want an image at this size with this cropping in the center and it's all in the URL it's all in one place and then you can take that make the thumbnail often from account obviously and then bring it through a and by doing this you're taking all of this stuff and splitting it out so if you see something that looks like this share that like oh all of my workers are talking to this try and spit it out try and take it and tease it apart into different pieces if it still still has to be shared then you have to start shouting it because you can't have a single shared resource in a big system it's just not possible and I spoke earlier about some shouting shouting itself is probably a few hours of talks as well and there's plenty of stuff online to go and read up about but it's a good general idea like one of these two things as you do what you want to try and go for and among all this people think well I'm changing my site too much I'm taking Django and ripping things out now as a core developer I want to reassure you Django's job is to be deleted piece by piece nobody at scale runs Django exactly as we give it to you out of the box that's not what happens like Instagram does not run normal Django what they do is they take Django and you replace each piece slowly to be something specialized to you as I said before Django is designed for the 80% the general set of Engineers and as you specialize our job is to sort of just fall away and let you do what you think is best in this place and what you can end up with is something that has no Django code at all but still looks like Django because well it started off being Django and you've replaced every piece of it but much like the broom where you're placing handle then you replace the head it's still a brute it's still your broom so don't be worried about taking Django like oh we need a different caching layer that's fine we need a different session there that's okay people replace the ORM right like if you go listen to Karl's talk about Instagram from a previous django con he will tell you about how they've entirely removed the Django ORM from Instagram they all wanna get to custom Facebook store but it's still Django I still like URL routing in there so this is fine and it applies to other frameworks too but the key thing is when you do this make sure you understand what you're replacing I've seen many people take Django and place this with something else and they got it mostly right but not a hundred percent right and we have good documentation about what this is meant to be so if you do replace pieces please try and understand like what the caching layer looks like I've seen examples like oh we just used this thing wrongly and now when I write to the cache everything else in the same thread gets the same response back and like I start getting like pictures in my header and stuff and then my final piece of advice to you is something very important don't try this too early this talk is from a place of you know Eventbrite is a reasonably large company now we have hundreds of Engineers you can't do this on your small you will not only struggle to do it with the number of people you have but you'll probably pick the wrong trade-offs you don't know what your problems are until you see them what you should do is see them coming and go our our database is getting very busy and then handle them in advance rather than going oh no it crashed how we fix it but you won't know until you get in at least near there what you can do however is take these ideas of having things separable and separate and building them sort of as service like things and apply them to a new project for example if you're writing a new Django project and you have a couple of apps like a user app and events app try and keep them self-contained try and not have them import each other because if you keep them like that when it comes time to make them separate you can go oh well it just imports itself we can just lift out this module and put it down separately and hook it in and we're done otherwise you spend weeks or even months of like going through the thing removing imports adding different connections in trying to fix it up and trust me it's it's painful doing a lot of stuff so just consider it try and think of it but don't go down the road too far don't shard from day one that you won't need it the best advice for a small company to scale the databases to buy a bigger server it's very cost effective trust me and thank you [Applause] hello thank you for the talk and my question is how can you say something about how big Eventbrite is in terms of how you know users two requests per second something like that when do you need to like how much Lord to help that you need to scale in this way so I can't give specific numbers unfortunately one thing we don't tell you but my general advice to you is probably around 10 servers like so in particular I used to run a site minecraft forums in fact that was a huge amount of load but very heavy read that was just three or four servers it was three PHP servers and one database we just kept increasing the size of like the window softly I went oh we have a really really big machine with like four processors and loads of RAM with their sure if you pay us the money and that worked fine so I said like four to ten servers is at the point where you start thinking about it like sort of a few tens to hundreds of requests a second it really depends on what you're making if you're making like shopping it comes much earlier if you're making like a wiki it comes much later so it really is unfortunately I can't give super amazing advice on that anybody else hey thanks resume and for an amazing talk what I wanted to ask is about the bus and especially about the channels and I think jungle channels yeah so the question is how do you is there a way to have like sticky session or routing in the channels and how does that affect affect the complexity of the bus itself how do you think about it that's a very good question and I think this is about what I said about shared-nothing so your your first thing would be oh well what if we have all the things from one socket it's stickiness go to the same server we can store things in the server memory but I'm like well deserve that thing could crash we shouldn't store them there what channel's does is it uses the Django sessions framework so it takes each WebSocket which each sort of incoming socket and gives it a unique identifier which works like a session ID and then in Django you have a decorator which just takes that reply channels it's called and just looks up in the session table and gives you a session and what you can do is you just say oh you do at channel session you write some code as normal you store things in a session and then the next thing on the next server which could be the other end of the rack can just read it from sessions and handle it so that's kind of how we solve that trying to reuse as much Django as we can but again it may not work for you but you can just replace it it's an optional part anyway thank you oh thank you very much
Info
Channel: PyCon Ukraine
Views: 1,822
Rating: 5 out of 5
Keywords: python
Id: Bx-SgneXggs
Channel Id: undefined
Length: 45min 6sec (2706 seconds)
Published: Tue Sep 05 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.