An Overview of Performance & Scalability

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thanks Matt so I'm gonna do my best to try to be vocal but the cadre party last night really kind of took my voice away so I'm normally a very enthusiastic speaker but I may be a little bit monotone because I have a very small amount of voice to use with you guys so I'll do my best so when talking about a performance and scalability that's two different topics basically performance basically meaning making your site fast scalability may cancel your site can handle large amount of traffic before you even get into anything this stop this talk mostly is going to focus on installing software like things that you can do on your individual server so hopefully that means that you have like a root account on your server or you have a server administrator that has the ability to install software on your server that's going to be the criteria for a lot of the software that we're going to talk about but I also want to say that there's a good chance that before you even get into that you should start and figure out what's wrong with your site if you're having problems with just the most basic Drupal site causing you troubles you probably aren't looking at installing new software to actually fix your site or two to cover up the problem if you will most of the time it's really just a matter of finding what's wrong with your occurrence I like finding why is Drupal slow why is my site crashing under under some small amount of load well there's at least a couple frequent performance culprits that you should look in-depth at if you have them installed on your site and any any module has the potential basically to do absolutely terrible performance things to your site and a couple of these modules are even included in core the statistics module this is one of those frequent culprits where people go to the modules page for the first time and they see statistics oh I would like statistics on my site yes please and they turn on the statistics module and never actually look at the statistics pages because the statistics generated by statistics module are actually not that helpful so if you're using statistics module or if you've turned it on turn it off because it seriously is like one of the most terrible modules out there in terms of generating a lot of sequel queries it is an insert on every single pageview to track you know that somebody visited your page or something like that and what you end up looking at eventually is that you go and look at your statistics one day and you find out that google bot has hit your site 380,000 times since you launched it's like oh great that's actually three hundred and eighty thousand entries in the statistics table you look at a statistics table I've seen a statistics table of eight hundred million rows it's just way it's just ridiculous anyway statistics trying to offer you something like Google Analytics instead something that is really something that is meant to to store a lot of information tracker module tracker module essentially makes it so that every time you view an individual post it records the user and the post combination essentially so you can see like the last time people viewed a particular post or the last time they participated it also just an absolute performance killer try to avoid that one if you can if you need tracker ability use tracker to from contribs a better more scalable version and as actually what's on drupal.org XML sitemap oh man I am XML sitemap way back in the day used to be like the number one killer of sites if you installed it almost immediately would would just destroy your site I'm the newer versions of it are totally fine just if you're running an older version triple six one point X branch another problem sort of in the same SEO vein node words module code words even today is a real performance killer and it really is kind of harsh for the people that have an SEO consultant come in and say you need to install XML site vinod words and pretty soon yer so it's like starts going really slow node words is been in a tough place for a long long time and I'm sorry to say like right now you really just can't install node words if you do if you need meta tags of some kind consider doing it through theming or through a hook page alter some something that you can use for any anything to avoid that module for the time being user relationships is a module that just has a lot of complexity to it but is a frequent performance culprit and any kind of access control module OG taxonomy access control tech light any access control module is going to be a pretty crazy performance killer so if you're using access control modules a lot of times you can't avoid them but if you are using access control modules try to just use one and not multiple things get even more crazy the more access control systems you have in place the crazier the queries become and and the the worst performance implication that has so these and many others I'm sure these are just really popular modules that can cause a lot of problems but there are a lot of modules out there that I don't even know of that I'm sure are equally big performance killers and so if you don't know what's on your site that's causing some problems there are some tools to help you find out what those problems are and so the easiest one is one that a a tool that everybody should have installed that's the the devel module and so the develop module is a dribble org project you can see the URL right here and it includes two tools at least in the Drupal 7 bridge has this bottom option for XH prof but every version of develop all the way back in time I had this ability to show a query log and the query log is absolutely fantastic in terms of finding out what what sort of single queries on your site that may be running slowly so this is an example from a site is having some issues you can see here it executed 300 queries in three thousand milliseconds so it's actually three seconds that it was just doing sequel queries single queries are the number one slow down on any Drupal site and you can see eric queries longer than five milliseconds or executed more than once you should never ever need to execute the same sequel query on a single page more than once if you are executing the same to worry more than once you should be able to rewrite that function in some way to make it so that it remembers the fact that it's already got the result and store that memory instead of executing it multiple times you can see right here actually on this particular site the number one sequel query here is statistics exit this is statistics module doing it inserts into the access log table sort of like what I was talking about that statistics modules just not a really great module and it will sort them you can have it sort by the amount of time that took to execute so the longest running queries at the top and you can see here that an insert is a rather expensive operation and comparison to something like a select so XH prov XH prof is a completely different approach to profiling essentially so see you acquire a great thing to look at but XH prof and other tools like it like cash grinder PHP grind go through and they actually analyze your code and actually break it down by function and say this function took the most time to execute and then sort it so you can see where all the processing time is actually being spent within any individual dribble execution so both of these just show up at the bottom of the page and then the case of XH profits link in the case of just in the case of the query log it's just a list right down there at the very bottom of the page and so you just can browser on your site and any particular page if you're noticing it's slow you go and look at the analysis of where the time is being sense spent and this is particularly useful if you're not seeing any expensive sequel queries a lot of times a slowdown is caused by an external HTTP request that is a very very common symptom especially with things like in the age of Twitter and Facebook a lot of people are querying external web services during individual page loads and this will bring that to the forefront because you'll see that a bunch of time was being spent in like Twitter API call or something like that you know a function that is executing an external HTTP request so basically your page and only has to load but then your server has to query another server and that page has to load and then get the result back returning to you and then you return back to the user and this will bring that those sorts of problems to the front so step 2 so step 1 finding out what's wrong step 2 and this may be this would actually be step one turn on the frickin page cache this is the number one like thing you can do to speed up your site and it is crazy the difference that you have between a page that is not cached and a page that is cached by the the dribble page cache and we'll get into numbers at the end of this presentation but similarly turning on the page cache this is located the performance page of dribble in Drupal 7 it looks like this you should set a minimum cash lifetime and the expiration of cash pages to something reasonable like I haven't said here 25 minutes I would never recommend that you said at higher than five minutes though that that drop-down goes all the way up to something crazy like a week or something like that it is absolutely silly the advantage that you get of something like a five minute cash or even a one-minute cash is crazy like say a hundred users 100 Mary exciting say 10,000 users visit your site in one minute if you have the page cache turned on that turns 10,000 times that Drupal had to start up read stuff from the database and generate the page into one time and if you turn it up to five minutes then that makes it so that you know 50,000 users 50,000 users generated the page one time it's not actually that that big of a difference that you had to generate the page 5 times in 5 minutes versus generate at one time in 5 minutes right and when you're when you look at the grand scheme of like 50,000 users coming to visit your site so set it really low basically the you're not getting much benefit by setting that to something really really high so I usually say one minute five minutes at the at the axle maximum in Drupal 6 it's similar looks like this only it's a set of radio buttons instead of instead of being a checkbox this particular screenshot is actually not from Drupal it's actually from press flow which I'll talk about here in a little bit okay so once again before you actually get into installing new software what you want to do is look at your existing stack and figure out how can i optimize the software i have today press flow like i just mentioned press well if you're running drupal 6 and you need to install a performance software on top of it you shouldn't actually be using Drupal core this sounds really crazy but press flow is essentially a complete replacement of Drupal core it's one hundred percent API compatible so you can just basically removed your core replace it with press flow and it's exactly it is Drupal it's like a fork of Drupal in a way it's a fork done right that is a version of Drupal that is one hundred percent API compatible and one hundred percent just use all of your existing durable modules but but it's different and that it does things like it allows you to use like slave databases condon delivery networks reverse proxy systems that we're going to talk about and it basically it's just a more performant version of dribble core some things that they've done in it narrow down your use cases a little bit you can only use it with my sequel they've stripped out the database abstraction layer and said my sequel only and that makes it so it can be more performance because it got rid of a lot of abstraction the triple core includes so you only need it in Drupal 6 drupal 7 function want functionality wise includes everything like it includes the ability to use accounting delivery network and reverse proxy caches slave database servers all of those abilities are all built into Drupal 7 now press hello as far as like a Drupal 7 version of price flow may exist soon in the future but it will likely focus on things that Drupal core doesn't yet have things like the ability to use edge site includes for example in generate parts of the page through varnish really fancy high-performance stuff I think they're also really big on trying to make it so that they can use hip hop PHP hip hop PHP is basically what powers Facebook where it's basically crazy fast because it's all I can say about it and the nice thing about press flow also does not create sessions for anonymous users built into Drupal 7 also and we'll talk about why that's important when we get into and into reverse proxy cache is so install press flow of using Drupal 6 it's just a good thing to do it's really simple you just it just like upgrading Jubal core take out the normal trip core files replace them with press flow equivalence and pretzel is actually really good about updating just like Jubal core every time there's a security release or an updated ripple core the version of press flow will follow in like a matter of hours or maybe at worst couple of days they're really good team so PHP PHP obviously everybody's running it if they're running Drupal the absolute number one thing you can do is install a PC a PC is another PHP cache which reduces all of the disk access and this is particularly super important if you're running a vm or a cloud server because disk access on cloud servers is catastrophic like that is the number one thing my a cloud server is not nearly as fast as a dedicated server even if they were like the same specs because disk access on cloud servers is just absolutely terrible so a pc can alleviate a lot of that problem where whenever you request a page the very first thing that apache does when it starts up PHP PHP says oh I need to render this page index PHP gets pulled up and then it has to load almost all of those files that you see within dribble all of the dot module files everything in the includes directory it literally needs to read it from disk all those files off disk parse it turn it into C code compile it and then actually execute on that and return the result back and does that on every single page load if you install a PC it skips the process of reading it from disk and it skips the process of compiling into C code because it already has a compiled as it ready essentially and skips a huge chunk of processing power and a lot of disk access disk accesses and makes it so that all that information can immediately just be executed and then return back to the user so a PC is incredibly useful and the great thing about is that it's invisible you just install it or you have it installed and it takes absolutely no work whatsoever you don't need to clear it you don't need to even know that it's there it basically just makes things roughly about twice as fast as far as the process of building and compiling a triple page it's really great one thing you need to worry about them is that you need to make sure that all of Drupal like all of the files essentially you can roughly think of it as the the file size of all of the code needs to be able to fit in memory and the default size of a pc is something like 8 megabytes and eight megabytes honestly it's just not enough space for all of the dribble code to actually fit within memory so what we commonly do is we set the the size here to 100 eight megabytes that's about as high as you can go with a PC on a lot of systems but 120 megabytes is plenty of room for dribble to fit and if you're running other PHP applications on the same server they'll all fit within that memory allotment also so you don't need to set it 220 megabytes but that's a good a good number if you have it available if you want to check actually how much space you have available a PC comes as this really nice script just called a PC PHP that is a file that gives you statistics on a PC and how well it's running and what you always want to see an APC is a ninety-nine percent higher hit rate on a pc that means that a pc is not needing to load or PHP is not needing the load from disk any of the files hardly ever like this is this is about your typical screen here where you've got a ninety nine point seven percent hit rate that means that ninety nine point seven percent of the time you don't need to read from disk and that APC has already got all of that information in the cache already so that point three percent is really like the very first time you restart at a pc and then it needed to load all of the files from disk for the first time and then it will never need to do that again so you always have like a super high hit rate if you're getting a lower hit rate that usually you'll see usually see this in combination that the hit rate will start to drop but it'll actually be like a usually it's like a cliff where hit rate drops all the way down something like five percent and the cash over here the green pie slice is completely gone it's all orange meaning the cache is full if the cache is full that means that a be a pc is clearing stuff out of the cash before before it actually gets to be used so if you run out of cash space the orange pie is or that the pie is completely orange and a pc doesn't have enough space actually to function properly so you increase the size the hit rate will shoot up like crazy and then you're in the situation where a pc is working like it should again that's a take a break I know I'm fire hosing information at you guys but we got any any questions so far anybody in the room yeah Mac could you repeat that I'm sorry I did not hear your question you should be enabled internal debugging so good idea so internal debugging you mean like something like X to bug and then using a good ask it's asking you when you installed are you talking about a pc so internally debugging with a pc should that be installed ok so yeah there is a debugging thing that that comes with a pc should that be installed you can install on your server if you want it may actually not actually sure it I'm not actually sure if that's xtube ugh i don't think it's x tobogganing a pc actually comes with some kind of debugging thanks straight up I've never actually used it so I yeah I don't think I don't think it's necessary no turning further questions Oh the question was you mentioned access control modules can really cause you performance nightmares what if you really need one of those so if you really need access control modules and we use access control modules even on our own sites that are like you know super high performance sites we actually use the domain module which basically applies access control so that you see content on one particular domain and not on another domain so it allows you to run multiple sites basically completely different sites and domains off of one triple database and those kinds of queries are expensive and if you see if you turn on query logging you'll probably see that a lot of those queries are actually whenever you do a listing of nodes actually is the the the common thing that gets affected you'll see those queries start to float to the top of the list in terms of being expensive queries so what can you do if you need those things well talk about it a little bit here actually this is good lead in here into my sequel because what access control modules do is they generate queries that are complicated basically they make it so whenever you have a listing of content they needs to join against the access table or the node access table and then the node access table then does another joint on to whatever module table that is and that all of those joints basically a form an expensive query and you can mitigate the problem essentially by optimizing my sequel and I'll talk about it here in a little bit using slave servers to make it so that those queries are hitting not your your main database server so let's talk about my sequel so my sequel like I just said in multi-server setups you can utilize slave databases and all paid queries so anything anytime you have a query that has multiple pages like the river of news that you get on the homepage of Drupal sites by default that would be an example of a page of query where you've got multiple page content or if you have any kind of limit so like if you'd say have you built a view and you say only show 10 items then that query will hit the database or the slave database instead of the master and this only actually applies if you're running press flow which is why I've been telling people that you should use it if you're on Drupal 6 interval seven actually views has a very nice checkbox under you underneath the query settings just as you slave server so and Drupal 7 it's much more straightforward you just say i want this query to hit the slave and if you have a slave database server in drupal 7 you should pretty much do that for every single view you build make everything hit the slave so the idea is to protect the master database from queries because the master database when it comes to scaling out like having adding Hardware throwing hardware at the problem the master database is the one part of your infrastructure that you can't scale out you can't just add another machine you can upgrade your master database to be a really fast machine and a really big like 16 core you know 24 gigabyte or 24 gig monster basically but you can't actually add more machines at the database layer at least as far as like that there's always one database system that is actually receiving all of the inserts but configuration wise i'm going to talk more about master and slave databases but configuration wise there are two things that you can do that are always huge wins on the my sequel layer and the first of them is turning out turning on query catching weary caching basically is a small cache like 32 megabytes usually of cash that just remembers the answer to particular queries so durable when generates pages will end up doing a lot of the same queries very frequently and turning on the query cache which is have my sequel store the result two queries that are frequently done in memory instead of needing to access them from disk and in Drupal 7 nodb is the default engine or storage engine and nodb has particular caches also that you can set that need to be set to to a pretty high number I'm going to talk about them in a second if you need help actually configuring your my sequel parameters there's a bunch of other things besides query cashing in the nodb buffers those are the two biggest ones there are two extremely helpful scripts that you can run so you guys don't have to actually write down all these URLs I posted the slides on the door through poll site if you go to this presentation session you'll see that there's a link there to download the slides but these particular these two scripts basically their shell scripts they're written in perl and you download them onto your server and then as a root user on the site you simply execute these particular scripts and they'll tell you everything that's wrong with your my sequel configuration which is really really yeah it's really pretty awesome they're not a substitute for knowing what you're doing but they're a really good learning tool in terms of like finding out what's wrong with your database and just setting off like little red flags basically saying things that you can fix on your server so up two times what they'll do is they'll tell you to turn on query caching and set the nodb buffers so here's what this actually looks like so this is in your my GF file the configuration file for my sequel essentially and this is what it looks like to turn on the query cache the query cache first up here and then nodb buffer down below so you can see here I've got the query cache set to 32 megabytes which is going to be pretty pretty reasonable for most servers and sites query cache limit basically saying that I will serve store at most one megabyte size for like a division query and the query cache type you can look up the details of what that actually means but its technical that doesn't really just say one so and then down below this is setting the nodb buffer size 2 128 megabytes and then setting the flush method to direct and up here at the top I've got links to two particular articles on why you set these to what they are the the flush method basically prevents the operating system from doing cashing on top of nodb or my seagull doing its own caching so it reduces redundant caches there's no point having caches that two layers doing the same things but the nodb buffer pool size is a really interesting parameter to set because just about everybody will tell you if you have a dedicated my sequel server you said this in 0 DB buffer pool size to just ridiculous numbers a lot of people will say two-thirds of the available memory on your server if you have if you have a dedicated my single box this can go all the way up to something like 10 gigabytes of size if you have a dedicated server because as far as nodb goes that cache size the bigger the more you give it the better my sequel runs if you're running nodb database tables and it just basically has no limit at all to bigger is always better when it comes to this particular this particular setting so I've done only set to 128 megabytes which would be a typical setup if you're running like a virtual machine of some kind or a single server that has both like my sequel and apache and the whole stack all on one machine then this might be a reasonable start you know DB is your preferred engine yeah yeah so talking about database engines there are two really big database engines out there there's a my I Sam which is typically the default in Drupal 6 and before my I Sam was the the default database engine for for for my sequel databases nodb on the other hand is the default for Drupal 7 so all tables are in ODB by default and basically the difference between mean the two is that my I Sam is highly performant at read queries so if there's a like a select star or count or any queries that need to do calculations on tables my I Sam is more performance in ODB on the other hand is only slightly less performance on the Select side but ridiculously more performant when it comes to doing insert queries and update queries so generally as we found ultimately with triple seven is that doing nodb across the board is way way more beneficial than using my ICM if you want to be really picky about it you can actually have some database tables use the my ICM my seagull engine and have some tables use nodb any tables that you want to be nodb or ones that are receiving a lot of write requests so that might be things like the node table the user table the access log the sessions table watchdog any of these tables that receive a lot of write requests you might send them to nodb and then set everything else to my I Sam that is mostly read-only requests it's getting getting pretty heady though talking about my sequel engines like comparing them to each other I typically just say use nodb across the board just because it's the better decision so that's what I would say quite am in the back how hard is it to convert an existing database you're saying to change engines on your tables yeah how hard is it it's actually it's as simple as a query essentially you execute against my sequel and it will convert it from one engine to another one it may take maybe a second to actually do the conversion so if you do all of your database tables all at once you may be looking at like a 30 seconds sort of conversion time there's actually some modules out there that can help you with this I think it's called DB tools DB tuner ODB tuner yeah DB tuner that actually will tell you what engine all of your tables are and will allow you to convert them one to the other great question any other my sequel questions and you're talking about my sequel cash you could the question was is there a limit to the cash and will it be issues and delays and stuff when that cash gets cleared so my seagull essentially will always take care of clearing its own caches and it's not you may be thinking of like the the dribble catch when you hit like flush all caches button that has no effect on on my sequel that my sequel cash is self-managed drupal doesn't even know that there is a cache in place so my seagull is managing all of its own caches at a layer that Drupal is not even aware that there is caching going on so there isn't actually any any flushing of the cash like like we see commonly enjoyable because my sequel is smart enough that it's basically managing all of the cash all the time for you so you'll never actually see a flush unless you actually like restart my sequel which will actually you know clear everything out and actually turn off and then turn it back on so and there was some other question about a limit right you can't remember what that was yeah yeah actually a really really good question which amazingly leads into my next slide again about what kind of limits you can set on these things because you're right that there's only so much memory available in particular machine and memory actually is a huge thing that you need to take into consideration when it comes to how big all of these things are like those scripts that I was talking about will tell you the maximum amount of memory that my sequel will take up my sequel you when you run those scripts they'll say my sequel is currently configured to use a maximum of two point three gigabytes of memory which is really awesome it basically takes the number of max connections that you have the amount of memory takes per connection and then the amount of cash is they upset and adds them all up for you to calculate a total number that my sig will take this much memory at maximum which is super helpful Apache on the other hand I haven't yet seen the script that does this I guess maybe maybe it's so simple it doesn't need a script but Apache is where most people go wrong in terms of the amount of memory that they a lot to their box there's a lot of things that you can talk about in in the world of Apache like the way Apache works there too popular engines out there for Apache like pre fork or worker which worker you're normally using a fast cgi and pre fork you're usually using with mod PHP yeah so fast cgi versus mod PHP using HT access files like local HD access files versus like moving all of those HD excess files into virtual hosts whether or not you have keep-alive on or off and disabling unneeded modules there's a lot of modules that patchy comes with that you can turn off that you don't actually need all of these things while important are not nearly as important as calculating the amount of memory that Apache will actually consume all of these things you can argue one or the other and actually some in some situations it makes sense to do one or the other in just about every site you'll need to decide you know a particular situation where you where you want to use something that's not the default the defaults of Apache are usually pretty respectable by default you'll be using pre fork and mod PHP almost always which has a performance advantage typically over fast cgi and worker fast cgi and worker typically you may be able to squeeze some like more connection ability with less memory in those configurations but they're not quite as fast there there's just like totally minor trade-offs between the two of them and honestly mod PHP and pre work are just a lot simpler to work with and they're more stable that most people stick with them so all of these things you can configure them in various ways but the one important thing that you really want to pay attention to is max clients max lines essentially is the number of connections that Apache can receive or handle and this particular setup this is the Apache defaults the Apache defaults has max clients at 256 meaning that 256 connections can be handled by Apache at any point in time so it'll try to handle 256 people simultaneously 256 is a really high number actually especially if you're running Drupal and mod PHP the size of every individual Apache instance if you use top or some other utility to check the memory usage on your on your machine you'll see that a typical dribble install will probably take up anywhere between 40 and 80 megabytes in every single Apache instance so if you take the max clients and you multiply that by the average just looking at top and see oh this particular HTTP on my server is taking up forty megabytes per instance 80 megabytes is not uncommon depending on like the number of modules and things that you have installed on your site take that 256 x 40 megabytes is 10 gigabytes of memory and apache we'll just keep sucking up memory as if it was candy even if it doesn't exist on your server it will just continue expanding and take more and more and more memory and what your server will try to do in this situation is it will try to a lot memory even though it doesn't exist and that's when it goes into swap swap is essentially disk-based memory where when the server runs out of memory it says oh I'm out of memory so I'm going to start writing memory to disk and especially on cloud service that is an absolute catastrophe touching the disk is always the slowest operation you can possibly do and you want to avoid it at all costs so if you're if you've configured Apache or left it at the default setting max clients to too high a number is absolutely catastrophic if the server runs out of memory like that or air continues to allocate memory even after its run out it goes into a really nasty spiral of death you won't even be able to ssh into it the server literally will go down and just about the only thing can fix it as a reboot it's really catastrophic when this happens so this is the most absolutely most important thing to do is that calculate all of the other services that you have on your server how much you know how many caches you upset for my sequel which is scripts will tell you my seat will take up this max amount and then other services which I'm about to talk about add up all of their memory allotments and then only give Apache basically what's left and that that basically will tell you how many people you can serve up at the same time if a patchy runs out of connections it's much better to have have it so the server just returns back I'm sorry I can't serve you a request right now it'll just timeout essentially and timing out for some users basically some requests will be fulfilled some of them won't is a much better situation than all requests being shut down because the entire server is down which was what will happen if you set this too high so this is basically the end of configuring software that you've got so we've got PHP we've got my sequel and apache the typical configurations that you have in things you can do to help them do we have any questions on this segment question right there oh the side the actual memory allotment of httpd can you bring it down basically the amount of size that that it requires is basically on the amount of memory that actually takes to build and execute Drupal that's what you actually see inside of the HTTP HTTP d size is basically almost entirely PHP and the things that PHP is needing to execute so generally the more modules you have installed on your site and the amount of code that it needs to execute increases the size of httpd so the way to improve it or make it smaller essentially is to use fewer modules it sounds sounds kind of crazy but I'm the application layer the more complex your application is the more memory it requires PHP to actually execute that code so it's something you don't hear a lot in the in the dribble world but sometimes you really need to look at the amount of modules and the amount of code that you're both executing because it's big application and with things like views and panels those are big applications and then things that you might add on top of those that just can really balloon the size of Drupal itself just purely the amount of code so the only there's there's no magic bullet to shrinking the the memory size of individual httpd requests or if you're running fast cgi PHP actually runs as its own service and it will then inflate the amount of code that needs to execute so the only thing you can really do is optimize your application and use only code that you actually need the question was how do you measure that value mm-hmm so if you've connected to the server usually through a terminal of some kind like SSH I say from top here and I made a poor assumption everybody knows the top is top essentially basically is like a the process manager that just lets you know what processes are currently running on your machine and if you type in top on a command line it will show you by default to show you things that are using a lot of CPU but I'll also show you each individual process is consuming this amount of memory and this amount of CPU and you can just look at all of them you could actually sort by memory usage by pressing the letter M it will sort by memory usage and then just take an average of what apache is currently using right now and calculate the total from that a little bit one question in the back as well gonna do by turning a beast at war describe what ABC status its caching everything but that's waiting checks files they've been modified hmm turn that off and then it never gets to disk but a 12 inch of you code you have to clear the cache but I guess my question is have you used it and we see an identification work you can consider question was regarding a PC status setting within a PC and is mucking with that particular setting going to give you any performance benefit yeah so you're right in your you described it quite adequately that it basically is a feature that makes it so that every time you update a PHP file a PC basically is keeping track of whether or not that file has been changed which is why i was saying a pc is so nice that it's like in its pretty invisible you just install it you never have to think about it again because every time you update your code if you deploy new changes to this I a PC automatically figures out if that code has been modified or not and flushes its caches appropriately and rereads it from disk and a PC stat is a way of making it so that it doesn't essentially do that check so I've never actually actually used it i've never actually implemented it because i find that i like a pc to be so it's so convenient the way that it works by default that the convenience there is is really high and the actual performance advantage of it unfortunately I haven't benchmarked it so I don't know what kind of benefit it's actually giving you but yeah thanks for bringing that up that's a really really good question okay so now we've talked about finding the problems with your site and and solving them just by doing a little bit of regular benchmarking we've talked about turning on the page cache that was step two and then we've talked about configuring the existing software that you have and I made the assumption that people are running my sequel apache and PHP after you've gotten to this point now now is the time that you can actually start looking at adding additional software to your server and this is really where the performance numbers can just go absolutely crazy like in the things that you can do so the first thing in my talk about is memcached and memcache the D at the end I guess is for Damon so some people say memcache memcached for the actual Damon itself when it's running memcache is essentially a way of a alleviating load on your database server where instead of doing my seagull queries to store stuff in the database memcache is an object store or a key value store that allows you to store things directly in memory so common things that may be stored in in memory if you turn on the page cache the page cache is literally the beginning of the HTML file to the bottom of the HTML file and then Drupal assembly takes that and shoves it into the database and the next time that page is requested dribble just says oh for this URL get the page cache and it will load it from the database and then and then return it back to the user if you're using memcache memcache will replace every single table in the database that starts with the word cash cash the actual cash table cache page cache filter every individual cash table instead of being stored in the database it's now stored just directly in memory and it makes this other one an anonymous user comes to your page triple starts up hopefully you're using a PC so a PC is read from memory or all the files are read from memory and then dribble starts up and it the very first thing it does so the page cache turned on it checks to see if that can that page has already been cached before in the last couple of minutes and if you're using memcache that's from memory too so you'd have dribble load the page cache and pulled from memcache and ultimately what you're striving for essentially is a disk free page server you want to be able to serve that page without touching the disk at all and memcache essentially is a really good tool for furthering that goal the great thing about memcache also is that memcache also has a huge effect for authenticated users because there are a lot of caches that dribble has that that are like individual parts of the page like if you use the block cash or views module allows you to cache the actual rendering of any listing and whenever use any of those caches they all go into whatever cash system you're currently using and memcache essentially basically replaces all of the existing caches there and instead of storing all those caches in the database they're all stored in memory which is really really fast so when it comes to memcache you use have to do this terrible thing in older versions of the memcache module you need to start an instance of memcache for every individual cash table you had on your site that was a crazy pain newer versions of the memcache module do it correctly and they simply use one big bin essentially one Damon that is started up on the server and then all of the cash tables are stored in that one individual bin which is much more convenient so 128 megabytes is a good start so when you turn on a PC you get or when you turn on memcache you get to define how much space it will take up and 128 megabytes is a good start for Drupal site just like the APC statistics memcache comes with a file also called memcache PHP I met I didn't mention it in my last one but almost all servers have that APC dot PHP file and this memcached a PHP file if you've installed memcache you probably have that file somewhere on your server but where it is depends on the architecture and the way you decided to install memcache in a pc so i always just tell people just run a find command on your entire server and you'll find it somewhere it's there it's crazy it's on everybody server but nobody knows it's there that's always in weird locations so you can find that mem cache file and just copy it to your web root and then load it up in the browser and it will tell you information like this or if you're if you don't have it on your server or if you just don't want to wait for the find command to search your entire machine you can just use W get which is essentially like a just a download command to pull that file directly from the memcache project or the APC project so this will actually just pull it down put this file in here into your web directory and then go look at it in a browser and this is the same thing where you can look to see how much memory memcache currently has available and you always want to have memcache have a slice of green space available if the cache is entirely full that means that it's disposing of things that are still useful to make room for other things and that's when you'll see your hit rate absolutely plummet unlike a pc though the new memcache module doesn't have a super high hit rate so seeing something like this where even if you were to get all the way down to like a fifty percent hit rate it's nothing to really worry about that's just the way the module works but what you do always want to make sure you always want to make sure that memcache still has a big chunk of free memory available if it's fill all the way up a lot memcache more memory and then restart the service so that it can fill up and then have space available so room to grow and not throwing away stuff that's still useful so memcache absolutely great and when it comes to performance generally like the first piece of software that we add to add two sites mostly because it's very versatile it works for both authenticated users and anonymous users and it gives big boosts to both of them so search search is one of those things on a Drupal site especially using just the search module that comes with core search is really really CPU intensive it's a really expensive operation to do a search and if you're using the search module that comes with Drupal core you can get yourself into a nasty situation where if users are executing a lot of searches it's taking a lot of CPU cycles and a lot of database connections to execute those searches and because users are searching so much and the database or the web server are all the same thing all of those searches actually end up taking your server down so people by executing along searches it's a really likely way for your server to come down Apache Solr is a separate project it's important to know that it's PHP based it's not even like Apache like http-based it's a java application essentially and that means that in order to run it you need to install Tomcat or some other Java server to actually use it so solar unfortunately it's a little bit of a barrier to install but it's really really great in terms of what it can do it's wildly more efficient than Drupal based database searches and my favorite thing about it is that you can put it on a completely separate box and I really recommend that if you've got multiple machines available or if you're using cloud servers that you should put a patchy solar on a separate box than your web server that makes it so that if people are executing a lot of searches and that's a really popular feature on your site the worst thing that can happen is that search stops working because as people are executing so many searches the search server goes down but the rest of the website stays up and having that decoupling between your search and the rest of your site is a really good thing so there are two ways that you can use apache solr once you have it installed on your server there are two projects out there there's the apache solr project and there's the Search API project and both of them are excellent which is really great that we've got two systems that are really fantastic Apache Solr is more like a out of the box everything just kind of works module and search api is basically like a starter kit that you can sell a whole bunch more modules on it's more versatile and it has more capabilities and if apache solr doesn't have the features you need consider looking at search api another really cool thing about apache solr is that it doesn't even need to be even on your local infrastructure doing a search actually across the internet actually it can often times be faster than if you were doing them my sequel based search Drupal on a local database and ah quia actually has a search product that you can purchase independently of everything else and basically just purchase bandwidth from aqueous cluster of Apache Solr boxes which is really great because that is a really high capacity to be considering they serve a huge number of people they have a very nice cluster of servers that all operate all of that all together for you so if you don't want to install it consider a subscription it's a good idea so varnish I love varnish varnish is an application that you stick in front of Apache essentially so what is capable of doing is it sits in front of a patchy you just have a patchy on your server most people are using it or you have something else you can actually stick varnish in front of anything it doesn't matter you you make varnish sit in front of that application and you have it except the HTTP requests so you put it on port 80 and then you put Apache or whatever you're using as your web server on the back end on a different port and varnish essentially acts as an intermediary it takes the requests if it has a version in cash it'll just return that version back immediately if it doesn't have it then it will ask the web server on the other port for a copy of that page and then it will remember it from from then on out and so varnish particularly is very powerful for anonymous users most sites have this crazy imbalance of registered versus anonymous users like ninety-nine percent of all requests are almost always on not anonymous users even if you think you have a very social site like drupal.org actually drupal.org still doesn't push like twenty percent authenticated it's still mostly anonymous traffic even on a site that has a lot of user-generated activity and varnish essentially as a tool that makes it so that that page cache that you may use in Drupal it makes it so the page cache is basically stored outside of Drupal it's stored in an application that its entire purpose is storing stuff in memory and saying hey do already have this page in cash and if so return it back immediately so it's basically a dedicated application for serving up cached pages and it is ridiculously fast the way that you need to make it work with Drupal though is that you typically configure it so that if the user has a session or cookie it will pass through varnish and then go to Apache and just load the page normally and typically to set this up there's a lot of configuration that can be done and lullabies written several articles on actually how to configure varnish for Drupal that I suggest icing just just doing a google for mullah bought varnish and you'll find the articles to make it so that if a if a user has a session then they pass through to Apache normally and everything continues functioning so it's really fantastic it's basically just a piece of software they put in front of everything else and it handles pretty much all of your anonymous requests and even for authenticated users it can free up a lot of activity because it still serves up all of your CSS all of your JavaScript all of your images can all be stored it can all be served from varnish and you specifically only allow Apache or your web server to generate things that need to be dynamic so it really takes a huge amount of work off of Apache and the only thing Apache needs to work on or your web server needs to work on is generating PHP pages essentially and varnish takes everything else so once you install this you'll see your traffic to your web server just absolutely plummet because it's not worrying about serving up all of these other things the only thing it's responsible for now is to HTML which is just great so I was going to talk about oops so I also mentioned here that we have a cookie based skipping of whether or not varnish is responsible for the page press fellow comes of the module dribble core and Drupal 7 doesn't have a module at all and I haven't actually yet seen a module that does this so I threw it up in a sandbox yesterday for everybody to use in the class and this is our cash buster module that we install on all of our sites that run varnish and essentially what it does is if a user submits a form it sets a cookie called no cash and that no cash cookie we usually configure varnish so if no cash cookie exists garnish will not catch the page so the general idea here is basically that anytime user submits a form they become sort of like a special anonymous user like if they login it sets the no cookiecutter that the no cash cookie and it makes it so that all of their pages are now being generated by apache but for an anonymous user basically this grants you sort of a special privilege as an anonymous user who is interacted with the site for example if you're an anonymous user that posted a comment or an anonymous user that submitted a poll you may want that person's comment to be visible immediately to that user but it doesn't really matter that everybody else on the internet doesn't get to see that users comment for the next two minutes you know that's not a big deal the user wants to see their comment immediately about the fact that it will take a minute or two for everybody else on the site to see their comment usually isn't a big deal so this sets a no cache cookie that makes it so that varnish basically says oh for the next minute I'm going to pass your request through to Apache and so that way you can see your own comment and then after a minute the varnish cash will expire and then everybody will be getting the anonymous version again including you and then after that minute you're getting from varnish instead of Apache again so a little bit complicated but if you're having trouble like anonymous users not seeing their own changes it's because of varnish serving up the anonymous version of that page without including their change and then you have to wait for it to expire before you you see it so let's look at the whole stack so this is basically in order of who's going to be taking the requests if you have a purely anonymous user almost always they'll only get to varnish and the entire rest of your stack will never ever be touched which is great this is what I was talking about a no disk access required request which is what you're always striving for have varnished store everything in memory a request comes in basically takes the request and immediately returns it back if you have that no cache cookie with that special module enabled you a lot of some of your requests may pass through to the Apache PHP and memcache layers and less of your information will get to the slave which will do all the read requests and even few of your queries will actually get to the mastery server and this is what i was saying that the master server essentially is the one thing that you want to protect in your entire infrastructure because it's the only thing that you can't just add another machine to if you're in a situation where you're you're reaching cpu capacity on on your web servers you just add more web servers it's great but you can't do that with my sequel with the master server at least with Drupal and so what you want to do is you try to like defend that serve as much as possible hmm we're running short on time and there's just so much more to talk about so I'm gonna have to like breeze through this next section which is too bad because I love talking about this stuff engine X is totally awesome and it's like it's like the new hotness and server software engine X is basically a combination of Apache and varnish put together where it is your web server and it is your reverse proxy cache all together the greatest thing about varnish is that you just take your existing infrastructure and you slap varnish in front and you don't have to change anything engine X if you're starting from scratch is a really good option for providing all the capability and speed of varnish and the ability to serve up HTTP pages like Apache all too there in one piece of software so it's really really pretty awesome if you're starting from scratch you may consider engine X but really in terms of speed varnish still has like the slight advantage when it comes to anonymous users so let's get to the smack down which is this as well I spent like all week putting together and it's so sad that it's at the very end of my presentation that I almost don't even get to talk about it so Apache Apache straight up this is with no configuration on Patchi this is typically like what you get on like a shared host no APC no memcache nothing if you're having Apache generate Drupal pages you'll probably get something around this level of performance 3.4 requests per second really kind of pathetic these numbers are all generated with a be Apache bench so really your numbers actually may vary depending on the actual setup of your server but as far as comparing them to against each other the numbers are going to be pretty accurate turn in terms of like the ratio of what kind of BIM performance improvements you see so out of the box you get about three and a half requests per second for Apache and turning on a PC like i mentioned basically gives you like a roughly doubling of the speed that you get because it's not reading everything from disk on every every request so a PC a nice double doubling here if you install memcache memcache roughly a doubling also and this really depends on how you've configured Drupal it may not be this drastic this is without the page cache turned on when you turn on the page cache and you have memcache in front of it you get 250 requests per second right and that's actually this is really respectable you can run any site typically on the on that kind of number so let's talk about engine X versus Apache so this is on all indicated user so this last slide all right talk about you had memcache enabled so 12 requests per second essentially without the page cache once again here on this slide that same 12 requests per second so this is an authenticated user without the page cache turned on Apache vs. engine X engine X has a slight advantage in the smack down here that it can it can outperform Apache slightly so it's if you have the ability install index index is really got the advantage just about all around the real advantage here is in the land of HTTPS or secure connections for logged in users this day and age with things like firesheep out there that a lot of people that immediately hijack other people's sessions if they're on a Wi-Fi network you really should consider any time user logs into your site no matter what you put them on HTTPS you really should be looking at that in this in this current day where there are so many tools to allow people to hijack sessions so engine X has a major advantage here in terms of the ratio of how much performance loss you have by doing all of that encryption the majority of the work when you're doing an HTTPS transaction is the initial handshake where the browser sends over since over key the server generates a key and then sends it back to the user and then they use those keys to communicate to each other that initial handshake process is the majority of slowdowns that you see with HTTPS and engine X has a pretty major advantage here and that it's handshaking process is just a lot faster than Apache so engine X really really pretty awesome especially for authenticated users and encrypted traffic so now let's talk about if you have varnish or engine X in front of you or actually I haven't even mentioned this and this is a really good thing for people that are on shared hosts or in situation in which they can't run they can't install their own like real reverse proxy server like varnish the boost module is pretty awesome it's it's really gimmicky and performance geeks like me like to make fun of it that it's not real reverse proxy cache but all of the numbers we've been talking about so far we got like what like 250 requests per second if those are best number starting with the boost module if you just have a patchy serving up static HTML files which is what the boost module does takes pages turns them into static HTML files and then puts them on your server you're getting 1500 requests per second crazy fast Apaches actually not very it's not that bad actually if it's just serving up static HTML it's pretty good at that really and 1500 requests per second is definitely nothing to sneeze at but compared to real performance software like engine X you're talking crazy fast so 6500 requests per second that is just insanely fast or varnish eight thousand requests per second it's just ridiculous like I mean at this level you're really talking about saturating your network connection somewhere it depends on the amount of data you're actually transferring or the size of the files you're transferring but somewhere around like the 2000 and 3000 mark you'll saturate your network connection anyway so these things are just so fast that you should always saturate your network connection so whether that's like 12 megabytes per second on 100 megabit connection or 72 megabytes per second on a one gigabit ethernet connection you should be able to saturate your network connection before your server starts having high CPU load if you're if you're only serving anonymous users and you're seeing a lot of like your server goes down because of high traffic that should never happen for anonymous users you should always saturate your network connection and the worst thing that should happen is the site search slowed slowly as network connections can't get through if you're seeing any kind of situation where your bottlenecked on CPU or memory there are things you can do like any of the solutions here for your anonymous users that make it so that you should saturate that and we're connection way before you have any CPU or memory problems so totally just about entirely out of time but I want to talk about scaling out like I mentioned a great thing about web servers is that you've always just add more of them but as soon as you have more than one web server you need to start load balancing between them so distributing all the requests between them varnish actually has a really fantastic ability to load balance also so you can actually stick a varnish box in front and have it load balance between multiple Apache servers typically if you split out your web servers that also means that you need to split out the database so the database lives on its own box also we like to put memcache on the database server just to have it be a central store of information and something that a lot of people don't think about but it's absolutely critical is all your files need to live in the same place like if user uploads a file so they upload a file on to server one and then they save the node and then they get server to that doesn't have that file on it so typically what you'll do is you'll use some kind of network attached storage or Nass or sand to basically share all your files between all of your servers so they're usually connected via an NFS mount so the NFS mount you may storage your bowl on the NFS mountain you may just store the files user upload on the NFS mount and then hopefully you've been able to separate out your search on to its own server and then if you're working in a high-availability set up like a lot about opportunist is facing you need to add backups on every single layer so you've got backups for solar you've got backups for my sequel nice thing is you can often turn your back up my sequel into a slave and generate slave queries against it and have load balancer back up the web servers themselves are are backed up because they're already redundant so you've already got multiple servers there to serve out the load so who I've got more slides yet but I just don't have the time to talk about them I'm of I'm just going to show you if you have a slave server i mentioned that select queries you can make all select and ranges go against the slave and this particular slide just shows you any time you have a view that view can be a listing and when it comes to looking at things if you look at the entire page the amount of content that you can pull from the slaves are instead the master server is pretty significant all of these parts of the page can all be queried against the slave server instead of the master so if you're if you're running up against my sequel issues slave servers can really aid in fixing that some links for you guys a lot about com all of the software that I mentioned we've written articles on all of them on how to install them on multiple architectures on Mac OS on sent OS on TBN we've written articles on a lot of these pieces of software and how to install them on all of them to bits is a really great website a friend of ours that is really good at basically squeaking performance out of really small boxes so it's like the ultimate site for I only have one machine and only has 512 megabytes of memory how can I get the most out of it to bits has got a lot of fantastic articles my single performance blog a couple of the links earlier that we're shortened are from there and then these two down here at the bottom if all of this sounds like it's way too much work for you and you're like oh man all of this software that all sounds really hard and it's tough to install that our friends at Pantheon and cadre both have managed support that just has all of the software installed for you out of the box and it's basically you go there and there drupal people and they have stacks purpose special purpose built with all of the software talks about all together on one box for you and so all you have to do is basically you know pay your monthly fee and throw your drupal site up there and it will be crazy fast so an option for hosting instead of instead of doing it yourself so am i all totally out of time well no time for questions but corner me and and we can talk thank you guys very much
Info
Channel: Drupalize.Me
Views: 5,912
Rating: 4.8431373 out of 5
Keywords: drupal, drush, performance, scalability, Nate Haug, quicksketch, DIWD, Drupalize.Me, high traffic sites, Do it with Drupal, Optimizating, optimizing a drupal site, how to optimize, configuring apache, configuring PHP, MySQL configuration, memcache, increasing site speed, using Varnish, nginx, Drupal tutorial, Drupal learning, learn drupal, Drupal tips, Drupal step by step, Drupal videos, Drupal video, Lullabot
Id: o6PoZi-HEPw
Channel Id: undefined
Length: 74min 38sec (4478 seconds)
Published: Thu Dec 15 2011
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.