Slow Web Server - Troubleshooting and Debugging Techniques

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] a user has alerted us that one of the web servers in our company is being slow and we need to figure out what's going on let's start by navigating to the website and loading the page okay we see that the page loads it seems to be a little slow but it's hard to measure this on our own let's use a tool called a b which stands for apache benchmark tool to figure out how slow it is we'll run a b n 500 to get the average timing of 500 requests and then pass our site.example.com for the measurement this tool is super useful for checking if a website is behaving as expected or not it will make a bunch of requests and summarize the results once it's done here we're asking for it to do 500 requests to our website there are a lot more options that we could pass like how many requests we want the program to do at the same time or if the test should finish after timeout even if not all requests completed we're making 500 requests so that we can get an average of how long things are taking once the test finishes we can look at the data and decide if it's actually slow or not alright the tool has finished running the 500 requests and we see that the mean time per request was 155 milliseconds while this is not a super huge number it's definitely more than what we'd expect for such a simple website it seems that something is going on with the web server and we need to investigate further let's connect to the web server and check out what's going on we'll start by looking at the output of top and see if there's anything suspicious there hmm we see that there's a bunch of ffm peg processes running which are basically using all the available cpu see those load numbers 30 is definitely not normal remember that the load average on linux shows how much time a processor is busy in a given minute with one meaning it was busy for the whole minute this computer has two processors so any number above 2 means that it's overloaded during each minute there were more processes waiting for processor time than the processor had to give this ffmpeg program is used for video transcoding which means converting files from one video format to another this is a cpu intensive process and seems like the likely culprit for our server being overloaded so what can we do one thing we can try is to change the processes priorities so that the web server takes precedence the process priorities in linux are so that the lower the number the higher the priority typical numbers go from 0 to 19. by default processes start with a priority of zero but we can change that using the nice and re-nice commands we'll use nice for starting a process with a different priority and re-nice for changing the priority of a process that's already running okay let's exit top with queue and change the priorities we want to run re-nice for all the ffmpeg processes that are running right now we could do this one by one but it would be manual error prone and super boring instead we can use a quick line of shell script to do this for us for that we'll use the pid of command that receives the process name and returns all the process ids that have that name we'll iterate over the output of the pid of command with a for loop and then call renice for each of the process ids renis takes the new priority as the first argument and the process id to change as the second one in our case we'll want the lowest possible priority which is 19 so we'll call for pid in dollar sign parentheses pit of ffm peg close parentheses semicolon do re nice sign pid semicolon done all right we see that the priorities for those processes were updated let's run our benchmarking software again and check out if it made any difference okay it's running once again we'll need to wait until the 500 requests are done and check out the new mean time per request value this time the mean time is 153 milliseconds it doesn't seem like our re-nice helped apparently the os is still giving these ffm peg processes way too much processor time and our website is still slow what else can we do these transcoding processes are cpu intensive and running them in parallel is overloading the computer so one thing we could do is modify whatever's triggering them to run them one after the other instead of all at the same time to do that we'll need to find out how these processes got started first we'll look at the output of the ps command to get some more information about the processes we'll call psax which shows us all the running processes on the computer and will connect the output of the command to less to be able to scroll through it now we'll look for the ffmpeg process using slash which is the search key when using less okay we see that there are a bunch of ffmpeg processes that are converting videos from the webm format to the mp4 format we don't know where these videos are on the hard drive we can try using the locate command to see if we can find them we'll first exit the less interface with queue and then call locate static 001.webm we see that the static directory is located in the server deploy videos directory let's change into that directory and see what we find there's a bunch of files here we could check them all one by one to see if one of them contained a call to ffmpeg but that sounds like a lot of manual work instead let's use grep to check if any of these files contains a call to ffmpeg so we see that there's a couple of mentions in the deploy.sh file let's take a look at that one since we're connecting to the server remotely we can't open the file using a graphical editor we need to use a command line editor instead we'll use vim in this case we see that this script is starting the ffmpeg processes in parallel using a tool called daemonize that runs each program separately as if it were a daemon this might be okay if we only need to convert a couple of videos but launching one separate process for each of the videos in the static directory is overloading our server so we want to change this to run only one video conversion process at a time we'll do that by simply deleting the daemonize part and keeping the part that calls ffmpeg then save and exit all right we've modified the file but this won't change the processes that are already running we want to stop these processes but not cancel them completely as doing so would mean that the videos being converted right now will be incomplete so we'll use the kill all command with the dash stop flag which sends a stop signal but doesn't kill the processes completely we now want to run these processes one at a time how can we do that we could send the cont signal to one of them wait till it's done and then send it to the next one but that's a lot of manual work can we automate it yes but it's a little tricky so pay close attention we can iterate through the list of processes using the same for loop with the pit of command that we used earlier inside the for loop we want to send the comp signal and then wait until the process is done unfortunately there's no command to wait until the process finishes but we can create a while loop that sends the comp signal to the process this will succeed as long as the process exists and fails once the process goes away inside this while loop we'll simply add a call to sleep one to wait one second until the next check okay now our server is running one ffmpeg process at a time let's try our benchmark once more the mean time is now 33 milliseconds that's much lower than before we've managed to get our web server to reply promptly to the request again we've mentioned a few different approaches that we can take when we can't fix the code like re-nicing the processes or running them one after the other when that doesn't help in our next few videos we'll talk about how to improve performance by fixing your code but before that there's a reading to put all the resources we mentioned in one place and then a quick quiz to check if everything is making sense
Info
Channel: Huynh Son Ca
Views: 1,633
Rating: undefined out of 5
Keywords: troubleshooting and debugging techniques!, troubleshooting & debugging techniques, troubleshooting, tracing and troubleshooting nginx, debugging, sql server troubleshooting, troubleshooting application server, websphere application server troubleshooting, application server troubleshooting explained
Id: qaVga8JNg2o
Channel Id: undefined
Length: 10min 30sec (630 seconds)
Published: Tue Nov 10 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.