Henry2: Basic HPC Workshop: Running Jobs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to part three of our introduction to basic HPC we're already very comfortable with basic Linux we understand the acceptable use policy and we've logged into the cluster additionally we've learned about storage options on Henry 2 and how to transfer files back and forth from Henry 2 to our local machines in this tutorial we are using R as the sample application an HPC has provided a sample source file called weather dot R now that we have our source code ready how do we run the application remember when you log into the cluster you use SSH after logging in you'll be on a log in node you can't run applications on the login node you have to run the applications on the compute node and to do that you need to use LS f LS F stands for load sharing facility and that is a specific job scheduling software a job scheduler does just that it schedules jobs your job consists of the program you want to run and information on what kind of hardware or software that you want to use to run it LS F is kind of like a waiter first it takes your order then the order goes to the kitchen where the chef's have everything pretty well organized there's a kitchen crew and they are cooking for the whole restaurant your order can be very simple something directly from the menu or you could give your waiter a long list of substitutes and food allergies but this time around we'll try to keep the order as standard as possible this basic LSF batch script is all you need to start running applications much in this template can be used as is you'll need to modify the highlighted items and we'll go over these now n is the number of cores required by the application for this next exercise we're going to keep it very simple choose one core W stands for wall clock time it should be set to the maximum time your code might take to run here we chose ten minutes if the code takes more than ten minutes the job will be killed by LSS in the next two lines you will set the environment for your application and you will run that application in this tutorial we will show you how to run an R script but you can just as easily try a MATLAB or a Python script here instead how do we set the environment before trying to run anything let's go back to Henry 2 and revisit environment variables here we are back on Henry to type echo home home is the environment variable that is defined as your home directory if you just logged in and type PWD you should be in your home directory PWD is an application it's an executable it's a program that prints the working directory our is our application but if we type it it says command not found most of the applications that users run on Henry 2 are not defined the commands will not be found when you use these applications you have to set the environment the preferred method of setting the environment is module load so I'm going to module load R and then if I type are good knows where R is to find the available modules on Henry to type module avail for available define the modules you've loaded during the suction type module list and if you don't want those in your environment anymore to module purge once we do module purge it can no longer find R so while can't sometimes find R and sometimes not that has to do with another environment variable called the path the path is a list of file paths where the computer looks for executables if you type in something it will go through all of these paths and see if there exists something and there doesn't but PWD it does find something so if we type which PWD it gives the full path this is the full path this is the executable user bin must be in our path and it's right there ok there's no R if we type which our command not found so what happens when we do module load R and type which are this is the path to our executable if it knows where the our executable is now it must have changed my path so let's do echo pi and here it is the module load command adds the path to the executable in this exercise you'll submit your first batch job remember always run from the scratch directory copy your code from the home directory to the scratch directory create a batch script that will run weather dot R you should run on one color and run for ten minutes look at the output the code should output a PDF file called weather dot PDF the LSF error file should be empty and the LSF output file should contain some temperatures and indication that LSF ran properly if you did get an error modify the script and resubmit please pause the video now to complete the exercise let's go ahead and submit our our job I'm in the home directory but I need to run from scratch directory CD / share / group / user and we'll copy the guide directory home user guide copy it here LS - L RT reverse time order that shows me the last thing that was modified was the guide CD and - guide and there is weather dot R I'm going to clear the screen we need to make a batch submission script so to create a file you use Nano submit CSH and let's look for a sample batch script go to the HPC website click resources software packages and scroll down to see if they have example our files and they do scroll down to batch scripts there's the batch script I'm gonna copy it and paste it in here you need to look at the file before you run it not just cut and paste so this first line is fine the instruction says that we should run it for 10 minutes W is for the wall clock we need 10 minutes one core use exclusive if memory intensive this is not memory intensive I'm going to comment that out I don't need to use exclusive output and the air files are fine load our our script my program so this runs my program our but we don't want to do that we want to run weather dot our so now this looks good I'm going to control X Y and enter to save let's go and submit it one more time check the website to see how the job can be submitted here it is I'm gonna copy that and let's paste it and enter Type B jobs to see if your job was submitted my job is in the pending State if I type B jobs again it's finished type LS minus LR T that shows reverse time order we have three new files the error file is empty there's a PDF file and an output file let's look at the output file less al dot two two and I'm going to tab complete it has the temperatures it has the LSF information and it appears to successfully completed type Q for quit the last step is to check that weather dot PDF is a valid PDF file so type file weather dot PDF file tells you what kind of file it is and it recognizes that it is a valid PDF document looks like we did it after waiting in the queue for hours for your job to run it crashes because of a typo running applications on a login node even for only a couple seconds to test a script is prohibited for this type of problem you can reserve a short debugging session on a computer in interactive mode to run the are interactively from a compute node open a terminal on your local desktop so that you can look at whether dot are back on Henry to request an interactive session on a compute request one cooler and ten minutes time make sure you are not on a login node and check the name of your node rename the previous PDF output file and start our paste the contents of weather dot are at the are prompt and then exit are after making sure a new PDF was created exit the interactive session before moving on check out how many cores there were on the compute node compared to how many there are on the login node please pause the video to complete this exercise Here I am back on Henry - I am where I left off from the previous exercise that means I'm in scratch directory in the died directory and if I do LS I see the source code the batch submission script the output PDF and the error and out from LSF on my local machine I am in the HPC demo folder and here is whether dot R and the contents of whether dot R are right here go back to Henry - and I'm going to request an interactive session on a computer with B sub - is 1cor is minus n1 wall clock time of 10 minutes and a shell tcsh and I've requested the session and now I'm on a computer so the name of my compute node let´s do echo hostname if I echo hostname I find the name of my node and to see 1-9 to rename weather dot R we're just going to move it it'll weather underscore batch PDF it's the PDF we made when we were working in batch mode LS minus LR T there is weather underscore batch let's start our Oh our command not found what do we have to do module load R so now we can open our there is our and we're gonna copy and paste the contents of the source code this is that this will be the same as typing it in so copy paste and then Flint to exit our you type Q R and paren don't save the workspace and there we are do LS - LRT and the PDF was created this is the one that we saved they have the same sizes and if we do file weather dot PDF it is a valid PDF to exit the interactive session type exit now this was our node name so I'm gonna copy it and LS hosts let me open this so and show you LS hosts it shows the name of every node on the cluster LS hosts and there we go these are all names of nodes let's find the one that we used so LS host grep and I'm gonna paste the node name and the question was how many cores are on this node here's the name of this node it's running Linux this is the model number and that's the number of cores on the node now we're on login oh three now that's a node so what if we do LS hosts correct login oh three this login node has 16 cores for more information about running jobs on the HPC click resources and software packages scroll down to the software package you are using once you have a sample batch script check the documentation on running jobs to customize the batch script this page shows several examples and use cases scroll up again and click the generic template for batch scripts this generic template contains a dictionary of the LSF options click on the option for more information if you need further assistance click support and contact us thanks for watching and I'll see you in the next section
Info
Channel: OIT HPC
Views: 2,038
Rating: 5 out of 5
Keywords: henry2, hpc, ncsu, workshop, tutorial, LSF, running jobs
Id: lnFvjnE1m5w
Channel Id: undefined
Length: 15min 40sec (940 seconds)
Published: Fri Mar 20 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.