Install RAPIDS into Windows WSL for Amazing Pandas Performance

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Nvidia today released two Jupiter workbooks that really demonstrate some of the amazing power that you can get from the Rapids who DF replacement for pandas we're gonna go ahead and install Rapids on a Windows workstation real quick and then we will jump right into looking at what the capabilities are after this video I'm gonna do two additional videos where I go through each of these two workbooks and show some of the things that you can really specifically make use of Rapids for so without further Ado let's go right to the web page here we're going to click the install link and then we're going to go down to here for the operating system we're going to choose wsl2 note the limitations these are more limitations of wsl2 only one GPU and direct GPU storage not supported so we're going to go through and do this the preferred method with conda now I've got a video that shows you how to do up through these first and these first steps here and I have a link to it it goes a step further even and installs tensorflow but you don't need to install tensorflow unless you want tensorflow so I'm going to assume that you've already installed the latest Nvidia driver that you've installed WSL to then and that you have installed minaconda into wsl2 like I said refer to that other video if you need help installing those it's a surprisingly straightforward process they've improved this tremendously I'm beginning to get more and more of a fan of wsl2 so go Microsoft let's pop open our Powershell window here and I'm just going to go right into WSL two you should have python already installed minaconda okay Python's installed this is good we're going to use the conda method it's going to create a conda environment for you to make use of so I like using the release selector here it's a lot more straightforward this reminds me a lot of how you install pytorch I am going to accept all the defaults I'm going to also install Jupiter lab and heck you could even throw tensorflow in if you so desired so I am going to install I'm going to add it I might use it in a future video so let's go ahead and I'm going to take this copy and paste it and then we're going to go into the wsl2 it did wonderful it did grab a little bit of extra stuff here so make sure you just get where it starts at Honda there we go now we've got exactly here what we had here so let's run this see what it does we will fast forward through this this does take a while there's a number of things getting installed here and while it's downloading let's have a look at some of the Nvidia benchmarks just looking at some of the common operations that you're performing in data pre-processing you can see the tremendous speed up over CPU merge where in particular I mean you're searching all the time doing that sort of thing and these notebooks provided by Nvidia demonstrate this quite well and some of the key use cases are real time exploratory data analysis and time series processing I know there's a project that I've done some research on that I'm really looking forward to trying this out on because it's time series in the medical domain and we'll take a look and see what that looks like in a future video it's really super easy to throw this in where you already have pandas many of the pandas Replacements require you to recode from scratch here you can see that you had pandas in previously and you simply replace with qdf all right I bet it's installed by now and we're up and running first I'm going to move into my downloads directory where I pulled those two workbooks into we'll activate it okay launching Jupiter lab and then you just control click one of these and we are in Jupiter lab and here we can see the two I'm just going to go to the time series one first I'm gonna do another entire video on each of these two notebooks that Nvidia put out they're really pretty neat you can see the the GPU time CPU time and the speed up it's quite substantial I don't know that I'm going to necessarily get the same results here I am not at all setting this up for a benchmark for a formal benchmark heck we've got screen recording software going on we've got Windows going on I'm sure these are probably Linux times this is also using a earlier generation GPU than mine as well so who knows but not a formal benchmark we're going to run this part it's going to download these files this takes a little bit of time these are big files so we'll fast forward through this okay that is done let's go ahead and untar them you can see these are pretty big files nearly a gig compressed okay and they're unzipped let's list them this should look basically the same except it's me that owns them now and you'll see they're 3.2 ish gigabytes so a little shy of 10 gigabytes worth of data definitely enough to blow out my 12 gigabyte laptop GPU let's run this okay the Imports are done this is the initial warm-up that they do for their Benchmark we'll go ahead and run through this all right that's completed notice we have 28 gig of my GPU used this GPU is also running the video display too I might add let's run this part here just to give us the data that we can see that it's present and we're going to save the concatenated file to CSV that'll probably take a bit but now we have one big file that has it all there okay now we've got one big file and we are going to reset the kernel that'll release all the GPU memory again they set this up for a benchmark so we'll do that we're going to use Nvidia SMI yeah Kernel's restarting that's what I told you to do so we're going to do Nvidia SMI and this is what they got their GPU was around 30 idle uh six six megabytes so not not being used tremendously and 48 gig which is 4 and a 6000. so let's run SMI a little bit more memory being used by mine but I am running the GUI on this as well and that probably accounts for the additional seven percent utilization of the GPU we'll bring Rapids online and this is reporting about the same thing in the GUI going to read the data and you can see the GPU getting its memory utilized okay it's loaded back in we need to convert the date column this is very common that was like quite quick and we can see that the date column has been converted and this is the shape of the data set it's quite quite big 127 million rows eight columns and just print the helps to run the cells in order and then we run this showing you the total range of dates that that covered run that get the tail and here we're doing our first selection we're doing a start time and an end time and notice we are using this the CP the coup High and so we'll run this this is our first real real query and instant I won't do any fast forwards for this you really don't need them with Rapids hence the name and there's the values and they note that the indices are not continuous and that's because we've done a selection not everything's there anymore and we can do kind of a count of missing columns let's do something a little more complicated we're going to resample and group time series data this can be quite slow in traditional pandas we're going to make the date the index and we're going to resample these to days and perform a group buy so we're running this part instant that's that's really cool and run the head and some so you can see by using Rapids the I mean that I would expect to be pretty quick but it it goes by quite quickly other types of filtering and other things also go pretty quick now this part went slow that's the import that doesn't count and then we we run this [Music] and It produced the dashboard of the maximum temperature per day really very very quick there again no fast forwarding occurred perform a similar on the mean temperature very very fast this is great for exploratory time series data analysis I'm going to take a closer look really at this workbook look specifically at what it's doing especially some of the functions that they're adding Beyond pandas which are really not many this is really very close to Orthodox pandas code thank you for watching the video and make sure you subscribe so you see even more with thank you for watching and make sure you subscribe to see more with the amazing Nvidia GTX 6000 Ada and also Rapids
Info
Channel: Jeff Heaton
Views: 4,773
Rating: undefined out of 5
Keywords:
Id: _9s14cKIpNE
Channel Id: undefined
Length: 11min 11sec (671 seconds)
Published: Tue Mar 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.