Spark Installation | PySpark Installation | Windows 10 / 11 | Step by Step |#spark #interview

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back to the channel so in this video we will not be solving any SQL ppar pandas or DSA problem in this video we will be uh setting up the ppar in our local so that we can start practicing uh ppar in our local system or build projects right so for ppar what we need we need Java python right Java Python and then spark files it and then we we need a wi util file so that we can execute in our local right and then for coding we need py Cham so these five things we required and uh you can find the download link in the description section right so once you click on the let's start uh one by one so for Java uh what we need to do we need to just click on this link uh I have provided all these links in the description section so you just click on the link Java link and you will be redirected to this page right for Java uh for spark we can use any of the three ver uh Java jdks either 8 11 or 17 for this we are using 17 right so you if you based on your system like if it is Linux windows or like Mac OS or Windows um you just uh click on the installer so since my uh laptop is of windows so I'm using this I just click on this link and it will start downloading so just follow the conventional path and uh it will be downloaded in our system the second thing is uh once the Java is downloaded now the next thing is we need a python right so just click on that link from the description section you will be redirected to the official website of python and current version of python is like 3 . 12.1 but for this like for the stability uh we are using this version uh V Point 11.4 like I have downloaded yeah 3.11.4 so based on your system if it is 32bit click on this installer if it is 64 click on this installer and it will be uh installed in your system uh download in your system you just need to click uh down install and it will like uh install in your system also and while installing just click on that add to path variable also so you if you check that box uh you will get one one popup and there you just need to uh click on one uh checkbox if you click on that then it will add the path also in the environment variable you don't have to do it manually right so do this one so these two are done right now the SEC third thing is we need a spark spark folder right so just click on this link and you will be redirected to the spark official website and just click the first link if you click this uh it will start downloading right it will start downloading so once the download is done uh you will see a folder like this you will see a folder like this right uh you will see a g file so what you need to do uh before this before uh doing anything over here just go in your C drive and create a folder named spark you can see here right create a folder named spark and then go back again to the folder and just uh like unip it and uh just control X this file this normal file not the jipped one right contr X and put this file in the spark folder okay put this file in the spark folder like how I have done got it so your spark is also done right what's next so the next step is wi ues right wi ues so just click on the GitHub link which I have provided you will be redirected to this page where we have all Hadoop version bin files so uh you cannot like for this we are using this Hardo 3 uh 3 3.5 and we just need this file wi U but you cannot directly like uh download this one got it so what you will have to do you will have to just uh jip this file like download this jip file and once you click on this button you will get this thing uh wi util Masters okay so once you d uh like unip it um you will have this unji file and you will have all the versions right from here go to the version which you want like here I need 3.3.5 so I'm going here and then in the bin folder I need this win util files okay just control uh copy paste it just copy it and then again come back to here again come back to in your spark folder and create one folder named Hadoop create a folder named Hadoop okay in Hadoop again like when you come to this uh folder you will not have the Hado right you you will have to create so create a Hado folder and then create a bin inside a hadu and in bin paste paste this wi util file got it if you complete all these steps you are good to go with spark right you are good to go so we have completed these four steps now the next step is uh downloading the py cham so that we can start uh coding in our pyam ID in the local system so for the this uh just click on this link and you will be redirected to the official website of Jet brins and this is the Prof professional one like you will be only having the 30 days of trial but we need a free one right so just click on this download button of the Community Edition and this will start downloading by itself right so just follow the same conventional next next next and you will be good to go with the py cham also right so what next so we we have done all these things right so let's check the version also first like we have successfully downloaded the Java and python also right and we need to uh set up the environment variables to see the spark also right so before that let's check the Java and python version so how you can check the python and Java version you just need to type python hyen hyphen version and then Java hyphen hyphen version I will give you all these things okay in the description section so let me do in my system so open the CMD and uh open CMD and write Java hyen hyphen version okay you can see here like 17.0.0 I have and then similarly python python hyen version you will get the python also so these two are here in my system now the next step is uh we need to set the path like environment variable so that we can uh connect Java and Python and with spark right so what we need to do we need to have we will have to set these three Java home harop home and Spark so this is the like screenshot of how the environment variables are looking like in my system so let's go one by one let's create the Java home okay so for Java home we will have to reach to the Java path right where the Java is stored so for Java you will have to go in uh click on this PC and then go in program files and then you will see a folder named Java click on here and then click on here and you this is the uh Java home part so how to identify which one is the Java home wherever you see a bin folder right in whichever folder the bin folder is present that path is the Home Path right so just copy paste this one and open the environment variable just write environment variable and um click on here and then create a Java home the variable name would be Java home and then paste this value right the path got it what's next so we have we are done with the Java home now the next is Haro home right so before Haro let's do the Spar home so Spar home like the variable name would be Spar home and the value would be the path of the bin file right so let's go to that POA file we will come here this is spark and this is this and here bin is there so just copy paste this part so I have copied this and then came here and uh we just need to click here and this spark home and just copy paste that file path right so this is also done now what's next the next thing is we need a Hadoop home right so for Hadoop home also the variable name is Hado home like you will have to write like this only how I'm writing and the variable value would be uh this thing like we created howu folder like and in that we created a bin and then in bin we place that VIN util file right so we need to reach there so in spark we need to go to file and then we need to click on Hardo so this right this is the path for uh Hardo form so just copy paste this and here over here just uh click on how do home like uh you just need to click on new and then new after new uh create a variable and then paste the value right we we have created the home so what's next we have created the home so the next thing is we need to add this home to the path variable right we need to add these two home path variable so if you see here we have like percentage Java home percentage and then bin right similarly percentage Hadoop home percentage and then bin percentage Haro spark home and then bin right and then we have the at the end uh percentage path percentage right so similarly same way we have to do so this path will already be there right so you just need to paste all this uh three right all this one 2 three and this um okay let me um let me put down okay yeah then the fourth thing is this so your also yours environment variable should also be look like this like how this is looking over here right same way so once you completed till this point you can execute the spark in your uh local right how to check if uh everything is good so for that same thing you have to go you have to come here let's clear this and you just need to write spark spell to check if spark uh has been successfully installed and set up in your uh system or not so let's uh let this run right so if you see uh like on YouTube also a lot of videos are there where we are like they are teaching till this point like how to set up spark in the local but they are not teaching like yeah you can see here the spark version so that mean your spark is set up in the local the next step is how we can write the pispa code in the py cham right so that I will show you the next so uh if you see here we have already like downloaded the pyam right we have already downloaded with the pyam so in pyam we create a very uh environmental uh virtual environments right so if we want to execute the p p Spug every time and uh we don't want to like uh create a environmental variable virtual environmental variable every time like virtual environment every time so for that what we will have to do we will need to follow a little more steps so so for ppar we need to create one more uh variable named python path okay python path and in Python path we need to put three values right these three values these three values so let me uh demonstrate you so you just need to come here and and uh just click on new and then after clicking on new uh create a variable name python path okay and paste these value so what is this value like the spark home which you already created and then python so we are just and then uh semicolon semicolon means we are like creating multiple um uh things at the same time okay so it's like a termination Point okay so spark home and then home uh python right so you can see that folder over here so uh with spark home you will reach this uh till this point right you will reach this till this point and we need the python one right so that's why we are writing SL python uh and then the next one is uh we need this thing the pi4j we know right the pi4j helps connecting the Java and python right so um for this one you just need to paste this park home and then the rest part you you can copy from here the next uh the rest of the part you can copy from here you just need to double click on the python and then in the lib you can find this so you just need to copy paste this path right this path this path why this path because this path is already coming from the spark form right so once you done this and then at the end you just need to write this percentage python path percentage this thing right so once this is done just click on okay okay then click on okay and click on okay now you are good to go with the spark right so what's next we can check the U the P like you're good to go with the ppar also right so ppar is also done so let's uh create one uh ppar program in our system right so before till this point we have not done like you will get a lot of error I will demonstrate through all of them right so let's go here and uh let's create here one folder now this is the ppar uh thing okay big data right double click here and then open this in P right let this open okay this is open now right so uh what all uh issues which you can get let me demonstrate it through you so uh while uh like running if you get something like where uh you see python is not available or something like that so what you need to do you just uh need to write manage app execution alas is in your uh like the search bar and just go there once you come here just uncheck these two okay just uncheck these two so this is one error which you can get uh you can see here right and then the next error which I will show you right uh we have come here let's create a new virtual environment by creating a project let's give this py charm P charm uh ppar ppar demo right ppar demo okay this file already exist P spark new okay P spark new and environment variable let's say by spark YouTube right I'm just giving this name and uh this is done and let's create the environment uh like the virtual environment name is fpy right and just click on create this window right so this is done right so let's create a main file over here Main main is done right the next thing is uh let's bring a sample code so for sample code let's copy paste these things yeah we are creating a session and then let's copy paste this things this is the sample code right we don't need pandas and let's control X this things place them here right so if you see here we are we are able to write the pack directly if you if you would not have done this step now uh then you you you must have to like uh download every time the ppack like pip install ppack every time so this step is very important for that matter right so this is done right so if we uh let let's check what all things are already there installed so if you see here we have the ppack already right and the version is also same as SP right so that's why that step was very important else like for each and every U virtual environment you would have to create uh again uh you have to like install the ppar right and if you if you do that step like every time you just need to come here and create a new virtual environment that's it so if I execute this uh we will get one error right we will get one error because one thing is not set yet if you see here this um our environment is p Spar new right so right you I as I said you will get this error right Python 3 create this cannot file the system so for this one what what we need to do we need to copy paste these two uh lines we need to copy paste uh where is that um okay here we have so we just need to copy paste this line uh before the spark right and uh for this like we just need to uh import OS and uh import system right if you do this two line now you will never get the error right you will never get this error so what this done uh this just return the executable file that is this file um python. okay that I let's print Also let's print also print this right and let's again run this now we should not get the error and uh so if if you can see here this printed this is the path right this is taking from the uh virtual environment like the python exe now you're good to go now you can start building your project or start creating the transformation and everything in Sp right let this run yeah you can see the data frame right so we have successfully installed uh ppar in our system and we can start practicing the questions right so if you like this content please do like subscribe and share the channel and you can find the SQL ppar pandas BSA problem for the data engineers in the uh description section like uh we have a lot of stuff over uh on this YouTube channel so you can explore that also uh thank you so much
Info
Channel: DEwithDhairy
Views: 2,084
Rating: undefined out of 5
Keywords:
Id: jO9wZGEsPRo
Channel Id: undefined
Length: 21min 38sec (1298 seconds)
Published: Sun Dec 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.