How To Install Spark Pyspark in Windows 11 ,10 Locally

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to my channel the cloudbox and this is the new video inside the iso data engineering uh so in this video I'm going to discuss very important topic how we can install a spark on your on our local system so because why this is important let's say if you're working on a data bricks but uh somehow you do not want to pay some amount or you do not have a community version of datab breaks account but still you want to learn a spark how to work with a spark so we can set up we can set up all those spark thing in the local system let's say in Windows or Linux okay so before installing spark and before going to start working on a spark we need to set up a few things okay we need to install the Java we need to install the python we need to install the spark and we need to install the one Hadoop file okay and if you want to work with one editor so let's say we are going to install vs code and uh we need to set set up the involvement variable kind of a stuff so we'll go one by one uh let's say I'm just going to open the Java uh what we can do if you go to Java link so there are couple of java versions are there java 21 Java 17 Java 8 and Java 11 so spark work with Java 8 Java 11 and Java 17 okay so in this in my system I have installed the Java one you can choose any uh one of them simply click on the Java click on the windows and you just install this so the moment you click so just make it true and it will ask for the uh sign in to Oracle account so the moment you uh able to I I guess I was already in Oracle account so you would be able to download the Java file so in a similar fash you open this python URL and you open the python page so as of now you can install the latest version so simply go to download and you can see the latest version here so the moment you do a click it will be able to download the python installation will be completed and while installing just add the one option will be pop up that add to part okay now we need to install that a spark version okay so let me go to this page and let me uh type this URL and just hit enter you can see we have latest version for 3.5 .1 and it's released in Fab 23 uh 20024 okay so we can choose any of the version so let's say you are choosing this and it will be available for Apachi hadu 3.3 or later okay we need to click on this uh compressed file that is a tzz okay let me choose the older one okay if the moment you do you need to click here here and it is going to uh download this file I have already downloaded so I'll show you apart from that if you see I have already downloaded now what you need to do you need to install this uh one more file so uh before that what I am going to do I have uh downloaded that uh G file what I'll do I'll copy this file I'll go to my C drive here I need to create one folder inside the C drive so what I need to do I need to create a folder inside C drive with name any name you can give so I'm just making a spark okay what I'm going to do I'm going to C drive I'll create a new folder inside C drive I'll name it SAR okay cool in the spark I'll copy that zip file but here I need to unzip it so what I'll do I I'll extract it here only and fine now again I'll go to a spark folder and I'll remove this file and for the better thing what I'll do I'll copy all those file and I'll keep inside the spark folder only okay anyway you can do it uh you can make it inside a sub folder also so the first step is clear right we need to download that tzz file and you need to create one folder as spark and you need to and we need to place all the files within a spark cool now what we need to do we need to copy that one v utils file okay so if you see we have downloaded that a spark version for 3 uh let me show you it is 3.4 point so we need to download a file wece that should be equal to this or it should be a at least a lower version okay so we need to open that so what I'll do I'll copy paste that URL here I'll make all these things available in the description so you would have a better picture so what you can do you just click on this code and download as a zip okay fine the moment you do that so you can see you will have compressed folder you can see this is going to create so what you can do you can do unzip and you only need to copy one file so you need to go to the this one 3.3.5 but we have downloaded the spark version 3.4.2 so this vtil files is lower than that okay F we go to compressed we will go to vtil master and here we need to copy this inside bin we need to copy this vtil do exe fine again we need to go to the C drive so we have already created a folder called called a spark what we can do we can create a folder called Hado okay Hado fine inside Hado we can create a folder called bin okay we need to create folder called bin just paste that file Vin .exe so step number two is clear we need to copy this URL we need to download the whole file or simply what you can do you can go to this this head folder the latest one and from there here what you can do you can download this file you can click it here and you can download it from here okay and after downloading what you need to do you need to go to your C drive inside C you need to create one folder called Hadoop inside that you need to create one another folder called bin inside bin you need to keep that V .exe fine so till now we have installed Java installed python we have installed spark and we have installed at V tool.exe this Visual Studio code uh setup we'll do later now we need to set up the envirment variables okay now we need to set up the envirment variables okay so somehow if you see this python installation is going to be in a different folder let's say in our case it is inside this let's say I'll go to my C drive I'll go to my users okay in the users I'll go to my name and here we'll see in uh I guess app data so this app data will be uh hidden so what you need to do you need to go here View and go to so and click this uh hidden items okay you need to go to app data in app data we need to go to local okay in the local we need to go to the programs I guess in the programs you can see python installation is there so I have installed two python uh 3.10 and python 3. one two both the python files are there okay now we need to set up envirment variables okay so go to this in the search box and search envirment variables Okay click on envirment variables you can see none of the thing is added here only the python is added the moment you click that add to path so this python file will be added here click on new okay we need to set up a three different paths okay so for first we need to give Java home Java home okay we need to fetch from the browse director where actually our Java is installed so go to this PC go to OS go to program files go to Java now select jdk11 if you install Java 8 so similar it will come like jdk 1.8 if you install Java 17 it will come like jdk hyph 177 done now in a similar fashion we need to do for a spark home okay so for a spark home we have a spark inside C uh we have inside this right we need to select this as spark good similarly we have created one folder called Hadoop so we need to give Hadoop home right and in a similar way we need to go to C drive uh and in the C drive we need to search for Hadoop and just click this inside Hadoop there is a bin right all good right we have added Hadoop home we have added Java home we have added a spark home and here we can see one more thing we can see that is path so just edit and edit what you need to do you need to click on new okay in the new you need to just copy paste this uh percentage Java underscore home percentage uh back/ Bin for a spark and Hado in a similar way you need to do okay just click okay everything is good close everything just go to your uh command prompt okay just go to command prompt you see whether Java is installed or not okay you see Java is installed right so for the python you need to go to all the paths and you can check for python so let's say uh I'll type a spark shell so spark shell what is going to do it is going to enable The Spark engine okay so setting default log to one and this this things it is going to do you can see this one error is coming I was facing this error so many times I search on the YouTube everywhere but in uh this error comes very rare uh I have not seen this error for many people but unfortunately it was uh coming for me so it is very important to note this so for that reason I fa this error I am making it easy for you so you uh shouldn't face it so for uh solving this error we need to go to involment variables again okay go to envirment variables we need to add one more thing here like we need to add one more uh variable here that would be spark local host name uh spark local host name okay and in the value we need to give Local Host okay just okay okay okay do exit from here again launch that uh terminal okay now spark shell again type uh this time this error should be resolved now we can see it is created a spar session and we successfully installed The Spar and the version is 3.4.2 right and we are in Escala cell I mean we are in Escala language if you are familiar with Escala so you can probably write your code with Escala but we are not into esala so what will do I will close the terminal okay and what I'll do I'll again open the terminal and from here what I'm going to do I'll simply type pis okay it will show the python versions and it will check everything and you will so see this so here we can see 3.4. let me write so you can see we can able to write the python questions and all so let me open uh so this is done right I guess you would be able to install a spark and everything is working fine now the our thing was we need to set up a uh ID and from there so for you what you can do you can follow this link and uh open just install the vs code okay after installing the vs code you choose the windows Linux whatever you use just open the vs code just open the vs code and in the vs code also you can see the Jupiter envirment is there okay kind of the uh Jupiter kind of environment or you what you can do you can simply uh click a python file and I mean just uh create a new file and choose a Python language from there also you can do so what you can do this is the code right so for before that we need to install the uh basically uh p so for that what we need to do pip install PP okay so if you do that so it is going to install the pp in my case it is showing requirement already satisfied okay so if you have P spark installed in your system what we can do import ppar from ppar SQL import spark session we need to uh import and then create a object of spark session spark session is a class so just create object Spark isal to spar session. buer do Master here also you can give or this is by default I mean this thing you can remove also it will work okay if you run this if you run this it is going to run and if you see this this thing so you can see spark session in memory spark contest UI spark version this is mastered this is basically local and this app name is spark example by example.com I have taken the example from this side okay I guess it is fine and if we see let me open one file if I can see if there is one file so I can show you while reading also so let me see whether if can see any file is there so let's say sales LT product dot uh I can let me put inside one folder here uh and I'll copy the path sales LT okay so what you can do I can just open this uh now let's say I'll create a data frame okay as spark. read. format and this is a csb right and uh let me make header oh okay I'll what I'll do I'll make option and uh header true and in the load I'll give okay let me check if I able to read it or not okay fine and uh DF do so if I do that we can successfully able to read the data using p spark Library so uh basically in data bricks we used to write uh display uh in uh data bricks so we need used to write uh display but this will not work here I mean it is going to show you the schema but not the data so I guess this video is really helpful while installing a spark and everything in the local and you can do the coding with the vs code if you really uh uh like my video so let me open my YouTube channel so please please to a favor please go ahead and uh uh subscribe my channel okay so in the uh in my video I mean here I upload video related to the data engineering okay everything you will going to find like uh the data Factory videos the data uh bricks video Even though I upload video related to how to do a certifications and all so I mean it will be really helpful for you and if you share this video to other people they can also learn Lear a lot thank you so much for watching uh bye have a wonderful day thank you so much
Info
Channel: TheCloudBox
Views: 5,558
Rating: undefined out of 5
Keywords:
Id: 49yQ-bdj4Ww
Channel Id: undefined
Length: 17min 37sec (1057 seconds)
Published: Mon Mar 04 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.