Apache Spark : How to install Spark 3.4.1 on Windows 11

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello in this video I am going to show the steps  to install Apache spark version 3.4.1 which is   the latest version available as on July 2023 I'm  going to install this version on Windows 11 laptop   So Below are the steps let's go through one by one  the first one will be the prerequisite download   and install 7-Zip if you already have sound zip  installed on your laptop you can ignore this step   we need 7zip to extract the downloaded spark  files in Dot tgz and Dot tar format so I will   quickly show you the download link for 7-Zip  just go to this link and then you can you can   just click on the download in my case I have  64-bit Windows laptop so I just downloaded this   exe file and followed the standard installation  steps uh I will provide all the download links in   the description section for downloading 7-Zip  python Java spark and also Hadoop wind utils so the first step will be download and  install python so again I will quickly show   the python download link so from  here you can download python   in for this I have installed the latest version  as of today which is 3.11 for Windows so I have   so once you click on this download and then once  the file gets downloaded you can just follow the   standard installation steps the next one will  be download and install Java JDK version 17. again go to the Java uh oracle.com  and then where you will find the download link for jdk 17 you have to select  windows and then x64 installer once you click on   this link it will get downloaded and then again  you can follow the standard installation steps   so there is a reason for selecting  Python 3.11 and then Java SDK 17   if I go to the spark uh 3.4.1 documentation  overview page so you can you can see here like   uh the spark runs on Java uh 8 version  8 11 or 17. and also it requires python python 3.7 plus so 3.7 is deprecated   as of spark 3.4.0 so this is why I have chosen  these two specific version for uh Python and Java   so once this is done uh you have to download the  spark again you can download it from so from this   overview page itself in the downloads  you can just click on this download   page which will take you to this section this  page and then here you have to be careful so as of   today 3.4.1 June 23 this is the latest uh one so I  have selected this and here you will see multiple   options so I have selected a pre-built for  Apache Hadoop 3.3 and later so as mentioned here so spark uses Hadoop client libraries uh I'll tell  you um later like why so this is so this is the   reason why we are going to download the Hadoop  between utils from one of the GitHub site so so we so in this step we are just going to  download the spark 3.4.1 uh again if I go back   to the link so just click on 3.4.1 rebuilt  select this option and then click on this   uh uh link which will take you to this page and  then just click on this one which will download   the file I'll show you uh in the downloads so you  will get that file downloaded in dot tgz format   using 7 zip you have to extract this   you have to extract this so once you extract  it will again come up as dot tar file and then   again if you do the same extract process you  will get the file extracted and the folder   so this is what I have I had mentioned uh in the  step four so after you download extract the files   and then the next one will be the uh downloading  the Hadoop win util from GitHub site I will   provide the link from where you can download this  in the description section but as I was mentioning   uh so we need Hadoop client libraries the  spark uses Hadoop client libraries for hdfs and   and there are also Hadoop free Binary for  which we have to do some spark configuration   changes but in our case we are going to go  with Hadoop Vin util 3.3 or later [Music]   so this is the GitHub page from here you  just go to the uh Hadoop 3.3 bin select   just download this Vin utils.exe alone I I will  show on the my downloads folder so this is what   it is so once this is done uh we need to create  a folder structure uh in the C drive create a   folder called spark again and the C drive  create a folder called Hadoop and within   it create a folder called bin so once this is  done we are going to copy the downloaded spark   and then Hadoop mini utils into this respective  folders so I will just quickly show you so in   C I have created a folder called spark and then  all these files I have copied from this extracted spark file in the downloads I've  just copied everything into this   and similarly in the Hadoop pin I have just  copied the mean utils from the downloads folder   so once you copy this so we are done till step  7. and the next one will be the step 8 in the if you just type environment variables environment  in the search bar so you will have to select the   system properties go to the environment variables  so the step eight will be setting the environment   variables and path for Javas Park and Hadoop so  if you see here for what I have done is I've just   clicked on new I have typed Adobe underscore  home and then provided the path as variable   value here similarly I had done for Java home and  similarly I had done for spark home so once you   do this you will have the variables for Hadoop  Java and then Spark and then what you have to   do is we have to set the path also so again for  the path you just have to click this we have to   set the path like this so how I have done this  I just click on new and then paste these values   for each of them so likewise I've I've done  the path settings for Java home Hadoop home and   then spark home once this is done all these are  done I had just validated the uh the version of   Java and then the python once  these are all done just go to the spark bin folder in the C  drive from here just type CMD   so this way it will take you  to the to this path itself   so once you are there I mean I I had already  opened this window so once you are there you can   just type spark shell this command it will take  a minute or so and then it will uh pop up this   message so as you can see here's power question  3.4.1 and then there will be a webview why uh link available and then if you just open  that you can see the spark web UI as well   so if you like this uh video please do subscribe  this Channel and share it with others I'll come   up with a more data engineer related topics in  the future thank you for watching this video
Info
Channel: Data Engineer Topics (DET)
Views: 19,701
Rating: undefined out of 5
Keywords: Apache spark installation, Spark latest version installation, Spark installation on windows 11
Id: xyEd_yvZwBU
Channel Id: undefined
Length: 10min 15sec (615 seconds)
Published: Mon Jul 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.