Windows 11 Apache Spark Installation Made Simple - Your PySpark Kickstart!" Install Spark in windows

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

channel so today I will show you how to install the pi spark in Windows 11 and install this Apache spark so basically if you want to install the Pi spark and purchase bar so this is the right video I will show you the N2 and installation and the sample code in pi spark and for pi spark I will use a Jupiter notebook ID so let us start so first of all you need to go to the google.com and you need to install the two things one is Apache spark okay so I type Apache spot download okay so I will get the first link so in this case I will click on the first link and we'll choose the uh third option called spark or 3.4.1 uh since uh during this video recording my uh system is showing that spark latest spark version is 3.4.1 so I will download this 3.4.1 okay sorry click on uh download button and on the very first link you need to click it so it will take some time to download it I have already downloaded okay so what I will do I will go to the download section okay and uh I will unzip this file all right so I have downloaded this uh this wind spark file okay that is a Windows uh in Apache spark I have downloaded it so I will go directly to the download section okay so uh here I have downloaded Apaches for 3.4.1 version so what I will do I will copy this file and you I will go to my local Drive so in local drive I have created on folder called spark and under this spark I also created spark type folder and in this folder I copied all this files so you will also follow the same thing if you ensure that you give your folder name called spark and Spark 3. okay so uh living uh take you through the download section here you just need to change the name that's all okay and here also you change the name that's all so once you go to the C uh drive and paste all this required details all right so first thing you have you have computed first task second is you need to create the Hadoop folder okay once you create these simple Ado folder and then create the green folder in the Hadoop okay and in green folder you need to download this viewing utils for Windows louder okay if you are using wind Linux then you don't need to use this utils and from where you can get it I will give you this download link in the description section okay if you want to get this utils just the click utils.exa download okay and you will get the first GitHub repositories click to the first video repositories and go to any Hadoop folder okay I will go to in Hadoop folder and I will check where is utils.exe file so if you go down somewhere you will find V neutrals.exe so your task is just it click and get this downloaded once you download this file it is just uh KB file you need to copy this file in before all right so two tasks you have done it you have created a spark folder your copy with spark folder now you need to go to the environment variable okay so for environment variable in Windows 11 I'll click on this square button and I will type here e and V here you will get the first options called edit system environment variable you need to click all that and you need to create environment variables so first is a Hadoop okay so in order to create this environment variable you need to click on new button so once you click new button it will pop up like this window okay first one you need to give Adobe underscore home and give the path of the Hadoop okay now I will show you uh how you can do it you go to this C drive where you have copied a spark a nodu or folder go to Adobe location and copy this C drive simple and put it here right and hit OK same way you need to uh install you need to create another variable called Spar home Okay click on new button then it will open up this edit variable display here type spark on and use the spark 3 location make sure you should give the spark 3 location where these all folders are that okay I this is my spark and Spark three so I will click I will copy this part three location and we'll click here and hit OK so your task two tasks are done okay now third one is you need to install your uh Java okay first uh I would say download Java a for Windows type this uh display and hit on this Oracle website Java AC 8 RK download so you will be redirected to this from Windows of Oracle website and here choose your windows 64 bit I am using a Windows 11 64 so I will enter I download this and it will ask you some email ID and password so just give email ID and password and download it okay once you download it install this Java so once you install this Java you need to uh give this Java path now I'll show you where I have installed it you just need to hit next next while installing uh my Java is installing uh local C drive program and Java jdk 1.8.3 okay so I will copy this path and I will go to system and environment variables and I will create another variable called Java home and I will copy my Java jdk 1.8 installation directory path so this task is done and next thing is you need to go to this path when you are now in this path variable you need to copy the bin folder of Urdu pen Spark now from where you will get I will show you uh in this C drive you have copied Spark 3 and this beam folder copy this beam folder and uh go to system variables and here copy this spark main folder okay same you have copied uh so you need to copy the Hadoop folder in Adobe we have created bin folder and copy this bin path and paste it here done so your system is configured with Apache Spa in Windows 11. now your next task is you need to open up python IDE for your Simplicity I will do verification whether our spark is working or not so I will go to CMD control uh terminal Windows 7 Windows 11 okay I type here Spa rks Park share now it will take a couple of seconds if it is taking more than 15 seconds that means your system has successfully installed Apache Spa so this is taking little bit time because because it is highly uh I would say it is highly used uh environment for a big data technology all right so for example if you are using Hadoop map reduce what are the tasks it will take to perform some two to three hours this part can do within a couple of minutes so that is the power of spark so here we get the uh spark windows in terminals called the version 3.4.1 and so our spark is ready in Windows 11 and uh for your information this spark is written on Scala language which is highly scalable and functional programming all right so this park is installed now my next task is how we can use a pi spark okay so in order to use a price per what you need to do you need to install in your installation uh path so for example I have used Anaconda python okay so I will go to Anaconda prompt Anaconda pump and type here exclamatory sign peep install Pi spark okay so here you can see base is shown that means in this Honda environment I will use Pi Spa so in order to download I will type install okay since I have already installed a pi spark in my anaconda environment so it will only it will show me that your Pi spark is already installed if you haven't installed it it will take a couple of minutes to install so have a patience all right now I open the jupyter notebook I've already installed so I will not uh read much will not wait for this completion now I have installed this uh anaconda and I have opened up this jupyter notebook and we will start the price per session so our first Target is we will read the text file and we will perform the basic operations or uh this file using pi spark okay so first of all I will install Pi spark so I will import Pi spark then from PI Spark dot SQL import spark sessions okay then I will create a spark session so in order to create the spark session you need to give the specific variable for the spark is equal to in bracket spark session dot Builder dot app name in bracket you need to give the name of this app then dot get or create so what does it mean it means that it is a highly scalable python API for a purchase path so basically Scala spark is written in scholar okay and Scala is a functional uh programming language which you can say the top of the Java program and uh in pi spark we don't need a compiler but in Java we need a compiler you need to write a code and compile the code using compiler now what this uh Pi spark dot SQL do it automatically do the back end operations for the integration with Apache Spa okay so uh this spark session is very very important and you can see spark is very lazy evaluation process lazy in the sense it write the steps to perform instruction to perform unless otherwise you invoke the action will not act so this is a beauty of spine okay so we'll use a spark session dot Builder dot app name and here give any random name I will give you the analyzing and vocabulary of price and Prejudice dot get or create so if any spark session is there then it will get the same otherwise it will create a new sensor so I will give a enter it will take a couple of seconds and our spark session will be read so let us wait for the couple of seconds so basically uh in spice path what we need to do right now we are going to read the input data from this as our local station okay so there are the four five steps but I will only perform only reading data okay for uh okay now I have copied the path of sample text file from my local uh drive now I will create a one new variable called book and I will import this a one three four two dash zero dot text file now let me create a one variable called book and Spark dot read dot text I'll try to read this Phi through spark okay keep in mind that this file is very very minimal now only is there but imagine that your data set contains a 1tb one petabyte size so this is not possible using single computer so in this case multi multi-node cluster needs to set up and in that case this Pi spark is highly useful by spine or Apache spark is highly useful okay now I will use a spark.read so it will only invoke me the functions okay because spark is only reading there is no action performed and we check where this directory is there I mean how many directories are there so here you can see there are different classes in spark dot read okay we will not go through each and every one okay because it is not at all uh scope will only use one schema let us see what our variable called book thin seems like so our book variable shows us that there is a value and there is a data frame which contains value as a string okay so let us type the schema of our variable call book so our schema is followed by a driver value and string now let us check the data types of our variables our variables are for book variable data type is string now we will try to see what is there in this book variable so until now no actions are performed only schema and data types you are getting it now we will say book dot show tan and trunk it is equal to 50. okay initially you can also say book dot saw but I will show you the difference we'll first see book dot show so it will show you the 20 rows first 20 rows here you can see first 20 rows you can see now this right hand side portion is being hidden and we want to expand it so we'll say truncate is equal to 50 so it will show up to 50 characters so here it is showing 10 rows and 50 characters because we have specifically given 10 all right so this is uh the beauty of Pi spark I hope you would like this video if you have any very feel free to ask thank you very much for watching this video

Info

Channel: Technical Surani

Views: 3,537

Rating: undefined out of 5

Keywords: Apache Spark Installation in Windows 11, Install spark in windows 11, end to end apache spark installation, apache spark in windows 11, pyspark installation, end to end python spark installation, pyspark tutorials, end to end apache spark tutorials

Id: XVxl6c9lhGQ

Channel Id: undefined

Length: 17min 15sec (1035 seconds)

Published: Tue Aug 22 2023