Spark Installation on Windows 10 and Mac | PySpark Tutorial for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to our PI spark tutorial series today we'll guide you through the process of installing and setting up Pi spark with Jupiter lab on both Mac OS and windows 10. let's get started first we need to install the Java development kit or jdk this is essential for running Pi spark spark runs on Java 8 11 or 17. here's what you need to do visit the Oracle website oracle.com Java Technologies downloads and download the latest version of jdk 17 for Mac OS make sure to select the right version for your Mac ARM version is for Mac with apple silicon chips while the x64 version is for Mac with Intel chips once the download is complete run the installer and follow the instructions to complete the installation to verify the Java installation we can simply run the command Java version and the Java version info should be returned Step 2 install Apache spark next we'll install Apache spark the framework that powers pies Park visit the Apache spark downloads page at spark.apache.org downloads.html and select the latest stable version of spark choose a pre-built package for Apache spark specifically the one label pre-built for Apache Hadoop 3.3 and later download it and extract the package to a directory of your choice here we will move it to the directory slash user slash coder2j apps and change the spark folder name to spark step 3. install python now let's install python spark runs on python 3.7 Plus so make sure you have the latest python version for example python 3.10 if you don't know how to install python check out this video once you have it create a python EnV using the command python mvin.pi Sparky ND activate it we can see the end name at the beginning of the prompt step 4. install Pi spark Jupiter lab we're almost there let's install Pi Spark fines Park and Jupiter lab the popular notebook interface for python step 5 launch Jupiter lab and use Pi Spark It's Time to launch Jupiter lab and start using pi spark in terminal enter the following command Jupiter lab this will launch Jupiter lab in your default web browser here we can create a new notebook and rename it as Pi spark get started first we need to set up the environment variables required for Pius Park we set the spark underscore home variable to the path where Spark is installed in this case is slash user slash coder 2J slash app spark change it to match your extracted spark directory let's set Pi spark underscore driver underscore python to Jupiter and Pi spark underscore driver underscore python underscore pts to lab to specify that Jupiter should be used as the driver for pi spark and the Jupiter lab interface should be used finally we specify the python executable to be used with pi Spark by setting Pi spark underscore python to python then we import the spark session class from the pi spark.sql module the spark session class is the entry point for working with structured data using spark SQL next we create a spark session object named spark using the Builder pattern the app name sets the name of the spark application let's say Pi spark get started the get or create method either retrieves an existing spark session or creates a new one if it doesn't exist in the end we create a simple data frame using the create data frame method the data frame is Created from a list of tuple's data where each Tuple represents a row with two columns name and age finally the show method is called to display the contents of the data frame after execution once you see the data frame value is output it means you've successfully installed and set a pi spark with Jupiter lab on Mac OS let's get started on installing and setting up Pi spark with Jupiter lab on Windows 10. step 1. install Java development kit jdk to begin we need to install the Java development kit don't worry it's a straightforward process spark runs on Java 8 11 or 17. visit the Oracle website oracle.com Java technology slash downloads and download the latest version of jdk 17 for Windows once the download is complete run the installer and follow the instructions to complete the installation to verify the Java installation we can simply start the command prompt and run the command Java version and the Java version info should be returned Step 2 install Apache Spark next we'll install Apache spark the framework that powers Pius Park visit the Apache spark downloads page at spark.apache.org downloads.html and select the latest stable version of spark choose a pre-built package for Apache spark specifically the one label pre-built for Apache Hadoop 3.3 and later download it and extract the package to a directory of your choice to extract it you need the 7-Zip application make sure you have it installed now we use the 7-Zip to extract the file to the download folder and change the spark folder name to spark here we will move it to the directory C backslash users backslash coder2j backslash documents backslash apps step 3 install python now let's install python spark runs on python 3.7 Plus so make sure you have the latest python version for example python 3.10 if you don't know how to install python check out this video once you have it create a python EnV using the command python mvin.pi Sparky ND activate it we can see the end name at the beginning of the prompt step 4. install Pi spark Jupiter lab we're almost there let's install Pi Spark fines Park and Jupiter lab the popular notebook interface for python step 5 launch Jupiter lab and use Pi Spark It's Time to launch Jupiter lab and start using pies Park in terminal enter the following command Jupiter lab this will launch Jupiter lab in your default web browser here we can create a new notebook and rename it as Pi spark get started first we need to set up the environment variables required for Pius Park we set the spark underscore home variable to the path where Spark is installed in this case is C backslash users backslash coder 2J backslash documents backslash apps backslash Spark change it to match your extracted spark directory let's set Pi spark underscore driver underscore python to Jupiter and Pi spark underscore driver underscore python underscore pts to lab to specify that Jupiter should be used as the driver for pi spark and the Jupiter lab interface should be used finally we specify the python executable to be used with pi Spark by setting Pi spark underscore python to python then we import the spark session class from the pi spark.sql module the spark session class is the entry point for working with structured data using spark SQL next we create a spark session object named spark using the Builder pattern the app name sets the name of the spark application let's say Pi spark get started the get or create method either retrieves an existing spark session or creates a new one if it doesn't exist in the end we create a simple data frame using the create data frame method the data frame is Created from a list of tuple's data where each Tuple represents a row with two columns name and age the resulting data frame is assigned to the variable DF finally the show method is called to display the contents of the data frame after execution once you see the data frame value is output it means you've successfully installed and set up Pi spark with Jupiter lab on Windows 10. congratulations you have successfully installed and set a pi spark on your Mac OS and Windows 10 machines in the video description below you'll find the links to the python Oracle and Apache spark websites for downloads as well as additional resources for reference thank you for watching don't forget to subscribe to our Channel and hit the notification Bell to stay updated on our PI spark tutorial series in our next video we'll dive into the basics of Pi spark and start exploring its powerful features thank you for watching and see you in the next video
Info
Channel: coder2j
Views: 6,673
Rating: undefined out of 5
Keywords: pyspark, pyspark tutorial, pyspark tutorial for beginners, apache spark, apache spark tutorial, apache spark tutorial for beginners, spark, spark tutorial, spark tutorial for beginners, data engineer, data engineering, data science, data scientist, data analytics, distributed systems, python, spark install mac, spark installation on windows 10, spark setup, spark setup in windows 10, spark setup on mac, pyspark installation, pyspark setup, pyspark setup in windows 10
Id: WxWRXNna1Qw
Channel Id: undefined
Length: 9min 30sec (570 seconds)
Published: Sun Jul 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.