How I Use Python as a Data Engineer

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
how many times you know who uses python in my work every day in this video I will share some of the tips and the best practices that I've learned along the way and how I use python as a data engineer one of the main tasks I do with the python is data extraction and manipulation I can just write few lines of code to extract data from different apis or data set then start manipulating that data drop some columns and apply some business Logic on top of it python comes under the top file skills you need to know as a data engineer python has very simple syntax so even if you have never coded in your life you can write python code within few weeks it also has used ecosystem Frameworks and libraries that you can use to perform data engineering works such as pandas numpy airflow now let's understand the entire process by building a small project in this demo I'm going to show you how I would extract data from API write some basic transformation job deploy the data Pipeline and load data into Target location our first task is to extract data from the different sources now in the real world you might have data coming from the multiple sources such as web application or mobile application some apis rdbms or many more but to build our own project we have few options available one is using static data set so we can go to the website like kegel.com Google data search or Amazon open data and similar website you can go here pick any data set you like and start working on it but we want something real time a data that keeps changing so we get new data every hour every minute and for that we will be using apis application programming interface you can use basic code to connect to this API and start extracting data there are many different apis available in the market such as Spotify API Twitter API stock market API or you can just go to the different website and you will find bunch of different apis that you can connect to and start extracting data once you find the apis that we want to work on then we can go to this website and request for the API keys after we have the access to the API then we can start writing the code to extract data from those apis let's say in this case we decided to go with the Twitter API and we want to extract data from the Twitter for the time being we want to track what Elon Musk is tweeting each and every day and we want to build our final data set so that we can analyze that in future can start writing the code to extract data from the Elon Musk timeline all of this data comes in Json format JavaScript object notation that looks like this it has some keys and values so if you want to access some values you can just use the key name and access it when we extract the data from your Twitter it Returns the output in the form of Json data this data is completely in the Raw format and we want to transform this data into more readable form and more understandable form and for that we will write the basic transformation job this includes reading the raw Json data and then converting it into more readable format that is row and column format for this we have a package available inside python that is called aspanda so we can convert our entire raw data into Data frame a data frame is a data structure that organizes data into two dimensional table of rows and column it's same as the Excel sheet or any relational table you might know once we have our transformation job written then we can spend some time analyzing this data and think about how to load this data onto Target location so everything we are doing is called as ETL extract transform load so we can can extract data from the multiple sources in this case we are extracting data from the Twitter API then we can write some transformation job so we converted our raw data into more readable format that is basically applying some transformation and then we can load our data onto some Target location that can be a data warehouse or any object storage location so you can store your data into multiple places one is object storage so you can store your data on the Amazon S3 Google Cloud Storage or Azure data block or we can use data warehouse so we basically have two types of system one is oltp online transactional processing and one is olap online analytical processing oltp systems are mainly designed for crud operation create read update delete so when you order something from the Amazon or when you purchase something or make payment online all of these things should happen in real time and a record of your transaction should be created all DB systems are designed for this type of work faster read write and updates old DB systems are mainly my SQL postgresql Etc but on the other case we have olap online analytical processing so if you want to answer some of the questions such as what was the last five years of Revenue how many products did we sell this year compared to last year's all of these questions can be easily answered using olap system or data warehouses some of the example of data warehouses such as snowflake Google bigquery Amazon redshift and many more so in this case we will keep it simple and we will store our data onto object storage that is Amazon S3 simple storage service Amazon S3 is an object storage where we can store different type of file such as audio video text file or any type of file that you want to store so we can store our transform data onto S3 and then in future if we want to do any analysis work then we can use that data to load it onto data warehouse or we can directly start analyzing data and build dashboards out of it whatever work does not and here what we just did we just created a simple data file plan where we extracted the data we transform that data and we loaded that data onto some Target location in this case Amazon S3 but we want to automate this entire process and our code should run every day at particular time interval that we set now to automate this entire thing we have multi table options available one option is we can just write the simple python script and deploy it on some virtual machine and set up the Quran job a crown job is a Linux command that is used for scheduling tasks to be executed at some time in the future but as we go forward in the future we might have more data sources coming and we might have to write multiple scripts and that becomes very difficult to manage with the crown job that is the reason we will be using a workflow orchestration tool we have many different tools available in the market such as airflow Mage prefect and many more in this case we'll go with the Apache airflow so let me give you the basics of airflow in airflow we have something called a stack directed a cyclic graph it is basically a sequence of different tasks we can have multiple tasks one task for extracting data one for transformation one for loading some data onto Target location this entire thing is called as stack and to create all of these different tasks we have something called as operators so there are many different operators available if you want to run the bash command if you want to execute python script and many more so in this case we want to run the python course so we will go with the python operator so once you have our dag ready we can just copy our code and deploy it onto airflow you can install airflow in your local machine you can use manage services or you can install it on any virtual machine we will install our iPhone to ec2 machine and then copy our code there are multiple color coding available on Alpha that shows different status of our job when it is green that means our code and successfully and we can see our final output on S3 bucket once we have our transform data onto storage location then we can have even more transformation job we can load our data onto data warehouse then I data scientist or machine learning engineer can come build a machine learning models or dashboard to find inside so this was one of the ways that I use python as a data engineer now there are multiple use cases of python and data engineering so each company uses python in their own way now the question is how to learn all of these things so there are basically two ways I will keep you so one way is basically if you have been following me then you will know I have a python for data engineering course dedicated for data engineering only this course will take you from very basic level to the advanced level and make you python ready data engine here so at one place you will learn everything about Python and how to use Python for data engineering but if you want to go on the path of self learning and explore Thing by yourself then you can also learn these things for free so here's the three-step approach that I suggest first learn the basic fundamentals these fundamental concepts are common across all of the different programming languages such as variables operators Loops conditional statement once you learn all of this basic concept then you can learn more about some Advanced levels and start doing the Hands-On practice this English learning object-oriented programming exception handling working with different packages functions Lambda functions and you can practice your skills on websites such as lead code or hacker Rank and the third step is picking a niche now you can use Python for many different applications such as you can use Python for game development you can use Python for web development data science and many more all of these different Niche requires learning about their own packages and framework so let's say if you want to Learn Python for web development you might have to learn about flowers or Django but in the case if you want to Learn Python for data engineering here is what you need to focus on first understand how to work with the different types of file form match so your CSV Json Avro Park Etc second learn how to connect and query database using Code such as learning about SQL Alchemy by my SQL cyclop G2 third you can learn about different types of date time format and time zone 4 is learning about how to write data transformation job you can use pandas and learn more about pandas and how to manipulate data how to drop some columns how to combine multiple data frames and apply some Logic on top of it with this you can learn about how to automate the entire thing and six learn how to read documentation and connect with the different tools so there you have it how I use python as a data engineer and how to Learn Python for data engineering from the scratch I have a course if you want you can enroll into it if you don't want to enroll then you can go to the self learning path that I give you pick any way you will arrive at the same destination that is all for this video If you learned something and if you found this video insightful then don't forget to hit the like button and subscribe to the channel I'll see you in the next video foreign [Music]
Info
Channel: Darshil Parmar
Views: 33,320
Rating: undefined out of 5
Keywords: darshil parmar, how i used python as a data engineer, how to learn python for free, python for data engineering, learn python for free fast, how to learn code for data engineer, big data using python, python, pandas, airflow, how to build data pipeline, data pipeline using python, learn airflow for data engineering, data engineering project, learn data engineering free, data engineering courses for free, python for data engineering course
Id: saM0zSdd158
Channel Id: undefined
Length: 9min 8sec (548 seconds)
Published: Sat Mar 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.