Complete Roadmap to become Azure Data Engineer. - Arun Kumar

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello guys good morning I hope you all are doing  good uh I am Arun Kumar again in front of you to   discuss one of another very important topic that  is what I what is the road map to become Azure   data engineer I mean many questions have been many  the students have been asking this question to   me since very long time rather than answering  them individually let me create this video in   this video I will be talking about what will the  road map to become an Azure data engineer okay   so there are very common like that people ask  I think this video will be sufficient for them   I think this video will be answering  most of their questions okay so let me   let me let's start like how what what will be  the roadmap to become an ISO data engineer okay   so first of all we need few of the programming  languages like SQL is the most important thing   that is required to become an Azure  data engineer this is this is must   this is must without this you can't  proceed and become an Azure data engineer this is the language that that has been in use  since past 20 or 30 years in the market I mean   and its demand has yet not decreased it is such a  powerful language to process Big Data whether it   is small data or whether it is Data very big size  apart from this we need a scripting language for   automating our data pipelines since data Engineers  are responsible for creation of data pipelines   right so that scripting language is python okay  this python is the most demanding language okay   although this SQL is must but this python is not  must but if you know you will always have an extra   s over other candidates who don't know python  okay if you know the basic level then it is good   if you if you are at medium level anyways  then it is more good and if you are at   expert level in this language then nothing  can nothing can beat you actually okay now   no let me talk about okay so apart from python  there are other languages also in the market that   are used for data engineering the other language  is R okay so this is python languages okay other   language one of the other languages are as well  apart from this Java is also used by this Java   this Java is not used on the on the latest  platform that is azure databricks okay so I   suggest not to go for this language I will suggest  not to go for this language our language and I   also suggest not to go for this scholar language  y because this python is a multi-purpose language   it can be used for most of the purposes and most  of the industry is trying to adopt this language   only earlier Big Data was done in iskal  most of the places Big Data was done into   uh but now Industries have moved heavily move  towards python they have already adopted or are   slowly and slowly adopting this technology  to perform their big data operations okay now let's come to another technology this is which  is very very useful and very very demanding and   it is a spark actually and this is what this is  what the technologies that handles data that is   a very big size okay so what is this any a very  common question like crystallines are confused   that is this any programming language so the clear  answer for this is no this is not in programming   language this is a compute engine over which  your goods relevant that you have written into   SQL and python okay this is just a compute engine  okay apart from this we use Python API over spark   and that is called as Pi spark okay we use Python  API over spark and that is called spy Pi so since   we are learning our uh we are we are planning our  roadmap of is no data engineering with python as   a language that's why we will be learning Pi  spot we will not be you should not choose any   other thing like if we are writing code over spark  over a spark with r then that is called as spark r   okay so of course we are not I suggest not to  go for this sparkle unless there is any specific   requirement for any company this transfer is is is  is most most in demand in the industry because it   has python apis actually Python apis and that  those codes can be written over those python   apis can be written over a spark to process the  data now apart from these Concepts there are few   Concepts like data warehousing and ETL Concepts  that I think that you should know without these   Concepts you won't be able to understand that what  what does the Big Data actually do so this gives   you a architecture kind of diagram inside your  mind okay this is what happening that is coming   from here data is being cleaned here and data  is moving there there are many Concepts inside   it like oil depend while if the system star  schema snowflake schema there are many things   later inside it that I think that you should  know and they are heavily asked in interview   okay so this was one point now let's talk about  the services of azure that you should know   to become a data engineer the first service  is this Azure databricks and this is the only   platform on which code is written into python  SQL python okay Pi spark code is also written   over this platform only okay ice bar code is  also written on this platform only and this   is only platform only it's actually on which  actually spark runs okay which spark runs this   is the only platform only service on this is the  service behind which actually spark runs okay now and this is one of the very important service  this service I mean this database is a separate   company this database is a separate company and  this Azure is a company is a product of Microsoft   so there is time between Microsoft and data bricks  and that's why after type the name of the service   became Azure databricks okay and this data bricks  was founded by the founders of spark actually this   databricks was founded by the founders of spark  only and this is this is a very good company for   big data right now in the market okay if you want  you can try to get into this company now let me   talk about Azure data effective this is a service  which is used for orchestration of pipelines as   well as data processing this is a service which is  used for orchestration of Pipelines orchestration   of pipelines it means how the pipeline will run  right first of all what would happen then what   should happen then what should happen and this is  your data Factory also triggers heavier different   kind of tickets and they are based on which your  pipeline will be triggered like suppose you want   to run your pipeline daily at 9 pm so you can  schedule those kind of things in this data Factory   there are multiple other kinds of triggers that  is all together as separate topic so we will not   be going into that apart from this this is also a  service on which data processing is done and there   is something called data flow there is something  called data flow in this Azure data Factory   in which actually behind which actually is parked  around there is something called data flow and   behind which actually a spark Spark Run so these  are the two services in which spark runs okay in   data Factory this data flow there is something  called data flow behind which actually spark   runs to process process the data now guys this is  this is very important topic that I am discussing   okay now let me take you to storage locations so  these this ideal is Gen 2 and Aerials Azure SQL   Server both have the storage location this is data  language so data Lake service Gen 2 generation 2.   okay and actually you will not find any service  in Azure platform with this name okay the actual   service that is available in Azure for this  for provisioning this ADLs Z2 is a storage   account okay is a storage account and you need to  enable hierarchical name space in storage account   to make that as a ADLs Gen 2 okay to make that as  ADLs Gen 2 if you don't enable this hierarchical   name space the service date will be provisioned  is blog storage okay blog storage okay I am not   able to okay blog story if you don't enable if  you don't enable this hierarchical name space   in your storage account okay that service  will be provisioned as blog storage okay   there is one more service and that is called as  it is Gen 1 and but this service is going to be   depreciated on ideal is gen1 okay and this service  is going to be depreciated on 29th of Feb 2024 it   means Microsoft is going to decommission this this  will not be in use and if you want to provision   this ADLs gen1 you won't be able to provision gen1  now because Microsoft has stopped provisioning any so these are the actually three data Lakes types  of data links that are available on Azure platform   now this is your SQL Server this is a sequence  server that is uh that is available in   on Azure platform and this is  azure keyword which is used   for security purpose for restoring your  secrets for restoring your secrets now okay now so these These are the languages and the  concepts this this concept that I think that you   should learn and these are the Azure services  that you should learn if you are targeting as   an Azure data engineer now let's talk about the  deployment okay after creation of the deployment   the pipelines are moved from development from  development environment okay to uit environment   and then it is moved to production environment  okay production environment so basically three   kind of environments are there in Industry  development unit in production development is   the environment in which you develop the code so  it is the environment in which that code is tested   okay and development also testing happens but  that testing is at unit level okay now the testing   happens at uot level and this environment is the  duplication of this production environment okay   both environment are it same same kind okay if  testing passed from this uit environment then only   we move our pipeline to production environment  okay and here actually your original pipeline   runs in reality okay apart from this these two  environments are therefore development and testing   so for this kind of movement so I am I am  developing my pipeline in development so   for this kind of moment from development  to uat and then from uit to production   this kind of movement is said as deployment okay  this kind of movement is said as deployment and   there is another service in Azure that is used for  deployment purpose and the name of that service is   this Azure devops okay with this we can create the  CI CD Pipelines we can create the cicd pipelines   okay that that is used for the deployment of my  developed pipelines okay this is also one of the   very important aspect now apart from this I think  that you should minimum do two projects you can do   more projects after learning all these things  I think you should do for go for two projects   or you can go for more projection as well and  then you should start giving interviews okay now now the next question is the next question is  that the students are that how much time around   how much time will it take to learn this whole  things okay so it really depends upon at what   at what stage you are many people already  know SQL many people already know python many   people already know both of them so based upon  different permutation and combination I suggest   that it will take three months to six months of  time to get your hands strong on all these things   okay it really depends on manufacturers that what  all Technologies you know as I said someone knows   equals someone no python someone know all of  the all both the things someone knows so if   someone knows this [ __ ] then they can directly  learn all these things and this is your divorce   fight all these things then they are good to go  so it really depends upon a lot of factor also   it depends upon how much time you are putting  on daily basis this is another important factor someone spend only two hour point if someone is  training four hours per day six hours per day so   of course those who are spending six hours  per day we will learn it faster way right   we'll learn and faster it so these were the  things that I wanted to tell you guys I have   listed down all the important things here now few  important points I want to highlight let me okay   few important points I want to  highlight let me write it down here there is no place actually okay the first  thing is in the process of learning you have to   keep revising the concepts that you  are learning the revision should take   place at four level okay if you are reading  something today then that should be revised   by tomorrow okay this is this process would go on  for six days then on Seventh day you should keep   to revise all the things whatever you have studied  in the whole way then this cycle should continue   for this Cycles will continue for almost one month  okay and at the end of first one the whole thing   should be revised which you are you have read in  the past one month then another cycle of revisions   will take place at quarterly level okay once  three months has passed in this cycle you have   to do another level of revision at quarterly level  right so see how many revision cycle Happened One   immediately after what tomorrow I mean whatever  you write today you are revising tomorrow one then   at the end of six days I mean on Seventh days  you are revising again whatever you read on   previous days then the third division is going  to happen at the end of your month and the fourth   revision is going to happen at the end of of  the quarter this cycle is very much necessary   if you want to retain any information in your  mind for very long time okay for very long term   because and in this process if you will keep on  revising in this way you will continuously keep   on refining your content your notes actually  if initially the notes is of 20 pages with this   revision cycle your notes will come in one or two  places many information will be fed into your mind   and your notes will keep on shortening  I mean the size of your notes will keep   on shortening so suppose you are getting  call for interview okay suppose the name is collected here you can confidently say yes you  can schedule and you can just look at those   short notes and you can just sit on the sit in  the interview okay otherwise people generally   get panic okay that I have not revised the things  and this must think is there to revise no this is   not the right way actually you will get panic  you have to make short notes you have to keep   on revising you have to remember the things  whatever you are studying all these things   should be there with you only then you will  be able to land a good job there should be a   proper strategy that should be a proper roadmap  this is what the roadmap I just explained also I am creating a self-study plan for those who are  planning to become an Azure data engineer okay   and that self-esteady plan will be bifurcated or  bifurcated in day basis so that you can know okay   today's given I need to read this today's day  two I need to read this and practice this today   is day three I need to read this in practice this  so all that kind of plan on a very granular level   I I'm I'm creating that is in process and once  that will be created I will be posting it in my   LinkedIn account you can go in my LinkedIn  account and you can download it from there   I'm sure it will be it will be very helpful for  you to get an idea and get an idea of how how   what all topics I need to learn in this particular  services in these languages what all things I need   to learn because people are generally confused  right people are generally considered at what   should we learn so I'll be helping them  by giving a self-study plan they can just   they got from there they can just download  that and they can make the that self-acity   plan is their reference and they can start  building the things over that okay so this was   all I had for this this was all I had for you  in this video I hope you liked this video and   if you really liked it please like subscribe  and share to my video thank you so much bye
Info
Channel: Arun Kumar
Views: 10,720
Rating: undefined out of 5
Keywords: college, career, azuredataengineer, arunkumar, forumde, doubt, collegestudents, mentor
Id: QGm2ENy9wIo
Channel Id: undefined
Length: 18min 54sec (1134 seconds)
Published: Sat Sep 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.