AZ-900 Episode 15 | Azure Big Data & Analytics Services | Synapse, HDInsight, Databricks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Episode 15 focuses on available services for big data and analytics in Azure. Join me a learn what is considered a big data and what are typical services for big data in Azure like Synapse Analytics, HDInsight, and Databricks. ☁🎥📺

📺 Video: https://youtu.be/JUQXx0R0RfE

🌐 Site: http://marczak.io/az-900/#ep15

💬 Practice Test: https://marczak.io/az-900/episode-15/practice-test/

Enjoy!

👍︎︎ 6 👤︎︎ u/AdamMarczakIO 📅︎︎ Sep 16 2020 🗫︎ replies
Captions
hello guys welcome back it's adam and in this episode we'll be focusing on what is considered big data and which services in azure help us process and analyze those kind of data sets stay tuned [Music] as part of episode 15 we will learn about free azure services this time it's going to be azure synapse analytics hd insight and data breaks but before we move to those services let's backtrack a little bit and let's talk about what is considered big data big data is a field of technology that helps us with solving typical challenges around extraction processing and analysis of our data sets but typically in order to call something big data certain characteristics has to be met the first one is velocity velocity means how fast is our data arriving how fast how often do we need to process that data are we processing that data in batches or maybe in real time a second characteristic is volume so are we talking about megabytes gigabytes terabytes or even petabytes of data and the third one is variety variety means that how structured is our data are we talking about tables or databases or maybe something very complex like video or social media information based on these three characteristics we can define whether this data is considered to be big data or not but as soon as we go high on one of those vectors one of those characteristics traditional softwares will not be able to process this kind of data sets and this is how big data technologies came to be they were specifically designed softwares to help us with those kind of challenges which brings us to our first service azure synapse analytics but in order to talk about synapse analytics and its benefits we need to talk about how typical process looks like when it comes to transforming an analysis of our data most data engineers will start their process by identifying where is their data whether those are flat files some web services or databases and from there a typical development process starts so developers will first need to ingest their data from their sources to the cloud then they will need to transform those data sets and store them somewhere and after storing the data expose this to other tools like reporting tools so that business users can take insights out of their data and make good business decisions and azure synapse analytics helps with all of those steps first of all by providing a feature called synapse pipelines this tool helps developers to ingest and transform their data using visual workflows additionally synapse analytics comes with embedded apache spark a leading technology for big data analytics and transformation in addition their synapse sql and massively parallel processing database clusters based on a popular sql server this feature helps with transformation using typical sql queries storing of your data but also serving it to your reporting clients and all of that is baked into something called studio synapse studio which is a unified experience to manage all of those tools and features and perform all of your data transformation in a single place and all of that is nicely integrated with another azure service called datalake so that ingestion transformation and storage of our data can be also done directly on the data lake to summarize azure synapse analytics first of all and foremost is big data analytics platform it's a platform as a service offering in azure allowing users data engineers data scientists to perform data analysis and data transformation over very large data sets and it has multiple tools baked in tools like apache spark or efficient big data transformations synapse sql which allows us to use sql server familiar tools with massively power processing design with dedicated or outlook capacity you also have synapse pipelines which allow you to visually build your ingestion and data transformation workflows and all of that is combined into a single studio experience a unified experience for your data transformation needs next on our list is azure hd insight as we were talking about the typical development process hdinsight can also support pretty much every stage of that process by providing so-called big data clusters and when it comes to hdinsight there are many clusters available clusters like hadoop clusters spark kafka hbase hive machine learning services or apache storm or many others in general the idea of the service is to provide you with open source big data technologies from the market allow you to provision clusters so that microsoft manages those clusters and you just grab the technology that you need to perform specific tasks that you need all of these tools serve a different purpose but you can use them in combination to support end-to-end development lifecycle for your application so azure age hd insight is a flexible multi-purpose big data platform in azure it's another platform as a service offering allow you to choose from multiple open source technologies on the market whether this is hadoop spark kafka or many of the other available technologies and lastly we have azure data bricks azure data bricks is quite similar to hdinsight except the clusters that we create are based on apache spark and apache spark alone and the main purpose of this service is to help you with data transformation at large scale because apache spark is one of the leaders when it comes to performance and data transformations for big data but besides the data transformation the creators of databricks also wanted to provide this as a collaboration platform for data engineers and data analysts so that they have a single place where they can manage their cluster and collaborate on their data solutions in azure portal i will no longer start by creating services we've seen azure marketplace enough so instead i will go to azure data break service that i created previously in here i can use a button to launch a workspace which will take me away from azure portal into separate portal designed for collaboration on azure databricks solution a so-called workspace the first thing that we need to do inside of the new workspace is create new cluster by opening a new cluster panel we can specify a cluster name in my case this will be demo if i want i can tweak around some options like changing the cluster type or the runtime version i can also tweak auto-scaling features and auto terminate options which is amazing from the cost perspective if i'm happy with all my selections i can simply hit on create cluster and just wait creation of the cluster takes about 4 to 5 minutes as the cluster has been created and is in running state we can start working but notice how easy it was i was just clicking few buttons and right now i have a big data technology cluster based on apache spark run in the cloud and ready to be used now we can create some scripts by going to the workspace on the left hand side where i either have my personal workspace in user section or in a shared workspace where i can share and collaborate with other users i'll go my to my personal workspace i will open my catalog and create new notebook notebook are simple scripts in azure databricks i can select a language like python scala sql or r in my case i have a demo using python and i will call this demo notebook inside of my notebook you will be able to notice that there are some small text blocks here and i can use those text blocks to write my scripts for now i will copy paste the script that i prepared previously for now we don't need to focus on the details of the script what the script does is connects to open datasets from microsoft with some sample data and it literally takes seven lines of code to do that it's very simple and very straightforward but once you pull in the data you can use familiar sql language in this case it's spark sql so you can use familiar sql language to review your data and analyze it in here you can also download this data as a csv change the chart type to to say bar chart and do all sorts of data transformation and analysis based on your needs to summarize data bricks is a big data collaboration platform it's another offering from azure in platform as a service category but it's really about providing this unified workspace where users can manage their notebooks clusters data and manage access to other users and collaborate with them so that users can focus on their data solutions rather than on the management of their big data platforms and it is based on apache spark a leader when it comes to big data transformations on the market it very well integrates with common azure data services by having out of the box connectors so it's very easy to pull data out of azure services and output data back after our transformations are done so let's summarize this episode today we learned about azure synapse analysis also use a modern end-to-end approach for data warehousing and analytics over big data sets also we learned about hd insights with a fully managed open source analytics service with a lot of supported frameworks and tools tools that are currently marked as a leaders when it comes to processing of big data sets and lastly we learned about azure databricks an apache spark based collaboration platform in the cloud which very easily allows us to process big data sets by abstracting the difficult topics when it comes to big data platform management materials and the cheat sheets are available under episode 15 on my website so check them out and for this episode we're done next one is about ai so definitely stay tuned for that if you like my work support the channel by subscribing liking and commenting and see you in the next episode
Info
Channel: Adam Marczak - Azure for Everyone
Views: 82,842
Rating: undefined out of 5
Keywords: AZ-900, Microsoft Azure, Microsoft Azure Fundamentals, Azure Fundamentals, Full Course, Certification, Exam, az 900, synapse, synapse analytics, hdinsight, databricks, big data
Id: JUQXx0R0RfE
Channel Id: undefined
Length: 10min 24sec (624 seconds)
Published: Wed Sep 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.