Introduction to Azure Data Services [Full Course]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
top three all right ready ready all right three two one hello and welcome to this three hour azure workshop by pragmatic works [Music] my name is mitchell pearson and i'm excited to be with you for the next three hours in which we're going to dive into the world of azure what is azure and specifically we're going to dive into talking about azure data services so let's jump over to our powerpoint presentation and start the show in this workshop we're going to talk a lot about the different azure services that are available but before we do that let's jump right in and talk just a little bit about myself and some of the highlights there i am the training manager here at pragmatic works i've been with pragmatic works for just over eight years almost eight and a half years now and my role here at pragmatic works is delivering live training classes that we do one-on-one mentoring with clients who call us up and need that kind of one-on-one assistance and then building new classes for our on-demand learning platform and of course doing free events like this as well i've been very fortunate to author quite a few different books uh in the industry a couple of them around power bi power platform and one on introduction to microsoft sql server as well and i blog at mitchellpearson.com so if you haven't checked that out please feel free to connect with me there and then i also do a lot of youtube videos shorter videos that are not three hours in length about 10 to 20 minutes a lot of videos out there on azure data factory so if you want to follow me there check me out at youtube.com forward slash c mitchell pearson that would be the short link i also whenever i'm not doing power bi and azure in all things business intelligence i enjoy playing tabletop games with my friends and also with my family so let's jump right in and take a look at the agenda for today we're going to be taking a look at what is azure before we can really dive into this we need to have a little bit of a discussion around what is azure how does it help me why do i want it why do i use it we're also going to dive into specifically azure storage azure sql what are my different options for azure sql and then data factory data factory is going to be a big piece of what we're talking about today especially as we get into the second half of the day and then if we have time if time permits and i hope we do we'll see where we're at we're going to take a look at two other features that i really want to highlight here which is azure key vault to keep your data secure and keep it protected and then also azure synapse analytics which has just hit generally availability here in the last couple of weeks and it's going to be a game changer for azure so i want to get just a real quick preview into that as well if we have a moment to do that logistically speaking there's a couple of things i want to talk about here there will be a recording for this after this is done so we're streaming this live right now and as soon as we're done this is going to be recorded and posted onto the pragmatic works youtube channel so it will be available if you miss anything if you have to step away if you can't stay with us for the entire three hours i totally understand this will be available for your viewing pleasure afterwards we will have a break around 10 to 15 minutes around that 12 25 eastern time mark so there will be a break in there so if you're holding off to run to grab something to drink or run to the bathroom just hold on a little bit longer we will have a break in there and then we're gonna have a lot of questions uh we were looking at the attendees here we have people who have joined us from all over the world and azure is such a huge platform that we know there's going to be a lot of questions as we're going through this so we'll try to answer those questions when it makes the most sense to keep with the flow of this workshop which will generally be probably after that first break and then we'll try to do a follow-up blog on this as well to try to answer the most commonly asked questions as we're going through this all right so obviously one of the reasons we're doing this is because we here at pragmatic works have a huge training platform right now we have over 60 classes on power bi power apps azure and sql server and we're also going uh fortunately and having a huge sell right now 65 percent off all of our on-demand learning classes here at pragmatic works and so brian um my co-worker is going to put that link in the chat window for you guys so if you want to sign up right now and take a look and take advantage of that offer please go ahead and do that this is recorded so there's going to be some people that are watching this in january february march of next year and unfortunately that code is not going to be available this is something that we're doing right now and so there is that so keep that in mind just to give you a quick glance at some of those classes that we have online we have classes on azure sql database we have an intro and advanced class on azure data factory we have a course on azure logic apps a data science with azure ml analysis services in azure data bricks so we have about nine or ten classes right now that are focused only on azure and that go very closely with the class that you're seeing here today and then one last thing i want to mention before we jump into it we have another one of these events coming at you in february where we're going to be talking about power apps i don't get to work with powerapps as much as i would like but it's probably one of my favorite tools in the power platform because powerapps are so cool what you can do with them and so you definitely do not want to miss that presentation that's going to be coming on february 11th and we have a sign up link for that as well at pragmaticworks.com all right so that gets us through all of the introductions for today we're going to jump right in and start talking about microsoft azure so what is microsoft azure we get that question a lot well in just a quick snapshot here microsoft azure is a cloud computing platform plus infrastructure for essentially building testing deploying and managing your applications and services and your data so historically speaking and i say historically speaking i haven't been in this industry 30 years but if i even go back 10 years 10 years ago all the clients i were working with we were doing everything on-prem we had a server on-prem we had database administrators we had network professionals we had infrastructure professionals that were responsible for buying the hardware installing the hardware patching the operating system installing all the software and keeping everything running smoothly but over the last you know 2008 i think is when azure was introduced but in the last probably six or seven years we've seen a huge transition to the cloud and so what the cloud gives you is this ability to really focus on development building your applications building your network building your framework around your data but you're allowing microsoft professionals to manage your hardware manage the infrastructure keep your data secure as well as keep backups of that data as well and so azure just gives us a lot of flexibility in that and then we store that data in microsoft data centers that are all over the world internationally microsoft has data centers all over the world and so there's been tremendous growth in this area and azure is something that you hear more and more about in just everyday conversation now azure is pretty overwhelming when you look at microsoft azure there's over 22 different categories and there are hundreds of services to choose from so if you're jumping into azure for the very first time and you're looking at it it's a little bit intimidating but one of the things that i found out coming from a sql bi background on-prem is that azure is actually easier in a lot of ways and so once you get into it and you start provisioning resources and you see how quickly you can spin up an azure sql database how quickly you can spin up a virtual machine you start to realize wait a minute this is pretty awesome there's a lot we can do here but what we're going to do today is we're not going to focus on hundreds of services and we're not going to focus on 22 different service categories we're going to focus on just azure data services another thing that i want to mention is if you're brand new to azure and you've never experienced this before you can actually sign up for a free trial account and you get a lot of information with that you can test and deploy enterprise apps for free you can create custom mobile experiences you can also go through kind of data analytics and gaining insight from your data that comes with a two hundred dollar credit so you're gonna get about two hundred dollar credit from microsoft for 30 days where you can go through and look at all of those different services you're also going to get 12 months of popular free services and then there's 25 services that are always free and always available for you to explore and have fun with and i'll show you where that link is we also have a link that we can drop right there in the chat window for you as well so you can feel free to sign up for that today is not one of those classes where you're going to follow along with the demos that we do but you can definitely sign up for that free trial explore those different options and come back and watch this video and maybe duplicate some of the work that we do in this video as well so let's jump in and start focusing more on a more specific segment of azure which is going to be azure data services before we jump into azure let's talk a little bit about on-prem environments right when you were managing an on-prem environment for your applications for your data for your different services you were responsible for everything as an organization you had to go out and procure the hardware right you had to buy the disk space you had to buy your cpu you had to install all of that and put it all together you had to set up the network any virtualization that you wanted to do needed to be done you had to install the operating system and keep it up to date so anytime there was a new patch or a new update you or somebody on your team had to take care of that you then had to install all of your software and had to keep the latest version of that up to date including what apps and how you managed all of your data and there's a lot of work that goes into that when you're talking about an on-prem environment and unfortunately a lot of that just being honest with you as a consultant going to com companies literally all over the world i would see over and over again that a lot of that stuff did not get taken care of the way it was supposed to but what if we could take that administration stuff and all of that hard work and the hardware and the infrastructure and just hand that off to somebody else and focus on just doing development developing our apps and developing our data well that's what we get when we start to transition over to azure and so the next thing that we have here is what's called infrastructure as a service infrastructure as a service is where we give a little bit of that control over to microsoft and their professionals right and we say look you manage the hardware you manage the infrastructure the storage the servers you do the virtualization for me and now my team we'll go ahead and take over the operating system we'll keep it patched we'll install the software that we want we'll do the apps and the data and we'll do all of that so that's a big part of infrastructure as a service and when you're going to azure you're really going to be choosing between two things most of the time it's going to be infrastructure as a service or platform as a service infrastructure as a service for a lot of companies there's their their first attempt at kind of testing water with the cloud platform and giving over some of that control but there's another level to this and the next level to this is what's called platform as a service platform as a service and you see in blue here is where the vendor in this case is going to be microsoft azure manages a lot more for you so now they're not only managing the hardware and the infrastructure and the virtualization they're now keeping your operating system patched and up to date they manage your software they keep all of that up to date for you and now you're focused on just app development and data development so you're focused much more on that piece and then if your database crashes guess what they have a restore a database ready to go you can restore it instantly if a data center goes down you can have it set up to where you immediately fail over to another data center and so you don't have to worry about if that's being done because microsoft guarantees that it's being done in their service level agreement contracts that they sign with you and then the last thing that we have here is software as a service and software is a service that's kind of like office 365 dynamics 365. that's your software as a service and they pretty much do everything for you you just input like insert data and read data right write and read data from those platforms so what we're going to be taking a look as we dive into the data services in azure are these infrastructure as a service and then platform as a service solutions that we have available now in this diagram that i've created for you here you're going to see on the left hand side all of the different sources that you can potentially connect to and you're going to have sources that are on-prem coming from an on-prem environment data sources that are coming from cloud environment and then you're going to have structured data and unstructured data and there's all these different data types that are out there and so the problem that a lot of companies are facing right now today is they need a tool that will allow them to connect to these hybrid data sources data sources on-prem data sources in the cloud but also all of these different types of data sources and ingest that data clean that data up apply business rules and transformations to it and then load it into kind of a curated data center right and so how do we do that today well a lot of us come from that ssis enterprise bi background and ssis is no longer really built to do this because ssis doesn't have a lot of cloud etl capabilities it can't connect to all of those new data sources that are coming out and all those new services in the cloud and so the solution for that as you transition to azure is most of the time going to be azure data factory if your background is ssis azure data factory is almost a clone of it in azure where you have this nice graphical user interface experience you can drag and drop and you can perform those different activities we also have another great tool for extracting and transforming data in something called azure data bricks that's another one of those classes we have by the way on our on demand platform azure data bricks gives you a lot of flexibility and functionality there where you can write in r python spark sql or scala and you have just a lot of flexibility but in order to do those transformations and data breaks it requires a lot of code so the people who generally are going to be using your your azure data bricks are going to be your code heavy data engineers and your data scientists because they're already familiar with it and that's what they're used to and so for a lot of people we're going to be doing azure data factory and for a lot of people we're going to be doing azure data breaks depending on what you kind of gravitate to with what your experience is and then we have azure synapse analytics now azure synapse analytics is very interesting i want to get to a quick preview of this at the very end but azure synapse analytics is essentially going to be a combination of both of those different etl options that we currently have available we got an interesting comment here from one of our attendees today as a traditional dba this handover to microsoft was very scary to me i want to control my server but after two years i realized that i have more time to focus on improving stuff and that's from peter smith which i know peter so it's good to see you here peter on the call today and i'm going to talk about that as well this is a really big point about what do i do now as a dba right you've taken away all of my responsibilities you focus on performance and improving app development all right and then once we've curated that data we've cleaned it up we've applied business rules and transforms what do we do well we're going to load that into some storage layer somewhere ideally in the cloud so we can dump that into azure data lake blob storage account azure sql database maybe we're going to be doing big data analytics we need that massive parallel processing all right we're going to take a look at sql pools formerly known as azure sql data warehouse and then we also have an option of using azure cosmos db which cosmos db is that non-relational way of storing data as well and then of course you know that there's lots of other ways to store data in azure as well but these are kind of the the main players that you see when you're moving there and then if you want to report off of that data how do you report how do you connect to it well we can just use power bi power bi easily integrates with azure because they're both built on the backbone of azure and then you see up in the top right corner there i also have just an icon there for azure key vault once again that's a great way to keep your secrets and your passwords and information secure as you're working in azure all right so these are the different data services that we have available we're going to jump in now that we've kind of done a quick overview and we're going to start talking about azure storage so there's a couple of different storage options that are available in azure specifically blob storage or general purpose and then azure data lake we'll get to that in just a moment when you're working in azure we're going to take all of the different resources that we create and we're going to put all of those into something called a container a logical container and so resource groups are a great way of taking all of the different resources you're creating for a project or for a department or for something in that way and you store all of them into a resource group that resource group is very similar to like a directory location on your laptop or on your machine and so we store all of that into that directory location and the benefit of that is that you can deploy update and delete all of those resources as a group in fact in our azure data factory course the advanced course we get into continuous integration and continuous development with azure data factory where how do we kind of build that enterprise level deployment where we deploy from one environment to another environment and we do that in part through resource groups in updating existing resource groups and so that's a big part of azure are these resource groups but every resource that you provision every resource that you create will be associated with a resource group and that resource group is going to store all of your metadata information about the resources that you're creating right what type of resource is it what's the name what kind of resources are allocated to it so on and so forth and after this class is over today what i'm going to do is i'm going to go to the resource group that we create and i'm going to delete everything that's in that resource group as one operation so that i can make sure everything has removed or is removed essentially all at one point so those are resource groups and then when we start talking about azure storage and the options available azure storage is really the general purpose account in data lake this is kind of our cheapest option for storage that we have available in azure and so historically speaking there's always been a lot of debate in conversation should i have a general purpose account or should i have a data lake but as of data lake gen 2 a lot of that debate has kind of gone away because what we recommend to people is if you're going to be creating a storage account in azure created as an azure data lake gen 2 unless you need a couple of those features that are available in a general purpose account that don't exist in the azure data lake and there's a couple of them but we're going to get to those here in just a little bit let's talk a little bit about parity or feature functionality similarities between a general purpose account and an azure data lake historically speaking if you go back to gen 1 gen 1 data lake did not have hot and cold storage tiers you also didn't have the ability to have really redundant storage so you didn't have some really key elements but some of the huge benefits of data lake is you have azure active directory integration and it was hadoop file system compatible the biggest thing around azure active directory is now you can apply with data lake security at the file level and at the folder really with the folder level whereas with general purpose accounts you cannot and so there's a lot more flexibility around security with azure data lake that you don't get with those general purpose accounts and so once again just to reiterate we recommend that if you're starting this from scratch building a proof of concept you start with the data lake unless you need a general purpose account when we look at storage security we know that azure data lake uses azure active directory and it has that integration blob storage account uses an access key and that access key is a little bit dangerous because if somebody gets the name of your storage account and they get that access key they can essentially get full access to your account now the access key should be regenerated every you know 15 or 30 days on a schedule but that's something to keep in mind but with that with the gen 2 accounts we can use and take advantage of the login id that azure active directory there are some compliance things that we get with general purpose account that we don't get with data lake we're not going to dive super deep into those today but i want to mention them one of those compliance things we see in finance a lot of times is what's called worm storage write once read many once the data has been written it cannot be modified it cannot be changed it is immutable this is a big thing once again for a lot of companies that have to meet certain regulatory rules azure blob storage offers that capability of doing worm storage azure data lake does not azure blob storage offers the capability of doing soft delete so if you accidentally delete a file through some erroneous process or through some malicious activity what actually will happen there is you can restore that because it was a soft delete azure data lake currently doesn't offer that functionality and so if you need those types of features you might choose to go with azure blob storage or the general purpose account not an azure data lake but in most other scenarios you're always going to go with azure data lake here so a little bit more about azure data lake it's designed specifically to store massive amounts of data for big data analytics so once you store the data in your azure data lake you can retrieve that data in a number of different ways right so the compute that you use to do analytics on it can be hdn site hadoop cloudera data bricks you can use azure sql data warehouse which is now known as sql pools within azure synapse but the compute that you use to connect to that data to return analytics can vary but it's designed specifically for big data analytics key features here unlimited scalability it's compatible with hadoop you get support for access control list once again giving us that kind of file structure that's in place and being able to apply permissions at that level you get an optimized adfs driver and that is designed once again for big data analytics and then of course we get that redundant storage so you can have zone redundant and geo-redundant so that if one data center goes down you can immediately fail over to another data center and your data is there and your application is back up and running in no time right and so these are the key features of azure data lake there is a really awesome tool here that you might be interested in as well and that is going to be azure storage explorer azure storage explorer is a great way it's an application that you can open on your windows machine your mac os linux device it works with both general purpose and azure data lake and once you open up this application you can connect to your azure account and you can see all of your different resources that are in azure whether it's a general purpose account or data lake and you can interact you can upload files you can upload folders you can change permissions you can do some security stuff in there so a lot of the stuff in the azure service that you would do you can also do in the azure storage explorer it's a great tool i use it all the time i would highly recommend downloading that and taking a look at that all right so what we're going to do is we're now going to jump over and take a look at the azure portal do a quick introduction just in case you've never seen it before i'm going to talk about the navigation and kind of the ease of navigation but there's a lot there when you open up the azure portal for the first time it is absolutely without a doubt it can be very intimidating so we're going to open it up we're going to explore the features and then we're going to go ahead and create our first couple of resources here a resource group and then an azure storage account specifically a data lake all right here we go so we're going to jump back over and i've already opened up a web browser here but what i'll do is i'm going to open up a new web browser and we're going to go to azure.com and when i go over to azure.com what it's going to do there is it's going to redirect us to you know en slash us right so english us and now i'm sitting at azure.microsoft.com and there is a lot of information here there is a wealth of knowledge that microsoft has put out one of my favorite sections on this entire website here is going to be the documentation page this gives you a lot of great information about all of the different services so yes when you're looking at azure there are hundreds of services but microsoft has documentation on all of them and it gives you a foundation it gives you the fundamentals of what does this service do how do i work with that service the other thing that i'm going to show you here is where you would go if you wanted to sign up for a trial account so all the way over on the right you'll see this little green option right here in green text that says create a free account that gives you 200 in azure credit for 30 days with 25 plus free services that are always available to you and so if you haven't done this yet if your company hasn't quite made that migration this is a great way of just getting in there exploring it and starting to learn azure for yourself once you've gone through this and you've created your account you've looked through the documentation you have an idea of what you want to do what do we do next well we're going to sign in to the azure portal and so normally you would see right here a sign in option well i'm already signed in so i'm just going to click on the portal and that actually will take me directly into the azure portal where i can start developing resources so i'm going to click right there at the very top and that is going to open up my azure account so it'll make me choose which subscription i want to use here and this is what i see when i first log in this right here is my home page you can choose in your settings whether you want to be in your home page or whether you want to be in a dashboard here but let's kind of walk through real quick the navigation and get you familiar with that over here on the left hand side we have our navigation pane i can collapse that pane and i can expand that pane to kind of get it out of the way in the settings in just a moment you're going to see there's some other stuff you can do with this in the first section here right here at the top you will spend a lot of time right here in this section so for example if you wanted to create a brand new resource in azure you wanted to create a storage account a sql database you wanted to create a data factory so you could start building pipelines and moving data you would go to create a resource you would search for the resource that you want to create and you can walk through those steps and we're going to create quite a few of those resources as we work through this workshop today you can also go to your home screen and as i mentioned before i am currently on my home screen so this shows me some common azure services and it also shows me my recent resources that i've been working with in azure down here on the bottom you have your favorite section now my favorite section is going to be different than your favorite section and your favorite section is going to be different than everybody else's because what you do in azure is you're going to take the the services that you work with the most and you're going to add them into that favorites pane in fact if i hover over resource groups right here and i freeze the screen you'll notice i can click on that little star icon right there and that will add it to my favorites or i can remove it from my favorites you can also do a little bit of drag and drop here where you take these items and you move them around so if i want resource groups to fall a little bit lower on that list i can just drag it up or drag it down and move it up and down on that list all right and now it's trying to move everything at one time there we go all right so that's how you kind of work with the favorites here but you'll get very familiar with that when you're working in azure that's a very common thing the other thing i want to show you is just some basic fundamental settings here across the top so this tells me immediately which account i'm logged into and as you're working with azure it's very common to have multiple subscriptions and multiple logins so you might be in the wrong account but that's my account that i'm currently logged in if you want to provide feedback to the microsoft team and tell them how great they're doing you can do that right here right are you satisfied if you are you tell them you are give them some kudos there pat them on the back or you can tell them if something isn't quite working the way you would expect and give them that feedback you'll also notice right here under question mark there's a lot of support here that you can look at you can take a look at what's new in azure you can take a look at azure updates they also have a guided tour that you can launch that'll kind of walk you through the interface and get you more familiar with it and then there's keyboard shortcuts as well the help and support section up here at the top will take you to all of that documentation that we talked about and also give you the opportunity to support or submit a support ticket to microsoft right the other thing i want to dive into is the portal settings now your experience might actually look a little bit different than mine and that might be because you've changed your settings or maybe i've changed my settings from the default when i first logged in myself and so under settings right here you'll notice that you can set up how long are you active before you get signed up you can also set up your default view do you land on a home page like mine is here or do you land on a specific dashboard that you've created and so one of the things you can do in azure is you can create dashboards and we won't get into dashboards today but those dashboards that you create you can set up things like service health and what kind of resources it's using if it's running if it's paused and that can be a big deal as far as saving you money right and so if you've created a dashboard that you want to make sure the first thing you see every time you log log into azure then you can click on that right there you'll also notice that that navigation pane i was showing you a moment ago that portal menu mine is docked but you can change that to fly out here and when you change it to fly out and this is what a lot of people i from experience theirs looks like this is you can now just completely hide that menu so you come up here you click on what you want but whenever you click away it goes away and it just gives you more real estate to work with when you're working in azure and so that's totally up to you i personally prefer just leaving mine docked that way it's always available when i want it and i can minimize it when i want to get rid of it and then the final thing here which is really cool is you can change your theme when you're working in azure so i can change that to a dark theme if i want to see a dark theme here we can do that you can change it to different themes you can change the contrast level so there's a lot of options that we have available when we're working right here in the settings sec settings of azure and that's how you get to that right there all right you'll also have your notifications pane right here that'll be the last thing i talk about this will tell you when you're provisioning resources if they're still under the provisioning process if they've completed or any other notifications that are kind of popping up as you're working in azure they'll show up right there in that section all right so we've gone through an introduction to azure we've walked through the portal most of you are probably familiar with that some of you are probably brand new what are we going to do now i'm going to go ahead and create a couple of resources once again do not feel obligated by any means to follow along we have a lot of demos today there's no way i can fit everything in and have everybody follow along but i'm going to create a resource group and then we're going to create a data lake account we'll upload a file to that data lake account just to see what that looks like we'll take a look at it with storage explorer so that's going to be our next section let's jump right in and take a look at how we do that whenever i want to create a resource in azure let me point out that you can create a resource in a number of different ways you can create it through powershell you can create it through arm templates which are automated kind of azure resource scripts that you create or you can just do it right here in the ui for simplicity's sake we're going to do everything that we do today through the ui but keep in mind there are ways to automate this and to set this up through code so what i'm going to do is up here in the very top left in that navigation pane i'm going to go back over here and click on create a resource and when i click on create a resource this brings us to the marketplace right here we can see those 22 different categories and those hundreds of different services that are available in azure once again this can be pretty overwhelming what i'm going to do is i'm going to search for a resource group and this is me assuming that we haven't quite created a resource group yet for this project or for this webinar that we're doing and so i'll search the marketplace for a resource group anything that kind of matches that search criteria will show up and then what we want is this option right here so this resource group option right here is what i'm going to do and i could you know hit the like button right there if i want to and then we can hit create so let's go ahead and click on that resource group and this view that you're seeing right here this is the view that you're going to get and you're going to see every time you pretty much create a resource through the ui in the in the azure portal you're going to see this screen that pops up it tells you what the resources you're creating you have a create button and then down here at the bottom it gives you an overview of that service it tells you the plans and then it also gives you usage information and support very very helpful stuff but we're going to dive right in here and click create and what that's going to do is bring us to the resource creation page now this is where it can get very intimidating because there's a lot of options that are available on the screen however you don't have to worry about all those options when you're first getting started with azure most of the time you can give it a couple of parameters a couple of properties and you're good to go and that's exactly what we're going to do here we're going to give our resource group a name we're going to choose the location where it's going to be stored and that's it so the name that we're going to use here is going to be and i've already got a backup set up so i'm going to have to give this a little bit of a variation from what i already created earlier in case we ran out of time or had some internet connectivity issues but i'm going to create one called azure workshop and since this is going to be our learning with the nerd events i'll add that on to the end so learning with the nerds and then we'll just say you know i'll put my initials on the end and then i need to choose my region so choosing the region and the location is actually very important for latency sake for productivity and performance but also for cost we'll talk a little bit more about that cost as we dive into this but i'm going to grab the east us 2 right there and then that's going to be it so once you've done that down here at the bottom i can go to review and create and we can create that resource you will notice across the top there's a couple of other categories here like tags and review and create just real quick tags are going to be tags are going to be a way of helping to understand pricing and billing so what project was that resource associated with who was the owner who purchased it so they're very very helpful but you don't need them immediately right out of the gate and so we're not going to worry about that for today and then we're going to go down and do review and create so as this is going through the creation process one thing worth mentioning here is i want to keep all of my resources in the same region ideally for reducing cost because if you move data from one region to another there's actually an egress charge associated with that for moving data and so there is a cost with that so that's why when you're creating your storage accounts you're creating your databases you're creating your data factory and you're moving data if you can keep it all in the same region you definitely want to do that and keep it in that same data center there all right so what we're going to do next is we should have that resource group created and you see that it is created right here and then from there i'm going to go over to my resource groups now how do we get to that resource group well a quick way of getting there is you can just click on right here go to resource group from your notifications but i can also go over here to the very left and i'm going to teach you this navigation here and i'm just going to click on resource groups from my favorites section so i'll click right there and that's going to open up our resource groups here and then i'm going to find the resource group that we just created here called azure workshop learning with the nerds and then my initials right there on the end and i'll open that up now i won't go through this for every resource that we create but you'll notice that every resource you create essentially looks kind of the same when you open it up right here what just happened is microsoft opened up something called a blade it's like this vertical pane that opens up and so they open up kind of horizontally across your screen so when i click on that resource group there we get this vertical pane that opens up this is what's called a blade in azure and if you keep clicking on new tabs it'll keep opening up these vertical tabs which get a little bit frustrating to navigate when you're brand new to azure because it's a new experience but down here across the bottom i can grab the scroll bar if i had a scroll bar and you can move back across those blades or up here in the top left you can close that out and you can kind of go back to the previous blade as well the other thing that i want to point out is we are currently looking at this resource right here called azure workshop learning with the nerds lp and you'll notice just below that that it tells you what kind of resource you're currently in so this is a resource group and directly below that i have all of the different administrative properties that i can set up right here and there's things like resource cost events access control who has control of this you can look at deployment history properties all of that kind of stuff and so different resources will have different properties that you can control but you don't need to know all of this when you first get started with azure all right so what we want to do is we've created a resource group this is just a logical container i now want to go ahead and create a azure data lake so let's go create an azure data lake and i'm going to go ahead and expand our navigation pane over here on the left we're going to click on create a resource and i'm going to search for not azure data lake this is tricky we're actually going to search for storage account so the way that you create a general purpose storage account or an azure data lake in azure is going to be through the same service so we'll search for storage account it looks just like this and this storage account that we're about to create will create for us either a general purpose account or a data lake based on one button that we click so now i'm going to go ahead and click create you'll see the screen right there looked identical to what it looked like for the resource group and now we have all of these options across the top once again very very intimidating when you're brand new but you don't need to set all of that up especially not from the very beginning you don't need to know in most of those properties that you see across the top we can actually set these up and configure these after the fact so after you've provisioned that resource after you've created it you can go in there and you can set those up so the first thing that we want to do here is we want to go ahead and select our resource group and the resource group that we're going to be using is the one that we just created called azure workshop learn with the nerds and then my initials so we'll click on that right there and then we need to give this a storage account name now there's a couple of requirements here around the storage account name one it must be all lowercase characters two it has to be globally unique across all of azure because when you create a storage account the way that you connect to that is actually through the name itself and so for the storage account here we're gonna do azure workshop learning worth the nerds and then i'll put my initials on the end of this one as well once again i want this the resource group or the location of this resource to match my resource group and my other one so i'm going to go ahead and stick with and stay with the east u.s and then down here at the bottom you'll notice that we are currently creating a general purpose v2 account this is not where we change this to either general purpose or data lake this is where we just say hey we want v2 i'll show you where we change this to data lake here in just a moment the next thing that we do have to decide is what type of redundant storage do we want and so the cheapest option here and the one that's going to be built in is what's called local redundant storage and the way that microsoft works is when you're working with your data lake or your storage account it's going to keep three copies of that data at any given time so if in a data center if a data rack goes down you have at least two other copies of your data on other data racks this is one of the huge benefits of azure is you have that kind of fail-safe built in whereas if you were on-prem and your your server crashed what happens your server's down you've got to go build a new server provision the hardware install everything or get the existing one fixed with azure you're back up and running in seconds now if that entire data center goes down somebody is doing some construction they cut the lines to that data center then you're not going to have any data if you're doing local redundant storage because that's redundancy within that data center and as you'll see when you click the drop down here there's lots of other options so you can have redundancy within that zone that geographical zone more than 60 miles apart you can have geographical redundancy and there's some other options as you move up and you get additional redundancy it does have a cost associated with it so that's something that you have to consider as well for this example i'm going to go with the cheapest option possible and we're going to go with local redundant storage all right the next thing that we're going to do is you'll see across the top we can go into networking this is where you can control access through ip addresses you have data protection under data protection you can do things like soft delete for blobs so if somebody deletes a file accidentally you can recover those files you can do versioning for your blob so you can roll back to a previous version you can also turn on a point in time restore for your container so if a folder gets deleted you can restore that entire folder or you can essentially roll back to a previous version if bad data gets written to that folder we're not going to mess with any of that for now but what we are going to do is we're going to go to the advanced tab and there's one thing that i want to do on the advanced tab there's lots of options in here but what we're going to do is we're going to zoom in and right here where you see hierarchical namespace i am going to enable that this is the one button that you click in azure when you're creating your storage account to signify that this is going to be a data lake and you must select this at the time when you create that resource so you can't go back in and modify this after the fact excuse me once you create this as a storage account or as a data lake it's going to be that that's what it is so we're going to actually change this real quick and i'm going to turn on this to turn it into a data lake and then with that being done i'm going to go back to the previous page and you'll notice on the previous page that a lot of those options that were there under data protection are now disabled and so this is one of those things about azure data lake you lose that point in time restore you lose the ability to do soft deletes you lose that versioning for blobs and you also lose that other thing we talked about in the slides with that worm storage for those financial companies that have those regulatory requirements once you write the data it cannot be messed with it cannot be played with that's what you get right and so there are those limitations with azure data lake and so if you need these options then you have to go back and do blob storage account now here's the thing if i go and look at my azure subscription and i look at all my different projects that i've created i have lots of blob storage accounts i have lots of data lakes so you're not tied to one data lake or one blob storage account so if you have some data that requires that regulatory compliance with worm storage and you're worried about an application accidentally deleting files and you want to be able to use that soft delete functionality then for that project for that application maybe you go with a blob storage account but for everything else you go with your data lake that gives you that kind of heightened level of security right so you're not tied to one or the other you can always have a hybrid of different services in azure that's just a great part of that flexibility all right so we could go into advance and talk about all those different features but once again you don't need to know all that for this three hour webinar and you don't even need to know that to get started with azure you can just get in here and jump in and get started and then under tags once again i could come in here and type in something like project and i can say this project is the azure workshop and so then when i get my billing information this resource is now tagged for that and i can see exactly what my costs are and so it's definitely a best practice to add tags to all your resources we're just not worried about that today so once we've gone through all that we click review and create and it brings us right here where we can review our resource and then once you review everything make sure that everything is set up the way you want it we're going to click on create now here's the thing so far let's talk a little bit about price the resource group that we've created cost us zero dollars cost us no money the storage account that we just created is not costing us anything right now the way that you get billed for storage accounts is you get billed for writing data to the storage account and reading data from it so if i build the storage account and i don't put anything in it and i'm not working with it or i'm only working with it a very small amount of the time with a small amount of files it's going to be pennies on the dollar it's going to be very minimal cost associated with that and you can monitor that cost over time inside of azure and see what that looks like but so far i haven't created anything that's actually costing me any money now the storage account has been created we got a pop-up notification it's telling me right here on the screen that i can go directly to that resource i could but i want to show you the navigation right so what i'll do is go to resource groups over here on the left right there in my favorites and i'll click on that i'm then going to find the resource group that we created in this class called azure workshop learn with the nerds mitchell pearson and i'll click on that and then as i start creating resources and associating them with this resource group they will all show up right here in this screen and so now what we're going to do is let's go take a look at what this storage account looks like as i click on that azure is going to open up a new blade over to the right it's going to open up a new vertical blade so let's go ahead and click on that real quick right there and then in here once again we get all of these different properties here that we can take a look at and these are going to be the properties for that storage account down here across the bottom you see we have this horizontal bar i can go back to any of the previous blades so this takes me back to my resource group and this takes me back to all of my resources so this is a little bit tricky to get used to at first but definitely something that is important all right and so now i'm going to go over here and click on containers and under containers what we want to do is anytime you're uploading files or you are removing files from your azure storage account you have to put that into a container essentially so we need to create a container and i'm going to go ahead in this azure data lake i'm going to create a new container here and i'm just simply going to give it a name so let's find the mouse there i think i lost it see if we can find it i have lost the mouse i see it on the screen there but it's not on my screen all right so i'll do it like this here we go that's weird all right so we're going to zoom in we'll give this container a name i'm just going to call this one employees it does have to be all lowercase it's just a directory location that we are building and so i'll create that i'm going to leave it with private access here and there's my mouse that came back and i'll click create it was on the recording stream but it wasn't on my screen which was a little bit confusing and once you create that container we can now upload files download files and we can work with that now a lot of times you're going to be doing this through an automated process you'll be doing this through an application that you've developed you'll be doing this from azure data factory for this demo i just want to show you some quick interaction we're going to take a file that we have and we're going to upload that to the container just so you can say hey i've never worked with azure before is it really that easy how do i get data into azure well we've we've just created a storage account in a couple of minutes here we click on employees and when i go into that folder i can now click right here at the top and i can upload files directly to this storage account so that's what i'm going to do i'll click on upload here it opens up a new pane over here on the right i'll click on the little browse button right there and then i'm going to go over to my c drive and under the azure workshop right here i have a file in there called employees i'm going to grab the employee file and i'm going to upload that to my storage account so i'll click on open click on upload and there we go instantly that file exists in azure and it is now stored in a microsoft data center all right so that right there is going to be our employee file and remember that any files that we upload to our storage account we have to upload those to a container and those containers are essentially just think about them like you would think about your on-prem environment when you open up file explorer those containers are kind of like the directory location or the folder where you're storing your files and so the interesting thing about containers is that you can have access policies and security that you set at different container levels so that one group of developers or one group of users have access to one container but they don't have access to another one and so they're great for logically grouping files together but also for managing security there as well and so that's what we get with our containers now we have a file how do we take a look at this file and make sure that it uploaded it correctly it's now stored in a microsoft data center in the east u.s region well what i can do is actually click on that file right there and that opens up this new screen once again a new blade in azure and then up here across the top i am going to tell it that i want to click on edit and i can actually look at the file and edit it live right here so i'm going to click on edit and then we see that we have a very simple file with the text qualifiers there the pipe delimiter separating the first name from the last name column and that is a file in azure now the other thing i want to show you real quick is that there's other ways to interact with this so maybe you're like look i want to upload files but i don't want to know anything about azure azure is a lot there's a lot of navigation in there is there an easier way to do this well yes there is there's an application that you can use called azure storage explorer and azure storage explorer is great for uploading data downloading files also setting up stuff like security and properties on your storage account so if i go back over here i actually have already opened up azure storage explorer on my machine over here on the left i'm currently in the explorer view the explorer view allows me to go through and look at all of the different storage accounts but first you would need to sign in to an azure account and so you can see right here that i am currently signed in as mitchelljacks2003 at outlook.com that is our azure sponsorship and then i am currently only displaying storage accounts from that subscription right so then when i go back over to the explorer right here i can expand this option right here and i can see all of my storage accounts and you'll notice that i have adls gen 2 so that is my azure data lake gen 2 account i have my regular just general purpose storage accounts in here and then i also have some very plain and ordinary blob storage accounts so what i want to do is let's find that storage account that we just created called azure storage it was azure workshop done with my initial on the end let's go and refresh everything here i didn't refresh that was my problem already had it logged in so we're going to go ahead and find our azure storage account under my microsoft sponsorship right here there it is azure workshop learn with the nerds mp i will expand that option for that storage account expand the blob containers and then we have this container for employees and all of our employee files and then we see the file right there now if you want to upload files here never log log into the azure portal you can do that so i can come in here and say i want to upload i can upload an entire folder with tons of files thousands of files if i want or i could select just like we did in the azure portal i can hand select exactly which files we want to upload to the azure portal we can do that right there you can also grab a folder and you can download it and then there's a ton of other things that you can do in storage explore once again we can spend an entire three hours on storage explorer because it has really great capabilities but from a usability perspective you can do all of your interaction with an azure data lake or your storage account without ever logging into the azure portal once it has been created because you can upload and download and do all of your interaction right here now ideally we're going to automate all that we're going to automate it through data factory we're going to automate it through line of business applications that are writing the files and doing all that work but if you need to do some manual operations you can do it right here all right so in this this section right here what we did is we created a resource group we created a storage account we uploaded a file to that storage account we made sure that it worked exactly the way we expected it to and then we also took just a real quick glance here at microsoft azure storage explorer and this gets us this gets us started but we want to get into some more exciting stuff so we're going to move into a section now where we start talking about azure sql what are the different options available for azure sql and then when we come back from that break we're going to dive into the world of data factory and how do we now start moving data between all of our different services cleaning that data up ingesting it and applying business transformation so what i'm going to do is we're going to jump back over to our powerpoint presentation and we're going to take a look at azure sql database now azure sql database is a managed version of sql that is in azure and so microsoft is managing the database they're managing the backend the infrastructure they manage your backups they do everything for you we're going to dive deeper into that but it's not the only option that is available when you're starting to move to azure what a lot of people do right if you're working with sql server specifically is currently today a lot of people are still working with that on-prem they have an on-prem environment they've provisioned a server they have their own database administrators and they're doing all of that work on-prem and a lot of companies they'll start to make that migration to azure and they'll go with infrastructure as a service so you're still responsible for for kind of patching your operating system you're still responsible for updating your software and so you might do something like go with sql server on a virtual machine this is going to be a very cost effective manner it's it's very cost effective it's not nearly as expensive as some of the other options that are out there and you get very very close to a hundred percent feature functionality with on-prem sql server so if you have sql server on-prem you want to keep every piece of feature functionality that you got you move it all to a virtual machine and you have everything there you can also go with something called azure sql database we can spin up an azure sql database here in less than a couple of minutes and we can pay as little as five dollars a month to build a proof of concept so we can have an azure sql database hosted at a microsoft managed data center that is up and running it has data in it we can query the data we can write to the database for as little as five dollars a month how awesome is that and we don't have to spin up a server we don't have to provision anything everything is done for us but azure sql database does have some limitations that doesn't give you all of the functionality that you might get when you're working with sql server on-prem and so then what do you do you're like look i don't want a vm because of vm i have to do all the management all the backups all the patching that's just too much work we need to get out of that world we just want to focus on development but we go with azure sql database and there's some limitations there where i don't get everything that i get with sql server on-prem well the next option here is another platform as a service just like azure sql database so everything is managed for you that's going to be what's called sql managed instance and this is where you have sql database running on a virtual machine but you get a lot of that server level functionality that you lose with azure sql database we'll talk about those limitations here in just a moment another option for sql and this is really when you're doing big data analytics and data science and machine learning and you just need really powerful resources another option that's going to be available to us is what's called sql pools formerly known as azure sql data warehouse azure sql data warehouse is a massive parallel processing service and this this really shows the power let me let me stop for just a moment and give you an example of the power of azure if i go back about five or six years we used to do a lot of consulting business around going to companies and installing these sql data warehouses for them and for somebody to get started with that on-prem they would have to spend a minimum of two hundred and fifty thousand to procure the hardware and then they would pay a company like pragmatic works to come in and essentially set everything up for them and so that might cost them a lot more money on top of that and so the barrier to entry here your entry point to using a massive parallel processing platform might cost you hundreds of thousands if not millions of dollars to get started and you wouldn't even know if it's going to work until after you got everything set up and configured with azure in just a few moments in a few minutes we can go out there and spin up an azure sql data warehouse massive parallel processing we can do that in a few minutes we can pay a few dollars we can run some tests see if it works and if it does great we move forward if it doesn't we burn it to the ground no harm no foul that is such an incredible feature that we get with azure that we just couldn't do when we were working on-prem right and so that is another one of those sql options that you have available to you this is a huge push by microsoft right now they've made a ton of improvements to sql pools and some of that functionality in the release with azure synapse and so that is another sql option that we have available we're going to be focused specifically on azure sql database in this class but just know that there are other options that are available to you now what is azure sql database it is a platform as a service what does that mean microsoft is managing the hardware the infrastructure all of your backups your data redundancy patching your operating system all of the hard work microsoft is doing that for you and then your responsibility is building your database building your tables loading the data into that right writing and reading data from it so you still have the flexibility unlike a software as a service like dynamics or office 365 you still build your databases you build all your tables you build all your programmability but you get to focus and peter said this earlier in his comment to us right you get to essentially focus on the development and performance tuning instead of having to develop with or worry about all of that other administrative stuff that's going on in the background so that's what azure sql database is it is a relational database as a service i know there's a big push right now especially among app developers for kind of non-relational databases but azure sql database is your traditional relational database as a service near zero administration right because microsoft is doing the the scaling for you the backups the disaster recovery patching your operating system patching your software you also get a pretty awesome financially backed service level agreement from microsoft microsoft essentially guarantees you that 99.99 of the time your azure sql database is going to be up and running and it's always going to be running and if it goes down they will financially back that agreement so on your next billing statement from microsoft you will see a line item there where they are crediting you money if it went down for a length of time you don't have to fight them for it they just give that to you and so microsoft is very financially motivated to make sure that happens for you it obviously accelerates development cycles for all the reasons we talked about before you don't have to go out and buy all the hardware install all the components set everything up i have been at clients where a server crashed and it was a fire sale right it took everything we could do to get another server provisioned and everything set up and it was not a minute process or an hour process sometimes it would take an entire day because we had to get all of that stuff set up but without you know in azure if something goes wrong microsoft has you covered you just fell over to another copy of that database but if you want to spend something up real quick from a development perspective we can do this in just a couple minutes which we'll do here in just a second the other great thing of azure is you only pay for what you need if you are purchasing a car you have to go out there and buy the entire car and then you're stuck with it unless you can sell it to somebody else but if you can go out there and just hire an uber or a lyft then you only use what you need you pay them for that and then you're done and in a lot of ways azure sql database is kind of like that right predictable price performance you know exactly what you're paying the azure sql database that we're going to provision here cost me five dollars per month i know what the price is and i know what performance level i'm going to choose and then the other benefit of azure sql database is we get a very familiar sql environment and tooling which means we're going to be able to use things like sql server management studio for that and so those are just kind of benefits with azure sql database another thing that's worth mentioning here is you do get three built-in replicas so you get three built-in copies of your data for high availability one of the awesome things i cannot stress this enough is that you actually get point in time restore capability with azure sql database out of the box so microsoft is constantly taking backups of your database and if you ever need to back or restore to a very specific point maybe you had an application that inserted some bad data something went wrong with one of your processes you can immediately go in there and do a point in time restore back to a very specific point the default functionality that's built in allows you to go back about 35 days so up to 35 days point in time restore you don't have to set any of that up it is awesome geo replication you can have your data replicated to multiple geographical regions so just like we were talking about with storage accounts if an entire data center goes down and you have all of your applications whether they're line of business applications or web applications feeding to that database and it has to be up and running you can flip over to another geographical region and you can start using your database there so minimal minimal downtime if you've set this up correctly the other huge benefit is that anytime you delete a database in azure they automatically take a backup of that database they automatically do it so if you accidentally delete a database in production and we all know that that happens more than we would like to admit you can actually restore that database i was actually doing an azure bootcamp about three weeks ago and one of my students said that they were in production and accidentally deleted a database fortunately for them it was in azure and so the administrator came along restored the database no harm no foul but yeah that's an awesome thing with azure you also get security things like transparent data encryption tde which means your data at rest is always secure and it's always protected so if by some chance somebody was able to break into a microsoft data center steal the hard drive and run out of there your data is still encrypted it's still protected you get roll level security and your data is always encrypted you also get kind of advanced threat detection with azure defender for sql so this monitors things like login activity and who's trying to authenticate to your sql server and can set it up to where it sends you alerts and keeps you up to date as far as any kind of malicious intent there automatic database tuning that you get in sql server is there and then of course we get the azure active directory integration so all of these great features that you see are built in to azure sql database out of the box near zero administration for you all right so these are really great features here and this is just a small part of azure sql database i think our course on this is like seven or eight hours long and we go into all these different features but i just want to show you like why would i maybe want to move to azure sql database what are the features what is the functionality and all of the things that you see here are part of that right all right so that's azure sql database there are some limitations with sql database and that's when you run into these limitations and you start to ask yourself okay maybe i need to do vm or maybe i want all of that managed capability that you were just telling me about but i can't use sql database so maybe we take a look at managed instance right what are some of the limitations well you're not going to be able to work with bak files right you're not going to do your historically speaking we used to take backups of our databases and restore them you don't have to do any of that anymore so microsoft removed essentially removed that capability there's no more window authentication we're now doing everything through azure active directory there's no change data capture that's not a feature available sql agent and then you see the other options that are on the screen the one thing about sql database is that it was designed in such a way that you create your azure sql database as an individual database it's not something that is stored on a physical server like when you do this on-prem it is something that is stored as an individual database and because it's an individual database there is a lot of server level functionality that you lose from an on-prem environment and that's why sql managed instance kind of fills that gap if you say we really need to have a sql agent well then you can go with managed instance however keep in mind that when we're talking about limitations here we're not saying you can't schedule jobs of course you can because when we get into data factory later we're using data factory instead of ssis to move our data and clean that data up instead of using sql agent we're just going to set up triggers inside of azure data factory that automatically kick off those events to happen on either a schedule or when an event occurs and so instead of using sql agent you can use other tools to schedule your job azure sql database just doesn't have that out of the box right so that is something to keep in mind as you're going through that experience of trying to figure out what do we need in azure in order to get our applications up and going these are going to be some of those limitations that you have with azure sql database so this goes back to peter's point from earlier i'm sorry to keep picking on peter here but why do we use platform as a service for a database why do we use azure sql database if if all those other points that we've already talked about haven't really highlighted these well one thing is less maintaining means more time really optimizing and really focusing in on performance tuning that application performance tuning the code taking a look at your audit logs and seeing how you know how much is it being adopted how much are people using it another place where i've seen dbas really thrive in azure is getting a very strong and solid understanding of price maybe one service is better than another maybe we can because i optimized my i saw a tweet just the other day on twitter it might have been this morning when i was flipping through twitter where somebody who was working in azure had done such a great job of performance tuning all of their applications they were able to scale back all of their databases saving their customer a ton of money right and so if you're focused your time on application optimization and you can do it now with the less resources because your output your applications are running better then you can save your company a lot of money and so that's one of the places that as a database administrator you really thrive and bring value to your company because you're still optimizing that and managing cost and maybe creating automated automation scripts to do a lot of that for you all right so i talked about this before you'll see this here in just a few moments when we create our database and jump into management studio but we get a very familiar sql experience we can connect to our sql database in the cloud the same way we connect to our on-prem environments and we can do that through management studio so if this is something you're familiar with today this is going to look very very familiar to you when we get there so these familiar tools are once again available to us even when we're connecting to our resources in azure all right so we've talked a bunch about a database we've talked about the benefits the highlights maybe a couple of those limitations now what i want to do is i want to jump in and i want to go ahead and create a couple of new resources here one is going to be our sql server and then the other one is going to be our database all right so let me open back up our web browser we're going to jump back into the azure portal here and we're going to go ahead and create a couple of quick resources and then i'm going to connect to those resources and show you what that looks like the first resource that we want to create is we want to create what's called a logical server so unlike in an on-prem environment where you provision all that hardware you install it you install all your software you're not actually buying a lot a physical server an entire physical server at microsoft microsoft virtualizes all this out it feels and it looks like your own server but it's not but we do need to create what's called a logical server and that logical server is going to store just like a logical container that we talked about earlier it stores all the metadata information about any databases any elastic pools any data warehouses that are attached to that server it stores all of that logical information there all the metadata information and so this is going to be really important we need to have a sql server so let's create that real quick and then we'll come back and create a database so up here in the very top left in our navigation pane just like we did before we're going to click on create a resource and i am going to search for sql server now this one's a little bit tricky but i'm going to type in sql server and then i'll start typing in logical server and it should pop up there it is right there so i'm going to click on that button right there and it's going to bring us into this very familiar screen where it says hey you want to create a new resource so we're going to click on create right here and when i click on create it brings us into that kind of ui driven creation screen for that resource and we're going to go through and set up all of our basic and then our networking information here so the first thing i want to do is go ahead and choose a resource group it's actually incorrectly allocated it to just an automatic resource group that i have here so i'm going to go down and choose our azure workshop learn with the nerds with my initials on it and then we need to give this server a name and so the server that i'm going to choose here is going to be pw learn with the nerds by the way when you're naming things in azure you'll notice that some things allow you to put in hyphens some things are underscore some things don't let you do either one sometimes you could do upper case sometimes you can do lower case one thing to keep in mind with the server is just like the storage account name that we did earlier this server name must be a hundred percent unique across all of azure so if somebody anywhere in the world who's using azure has already used that server name you cannot use it once again the reason for that is because we use that server name when we connect to our databases that are on that server so we're going to type in pw learn with the nerds and then azure workshop and then i'll put my initials right there on the very end all right so that has passed the uh credentials there and then we'll go over here and click on east us2 and then i need to give this also a username and a password so this is a sql server account i'm going to create a sql server username and a sql server password that allows me to authenticate to that server and then see the databases within that server so my username will just simply be admin user and then i'll type in a super secret password here whoops let's make sure i don't mess that up again all right so we have that done and then of course we can go into networking and we can set up some stuff with that additional settings we can do that after the fact and we're actually going to come back and talk about networking here in about the next 15 minutes so we're just going to go straight to review and create so resource group what region do we want to put it in what's the name of my server that's it once again this server isn't going to cost us any money it's just a logical grouping just like the container that we used earlier so now down here at the very bottom we're going to click on create and once again in the magical world of azure a resource is being provisioned for us now this might take a moment to finish it shouldn't take too long because it is a logical server meaning it's not a physical server it's just some virtualization that's going on in the background but as soon as this is done we're going to go provision our database now let me show you something else that's really cool if i go over to my resource groups right here i can click on resource groups and i can go back to that resource group that we created and up here in this administration page you'll notice right here i can see the number of deployments that i've done to this resource group and we can see that we have currently right now a deployment that is taking place and so if you ever on that screen and you're creating a resource and you get out of that screen and you're like i don't know if it's still deploying or not you can always just go to the resource group and you can see it right there also in the properties section once again there's a lot of great properties in here there is a section for deployments there's a lot of benefits to this section but one of them is that you can go into that deployment section and you can take a look at what's currently deployed what's been deployed and if you wanted you could actually take something and redeploy it again so that's a quick way of kind of regenerating multiple resources there all right so this should be done let's check our notifications and it's not so what we're going to oh there it is it's finally done it does take a while sometimes to pop up immediately in our resource group but we're going to click refresh here at the top and like i said it does take a moment sometimes and there it is so now we have our sql server we can click on that it'll open up a new blade but we don't need to do that yet because what we want to do is now create a sql database that we can attach to that server and we're going to start querying that database and take a look at it so this is where stuff starts to get fun and then after break we're going to come back and we're going to talk about data factory and connecting to all these different resources what i'm going to do next is create another resource here and the resource that i'm creating now is going to be sql database so let me search for that and that opens it right up and so now i have my sql database here and just like the other resources we've created i'm going to go ahead and click on create that's going to bring us into that edit screen where we can create our resource and i'm going to choose my resource group you'll notice this is very similar resource group location name resource group location name these are going to be like those standard properties that are always required whenever you create a resource in azure and so i'll choose my resource group i'm going to give this database a name going to go with something like pw learn with the nerds and then add venture works because we're going to be using some sample data from adventure works here and then we want to go ahead and select our server and so the server that i created here was this one right here learn with the nerds and i put my initials right there on the end and so i'm attaching the database to that server notice it doesn't even ask me the location because it's automatically going to put my databases in the same location as my server so it's very important to get the location correct on the server and then down here at the bottom there's a couple of important properties that i'm going to pause for just a moment and we're going to talk about the first one is do you want to use an elastic pool azure sql database are essentially individual databases that have their own resources allocated to them this is very different from an on-prem environment in an on-prem environment we provision a server and we install all our databases on that server and they have to compete and they have to share for those resources a big job of a dba was kind of managing that right in azure it's the opposite each database has its own resources but a great way of improving performance and saving money on cost is to create a pool of resources and attach all of your databases to that that only works if those databases peak at different times right so you have some databases that peak in the morning some in the afternoon maybe you got a bunch of like databases that aren't very often used but when you use them you want them to have good performance so those are good candidates for that but not all databases fall into the candidacy for this just keep in mind if you want something that's similar to an on-prem environment where you have a pool of resources that are shared elastic pools kind of give you that flexibility now what we do need to do is i need to choose what level of compute cpu memory disk space or disk speed the speed of my disk for reading and writing and what amount of storage do we want to set up and configure for this database and so the way that we do that is we click right here where it says configure database and when i click on that it takes us to a new tab and this is where it can be very overwhelming right when you're going through the process of essentially provisioning your azure sql database there's two different methodologies here there's two different pricing structures one is called vcore which is the new model one is called dtu and the dtu model is the legacy model we're going to use dtu because it's the easier price to entry here the v core model is easier to essentially compare to an on-prem environment so you say on-prem we have 32 cores we're going to go with 32 v cores or something like that and so it's easier to kind of to set that up with the v core model but the old model the legacy model here that's still there for backward compatibility as well as new databases as well is what's called the dtu model now what you'll see here is we are currently being brought immediately to the v core model and in fact if i zoom in let me zoom in for you right here right here it says v course now here's the thing if i choose two v cores which is the minimal number of v cores i can choose at only 32 gigabytes of storage over here on the right we see that that's going to cost me about 380 dollars per month that is a lot of money if you're just trying to do some basic development and build a proof of concept right so that is the minimal entry point for the v core model you can go with a little less storage and save a few bucks but not really a whole lot also you'll notice that general purpose is the lowest level service tier so you can go with hyperscale which two v cores with hyperscale is six hundred and ninety two dollars you can also go with business critical one thousand and twenty one dollars per month now if you just want to build a proof of concept you just want to test this out what we can do is we can go back to where it says basic standard or premium right here across the top that will take us over to the dtu model and so i'm going to click on that and then i'm going to choose basic here you have basic standard and premium i'm going to choose basic and we're just going to go with the basic 5 dtus what is 5d to use well that's a great question it's kind of like a black box algorithm it's some amount of compute some amount of cpu some amount of memory and some amount of speed on your hard drives right how fast you write and read your um your data so as you go from five to ten that should scale linearly right so you should expect double the compute if you go from five to ten double the cpu double the memory all right and so here's the thing if i go with a basic here let me scroll back over to the right which is 5d to you it's only two gigabytes of storage that cost me five dollars per month there's a lot less than one thousand dollars per month i can build a proof of concept here and we can do a lot of work with a very minimal cost so that's what we're going to do we're going to click on apply and now we can actually go in and review and create now here's the thing with with azure sql database if you build this out and you realize you need to go up to another higher service tier you can absolutely do that you can jump right in here and you can go from two gigabytes um to a hundred gigabytes you can go from five dtus to a thousand details in fact i don't know if a thousand dt is a real thing i'm just throwing numbers out but you can scale it up and you can scale it back down but you can also change from the dtu model to the v core model so if you start with a dtu model and you want to transition over to the v core model because it's going to be better for performance it goes higher up or whatever reason you can do that you can take a database and scale between different models as well as scaling up those service tiers yeah and so that's the great thing about azure you scale it up so what if i'm running a standard data model with 20 dtus and it's costing me about 30 bucks per month but then we come up to end of month reporting and we have to run thousands of reports and we have to generate and we have to process just terabytes of data what do i do well i can take that that's been costing me about 30 bucks per month and we can scale that up to a higher level and then when we're done we can scale it back down and you only pay for what you use so that's one of the huge benefits when you're working with azure sql database all right so let's go ahead and re view and we're going to create this resource right here it's oh i messed up i wanted to create some sample data so i got to do it again so that's okay i'm going to create a new resource here i got talking about the different services we're going to take a break in about 10 minutes and that'll move us to the second section of the day so let me go ahead and pull in my sql database actually i have one that's already been created so let's do that and i can create the other one when we take a break because i just want to show you how we can essentially connect to the server and connect to the database using management studio so right here you'll notice in my backup file i have a database already created right here called sql database and it has it's attached to this server so i'm going to connect to this server from management studio so what i can do is click on the server to open that up from the resource group and whenever you connect to the server it's the servername.database.windows.net so you've got to type all that into management studio so i'm going to come over here to the right on this overview screen i can come over here and i can copy out that server name so i'll copy out that server name and then we're going to go back to everybody's favorite tool for working with sql management studio and i'm going to tell it i want to connect when i open up management studio i want to connect to my database engine and instead of typing in a local server that's in my on-prem environment i'm actually going to type in right here pwlwn.servername.database.window.net then i'm going to put in my login information and my password so my super secret password it'd be funny if that was actually the password but it's not and then i'm going to click connect and what happens is it doesn't allow me to connect because what we haven't done on that server is i haven't given myself access from this ip address so i can't connect to that this is your first level of defense in protecting your data is setting up firewall rules and so because i am an admin i can actually cheat and i can click right here i can sign in and it will automatically add in my current ip address however i can also go back over to the azure portal and i'm going to show you where we can connect right there and so from your sql server right here i can go to the properties pane and i'm going to scroll down to security and under security we can set up our firewalls and firewall rules and our virtual networks right here and i'm going to add this office ip address so i'll click right here on firewalls and virtual networks and then up here at the top i'm going to go ahead and click on add client ip address and so that just added my ip address for this location and then i'll give it a name like work office right there and then i click save and now when i go back over to management studio and try to connect to that server i will be able to so we'll do this demo and then we're going to take a break all right so i'll hit cancel here i'm going to try to log in one more time i'll click on connect again and now it lets us in the username work the password worked and you're going to notice that this experience is very similar to an on-prem environment so i'm connecting to azure once again from an application on-prem i can expand my databases here and the reason i went with this database instead of the one i just created is because we didn't go through the optional parameters and we went with a blank database but this database i created with sample data and so we have some tables here that we can work with and so for example if i wanted to query this data i can right-click select top 1000 rows just like you would do with an on-prem database we can select the data we can write queries to retrieve information from it the performance here is going to be a little bit lacking with a five dollar per month database right but you can insert records in you can delete records you can do all the stuff that you do with sql server on-prem right here in management studio other than the limitations we talked about before all right that brings us to 1222 here we've covered a lot of information this morning as far as just getting started with azure creating resources in azure what we're going to do is we're going to take a 15-minute break so we're going to come back at about 12 30 seven-ish eastern time i'll put us i'll put a timer on the screen and when we come back we're going to jump into my favorite tool in azure azure data factory we're going to talk about how do we now take data and move it between different services that we've created all right so i'll see you back here in about 15 minutes thank you for joining us and we'll be right back yourself hello and welcome back i hope you had a good break got some food ran to the restroom what we're going to do now is we're going to dive into the next part of this course i do want to jump in and remind you once again that we're having a cell that's associated here at the end of the year that's also associated with this event and for that sale we're doing 65 percent off of our on-demand learning and so if you have an opportunity to take a look at that we're doing 65 percent off all of our on-demand learning classes right now at pragmatic works which includes power bi sql azure and of course data science and more lots of good classes out there so sign up today and sign up right now we also had a question a lot of questions fortunately the people here at pragmatic works are answering a lot of them and peter is helping out a lot as well in the chat window one of the things that i mentioned right before we went on break talking about azure sql database and i want to jump back to that and talk about it again is there's two different models with sql database there's the dtu model and the v core model and i want to highlight that a lot of people including pragmatic works uses the dtu model for production so the dtu model is absolutely great for production and you can start off with that model and it just gives you a low cost of entry into getting into azure but the v core model is going to scale a lot higher and give you some additional capabilities if you need that but do not think that you need the vcore model definitely start with the dt model see how that works for you and then scale up as needed the other thing i do want to mention is that there are a couple of class files that go along with this class and we're going to put those in the chat window for you right now and that gives you a pdf of the the pdf of the powerpoint that we're doing that includes my email address my contact information and all of the stuff we went through today there's also a file in there that has all of the links for today so where can you go download azure storage explorer where can you go for a couple different items those are in there for you as well so please download that and enjoy that without further ado we're going to jump into probably my most exciting part of the presentation we're doing today and that's going to be azure data factory so let me go ahead and pull that up on the powerpoint here real quick and what we're going to look at is real quick provisioning azure data factory doing some azure data factory kind of navigation and then of course development how do we build pipelines and data flows that move our data and help us to clean that data up and ingest it into azure so what is azure data factory right we've kind of alluded to this a couple of times but i haven't really dove into it well azure data factory is an etl tool that is based in the cloud so it's executed everything that we do in azure data factory is executed using azure compute resources the development is done in azure i'm not doing it in visual studio ideally i would be doing it in the azure portal through the ui drag and drop interactivity there and so that's the great thing about azure data factory the other thing worth mentioning here is that it is drag and drop interactivity unlike azure data bricks where you're writing a bunch of code and you're writing a bunch of commands data factory is very similar to ssis on-prem where if you want to perform an action you want to do a thing you grab that activity you bring it into your pipeline and you execute it we'll talk about the components here in just a moment if you want to write code in data factory and we'll write a little bit of code today just to introduce you to that and i'll do a follow up youtube video that i'll release next week on expression language and data factory so check out that youtube channel uh but if you want to write code it's really just a pretty easy expression language where you go through the expression builder and you write the code there so it's still not programming it's just an expression language what do we use data factory for well we use it for orchestration and we use it for data transformation data cleansing and so this is one of the great things about data factory is it easily integrates with both your hybrid environment so your on-prem environment and your cloud environment you can use it for both so that is data factory provisioning super easy you give it a name subscription resource group the version version two is really the only version the version one was writing a bunch of code version two is awesome and then you give it your location once again we want that location to kind of match our storage account and our sql server databases and other resources that we're building in azure all right and so we're going to walk through kind of data factory provisioning now before we jump into that let's talk a little bit about development and the different resources that are available when we're looking at azure data factory the first thing we come across the first thing we'll probably create is what's called a pipeline if you come from an ssis background this is very relatable to what was called a package and so a pipeline is a sec essentially a collection of activities a collection of actions that do work right download a file copy the file clean the file up and then load the file somewhere so it's a collection of activities that do work that's kind of what a pipeline is in azure data factory we also have something called data flows data flows or the data flow activity is where you do most of your data transformation so if you need to clean up bad data merge data together remove characters add new columns any of that kind of stuff in the code you can do that inside of a data factory data flow now there's two different types of data flows that exist the mapping data flow which looks very very similar to what you got with ssis experience and then also what's called wrangling data flows we won't touch on wrangling data flows today but this is super cool feature in fact last week on my youtube channel i actually did a video on wrangling data flows and wrangling data flows are derived from the power query editor in power bi or the query editor if you've used query editor in excel it is essentially the ui driven way of cleaning your data through just right clicking and saying remove these values right click and unpivot data all of that capability you have in power bi desktop will hopefully one day be available in wrangling dataflows it's still in preview so you don't have all that capability today but you can do that there mapping data flows that's what we're going to be talking about that's what we're going to be looking at today and it looks very much like what you see on the screen you have your source you have your transformations that are cleaning the data up and transforming it getting it ready for that final destination and then of course at the very end we have the destination where are we going to load this data once we're done whenever we're working with pipelines or data flows of course we have sources and we have destinations and in order to connect to those we need to tell azure data factory the connection information where's that server at what are the credentials to access that server and we do that through something called link services this is very similar to connection managers with on-prem sql server so if you've worked with ssrs or is or sql server analysis services on-prem connection managers are essentially the equivalent of linked services in azure data factory so that's how you connect to that database that's how you connect to those storage accounts and then further defining that connection we have something called data sets so link services once again this defines the connection information so the data factory can connect to those data stores whether they're your database your blob storage account your data lake all the different connections that you can make you can make through these linked services a data set is a very specific named view of your data so if i'm looking at a database the data set might be a connection to a specific table or a very specific query that's pulling data from a table right and so the data set is more defined than a linked service and so a data set might be a file or a folder location within a data lake account and so these are really the resources that we're going to be working with in azure data factory they have a very strong correlation with the same resources that we used over in ssis so if you have that kind of background the very very relatable resources that we have here now within azure data factory within your pipeline and within your really within your pipeline we have all these different activities that we can do and an activity in a pipeline essentially just defines an action that we want to perform on our data so maybe we want to run a stored procedure where we have a stored procedure that inserts data into a table and removes data or deletes data we can do that maybe we want to run a data bricks notebook so if you're familiar with databricks and you've created a notebook that does a bunch of work and you want to execute it directly from data factory you can do that through an activity called a databricks activity you can also do things like copy data you can do a copy data activity and there's a ton of other activities in there like a web activity i have a couple youtube videos on using azure logic apps to in conjunction with data factory to extend upon that capability as well and so there's a lot of different activities in these activities to find the actions that you want to perform inside of azure data factory now because this is a three hour presentation we don't have a lot of slides we're going to jump right into the demo and i'm going to show you real quick here how do we provision a data factory so i'm going to go ahead and go back over here to our web browser and i'm going to create a new resource real quick and it's going to be that data factory resource that we want to create so we'll go over and click on create a resource here in the very top left i'm going to search for data factory once again data factory provisioning this resource does not cost any money just to provision it you get charged for the pipelines that you run and the amount of resources that those pipeline runs generate right the amount of resources that they use so we can create this data factory and it's not going to cost us any money to provision this so up here in the top left we'll click create once again the same exact experience we've seen time and time again we'll click on create and then we have to give it that basic information so we're going to tell it what is our resource group it's going to be this one right here learn with the nerds with our initial then we're going to choose our region of course i'm going to choose east u.s 2. once again the location is important for both production or performance right that latency you want to make sure that you know if your entire company is based in the east u.s you don't want to create and put all your resources out in the west us because there's going to be latency problems with that and there's going to be some performance with that you want to ideally put everything in the east us with you and if i'm putting my storage account and i'm putting my azure sql database and i'm putting all of that in the east us too i also want to make sure my data factory is in there as well this improves performance and it's going to help you as far as cost is considered as well so we go ahead and choose the east us2 and then we're going to give this data factory a name and i'll call it azure workshop let's see will it let me do dashes here looks like it will adf learn with the nerds and then we're going to go with version 2. once again version two is really the only version don't go with version one trust me all right so we'll go with version two here and then there is one more requirement here we have to tell it right here under git configuration that we will configure our source control later that continuous integration and continuous development i could set it up right now but i don't really have it set up for this demo and it takes a little bit of time so i'm going to tell it that i want to configure git later the one thing that i will tell you about this is two things one obviously we have training so our advanced azure data factory class does show how to set this up configure this and do that ci cd process but two it's actually a pretty easy process to set this up it's not very difficult at all and then you take the work that you do in data factory and you publish it to your repository your code repository and then from your code repository we build a pipeline release that moves it over and publishes that out to azure data factory in production and we can set it up to publish that and move it between different environments like dev qa and production so that's what that git configuration is referencing there and it's pretty cool easy to set up all right we're going to go ahead and do review and create there are of course some other categories there that we could talk about but can't dive that deep today and i'm gonna go ahead and create this data factory all right and so this will take a moment to load it's going to once we create the data factory it's just a blank slate where we can go in and start creating connections to our different data sources in azure and we can also start creating those pipelines and those data flows and making things happen so we'll give it a moment to load here and as soon as it does the first thing we're going to do is we're going to jump right in and we're going to start creating a couple of connection managers to our data and it looks like it's done there we go it looks like it's done so what we're going to do is go ahead and go to that resource now this time i'm going to cheat remember we've been going to resource groups going back actually i'm not i'm going to go back over to my resource groups i'm going to click on the resource group right here and we will refresh there we go and you'll see we now have a storage account there's a couple different ways of noticing this we have our iconography over here on the left right so the icon will tell you what type of resource it is azure data factory storage account sql server database and then sql server so the icon represents it also over here under type it kind of tells you exactly what that is as well so that's going to be important there as well now i'm going to click on azure data factory this opens up a new blade in azure and this brings us to kind of the administration page but if we want to do development in azure we're going to click on author and monitor which we see right there in the middle of our screen and then that's going to open up a new tab in our browser and that's where we're going to build our connection managers those data sets our pipelines and our data flow so let's jump right in because like i said this part is pretty exciting here we're going to go to author and monitor that's opening up a new tab and so if you go to my youtube channel and that's mitchell pearson on youtube and my co-workers will throw that in the chat window for you i have about 10 different videos on data factory on how to do quite a few different things in azure data factory including quite a bit of some of the advanced stuff you can do in there so that scales up quite a bit from what we're talking about here and so over here when you log into data factory this is the first view that you see you see this let's get started section right here where you can create a pipeline you can create a data flow you can copy data you can also set up your code repository right here i don't really do anything on this main home screen because you can do all of this stuff in other places where it's designed to be done but you can do all of it from right here you can get started there's also a lot of videos in here and there's some videos and tutorials now most of these are quite old but they've aged pretty well right data factory is being updated quite a bit but most of these aged pretty well and they give you a good idea of the different things you can do over here on the left we have that navigation pane very similar that we have in the azure portal we can collapse it we can expand it and this is the data factory page it used to be called overview they renamed it to data factory then we have the authoring tab this is where we're going to be building all of our pipelines and our data flows this is where the work's going to get done we have incredible built-in monitoring that comes with data factory right out of the box just a really great job by microsoft there and then we have the manage tab now we're going to actually start with the manage tab today because this is where we go and create our link services so i will click down here at the very bottom and click on manage and when i click on manage it opens up this new window right here and you'll see over on the left we can create link services integration runtimes we can do our git configuration here we can do arm templates here for our source control and there's some other stuff here as well like triggers and triggers are of course the scheduled execution remember we talked about sql agent this is how you do it in data factory we're not going to get into all those things though but what we are going to do is create a couple of linked services so up here at the top i'm going to click on new and when i click on new right here it opens up a new window i'm creating a new connection manager so i want to create two connection managers here one to our data lake account and then i want to create another connection manager to our azure sql database right so we need those two connection managers so that we can then essentially react or interact with all the different data sources within those so over here we're going to create a first new linked service i'm going to do azure because both of our data sources are in azure and then the first one that we're connecting to is going to be our azure data lake so i'll grab the azure data lake account right there which my head is blocking a little bit there it is azure data lake right there and then down here at the bottom we'll click on continue and then it brings us to the new screen right here and we need to give this a name now i have my own made up naming convention you use your own what works best for you but what i like to do is i'm going to put in here that the source is azure right it's not on-prem or somewhere else so it is in azure so i'm going to put that then i'll put in the type of connection it is so this is an adls account right azure data lake store and then i'm going to put the name of that storage account and that storage account was something like azure workshop learn with the nerds right so i'll put the name of the storage account there so i know it's in azure i know what type it is it's not a blob it's not a general purpose and then i put in the name so that's what i like to that's how i like to name mine then down here i'm going to tell that we're going to connect using the account key method if i have time later i want to talk about azure key vault because as a best practice we should not be connecting using the account key we shouldn't be storing that in our code this also will impact using things like azure devops in your code repository because it won't let you store sensitive information there so we would rather store our account keys our sensitive information our passwords in the key vault and then we can just reference the key vault name from here so i should have a minute here at the end of the presentation today to show you guys that information there but for now just quick getting started we're going to connect to our account key we're going to pull it using our azure subscription and then we're going to tell it that we're using our microsoft azure sponsorship and then our account name is let's see storage account name is going to be this one right here with my initials on the end that's the one we created just an hour or so ago and then once that's done we go down to the very bottom down here i click on test connection that's going to give me the green light signal is good connection succeeded and then i click on create and so we just created our very first connection manager in azure and that is our connection to our azure data lake account now i need to create another one and i want to create a connection to our sql server database so up here at the top we're going to click on new under link services right here and then i'm going to click on azure again and under azure i'm going to come over here and find azure sql database right there it's a couple of rows down but it's right there and then i'll click on continue and then i need to give this a name as well so once again i have my own little naming convention this is in azure it is a sql database right it's a sql connection and then the name of that database is going to be adventure works so i might have multiple different databases within that i want to connect to and this one this connection manager here specifically to that database so this is going to be our adventure works database and then i'm going to choose my subscription here we'll go with microsoft azure sponsorship my server name very important to get the right one for this one and that is going to be let's see which one i would have done i have so many servers here oh here it is the one with my initials that's the one but i'm actually going to go with my backup because remember when i created this database i didn't go to the additional properties and choose the sample data so if i go to this server right here i'm not going to be able to actually have any data so i'm going to use my backup since i messed that up earlier so thank goodness we have a backup and now i'm going to connect to that database right there that adventure works database has data that we can use then i'm going to pass in my username so admin user and then once again i could use key vault for my password here instead of typing in the password manually all right now we're going to do test connection and it failed what happened right what's going on here why did my connection fail well it failed because azure data factory is a service in azure and it cannot connect to our azure server so this is another time when we got to go back to our azure server go into the firewall rules and take a look at those properties there so let me show you what's going on over here if i click on more and you read through this is a lot of information but eventually you're going to see that it cannot open the server requested by the login and it tells us it's a problem with this ip address well that ip address right there 20.41.3.129 is not my ip address you saw that earlier i shared it with you earlier with the whole world that's not my ip address that's the one for the virtual machine that's running in the background that azure data factory is using to try to connect to the server so what does that mean well let me show you what that means this is where we get to dive a little bit deeper into azure sql server i'm going to go back over to my azure portal i'll go back into my resource group here by scrolling back over remember we can scroll horizontally and i'm going to click on the server that we created earlier in this class and when i click on that server right here sql server you see it right there we can go down to security and under security we're going to go to firewall rules and virtual networks we were here earlier we added an ip address earlier and when i go in there you will see i need to go back to the other resource group real quick because of the server that we're connecting to so we need to make sure we connect to the right one we're going to go to this server right here and then i'll go over to my security again and we go into firewall rules and virtual networks and here's the ip i added earlier 45 but the one that data factory was using was like 21 right well here's the thing you'll notice right here that there is an option to allow azure services and resources in azure to access this server what does that mean what that means is that any service in azure can now try to authenticate to this server and that's a little bit of a concern from a security perspective for a lot of customers right because if i'm on an azure vm i now if that is on from my azure vm so i might be in a completely different company completely against the across the world i can try to connect to your server and try to different combinations of username and passwords to try to log in to your server right and so that's a little bit of a risk there that's a little bit of a concern but other than just adding the ip address from that virtual machine to make this work temporarily i would need to go ahead and turn this on and so i'm going to turn this on so that data factory works with this that ip address that you saw in data factory is dynamic so it changes so even if you have it work temporarily it's not going to work the next time more than likely right so to make this work we're going to go ahead and turn on azure services here and then up here at the top i'm going to click save to save my changes and now when i go back over to azure data factory and i test connection again if that worked it now successfully connects right so it now successfully connects and it works so the problem there was that we didn't have access to the server but we gave it access now we are going to click create and we now have two connection managers that we have created that's going to be step one what i want to do is in this example i want to take data that lives in my adventure works database and i want to copy it to my azure data lake right the data that i want to copy from adventure works is just going to be product data so i'm going to take my product data for today's date the most up-to-date version and i'm going to copy that over to a container remember a logical container for grouping those files together in our data lake and so the first thing i need is i need a data set a data set that points to my product table in azure data factory and another data set that points to the file that i want to write it to in datalink so we have our link services now what we're going to do is over here on the left hand side i'm going to go to author and under author we are going to go ahead and create a couple of data sets now before we jump right into that there's a couple of resources here right we have pipelines we have data sets and we have data flows we talked about those in the powerpoint the pipeline is like the highest level here the pipeline essentially is a collection of all of those different items the different activities that does the work and it does the orchestration as well the data set is separate from the pipeline and so where the data flows interestingly enough if you come from an ssis background you notice it's separate and that's because these are very reusable there's a lot of benefits you get in data factory that just did not exist in ssis and a lot of them is the reusability of components as well as some other stuff we'll talk about here very exciting stuff though so i'm going to go into data sets here and we're going to create a new data set so there's a little ellipsis that pops up when you hover over right here that little three dots we click on that and then i'm going to tell it let's create a new data set so we're going to create a new data set here opens up all the way over on the right hand side and then on the right hand side here we are going to search for inside of azure we're going to create a data set for azure data lake gen 2. so i'll grab that right there and click continue and then out of the different options that we have available right here i'm going to go ahead and grab delimited text i'm going to grab that csv file right there and then i'll click continue again and so these are the different types now i will say i'm seeing parquet file types show up more and more and there's a lot of benefits with that at schema it stores and saves the schema you also get column level compression within those files that gives you great compression uh sql on demand with azure synapse gets like 14 times better query performance with parquet and then databricks has done a lot of stuff around that as well so from a organizational perspective and planning you might want to dive into the world of parquet files if you're starting to make some kind of big migration but for this demo we're going to go with csv files and we're going to click continue and then i have to give this data set a name so the data set here is once again i have my own little naming convention i'm going to do azure it is adls account it's in my azure workshop storage account and then this is my product list right so i have where it's at what type of connection it is the specific storage account and then the name of the file that i'm writing this to and then under link service here i'm going to tell it that this is the link service i want to use remember we created that connection manager earlier and then i have to tell it where i want to store the file that i create now i want to store this in a container called products but we don't have a products container in fact if i click on the browse button right here the only one we have is employees because that's the one we created earlier but i don't want to have to go back to my storage account and create a new container over there so the cool thing is that if i type in file system right here products when this runs if the container doesn't exist it'll actually com create it automatically right so i'm going to type in products and then under file i'm just going to give this a name so we'll call this product list dot csv file and then i'm going to tell that i want to do the first row as header so the first row is going to be our column headers then import schema this one can get you we need to choose none we can't really import a schema from the file because it doesn't even exist yet we haven't created it we're going to create it so we have to do none here in order to make this work now what i'm going to do from there is down here at the very bottom we're going to click ok and we have created our destination data set where we're going to load this data once we're done now you'll notice once you get done creating it there are lots of options in here you can go through and change these options you can change the compression type row delimiter column delimiter escape characters there's lots of information in here that you can definitely modify the other thing i want to point out is i just created our first resource and data factory other than a connection manager and notice it shows up as a tab across the top so as you open up new resources and you create new resources they'll keep showing up across the top there so you want to make sure that you save your work often by clicking publish all and then you close them out whenever you're not needing those also i won't talk about it here but i'm going to try to squeeze it into the youtube video that i post next week for you guys parameters are a really great way of making this data set right here reusable so instead of having a data set that's dynamic for product list it could be employee list product list sales person list all those different lists that's one of the powerful things and capabilities that you get in azure data like our azure data factory too many azures azure data factories that you can parameterize this and make it very dynamic and very automated we won't get to that today but i could very easily make that file name right there dynamic through parameters and expressions and i know that's something you're interested in if you are coming from a sql enterprise bi background all right so that's it we need to create another data set and we need one that points to the products table in our sql database so i'll click on the ellipsis here just like we did before right right where you see that little one right there if you hover over it and ellipsis shows up and then i'm going to click on new data set and the data set that we're creating is going to be to our sql database so i can either search for it here or i can just click on azure and scroll down a couple screens and i'm going to grab azure sql database right there and click on continue and then we give this a name so just like i did before i'm going to go ahead and say that this is in azure it is a sql database i'll give it the name of the database so it's adventure works and then i'll give it the name of the table all right so i'll call this my product table we're going to go ahead and connect to our link service right here which is our adventure works database so you see the naming convention this is mine this is the one i like to use but create your own if you like and then i'm going to go grab cells lt.product and this time i actually do want to import the schema now one of the things about data factory and i have a video on this where i talk about rule-based mapping and some very dynamic stuff you can do is what data factory does one of the huge advantages it has over ssis other than the fact that it can work better with kind of hybrid data sources on-prem and cloud one of the big benefits you can get from essentially azure data factory is you get the benefit of being able to do what's called late arriving schema you don't have to know your schema up front you don't have to know all of the column names and the data types they can actually change and data factory can handle that if you remember from ssis for those of you who came from that background with ssis it was very metadata rigid if your source changed and it added a column or removed one or data types changed all your packages broke you had to go back in there and fix everything you can set up data factory to not worry about the schema and it just becomes dynamic but we're going to go ahead and import it here and then i'll go down to the bottom and click ok and that's it we now have our second data set right here now one thing i want to do right now is i want to go to the top and click publish all and when i click that button publish all what it does is it's going to save all of the work i've done so far out to data factory kind of in production so i don't lose it if i close my browser if my computer crashes i lose all my work that's not saved right this is in the cloud so i want to make sure i publish this right now and there we go it opens it up it says i had two new items i did that a little quick but we're gonna do it a couple more times and down at the bottom there i clicked publish and it saved all of that work now if you set up source control you set up azure devops you would be actually saving that to your code repository not publishing to data factory but that's another topic for another day so we have two data sets now we want to move the data from our database over to our data lake how do we do that well we're going to create a pipeline so right up here i'm going to click on that ellipsis right there and i'm going to tell that i want to create a new pipeline and that's going to open up a new pipeline i want to give it a name so all the way over on the right it's going to open up a properties window here and i'm just going to call this something like copy product extract or copy product list right so we give it a name and then i'll click right there to close that properties window and then we need to start using activity so the way we do things within a pipeline is through different types of activities that exist one activity that we can use is under move and transform and we can use what's called the copy data activity it doesn't transform data it doesn't apply business rules or logic but it just copies the data which in this example is exactly what i want so i'm going to grab the copy data activity drop it right here in our pipeline and now we have to set that up so let's look at the properties of that activity before we jump too far you'll notice there's a lot of other categories of activities and so if you want to explore this and have fun absolutely go ahead and do that under general is where i use a lot of the activities that i do in data factory they're your basic activities execute another pipeline setting up and doing some stuff with variables execute an ssis package this is interesting so this does not mean you can execute any ssis package what this means is that if you have set up azure data factory where you can take your ssis packages and deploy them to this data factory you can then use this activity to execute those ssis packages so you might have a series of activities and then once those are done then you kick off the ssis package so that only works if you're using the ssis integration runtime within the data factory all right that's part of what's called ssis lift and shift so there's that there's git metadata lookup activity stored procedure and there's a few other really awesome options available right there and then what we're going to do is let's set up this copy data activity now the way you set them up here is a little different than kind of visual studio down here at the bottom is where we get all of our properties for this activity and we can come in here and give this a name so obviously naming is important so we'll say you know copy data to adls or copy product data or something like that then we have to go over to our source and our sync now sync means destination so if you're familiar with domo i use destination still i always say destination and there's a couple other things i want to point out notice right here we have these ones they're in red that means that there is a required parameter there this is not done until that one goes away so we have to provide what is the source we have to provide what is the sync those are required elements for setting up this activity so i'm going to go ahead and click on the source and then i'm going to choose one of the data sets we already created the the source is going to be our product table in our azure sql database and then our sync let's go ahead and look at this so i don't go too quickly here you'll notice that we're going to just use a table we could write a query if we wanted we could also execute a stored procedure here our sync destination is going to be that azure data lake account copy behavior i'm going to leave all of i'm just looking over it real quick we're going to leave all of the basic stuff here we're not changing anything and then under mapping i could import the schema from the source in the destination and manually map it or one of the great things about data factory is we could just let it map it automatically based on whatever schemas are coming from our data sets so it's a late arriving schema and we just let it map it however it wants that is awesome awesome feature and that's what we're going to use we're not going to import it we're not going to manually do the mapping we're going to let it automatically take place and so what do we do now well up here at the top i'm going to run this in debug this is kind of testing mode for azure data factory and when you run it in debug i'm running the most recent copy of that pipeline meaning i'm not running what's been published out to azure data factory i'm running any changes that have been made even right here and so i want to run this in debug real quick and you see that it's going to kick off it tells me right here that it's running down here at the bottom as long as you don't have any activity selected you'll see an output window and under that output window if i click refresh we'll see the copy activity ran when it started the duration of that run and if it succeeded or not so if it succeeded we should be able to go back to our azure storage account and take a look at that file and we can do that two ways we can either go back through azure or we can actually do this in azure storage explorer so let's do storage explorer just to give you another look at that tool right so if i go back over to storage explorer this is what we had before we had one blob container called employees with one file after running data factory we should have a new blob container so we're going to do a refresh called products and within that we now have a file called product list that's how easy it is to set up a very simple data factory just to move data now i want to show you we have quite a bit of time left i want to dive a little bit into the expression language here and like i said i'll do a youtube video follow up to dive a little bit deeper into it than what i'm going to show you here but what i can do with this is we can make this naming dynamic so how do we do that where do we set up a dynamic name in data factory because remember all the pieces are separate right the data set the connection manager the data flow the pipeline they're all separate entities so if i go back over to our browser here and i'm going to publish everything again when you click publish at the top it opens up this window it tells you all the changes if anything's new if anything's been edited and then at the bottom we click publish and this publishes it to production that way you save your code publish early and publish often and that'll be done in just a moment but while that's running i'm going to go over to our data set called product list remember this is the data set that connects to our azure data lake account right and then watch what i'm going to do here in the data set you see i'm selecting it across the top i'm going to go over here to where we have the file name the file name is product list and i'm going to click in that box right there and when i click in that box you see something that pops up just below it that says add dynamic content this is where we can use that expression language to make this dynamic and to automate this so i can click on add dynamic content here and it opens up this expression experience within azure data factory and so i can add dynamic content here you'll notice down here across the bottom there's a lot of different types of functions there's collection functions we're working with like arrays if you want to convert your data if you want to work with any kind of date functions there's a logical functions return a true or false value mathematical and string functions and so what we're going to do is i want to combine product list with the date all right product list with the date how do we do that well the first thing i'm going to do is i'm going to go to my string functions and i'm going to grab the concat operator so concat is going to do a concatenation where you combine multiple strings together so that's the first thing and then i'm going to bring in our first parameter so the first parameter is going to be a hard coded text value called product list and then i'm going to do underscore and then single quote and then comma the comma here just tells me okay i'm ready for the next string that we're going to concatenate now by default what's happening here is i don't really have a good way of getting the date i can't type it in so i'm going to come down here to my date functions and in the date functions there's one in here called utc now all right and so i can come down here and find utc now i can click on it and then it adds it up here as well and so now what i'm doing let me hit enter again so it's easier to see i'm taking the word product list and i'm going to combine that with the result of utc now now i'm going to add one more variable in here and we're going to tell it that we want to also make this dot csv so we're doing product list utc now the date and then we're combining csv to the very end so this is pretty cool there's no real good way to test it here but we can go ahead and click finish i'm going to go ahead and go back to our pipeline i'm going to run debug real quick and so i went over to the pipeline i hit debug it's running again and so in about 10 seconds here it's going to be done and so let's go ahead and click refresh and we can go back over to our favorite tool here azure storage explorer and we can refresh the window and what do we get we now get the product file as a csv file with the date and the time stamp now i don't really want the time stamp on the end so i'm going to build i could build a lot more on this you might notice that it says 617 it doesn't say 117 so there's a function where you can convert from utc time to my time zone which would be eastern standard time i could do that i could also do some other stuff but we're going to keep it really simple we're just going to take the first 10 digits from this we're going to take the first 10 digits just the date so let's do that real quick and then we're going to move on to our next demo so this gives you a little bit of an introduction into expressions right so i'm going to go back over to azure data factory and i'm going to go back into that data set that we were making dynamic earlier and we're going to go back into that expression editor and so what i want to do is with utc now i only want to bring back the left 10 characters well there's not a left function but there is a sub string function right and so for all of you sql gurus out there you're going to be very familiar with this function it's called sub string and so if i click on that it'll add it up here at the top and it got rid of utc now so let's go grab that next so i'll go back down i could type it in but i'm going to just do it through the ui and there we go we'll add in we have substring utc now and then what i want to do with substring is i want to return starting from the ordinal position 0 through the first 10 characters now if you've done this in sql the ordinal position would be 1 so that's important to know the difference there and then i do need to add a comma here at the end to separate my date and then the csv file all right so we're going to go over here and click finish and then we'll run it again by going back to the pipeline and then we'll do debug again and we'll just see if this works once again we can do a lot more with this example we could convert it from utc time we could add a lot more here we could actually parameterize the file name instead of being product list it could be product employees or whatever we pass in so that's where parameters make this even more robust but if you download the file it will fail due to the period in the file name oh is there a coal in there sorry i can't see it whoops um oh i've never even that's a great one that's from peter i've never even tried to download it like that before peter because always get rid of that if you're talking about the colon that gets created from that time stamp i always get rid of that anyway but if that is the case and it essentially creates an unstable file we would go back to that expression language kind of like what i just did and you remove those characters or replace them so great feedback there and sorry this other screen's a little bit small here all right so let's see we are going to go check it out let's go back over to azure storage explorer again and refresh it again and let's see here we go now we get the product list with the date this is beautiful and we're able to kind of combine the simplicity of azure data factory and how easy it is with those ui driven activities with the expression language which as you saw very similar to any other expression language i've used in any other tool right really easy to pick up and work with there the only thing i wish was there and maybe it is maybe i just don't know about it is a way of being able to actually test it right there in the expression window instead of having to run it and see what the result is after the fact so that's just not something i've dug into but still really cool what we can do there now we're going to go and take a look at data flows data flows if we go back over to azure data factory here data flows are where we're going to do the bulk of our transformation and data cleansing right and so with a data flow right down here over here in our factory resources we're going to go ahead and create a data flow now and we're going to create a data flow that what i want to do is i want to take the products and the product category and i want to merge those tables together now you know and i know that both of those tables are in a sql server database they're in the same sql server database so we could just write a query to do that i know i know i know we're going to pretend like our product category is coming from some excel file and our product table is coming from the database and we want to merge them without having to load the data into sql right so we have to use our imaginations here because the more effective way to do this i would recommend is we write queries in sql to do the joins and we do all the work but i want to show you how to use mapping data flows so that's what we're going to do we're going to take the product category the product we're going to combine those tables together so our products now have the category as well with the category id we're then going to use a couple of transforms in there to get you familiar with data flows now one thing i need to do right now and i should have done it a couple minutes ago i need to turn on data flow debug here at the top i'll explain that in a minute when we have a little bit of dead time which we will have so now i'm going to go over to data flows and click on the ellipsis and i'm going to tell it that i want to create a new mapping data flow mapping data flows are your consistency with ssis on-prem but we also have something else here called wrangling data flow this is in preview there are some missing options but remember wrangling data flows are essentially power query editor inside of azure data factory so if you love that experience with power bi and you've done a lot of etl with the power query editor you can now do this at scale what do i mean at scale when you run a data flow in azure data factory whether it's a mapping data flow or a wrangling data flow microsoft takes that work that you've done in the ui it spins up a data brix cluster in the background and it converts your code to scala and it runs it against the databricks cluster and those databricks clusters can handle massive amounts of data so it scales it out it becomes very scalable right and so whereas within power bi when you're using the power query editor you're kind of limited to the memory on your machine or what you have in the service if you're doing it here in azure data factory you have massive scalability so that's awesome once again we won't do wrangling data flows today but i do have a video that i just posted recently on my youtube channel you can go take a look and see if that's something that you would like to explore so what i'm going to do is click on data flows right here we're going to do mapping data flows and click ok and since this is a brand new data factory it's it's trying to give us some hints and some tool tips here on what to do i'm going to click x right there and then what i want to do here is i need to bring in my data source now we're going to have two data sources we're going to have one for our product which we already have and then we're going to need another one for our product category so i'm going to go ahead and create before we go any further let's create another data set and we need a data set for our product category so i'll go ahead and do a new data set right here and that's going to be over here on the right and then under azure we're going to go down to azure sql database there we go and click continue and then up here at the top we're going to give this a name so this will be azure underscore sql and then of course it's a data set so i'm going to tell it the name of my database it's going to be adventure works and then i'm bringing in product category right so that's the name of the table within that database in azure in the sql server connection that i'm using and then i'll grab my linked server here and i'm going to use azure sql adventure works and then i'm going to choose my my my table name and so for my table name i'm going to grab product category here and once again i'm going to go i'm i will go ahead and take the schema from the database so i'm going to bring in that information the metadata what are the column names what are the data types and then i will go ahead and click ok down here at the very bottom and that creates that data set that we need like so and so now i can go back right here to my data flow and we're going to add two sources one is going to be our product and then one's going to be our product category so i'll do product first and down here at the bottom i'll give this a name i'll call it source products and then for data set i'll go ahead and grab the product data set that we already created right so that's all i'm going to do here i could turn off allow schema drift if i want to be metadata rigid i want it should always be the same columns they should always be the same data type i could turn that off leaving it on gives you more flexibility if it ever changes so it depends on what you want i'm going to turn it off all right then i am going to go ahead and click on the other source we have here and i'm going to call this one source product categories all right and then for that data set we're going to connect to our product category data set that we created that brings all the information from that table once again i'm going to turn off allow schema drift i don't want that on for this example and then so now we have two sources we want to join them together we want to do a join if you're in excel we want to do a vlookup right if you're in power bi query editor we want to do a merge so the way i'm going to do that is i'm going to click right here where we have this little plus button and this is where you can choose to add additional transforms so this is very different than visual studio it's very different than even pipelines you don't get a list of all the transforms you can't just drag them in you actually add them like this and so i'm going to click right there on that plus button and we are going to bring in a join condition now i wish we could dive into all these different transforms because they're super cool but there's other things in here like derived column and select in aggregate and creating a surrogate key if you just want to kind of create an index value there we're going to go ahead and grab the join right there and what i'm telling is i want to join products to product category so we can get the category name and we can get the category id so then down here at the bottom let me pull this up a little bit we can choose the type of join we want to do i won't get into what those different types mean because of time but i'm going to choose left outer keep everything from the product table even if it doesn't have a match in the category table right that's what we're going to do and then we're going to do a join on the product category id so i know because i know this data that the join condition is going to be on the product category id so we'll go ahead and grab that and that's kind of weird oh i got to choose my right stream i'm sorry so the left stream the first input is products the second input is going to be product categories i missed that and then we're going to go ahead and select our column here and it's going to be product category id all right so we've done our join condition and we're done now i know in ssis if you were doing a lookup you would do the join and then you would choose what columns you wanted this is a little bit different it's a separate operation so we've done the join we have 22 columns now i want to do a select operation that gets rid of all the columns i don't want so once again zooming in we'll click on the plus icon right here we get a list of all the transforms available and then the transform that we want is going to be called select very simple so we'll click on select and there's two different things we can do with select we're going to go ahead and get rid of the columns we don't want and then we're also going to rename columns and give them an alias here as well so getting it better and more prepared for the final destination so with select being selected there we're going to come down to the bottom and i'm going to drag this way up here i'm going to pull up my properties window all the way up and i'm going to start deleting columns that i want to get rid of right and so if i go down to the very bottom here we have some things like modified date i don't care when the product categories were last modified i don't care about the guide i do care about the category name we want to keep that one so i'm going to go ahead and alias that real quick right here and we'll call that category name like so i also want to drag this up to the top i want this to be at the beginning of that file so i'm going to grab it and pull it all the way up to the very top here and i'm going to move it up even another step like so and then let's get rid of some other stuff we don't care about i don't care about parent product category id i don't care about product category id from categories because we have that in product and then i'm going to get rid of that like so and so this is what you would do you come in here and you get the information that you need get rid of what you don't want and then i do want product category id so i'll drag that to the top and so now we have about 10 or 12 columns here and this is what's going to show up in the final destination when we get done here in just a little bit and so that's what the select transform does now you'll notice that we can preview our data here and this will show us a preview of what we've done that only works if we turn on dataflow debug i'll talk more about dataflow debug in a minute because we're going to have a pause and it'll make sense to talk about it then but you only get this option if you've spun up a databricks cluster in the background that you can use for debugging purposes and that's how you do it now we've brought in our source we've brought in our products we've done a join we've gotten rid of the columns we don't want now i want to use a derived column in azure data factor this is probably the most common transform in ssis and i'm sure it will be here as well with the derived column i want to check the category name that we brought from the product category table if it did not return a value if there was no match it was null it was unknown we want to replace it with something else right and so we're going to do just a very simple expression here in the derived column transform to make this work so let's do that we're going to dive back in here and i'm going to click on that plus icon once again and i'm going to tell it that i want to do a derived column right here under schema modifiers and that's going to open up this window across the bottom my incoming stream is going to be the last stream right here called select one so that's correct and then you'll notice i can either add a new column or i can replace an existing column i'm going to go ahead and replace an existing column or the value of that existing column which is category name so i'll click the drop down here i'm going to grab the category name luckily for me i moved it to the top so it's right there and then over here under enter expression i'm going to open up what's called the expression builder this is a little bit different than what you saw in the pipeline but i'm going to click on open expression builder here and you're going to see there's a lot of functions i mean there are a lot of functions in here that you can use to clean your data and get it ready for to really curate that data right what i'm going to do here is i'm just going to use a very simple one called if null and what this does and it does give you a definition if you hover over it like so here we go make sure that shows up on the screen it checks that the value is not null so if the value is not null meaning it returned a value it will go ahead and replace it'll return that value if it is null it'll return my optional result so i'm going to do that i'm going to grab that and that adds it up to my expression the next thing i want to do is i want to check the category name so i'll go over to input schema which is all of the columns from my table and i'm going to bring in the category name right there and i'm going to say look check category name to see if it's null if it's not null good keep the name if it is null then i want to return something like not applicable right and so i can do that and that is how we would come in here and replace those null values very simple you can do this for all your columns you can do those data cleansing operations and really clean that data up but we're going to click save and finish and then we need to create our destination and we once again want to load this into an azure data lake and we want to create a file in our azure data lake but there's a little bit of a limitation here as far as the way a data flow works and the way a pipeline works in a data flow remember in the background what's it using it's using azure data breaks and azure data bricks is manage spark manage spark is designed to work with big data so when it's writing data it's doing it in parallel operations and when it's reading data it's trying to read lots of small files that's going to work better for performance and so if i do a sync destination right here by clicking that button and then going to the very bottom and grabbing a sync the default behavior of this sync destination is that i can't give it a file name it says don't give me a file name we're going to create random file names so that we can write this in parallel and improve performance and that's good if you have big data but if you have small data files you can give it a name and you can kind of make that a single threaded operation right so you can combine the data back together before it writes the data and you can give it a name so let's do that real quick so we do need to create a new data set so instead of creating one from scratch what i'm going to do is i'm going to take the data set right here called product list from before i'm going to click on the ellipses and then i'm going to tell it that i want to clone that existing so i'm making a copy of it and when i copy it it'll give me the copy across the top let me make sure that that's the one it is and i want to give it a name a better name than copy because this is really going to be my category list there we go and with that data set being created i need to go ahead and get rid of the name because it will complain about the name i don't have time to show you the error message but it will complain so we're going to get rid of the name and then we're almost done with this demo and then we got azure synapse we're going to jump in and talk about azure snaps because it's really cool so that is going to give us a blank name for this file now when i go back over to my data flow here and i go back into that sync down here across the bottom we need now need to go ahead and set this up and configure it so i'm going to give it the data the data set name category list and then what i want to do is i'll get rid of allow schema drift i don't care about that or leave it on it doesn't matter over here under settings though i'm going to make this output to a single file when i do that it's going to tell me you have to set this up as a single partition it can't be multiple files being written you have to set this up as a single partition which can affect performance so we're going to go ahead and do that i could also go to optimize and done that manually and now let's give it a name real quick product category.csv and so what we've done let's recap real quick what we've done in this data flow because it's done we're ready to go we have brought in two separate data sources and we merged them together so we could get the category id and the category name then we got rid of all the columns we didn't want so we select them and through that process we renamed a couple of columns we used derived column transform to clean some data up get rid of some bad data remove those nulls and then finally we use the sync destination to load the data into azure data lake now how do we run this well i can't run the data flow by itself i can't run it so what you have to do is you run the data flow from a pipeline so we're going to actually create another pipeline real quick right here across the top i'll click on the ellipsis over pipelines we'll create a new pipeline and on this new pipeline i am going to bring in the dataflow activity and we'll drop that right here in the interface right so i'm bringing in a dataflow activity it asks me if i want to use an existing or create a new one we've already created it so we're going to use existing here and then i am going to choose the data flow i didn't give it a name i should have but that's going to work so we'll say hey let's execute that data flow then what i'm going to do is up here across the top i'm going to tell it let's run this in debug we've already got the dataflow debug running so let's run it this will take a minute and it'll take about a minute and a half to run probably and i'm going to click the background here so we can see the output all right so we got a moment to pause and freeze while this runs what's going on when you run a data flow a data flow runs against a databricks cluster in the background microsoft manages literally all of that for you so you don't have to go to databricks and create clusters and keep them running all the time for your pipelines at execution time microsoft will spin up a databricks cluster in the background it will run your code against that databricks cluster and then once it's done running you have your results and it shuts down that databricks cluster so you only get billed for what you use in order to debug this and test this in data factory i had to turn on this data flow debug up here at the top that data flow debug button is me manually turning on a databricks cluster for testing purposes so that's running in the background the benefit of me turning that on is if i did not turn that on and catch that about seven minutes ago that would not be on right now and we'd have to wait about seven minutes five to seven minutes for that to provision so you wanna make sure you turn that on and so this is running this against a databricks cluster it's queued it up it takes about should be done here in the next few seconds but it does take a little bit longer because in the background it's doing all that work against databricks now if you have a lot of data it's going to obviously give you significant performance improvement because it can scale that out across multiple computers multiple compute nodes within data bricks all right so we'll refresh this a couple more times and then we'll go over to azure storage explorer take a look at that and then we're going to have a conversation here over the next 15 or so minutes where we talk about azure synapse how does that fit into the picture because that's the big topic right now in the world of azure is the release of azure synapse in general availability all right and then i think we have a question here can you reuse data flows across workflows are you being billed so you are being billed when you turn on the dataflow debug you are because it is spinning up resources and you are getting charged for that i've never seen it be a lot of costs so obviously microsoft is going to spin up the the smallest cluster that it needs to spin up so that you're not getting charged a lot of money but you do can you reuse data flows across workflows you can use them across pipelines absolutely and that is phenomenal because when you combine the parameterization capability the expressions capability and then the ability to turn off schema so you don't have to know what the schema is so i can pass in products for one run into the source i can pass in employees for the next one i can pass in sales for the next run when you have that schema drift capability it can be massively dynamic whereas back in ssis it couldn't be right back in ssis you had a package and you had a data flow in that package and it was only used there now we at pragmatic works actually had developed some software around making those data flows more reusable but data factory has it right out the box which is awesome now let's go check and make sure we have our product i'm going to go over to our storage account this way this time let's go and look at right here our containers i dropped it probably in the products table so let's refresh it let me go check where we put that because we created a new data set so we have to look at the data set that we created for our category list it's going to the products in my azure i might be in the wrong storage account that's the problem with having a backup here all right let's go over here and refresh it i think this is the one we're looking for there it is so there we have our product category and now we have our category with the name and we've able to combine those and clean up any null values and all the columns that we don't want all right that's data factory there's way more to data factory i have at least 10 or 11 videos on data factory and i'm releasing new ones every month not every week but every month i have a new video that i release so check out my youtube channel or even better check out our on-demand training at pragmatic works where you can go through our intro and advanced course and it's a lot of material now i want to take a look at azure synapse because it's awesome i've already provisioned it in advance it's super easy to provision but i knew we were going to be running we're actually running exactly on time where i thought we'd be so we won't get into azure key vault we won't get an azure key vault but we will talk about azure synapse analytics so if i go back over to my resource groups i have a resource group for backup here called my azure workshop learn with the nerds and in there i've already provisioned an azure synapse workspace so what is azure synapse azure synapse if you've noticed from this three hour presentation anytime i wanted to go look at my storage account or my database or my server or create a pipeline in data factory or go if we want to build power bi reports we have to go to all these different windows we have to go over here and over here and it's easy to get lost in the navigation experience really really easy so what is azure snaps azure synapse is what microsoft has is called a unified analytical platform where i can build my power bi reports i can interact with my data lakes i can build all of my data factory pipelines i can build notebooks for my apache spark you can do all of that in a central location without ever leaving that screen so it just makes your overall experience a lot better your data analysts your data engineers your data scientists they're all going to be able to use this same workspace and you're kind of working in that same environment without having to go to all these different tabs and so microsoft is going to continue to grow on top of this and so what we've done today with data lake as well as with our data factory pipelines we can do everything that we've done in all these different windows we can actually do that right there in azure synapse now there's a lot more to azure snaps but this this is me giving you a quick introduction to it at pragmatic works we do a lot of analytics in a day events for microsoft which is a full day event for azure synapse analytics and so we do a full day where you have a demo environment and you can explore it and so take a look at microsoft take a look at those analytic in a day events and maybe go join one of those if you're really interested in this but i'm going to give you a quick introduction so i'm going to click on that workspace that i've previously provisioned just like data factory once you open up azure synapse you're then going to click on open up synapse studio so right here in the middle of the screen it's a little hidden it used to be up across the top when it was in preview and you could go open synapse studio here but it's not there anymore we're going to click down here at the bottom and click open and that opens up a new tab right just like data factory in fact you're going to notice that this experience looks very much like data factory did so we're going to let azure synapse run and we're going to let this load up and so we should see it here we go all right so this is the introductory screen this is kind of like the home page here let's talk a little bit about what you see and what you get with azure synapse we'll do some quick introduction to different pieces here if we come over to the left you see this navigation pane this looks very similar to data factory right you have that collapsible pane and inside of that notice that we have monitor and manage exactly like data factory we have integrate for building pipelines exactly like data factory but there's a couple of other tabs here as well and so what i've done in this azure synapse studios i've already built a couple of items that we can take a look at but first of all let's go over to data under the data tab this is where i can link my different resources and so you'll notice right now i have two different azure data lake accounts linked to this azure synapse environment two and i could link more right i could come up to the top here and i could say you know what i want a link to some additional i could add additional data lakes here and so that's one thing you can do you'll also notice interestingly enough in the data section you can create data sets now these data sets are for use in data factory pipelines so this is where you would do it if you wanted to do this in synapse and see that i've created a azure data lake so that's the great thing about naming this that's an azure data lake so this is my product file and this is my sql table for the products so we have the two data sets already created right there if i go over to develop actually let's go back to data here's another really awesome thing about azure synapse you can actually run queries on demand against files that are sitting in your data lake so you can do that data exploration and understand what are those files what's in those files so for example let's see what we have in these data lakes right here if i go over to my products hopefully i have something in this one there we go we got one file in this this this file right here so i'm going to go into the data lake go into that container and you'll notice i have a product list.csv file if i right click on it and i say newsql script i can actually write a sql script against that that file and just get a quick idea of what's in that file without having to leave without having to go over to some other tab like azure explorer or some other window in azure so i can run a top 1000 rows and then i can tell it to run it and i'm running it against a built-in sql on demand instance so in synapse you will get charged for that instance that those virtual machines or whatever they are that run in the background that give you the compute the cpu the memory the resources to run this but here it is i can now see that file live and in action right here inside of azure synapse right and so that's pretty cool that's pretty cool you can also do other stuff let me go up to the very top here it created a sql script if i wanted to save the script i could but i'm going to go ahead and close that script down by clicking on the x button right there close and discard those changes and then i could also right click on that product list and i could do something like a new notebook right so if you want to create an apache spark notebook directly from that file that's sitting right there i can just tell it you know what let's just go ahead and create a new notebook and load that to a data frame so i can start working with that file whether it's writing python or c sharp or spark sql or scala those different languages you can create a notebook right here you can also create a new spark table you can also create a data flow and this is yes the same data flow that we were just looking at in data factory it's the same type of data flow that does the same thing so we can create a new data flow using this essentially as a data source so this is really cool you can also just say hey you know what i want to create a data set for my data factory pipeline instead of going through all that work you can just right click right here and that file becomes a data set that you can of course modify after the fact so this is really awesome functionality that just lives here and exists for those files that are in my data lake now that was the data lake section here's a couple of data sets i've already created we'll come back to that under develop this is where you will store all of your sql scripts that you want to save in your synapse studio all of your notebooks that are running your apache spark jobs your data flows that you're using for your data factory pipelines that you're using in data factory your spark job definitions and then you can also do something in here called browse gallery browse gallery takes you out to really a bunch of different templates that are available templates that are available for sql scripts notebooks data flows and so you can leverage those templates which i always find to be a great thing right templates are always great because they give you an idea of what other people have done and you can either reuse it or you can break that down and dissect it and then do some stuff yourself so that right there is where you can go if you want to browse galleries and you would do that by just clicking right here on that plus icon from the develop tab now we're not going to do anything with the develop tab today but that's where it is by the way this is something that wasn't in preview but is there now you can also set up your code repository for azure synapse and so you can save all the work and code that you do directly to your code repository whether that's in github or azure devops as well and then over here under integrate we can build all of our pipelines once again just exactly like we did in data factory then we have monitoring where we can look at all of our pipeline runs all of our scheduled jobs that have run all of our run times to see how their performance has been and their resource utilization and so all of that cool stuff shows up right here under monitoring we didn't get into that today but the monitoring is very easy to understand it's very robust very good job by microsoft there and then we go over to manage now manage is very familiar to us it's the same managed tab that we had in data factory but it's got a couple other options because we're in azure synapse and so hopefully what you're seeing as we go through this and i'll run a data flow here or a data factory pipeline in just a moment but hopefully what you're seeing and what you're deriving from this is that microsoft is trying to give you a lot of the functionality that already exist in azure they're trying to give it to you in a central location a unified development experience so that you don't have to do what we've done for the last two and a half hours which is go back to azure click on this link open a new tab go back to azure go back to your resource group open a new tab now i can come into this synapse workspace and i have all of these different hubs the home hub the data hub the develop hub on the left hand side and i want to build a pipeline i go here if i want to look at my data lakes i go to the data hub if i want to develop some sql scripts and do some notebooks i go to the develop hub right we can go to these different hubs in here and we don't have to go all over the place to do that you can also build your power bi in here now i'm on a sponsored microsoft account so i'm not logged in under my pragmatic works account so i can't quickly get in and do that but if i'm under manage here and i go over here and go to my linked services right here you'll notice i've already created about a couple of link services in preparation for this event and if i go over here and click new you'll notice across the top it says hey you can connect to power bi and so i can connect to power bi with my pragmatic works account if i were in my pragmatic work subscription and then i can connect to a workspace and i can build power bi reports right here in azure synapse workspace now once again there's more to synapse obviously than what i'm talking about in a 15-minute kind of run-through demo here at the end but this ties back into everything we've done today and then there's additional capabilities on top of this this is really cool this is really exciting and this is a big thing from microsoft that was released just this month in december i think it was december maybe it was at the end of last month i think it was this month though so what are we going to do well i'm going to show you how to run a pipeline real quick i'm going to show you the work that i've done run a pipeline and then if there's any common questions that i need to answer we'll answer them if not we will wrap up this event so over here on the left under manage what i've done is i came in here and clicked on new and i created a two connection managers primarily one to my azure data lake learning with the nerd events and then one to my sql database right so those are the two link services i created so if you remember back to data factory i went through the same process i created those two link services and then i did that so i've got the two link services there now i see a question from brian here so i'll take a look at that here in just a moment once our pipeline runs and then we have a couple of data sets data sets are doing the exact same thing they did in data factory i click on the ellipsis right here i tell it new integration data set and i go through the same process that we did in data factory so the experience is identical right and i create those data sets now we've created our connection managers we've created our data sets now we go over to integrate and under that integrate hub we're going to come into our pipelines and we're going to create a new pipeline so let's see ricardo said can you execute ml here so what you can do here through ml is going to be through the develop tab and through develop you can create your notebooks and you can i think that you don't have r right now as a primary language but you have python so any of the libraries that are out there you can install those libraries and you can run any of that code through that and then the other question we have is can i connect to azure synapse and azure data factory to an on-prem github repo i don't know about that i have not tried to connect to on-prem github i've always actually i don't even use github i've only done it for a couple simple examples i always use azure devops so unfortunately i'm the wrong guy to ask for the on-prem version of the github repository i know you can do it with the cloud i've done it with that i'm not quite sure about on-prem so i apologize for that one but yes so let's go ahead and create a new pipeline real quick and then you'll notice i've already got one that will run here in just a moment but when i create that new pipeline right here i can come over here and say hey i want to create a copy activity connect to my data sets and we're done now this one right here is already doing all the work but let's just change something real quick so i'm going to go back over to data we'll go into our azure data lake product and i'm going to give this a new one i'm going to call this live event right so this is our live one that's going to get created right now when i run this pipeline in just a moment so we've given it a new name i'm going to come back over to my pipeline and i am going to let's run let's run this existing pipeline we'll get rid of this one i just created a second ago delete pipeline one yes and then we're gonna run this pipeline in debug mode right here let's give it a moment this should run pretty quick now here's the beautiful thing about azure snaps i don't have to go back to azure storage explorer i don't have to go back over to the azure portal and find my azure storage data lake account instead what we can do is we're going to check our storage account right here in this unified development experience right and so you see that it ran successfully if we take a look at the screen here we see it ran successfully it took about five seconds there and now how do we verify that it worked well we go back over to data right we go back over to data we go back over to our data lake account and so i'll click on my products here and then i'm going to refresh this and i'm assuming that i have the right account hooked up and connected here so let's find the refresh button here refresh and there we go so now whoops it's not break time now we have that new file right there it is a live product list we just created it and so the great thing about azure synapse forget about all the other improvements that are coming out that they've made is that all of these different things that we used to have jump between all these different screens we can now do it all in one place one location and microsoft will continue growing on this experience which will be a really beautiful thing really beautiful thing all right are you charged for the pieces or are you charged for the complete package if you're talking about azure synapse they're all billed separately so they're built the same way that they are back in azure so for example a data factory pipeline execution you're charged for the data integration units for that run the total integration units used and so even here it's still broken out and on your bill it is a separate line item for data factory integration units and your azure storage account your data lake account you have a separate line item there for any egress charges moving on between different regions and you have your separate line items for writing data and reading data so you are absolutely build and charge for the different pieces individually and that's because that's the way they're set up so microsoft can't really undo all that and bring it together here so great question so that wraps up everything we covered today i'm going to real quick here put my contact information on the screen once again that is in those class files for you as well and then if there are any questions brian that are pretty common questions that we've seen over and over again let me know we'll see if we can do our best to answer them anything we can't answer what we'll do is we'll try to answer them in a follow-up blog in the next week or so we'll try to get it out pretty quick but i know my schedule i came off a 10-day vacation and so i am catching up on a lot of things here and i have a lot of stuff coming up in the next week or two but i'll try to get that done for you guys where we just do a question and answer kind of blog and answer a lot of the ones we couldn't get to today because i know the chat has been blowing up throughout the entirety of this presentation and azure is huge it's a huge platform hundreds of services how does this affect me my job my company all of those things are going to be things that we talk about can you do auto scaling so you can for some services right so it depends on the services that you're using you can do auto scaling there is capability in there for example with azure data bricks and azure data breaks you can set it up so that you say all right minimum compute nodes is two maximum workers is eight and it will automatically scale as necessary so if you're not doing anything it sits at two and it scales up to eight when necessary and so can you do auto scaling in azure absolutely but it depends on the service different services offer different capabilities for that yep great question sam said are these batch processes and do we schedule them somewhere great question let me jump back over there real quick for you so the way you would schedule these processes and they are done in in a batch right but the way that you would schedule them especially for azure data factory is under manage and you can do this back in data factory or in azure synapse in fact let me just do this in data factory because um synapse might be a little bit different there and i want to make sure in the one minute that we have i can get this done under manage here i can go down to triggers and i can create what's called a trigger which is a schedule right and when i click new it'll open up a window over here and i'll say you know daily schedule and then i'll tell it we want to run this on a schedule give it a start date and then you tell it how often you want it to occur so i want this to occur every day and so it's going to run every day and then i tell it what hour so i'm going to say i want this to run every day at 7 am all right and then no ending date so now i'm done i've created a trigger you saw easy that was super easy super simple so that right there is going to be step one step one create your scheduled job step two you now have to actually go back to the pipeline and attach that pipeline to that schedule so i would go back over to the author tab here and i would find a pipeline like the copy product list and then across the top right here i would tell it essentially that i want to add it to a trigger so i would click right there and i would say new or edit and i would click right there so first you create the schedule then you come back to the pipeline and you go into add trigger and then i just choose the schedule i want to add it to and it gives you all the properties again if you want to change them i don't want to change the original schedule so i click ok and i click ok and that's it so now you'll see up here it's highlighted in the screen it's highlighted as part of a schedule so that will now run every day at 7 00 am as part of the schedule and so yeah that is how you schedule them for azure data factory great question thank you sam for that all right all right that's going to wrap it up thank you guys for all your questions thank you for attending this event by pragmatic works we love doing these events they're always fun we love giving back to the community so let us know what you thought remember we have an event coming up here in just two months with brian knight where we're going to be doing power apps and that's a three-hour event and i don't know if there's anything cooler than powerapps so make sure you sign up for that and until next day our next time enjoy
Info
Channel: Pragmatic Works
Views: 139,830
Rating: undefined out of 5
Keywords: azure data services, microsoft azure, azure services, what is azure, iaas, paas, saas, what is azure data services, azure components, azure storage, blob storage, data lake store gen 2, azure sql database, azure sql, azure data factory, azure synapse, microsoft azure cloud services, azure full course, new to azure, intro to azure, azure database, azure data services and storage, azure data services overview, azure data services architecure, azure overview, azure demo
Id: 6bM8LPsVoic
Channel Id: undefined
Length: 179min 5sec (10745 seconds)
Published: Thu Dec 10 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.