Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome back my name is mitchell pearson in this video we're going to take a look at creating secure connections in azure with azure data factory and azure synapse specifically creating secure connections over to our azure sql database creating secure connections over to our azure blob storage accounts and so if you like these types of videos on azure data factory on dax power bi power apps make sure to take this moment hit that like and subscribe button down below and if there are specific videos that you're interested in let me know in the comments tell me what you're interested in i'm always looking for some inspiration for future youtube videos as well so today we're going to be specifically looking at managed virtual networks inside of azure data factory and then private endpoints and what this does just as a very quick kind of high level understanding is it gives you a way of creating a secure connection between data factory and the different data stores that you connect to the inspiration behind this video is recently i was doing a three hour introduction to azure video on the pragmatic works youtube channel so you can go back and take a look at that and one of the questions i got from the thousands of people who are on that live stream was how can i securely connect from azure data factory to my azure sql database because there's other ways of being able to create a connection here you can allow azure services to be turned on you can try to go through and add the ip address that is using is being used to connect but those are really not going to solve the problem of trying to create a secure connection to make sure that all of that data is transferred on the backbone of microsoft and it is secured and it's moved securely right so one of the things and let me just full transparency here i am not a networking expert i'm not a networking guru i've used virtual networks in azure quite a bit between you know my virtual machines and my different resources to lock those down and there's a lot of work that goes into that so what we're talking about today is virtual networks yes but specifically managed virtual networks within azure data factory and these managed virtual networks take away a lot of the decision making and a lot of the overhead administration that you might have to do if you were managing the virtual networks yourself so we're talking about managed virtual networks within azure data factory this is really cool now we also are going to be taking a look at synapse analytics because it's a little bit different in the original setup and configuration than what it is here once you got it set up good to go but there is something that we need to take a look at in azure synapse in fact in many of my videos going forward on data factory pipelines data sets data flows i'll be doing those probably in the synapse interface instead of in the data factory one because i think long term that's where everything is going so let's take a look at the problem that we're talking about here right i am inside of azure data factory right now and if i go over to the manage ribbon over here and i open up one of my sql connections so i'm going to open up this guy right here azure sql to an adventure works database that i have you'll notice that i am creating a connection here to my pw mitchellpearson.database.windows.net and i'm connecting to my adventure works database so i'm connecting to that sql server and i'm connecting to that database i put in my username and password and i come down to the bottom and i test connection and i'm going to get a failure this is very common you run into this a lot when you first start with data factory and if you look at the option here it's going to give us an error message that essentially says look we cannot connect to that server because either your username is incorrect or down here i cannot open the server it's essentially a firewall issue and this is your first line of defense when you're working in azure right we want to put firewall rules in place to say look if you're at some coffee shop you're at starbucks and you're drinking coffee and you're hanging out there and you accidentally leave your computer open somebody can't just log into or try to authenticate to our azure service because guess what we don't allow those random ip addresses to have a way to connect to our azure sql database and so that's what this is we're trying to create a connection to azure sql database azure data factories like nope not you know azure sql database from azure data factory and azure sql database says no we're not going to have any of that so i cannot connect now how do we fix this without managed virtual networks without private endpoints a little less secure right how do we fix it well either a we can take this ip address and we can add it to the firewall rule of that sql server database or b we can turn on allow azure services across all of azure so let's go take a look at that real quick and i'm going to talk about both of those why we won't do them very quickly briefly here and then we're going to jump straight into setting up the manage virtual network but i want to make sure i'm explaining the background behind this a little bit because most people watching this probably this is a little bit brand new to you and you probably need to understand the why behind why am i going to use a virtual network that's managed by microsoft so we're going to go back over to my youtube channel my youtube resource group and let's open up our server which is right there there we go we'll open up our server and in our server we're going to go take a look at firewalls and virtual networks all right so i'm in my ilm or pw mitchell pearson so pw stands for pragmatic works mitchell pearson sql server and then down here i have my firewalls and my virtual networks and so what we're going to do here make sure i'm not in the way on the screen is we can come in here and we can add an ip address we can create a rule that says that ip address that's trying to authenticate from azure data factory they can authenticate it's okay and so i could put in that ip address that we saw just a moment ago the problem with that is that that is a dynamic ip address that changes every time azure data factory tries to authenticate to my sql server so if i put it in right now yeah it might work for that one test connection but later on it's not going to work so that's not a good option here the better option if you want to make sure that connection always works is you come up here and you turn on allow azure services and resources to access this server and that will work and that's what a lot of people do they come in here and they turn that on and they think everything is right with the world however what does this mean this means that any azure resource anywhere in the world for any tenant any organization can connect to your sql database they can get through that first line of defense and they can try to authenticate so if i were on a virtual machine right now and you have this enabled on your server at some other company at companyxyz and i'm on a vm within the azure network i could try to open up management studio and connect to your sql database i could just start typing in username and passwords and hope that i get lucky obviously more sophisticated hackers have a better program than that but that is what this means and so for a lot of companies they're like look we cannot turn on allow azure services to all azure services that's not a good option so adding the ip address is not a good option turning on azure services when we're talking about in production we have federal regulations we have security risks right we have you know lawsuits can be brought against us if our data gets leaked out this is not a good option then what do we do well short of letting a really somebody who knows networking backward not me not me somebody who really knows networking backward and forward setting up all this stuff and managing it we can take advantage of managed virtual networks and locking all of this down this is awesome and it's really easy to set up and configure so that's kind of the background here on what we're doing let's jump in now and take a look at this so we're going to go back over to azure data factory and what we want to do you notice right there the connection doesn't work it's going to work here in a little bit there's a couple of steps we need to do one we need to create an integration runtime in azure data factory that is ran inside of a managed virtual network that's one secondly we then need to come over here and create a managed private endpoint for that virtual network so we create essentially a private ip address that's two create the integration runtime add the private endpoint number three is we need to go over and approve it so we go over to our private link center and we approve that because it will be sitting in a pending state once approved we can now create connections using that and it's going to be awesome so let's walk through this process together when we're done at the end we're going to go over we're going to transition over to azure synapse and i'm going to show you a slight little difference there as well so the first thing that i want to do is we need to go ahead and create an integration runtime now if you've worked with data factory before you know probably what an integration runtime is and it is essentially the compute the resources in the background that is used to move the data between different data source so if you're moving data from a blob storage account over to azure sql database there's some resources that need to be used memory cpu so on and so forth and your integration runtime is what moves that and this is how you get build primarily in data factory so we need to create an integration runtime where that compute is running in a managed virtual network so we're going to do a new integration runtime here and then we're going to choose azure self-hosted because it's going to be one of those it's not an ssis lift and shift scenario so we'll do azure self-hosted and click continue and then we're going to grab azure and click continue again which by the way continues being blocked by me so we'll go ahead and get rid of that we'll click continue again and then we just need to give this integration runtime a name so we'll call this our azure ir and then i'm going to call it you know managed v-net right so i'll give it a name here that makes sense so that when i am in there and i am creating my uh connections that i make sure that i'm using this integration runtime and not the default now there's another thing i want to talk about around this default i'm going to hold that till later in the video as well we'll come back to that so for now we're going to choose the manage vda how do we turn this on how do we create this within a managed v-net well that is right here so where it says virtual network configuration we're going to turn this on to enable and by clicking that button it's essentially taking this integration runtime and putting it into a managed virtual network that microsoft manages so we don't have to worry about setting up the subnets we don't have to worry about turning on certain service endpoints or any of that this is going to be managed for us automatically so we're going to enable that and then we're going to turn on this option right here which says enable interactive authoring capability after creation what does this mean well what it means is exactly what you see right there whenever you are testing your connection to your linked services whenever you're browsing a folder whenever you're setting up a data set whenever you're previewing data importing a parameter or doing any of those things because we're using an integration runtime within a managed virtual network there's a little bit of there's compute that's assigned to that and so what this says is look if you want that to be more efficient and happen quicker you want to go ahead and enable interactive authoring and so what that means is that microsoft will essentially provision compute for you so that when you're developing and you click on the preview button or you browse or you test connection it happens a lot faster than waiting for those resources to be provisioned so you can then do that it's very very similar to data flow debug remember when you're working in data flows and i've done some videos on this when you're working in data flows you will enable the data flow debug capability and what dataflow debug does is it spins up a databricks cluster in the background so that when you're doing things like previewing data in data flows you don't have to wait five or six minutes for a cluster to be provisioned that's kind of what this is doing here and it has a time to live or an automatic termination of about 60 minutes so once you get in here and you start doing work it will automatically provision some compute in the background that you can work with making development a lot faster a lot cleaner a lot easier and then within 60 minutes it will automatically de-provision that and get rid of it so you're not having to worry about that that's what that means so yeah there we go the next thing i want to do is go ahead and choose a region everything i do is always in the east u.s too so it keeps it very simple and then there is another option here around data flow runtime i have done a youtube video on this really on optimization of debugging so when you're in development how can you reduce cost and i showed how to create an integration runtime specifically with kind of limited resources around the data flow runtime and your integration but i'm not going to worry about that here so that's another topic for another day now this is it once we have done this and we click create what's going to happen is it's going to go through and create that integration runtime it's going to create it with that managed virtual network as you see right here so if we zoom in you'll notice that we have a managed virtual network that's what this ir does which is awesome so the ir has been created here it looks like it is now running which is perfect but we're not done so now once you create that managed private virtual network or that managed virtual network through azure data factory this is really easy the next thing we need to do is we have to go over to our managed private endpoints and we need to create a private endpoint that allows us to connect azure data factory to azure sql database azure sql server right so what we'll do is in here we're going to create a new one that's going to open over here on the right and we're going to choose from that list azure sql database so we're going to be creating a private endpoint private ipconnection to azure sql database here so we'll do that then we need to go ahead and give this a name [Music] so i'll do something like you know you'll if you've ever seen me name a resource in azure data factory before i like to tell it what kind of data store is it like is it on-prem or is it azure it is sql and then the name of the database we'll connect to here will be adventureworks dw and that's fine so we've gotten that done we then need to connect to the actual database here so we're going to choose our server and then my server for this one here is pw mitchell pearson all right and then down here at the very bottom once we've done that we'll go down to the very bottom here and click create after creation a private endpoint request will be generated this is where a lot of people get tripped up here a request will be generated that must get approved there's two different places you can go to approve this i'm going to show you both of them all right so we're going to go ahead and click create so creating the managed ip remember that or creating the managed private endpoint here remember that was step two step one create a new integration runtime within a managed virtual network step two create the manage private endpoint step three once it's provisioned we have to approve it all right so we're gonna let that finish up this could take a few seconds and i'll come back here in about all right i'm back so you notice that it says the approval state is pending and what that means is that it needs to be approved and so we can approve this in a couple of different places obviously within your azure service you need to have the right level of permission to do this so it might have to be somebody else who approves these private endpoints for you but what we're going to do is back in azure we can either go to the private link center because we're creating a private link essentially between these two resources or we can go directly to the server that we just connected to so let's go to the private link server first so i'll go back over to let's just go to all services here i'm going to search for private link all right we'll go into the private link server and then you'll see under active connections is it approved already shouldn't be permit approved let's click on that right there then go back i'm thinking there's some residual from when i did this before so let's go back into the private link center private endpoint it shouldn't show up here there it is pending connections all right pending connection sorry about that i went directly to active connections so pending connections is where you can go and select it and then you can either approve or you can reject it right so this is where you can improve it now you can go to the private link center and do it and this is great because you can see all of your private endpoints across all of your resources so this is a great resource and a great place to go however you could optionally i'm not going to do anything here so i'm not going to approve it or reject it optionally what we could have done is back in our youtube resource group right here i can go in and go over to our server and then on our server right here you'll notice that we can go over to our private endpoint connections and right here i can select it and i can approve it so that's what we're going to do we're going to approve that connection real quick i don't want to give it any kind of response and that might take a moment or so for that connection so thanks to the magic of video editing we're going to pause it for just a moment and i'll pop back in here in just a second all right and i'm back it says that the connection has now been successfully approved so we can go back over to azure data factory and we can try to refresh this now if i remember correctly it might take up to a minute to show up here but let's go ahead and refresh and it is going to take us a moment let's do this i want to create another connection so this shows you how to create a connection to azure sql database what about creating a private endpoint to our blob storage account so while we're waiting for that to finish we're going to create one more very quickly here over to azure blob storage and so i'll click on continue and i'll give this a name real quick this will be azure blob and then my blob storage account here is just named mitchell pearson and then i'll grab my mitchell pearson storage account and click on create so we are creating a managed once again and we'll click refresh here to see if that one's approved yet and it's not we're now creating another private endpoint and so here's the thing to keep in mind with these private endpoints if you have 10 different services you're connecting to azure cosmos tv azure sql database azure blob you're creating a an endpoint for each one of those here that need to be approved but now once both of these have been approved from the private link center or from those individual resources once they have both been approved we will now be able to create essentially in data factory we can create linked connections essentially connected man connection managers linked services between those resources and everything is done on the backbone of microsoft using private ip addresses locked down and secure so that's what we're going for here so we'll refresh it it looks like this one's still pending that one is still provisioning and so what i want to do is before i pause the video here i want to go over to the blob storage account so let's go over there first we'll go back to our blob storage account and in your storage account you can also approve those private endpoints just like we did with the sql server and where we go for that is we're going to go down to networking there it is we'll go over here to where it says networking right over here on the left and then we'll look at our private endpoint connections and when i go in there you'll notice that we have one right here that has been provisioned and it says it's pending so we're going to click on approve and then we'll click yes and so now this is a really good point for me to kind of pause the video let that finish let all the approvals go through then we'll come back and take a look at what those connections look like because we're not quite done because we have to create the connections and when you create the connections you have to remember to use the right integration runtime so we'll be back in just a second so we're back both of our connections have been approved we are almost done with this video we now need to go ahead and create our link services and they have to use the integration runtime that's tied to the virtual network that's tied to these private endpoints there's a lot there right so let's do that real quick we'll go over to our link services remember before we were trying to create a connection um if we look at one that already exists so if i'm not going too quickly here we had a connection here that we were using before right here and remember we couldn't connect so i've done all the hard work right well if i go in there and i open that up and i come down here and test connection it's going to work right right of course it's going to work because we just did all that work and then you look in here you're like wait a minute mitchell that's not working we're getting the exact same error message that we were getting before it's not working so what's going on so the problem is that it's not that our virtual network isn't working it's that we're not using the correct integration runtime we're using the automatically created integration runtime that got created when we first created this azure data factory so that's not what we want all right so what do we do well there's a couple of options here and this is what i said we'll talk about later one is we need to go ahead and just switch this over to the virtual network we just created right so it says interactive authoring is enabled so you can look more into that later everything here still looks like it's connected correctly so now we come down to the bottom test connection and boom it works this is awesome this is great it works so there's a couple things here right we got to make sure we use the right integration runtime here's the thing let's say that you're looking at this and i'm going to go ahead and cancel out of here if i go back over to integration runtimes it's going to be very easy for a developer to accidentally keep grabbing this one especially new developers and then you're like what happened fortunately when you are provisioning your azure synapse or azure data factory today you can set it up so that your default integration runtime is in a managed virtual network this is an older data factory that i've had sitting around here for months and months so i'm kind of stuck with that integration runtime that's there and you'll notice that even if i hover over it you really can't delete it because it's there so on new data factories when you provision them when you create them you can put the default integration runtime into a virtual network into a managed virtual network and then you only have one option here so that's something that's pretty cool and you'll see that option under networking it's right there it looks exactly like it looked here now here's the thing this was pretty cool because this was a data factory that started off without a managed virtual network we created one and you saw how it worked and if i created another link service in here let's take a look at that because we do have some time and i want to talk about that how do we create a secure connection to our blob storage account well if we go over to our storage account right here and we go into networking and you look at firewalls and virtual networks most storage accounts by default are open to all networks meaning all ip addresses and that is once again a security risk because it could be easy for someone to try to authenticate to your storage account and get access to your data so what we can do is you can turn this off and put this on selected networks only so that it is not all networks and then instead of adding any kind of firewall rules here or anything like that what we can do is we'll just use the private endpoints and so you can also connect to the blob storage account and by the way this is a really good time to mention that it is recommended that you use private endpoints to create those secure connections between all of your different resources so that you have the best security that you can have here right not opening up all the ip addresses not allowing all azure resources but you use those private endpoints so that first line of defense is really really strong and it's hard for somebody to even get to the point of trying to authenticating so that is what is there and i want to show you what that looks like here because it's easy for me to always forget that i can manage those ip addresses at the blob storage account level but we can now we've taken a look at data factory i promised you we would go look at azure synapse and so that's the next part that we're going to do so back in my youtube resource group we're going to go take a look at a synapse workspace i just provisioned the synapse workspace right before we came in to this video and this is where you're going to see something that's different with azure synapse than data factory that's very very important so let's go ahead and launch the azure synapse studio let's launch our workspace here and when we get in there i'm going to show you that i actually cannot the way that it is currently set up and configured and i just created today i cannot create an integration runtime that is in a virtual network and i'm going to explain to you why because this is something that you have to decide at the time when you are provisioning if i go over here and go down to manage just like we did in data factory and then i go over to integration runtimes right here you'll notice that this is a public subtype it's not a managed virtual network right so it's a public if i go in and say well let's create a new integration runtime to lock this down and make it secure for all of my pipelines that i'm building in synapse i can create a new click on azure here click on continue and then over here you'll see that i don't have the ability to enable virtual network configuration it is disabled i don't have the ability to do that so what's going on why can't i do it is it something i can do in azure synapse or is there something that's wrong and unfortunately the reason we can't do this here is because something's wrong so when you're provisioning a synapse workspace here and let's just create a new resource and walk through the basics real quick so i can show this to you when you are provisioning provisioning in azure synapse workspace and we're going to go into create here and we're probably going to try to skip everything and we'll go to networking under networking you have the ability right here to enable a managed virtual network and if you read all the text here you might think that well if i don't enable the manage virtual network when i create it that's okay i can do it later in fact if you read right here allow connections from all ip addresses to your workspace endpoints you can restrict you can restrict this to just azure data center ip addresses and or specific ranges after creating the workspace but that has to do with just ip addresses choose whether you want synapse manage virtual network dedicated for your azure workspace well once again there's nothing here that really says you can change it afterwards or not so you might assume you can but if you click on learn more we go to the incredible microsoft documentation i say that with all seriousness i always give microsoft when it comes to azure and power bi i give them an incredible amount of um just kudos right they do a great job keeping this documentation up to date and you'll notice that if you come in here and you scroll down just a little bit you cannot change this workspace configuration after the workspace is configured for example you cannot reconfigure a workspace that does not have managed workspace virtual network associated with it so if you create it and it doesn't have a managed virtual network you cannot add it to that workspace after the fact alternatively you cannot reconfigure a workspace with a managed managed workspace virtual network associated to it and disassociate it so if you create it with the virtual network it's there it exists it lives that's just the way it is you can't get rid of it and if you create it without it you can't add it i don't know if this is going to change in the future i just don't know but what i do know is from reading this definition right here you want to make sure that when you provision your azure synapse if you're going to be locking this down which is recommended and you're managing it yourself remember one of the things that we should point out here this was a rather lengthy video for me probably going to be closer to that 30 minute time frame i do a lot of 10 minute videos one of the things to keep in mind is that you can do a lot of this virtual network stuff yourself you can manage it yourself you can create the ip the private endpoints yourself you can create the subnets you can do all of that there's just a larger learning curve involved in that that is not an area that i personally feel super comfortable with even though i have done it because that's not my background it's not something i get excited about i get excited about power bi i get excited about dax power apps azure data factory building and developing stuff i love it that's why i come up here and i work on the weekends or before work and i do that kind of stuff because i love learning that stuff and i do love learning about the networking stuff but i don't get fired up about it but it is something i'm going to be spending more time learning because i feel like that's a little bit of a weak point in my armor so i want to learn it i hope you enjoyed this video give me some comments if you liked the video obviously hit that like button let me know what you thought about the video and let me know how i can help you with other videos going forward like i said before i'm always looking for inspiration on these videos thank you and enjoy and have a good day
Info
Channel: MitchellPearson
Views: 7,237
Rating: undefined out of 5
Keywords: Azure Synapse, Azure Data Factory, Synapse Analytics, Data Factory, Virtual Networks, Private Endpoints, Azure Security, Private Link Center, Azure Links, Azure Private Links, Mitchell Pearson, MitchellSQL
Id: TkLT4HWd558
Channel Id: undefined
Length: 30min 4sec (1804 seconds)
Published: Fri Jan 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.