Azure Purview - Understanding the Elastic Data Map

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello hello and thanks for joining me on another purview day so since the preview came out as your purview there's been loads of interest loads of excitement people trying to do interesting stuff but the one big question hanging over has always been price people saying oh four four capacity units turned on permanently and we'll get on to what a capacity unit is it's been a little bit expensive there's basically been a bit of sticker shock of people looking at it and going i just want to try it out i just want to get started and see if it's right for me i want to use a little bit of it and then if i like it i'll use a lot more of it but it didn't really have that ability to scale when it was first released now a couple of weeks ago back in august there was a big announcement saying they've introduced elastic scale for purview which is fantastic essentially rather than defaulting to four of these capacity units it will default to one and then scale as you use it now if you didn't create your purview account after that point so if you credit your perview account recently you don't need to worry if you created your purview account earlier the 19th of august i think it was then you're going to need to either sit tight and wait and then at some point over september october you'll be upgraded to the new version or if you are impatient like me i deleted my purview account i recreated it so i can demo it to you today because it's not just pricing there's been some changes into pricing and scale there's changes to some of the ui where things are kept and there's some changes around how we can do some security stuff there's also one or two things that snuck in that we can go and get a deeper look at what pervy is actually doing behind the scenes so we've done that i've got my nice fresh purview account i then realized that by deleting my purview account i've lost all my previous demos but that'll be fine well we can deal with that so yeah we're going to have a look at the new pricing concepts make sure we understand how it's going to charge us and then take a look at the new ui bits and pieces if it's your first time here don't forget to like and subscribe stay awhile listen uh pop something in the comments to let me know is this useful are you going to use purview now the pricing changed is that the deciding factor that's made you re-evaluate it were you happy with the price because you've been paying insane amounts for other enterprise scale governance tools and it doesn't actually seem that bad really interesting to see what people are actually thinking of this change so let me know down in the comments otherwise let's go take a look okay so i've got my advancing per2 here so this is my recreated purview account and the main front thing that we can see here is platform size one capacity unit so previously when we created purview we had to choose between 4 and 16 capacity units essentially kind of like servers how many machines you want grinding away in the background to get to where you need to be and it's like uh four i guess how much is that gonna charge me i don't know um so let's dig into that let's talk about capacity units and what they actually mean so there's a load of info about it so when they came out with the elastic data scale they built this whole thing that goes through and explains what a capacity unit is the one capacity unit is 25 operations in a second or two gig of internal storage actually the combination of those that's what you get with one ce 25 operations a second and two gig of storage for all of your assets so when we're talking about purvi and we talk about an asset that's all bits of json right and it's the assets that you've stacked it's the relationship between other assets it's the metadata that hangs underneath those things like the schema objects um it's all the lineage objects those if you think you know maybe i've scanned 100 tables well that's 100 objects and then each of the column values inside there it's the lineage between that column and that column and as part of a data transfer each of the governance objects that you associate to each of the classification objects that you associate to it there's actually a lot of these different object types that are in our data map but the 2gig is still actually quite a lot there's quite a lot of data in there that you can still stick and float behind beneath that kind of two gig layer so if you think about i'm making a lot of changes at once i'm running a massive update and updating every single file in my entire data estate well then i'm going to get more than 25 operations a second and the nice thing is that they think about those two concepts separately so how much storage have i got how much operations do i need how much interaction how many transactions am i making with it so they've got some examples if we scroll down a little bit to talk about the different billing amounts talk about how you're using things but this is the interesting one that kind of billing idea so you've got the idea of your storage amount you're talking about is how much you have to fit and at the moment you go over 2 gig the moment i've got more than 2 gig of assets across all those different json files it has to ramp up the two capacity units because it needs that storage whereas on the flip side if i've only got one gig say of later assets stored and i have a massive spike in transactions it'll scale up but then scale back down again so you've got the kind of flex of the ceiling when it's doing lots of heavy workloads and heavy transactions and then you've got this kind of ratchet effect that'll slowly scale up if you put enough data into it if you have enough assets scanned if you could put lots and lots of stuff in there so have a look through this stuff really useful to understand how to think about how many of these kind of um different compute units uh we're actually using or capacity units cool so that's the idea of a capacity unit we saw that mine's currently at one and i'm not going to try and get it forced it to go up and put a load of data in but just you can go and have a look at that and see what it currently scales at the bigger question going well okay i've got this arbitrary idea of a cu what does that actually mean in terms of cost so under the purview pricing it's not in the pricing calculator yet but if you go to the azure purview pricing page now that we understand what an elastic data map is we could talk about these capacity units we see we've got that price of what it found of 30 pets of 0.307 pounds per capacity unit per hour in terms of what that actually means we just go for a cheeky little manual azure pricing calculator um in terms of the usual pricing calculator if we say something's turned on 24 7 if we say this is going to be turned on all day every day for a given month uh the base unit of estimation they use is 730 hours that is my i'm just not going to switch off that's turned on for a month and yes months have different days and there is flex as an estimate to say roughly how much an average month is this gonna cost 730 hours is our baseline so if we're saying i've got one capacity unit and i'm not doing huge amounts and so i'm not going to scale above that that means our baseline cost is 224 pounds per month to turn on per view and have it with a small amount of data in there and not under massive massive loads that's far more reasonable but when we first looked at the preview and the baseline was four we're saying it's 900 pounds a month and that kind of is just a bit it's like oh that's a lot if i've got like a little demo environment i can build that for about the same my whole environment then the thing to govern it costs the same that doesn't seem right so absolutely much much cheaper now they get away with having is that 225-ish pounds per month um and yeah it's not as cheap as depending on a five pound a month basic as your sql database it is expensive because it's an enterprise governance tool so it's gonna have a price assertion to it but now at least the price is a lot more representative of the kind of workloads people doing that allows you to do the small little things basically it opens up smes you've got small medium enterprises like smaller companies have data management and quality problems too and that means they can use it at these lower levels and it just makes sense so that's where we're at with pricing pricing is a lot more reasonable obviously you can go and check out pricing in your local currency whatever it is um but yeah interesting it has gotten better it's a lot more relatable it's a lot more approachable and you can turn off a bit and then you can't turn it off you can't pause it you have to delete it and then lose all your demo anyway cool so pricing that's what we've got so either if you had a preview beforehand it'll switch over at some point in september october or you can go the brute force way of delete it recreate it and you'll see it in this one capacity units means i'm on the new format of it the other new bits that you'll see when you switch over to this new version there's a couple of bits out here actually you can see things like manage resources so when we're using things like data bricks so databricks when you spin up the database uh workspace it creates a separate managed resource group and when you create clusters and things you'll see it creating vms and deleting vms and creating the vms and deleting vms uh and actually we've got the same idea here for pervy now though it automatically creates a storage account where it keeps your data automatically creates an event of where it keeps your data so actually if you're trying to get an idea of what is this thing what why where is it storing my data you've actually got storage account you can go and see what's going on now like other managed resource groups this is all locked down probably shouldn't even look at this go and see all the files it's creating which is interesting the files themselves are locked down so you can't go in and see what's inside that fire i mean technically you could brute force your way into it and get hold of those things but it's a managed resource group it's not supported to go and have a look inside there but it is interesting because you can mean you can keep an eye on the size of those things you can have an understanding of how much throughput is going through both on your event upon your storage account yeah interesting that you we've now got visibility of that stuff where we didn't have it previously okay so let's go and dive into the ui so ui kind looks the same right no real changes there we still got the same landing page you can see i cannot i've given it a nice name advancing catalog rather than advancing per2 we've got some ability to meant to tailor it slightly for our users and then we've got the normal bits that we've got here from the catalog now the main changes over here so you can see we no longer have data sources we've now got data map but they've lent heavily into you know what let's just show users it's called the data map let's not call it data sources and things there and then as soon as we talk about the actual system call it data maps for the technical people everyone knows it's the data map that is the map of all of our uh data in our estate so when we're talking about adding data sources we're registering scans we're registering kind of a lake and all that kind of stuff that still happens in this data map area so this is again looks fairly similar we've got still got this these idea of collections and then inside collection we've got various different data sources we've registered obviously i lost all mine um but the whole way that we think about these collections has changed so the collection kind of used to just be an arbitrary box yeah let's put those in that collection those in that collection it doesn't really mean anything it's just kind of like drawing a little line around some data sources so i can group them nicely in a diagram now for the updated refreshed version of purview we've actually got this whole collections management piece where we can see a hierarchy of our collections and see what's inside it but most importantly we can manage security here so rather than having security in the management layer as it was previously though i used to have to go out and into the management portal and then assign roles and say you're allowed to read all the data you're allowed to curate all the data we now actually do that on a catalog basis which is far more fine-grained like i said well actually you're a data curator but only of the data that you have access to so we can actually make this nice um federated distributed uh think of people managing their own data we're no longer saying you if you create some data you can do it for all data that we possibly have in the organization which just makes so much more sense so under a given collection we can have admins data source admins data curators and data readers for example actually you can read all the data assets we've got for that nice semantic layer but you can't go and read all the data assets that we've got in the lake because you just wouldn't have a clue what you're looking at because there's folders and there's spark things there's a load of stuff there oh it means we can go to tied it up we can have personas defined across our data estate and map them nicely on these collections so we can do that on each of the collections so we can have a real hierarchy of all these different collections of data sources and then we can set uh limits there so super nice to see that that's a really really nice improvement that we've got this granularity of collections you'll also notice that some of the other pieces that used to be in the management area are now over here as well so we're talking about scan rule sets we're talking about the integration runtimes to go and get data talking about classifications those kind of automatic going that's a bank account that's a person's name that's an email address there's kind of recognizing the type of data that we're seeing uh these are all now living in our data map they are part of that estate which again it's kind of a security thing you're allowed to go and manage the data set but you're not allowed to manage the whole admin for pervy and they're separating those two out those two different main areas so management's a lot simpler than it used to be there's a lot less stuff here because the actual the call purview admin has been separated at and that's mainly the main change that's the big thing is it just changed how we think about collections changed how we think about security change where we do that kind of stuff the other interesting thing is we no longer have glossary knocking around in this side menu so the glossary items if you need to get to it you need to get through the main catalog the same with browsing assets right if you want to go and have a look at your assets you have to go to this data catalog and then you can drill down into assets and draw them and find some stuff same with glossary and go into the glossary terms you can go and see the info you can go and create your term templates your terms all that kind of stuff so that all kind of works the way it used to work it's just a slight change in the ui as to where you go to find things and how you go and set it up and i think that's mainly all the things i wanted to go through really so big big big thing to be aware of is you want to go and make sure that you are on this one capacity unit you are on the new elastic scaling have an awareness of that lasting scale in terms of when are you using it the workloads that you've already got in per view actually how many capacitiveness does that need to be because i've got what 5 000 or so assets that i've registered yeah 5 800. that still fits really easily inside a 2 gig storage limit because it's just bits adjacent it's not a huge amount of data so be interesting when people are starting to map out their bigger states where are you finding that lens in terms of those capacity units are you scanning everything and you've got millions of assets you're like oh no i'm actually automatically already at four battery units anyway in which case the pricing was about right for what you're trying to map why are you doing it like me and going okay well this is tiny it doesn't even fill up one capacity in it cool i'll benefit from that pricing it'll be interesting to see where people land on that scale of scaling yeah cool so i said that is all i've got time for today to run through a little bit of purview changes and some interesting tidbits inside there do take a look do let me know how you get on with it let me know if you like the new security message if you're going to be using it has this changed your mind about whether or not you're going to use purview i would love to hear so let me know down in the comments as always don't forget to like and subscribe and i'll catch you next time yes
Info
Channel: Advancing Analytics
Views: 1,214
Rating: undefined out of 5
Keywords:
Id: J1BUEU07pGQ
Channel Id: undefined
Length: 15min 26sec (926 seconds)
Published: Fri Sep 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.