Azure Synapse Analytics - The first 20 minutes!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome so it's been interesting week this week we've had Microsoft built now it's an unusual one for dota people because it's usually this big developer conference isn't really that much for us most of the time although the mountains this time lot of things around cognitive services lots of AI things to do with fairness and bias and responsibly but the big thing on our side is this whole thing called as your signups so around about the last year you might have heard a visual sign ups analytics being rolled out and it's kind of just a rebranding of as your sequel data warehouse but there's a whole lot more than that it's exactly a giant workspace with a little bit of sequel data warehouse a bit of spark a bit of sequel on demand a bit of data factory and various things plumbed in with the ideas it's just a one single stop platform that's the plan so now for the first time it's in public preview so all of those other bits the spark a bit a sequel on demand there's a TF bit you can now play with the service pieces so the secret White House in g8 be TGA for ages years the new bit is all of the spark and seek them on a bit so I thought I'd had a bit of play so to only this morning but I actually first time in my own subscription trying to spin up a workplace and have it go so I thought you can join me on the journey you can I get it working can I you hook it up to my existing warehouse can I do some other stuff I currently doing data breaks can I do that in sign ups analytics so let's have a look okay so I got my portal set up I think I have already made a workspace this morning there's nothing in it currently so I'm gonna show you quickly how we create one let's have a quick going so assume I can just find under signups okay so we've got two options there we've got the workspace of preview we've got the formerly sequel data warehouse so you can create the normal just warehouse on its own or you can say I want this new thing that simple big preview not fully available yes not fully supported yet and see what it looks like okay so creating a science workspace what do we need so I created a quick synapse resource group normal stuff you'd give it a name now interestingly you can't do like that kind of thing don't you have lower case it's a blob storage in the naming convention so let's go cailli advancing that can do any cosell so it's a nice formatting that's not going in just to fail in one or two little regions and then slowly spread out to the major ones and then eventually we get it in any kind of tertiary regions straight away I seem to fail it all which is great so in my region from subscription so I need to give it a lake so I need to point it at an 80s Gen 2 subscription where it's gonna crate wanted the temporary surfers could they use a save data and that's kind of the root directory now I do already have a lake obviously so you can do that bits a few more questions so we can give it an admin user so that's for the sequel server part of it so when to actually doing the sequel pools I can go and connect as an admin I can use micro studio and all that kind of stuff I've got a mate identity so it's going to go and get certain permissions set up by default so let me click for it it's going to go in and give it some permissions which is interesting could you do that network social network yourself up why not but obviously means that the bits that we've got a slightly more secure but it means maybe plumbing it into other parts of our subscription might be a little tricky but you can see ok so let's do that we're allowing it might be buses I'm not going to tag it cuz I'm fun thing reckon you should tag your resources let's just go and that okay so that's ready to Craig I'm not gonna hit guys I've called it something very similar to my previous one and I should go back and have a look at what gets crazy so notice as you all sign up to my workspace advancing signups know a few things about different end points so I'm connecting to in the actual data warehouse where actors provision a service and leave it turned on the number of computers and all that kind of stuff I've got that end point I want to write a query and there and then have it spin up some compute run the query giving me answer to turn things off again and then we tried me for the day - it passes through correct definitely so I've got two different sequel and services depending on what I'm trying to do and I've got a few different points like to my workspace and that kind of thing so some interesting things a few bits and pieces you can play around with but honestly most of its not here what we have is this button so that launched the sign-ups studio and that's kind of the big flagship thing that they're doing everything is held inside this new studio same at a factory same as data bricks same as all the modern makes your tools really they come with their own browser now this is one of the big arguments that might have to be making about inside map since then it's an all-in-one studio so you can go there and do a bit of data factory and build your ETL you can do all your spiky things and actually make something scalable and quick and robust and working with some data science and some interesting types of data so you can have that sequel this thing now spoken previously about data Lake houses and there can you have a single solution that's a little bit like a little bit warehouse and connect coming together in the middle and this is the attempt this is kind of the answer to that question to say does this work so what have we got give it to pieces so we have data we have no databases development area we've got pipelines area for doing orchestration and stuff I mean I'm really interested in the spark stuff so I'm mainly going to be digging around there so okay in sequel pools its provision does a sequel and demand cluster so by default the one thing we definitely get is sequel on demand and then we'd only paying for that that's just they will pay for it if we use it and it could cause some data through it for me only some spark so let's make a spark pool okay so I'm gonna call this advancing spark because nothing about branding don't want it to auto scale for now okay noted sighs so that's obvious nice the cluster so reaching my workers my executed in that cluster what size should they be I think you're fairly small four and thirty you can do a lot of things on that slider cluster leave it at three notes nice and easy not much else can do there additional settings waterfalls yes definitely after 15 minutes seems reasonable SPARC version one option 2.4 hopefully sparks people I was coming so and that's in currently in beta and good stuff before the pass down the partition proving and all that kind of stuff so be good to see her here immersion and job placement version so dotnet is the interesting things that's not in data works currently so we kind of be a lot of dotnet integrations into sign ups the fact it's in straight away is pretty cool and don't blink more point for it's interesting because they announced no point six during built they said that's now going to be rolled into it so I'm assuming it's some point very soon you'll see a spark pools suddenly updating and saying they do one point six a huge amount of difference between one point four point six few things in terms of how it handles stuff and some other passed down things yeah interesting stuff okay so now we've Ukraine yes please go and do it for me so that's gonna go and Chris but let's do so that's promising it good okay so what can we do I keep developed something so let's say what can I do I can do new things so I'm doing new do new spark job definition I guess but no books the way I normally works let's see what I can do picked up now I have a spark cluster that's great so it automatically you picked it up I'm assuming they're gonna have to turn on some help but we'll see um let's just say this is gonna be a delta test and we can see so the lake I built sign ups on already had some Delta tables the life are normally used for a lot of my experimentation of building frameworks and showing how some weird smokey things work so hopefully we can just use the same thing and I'll just work with okay so I got my languages thai pythons garlic see sharp sequel no the one meant missing no all right here okay so let's add some code okay right so we've got that ready to go okay so I've got my stuff in here so I like the fact I can go and browse this stuff without leaving it when the biggest pain today having to ship between lots of different notebooks so linked okay so go ahead links to which account so that's the one I know I've got daydream that's the one that I know I've got a lot of Delta tables so I'd like to be able to go ahead and read some stuff from there okay so I can see my tables I get to exceed created a signups following them that wasn't in full so in my base layer I've got some data I categorize nicely my veins of god stuff not Adventure Time stuff that's going this I grab the room for that so that's gonna be if I can just go and take the end of that Lobby grant okay so inside that just give you a sugar it's Delta got that normal kind of stuff so I've got a load of parking lot of different files you've created by spark show up and so there's no naming convention it's just I don't care what those files look like I haven't got a query the directory and then just assume everything could be taken care of inside there I've got my Delta look that's as various things are very to doesn't insert here it says nothing what's going on so this folder this going up to address actually going up to one that is the address I want to push back and say can you please do me a query on there so let's see if we can get that working okay so how did that frame a spark let's see read four minutes that's good so for Delta and we say all load from / one okay so I don't know if that's gonna work so based on if that's just the native place where things are running then it should pick up if that's its root folder I might need to do some config I'm going to do something to get back to it recognize that can I control that oh yeah okay so that's probably the spark session is trying to load that in to try and make that work so looks promising I like the fact it's already got that danger in it that's always nice you can curious what I thought it would automatically have some kind of default hive store but it doesn't appear to have any kind of spark database so I maybe need to start using - then we'll see um okay we'll see when it's much to read this day I heard a frame we'll see what's actually happened that's what so got and text which is interesting so let's just see I kept so it's just doing a markdown sound so I guess in data bricks if we're doing that we'd have a separate cell and we do your kind of percentage MD and then you start to do that doesn't recognize I'm trying to do magic command but it doesn't have markdown so instead of using a magic command early markdown looks like I need to do a different type of sales you gotta code sales or tech sales that's fine a little bit different in that there we go so you're trying to say only hosts equal and then PI spark and a back to sequel and some scarlet you to pretend asleep or percentage kannada percentage python over here when signups we've got doubles so i think that's good so it's like police of allows us to do that mixing of languages it's just we have a slightly different syntax which is fine we can learn different syntaxes just been waiting for the spark session to start I'm assuming that it's now provision in it clusters I did I gave it things like the Auto turn off I guess when I was doing it earlier on my captors that provision does have deployed the configuration for that actress blockbuster it didn't start it until I said hit go on some code so we're gonna be a minute or two waiting for this to start up I guess we can have a bit of a mood around while that's working so let's see what else to exploit things gonna be the main thing ingest it gonna be interesting so I want to try a lot of the ways we normally work with data factory it's getting data factory to look at some metadata and then loop over the metadata and do a thing for each item in dirt so we're trying to say load from this database and bring all my database into my leg then add one data factory to say list all my tables for each table do a copy activity put it in the right place in the lake you know so using legs each folder should be a separate entity a lot of documentation that we see Nomad signups you've got a folder and it's got loads of different tables worth of data just as files and that is super bad practice in terms of building legs so yeah building out a nice metadata driven the pipeline is going to be interesting so okay so this looks like it should straightforward take a factory except I've got some sign ups things in there so that's gonna be interesting if we go back to normal data factory whether or not we have the sign ups activities and that looks very similar to the jobs that we have in daily works except that's pop rocks so subtracting to one notebook that we can run or we can want to store a proper guest depending on whether we're going on the sparkles all the sequel's kinda and kind of makes sense going on okay so I'm not writing pipelines it's a spark application so it's in progress so it knows my spark application to turning on that's that's good not in any one sequel site says nothing on there okay so I mean that it's nice that we've got to be traditional data factory marketing bits and got some bits to look after our spark cluster so let's see so it's turning things on it's very much a dead like analytics execution plan so that's doing a few things actually that's the result of the job so if we go back to we go back to our notebook which is up here I'm gonna take us so long to get used to having different tabs along the top okay so we get rid of our pipeline I think about Lex kid and my surprising okay so we've got a data frame that was reading a delta table that already existed and it's loading so we've found it on that linked deck like so the kulluk rated signups in a lake already it was able to query the data that we've got so let's just make sure it's not telling porkies can we actually just read that data frame see what's in there okay so can what's the job happening now okay so I can see it's actually doing some things doing three jobs okay I mean so normally if you're reading adult table certainly there'll be one job to read the Dubs transaction log a second one to actually go and read the data doing a few different things I'll be interesting to kind of figure out what it's actually doing so that's definitely more jobs than you'd normally think of eating a table and this is doing some of the Delta transaction look updates that kind of thing okay so we have just a straight table we've got the charting stuff normal Jupiter notebook kind of things can see data in there okay great so that is good so Delta compatible box which is one of the worries and yeah that's not too shabby at all okay so let's try and do something a little bit fancier okay so first let's try and do this secretly so we do some sequel no can't service no it's not coming into contact so normally we had create a replace time VN okay that's efficient less buggy stuff and we call this my table address okay so Kennedy notes yeah see you tacky so those gifts can I then go and call you my table it's a select star from okay so it's not picked up the intellisense for temporary stuff that make sense wouldn't expect it to so mister lastingly the exact same thing I just it's not displaying on my data frame to see it and then select star from there to view they're doing the same thing this thing showing my data oh okay so John time knows perhaps directory temp high in HD bus should be writable okay so it's trying to write something I guess this is it trying to actually create template give me in hive which I'm surprised it didn't do in the earlier step but guess that's permissions think so from what I've seen some times it seems to work on a management entity so it should have access to some parts of my leg let's let's go and have a bit of mood okay so we've got up here here girlfriend we've not got temp and I think that's it was saying it didn't have access to ten okay so yeah no so there's no no service let's see the fix that quickly always good to bit of live debugging see if we can get this working that was my advancing leg and so let's just see what are the warnings that gave me one of creating silence was that he was gonna add some role assignments for me okay so it's got a crater in it I can see that I spelt salad frog so you can see this is definitely me so it's got that so I have that as an actual entity that's been done okay so that does exist within there so let's just see what it's done over here so I've got my boot container I can't run manage access to the base level don't have that okay so I look at signups and see that okay cool say the satphone the one that's correct when I provisioned it has proper access the one that was created here is temp one hasn't have access so if I add him and just say journey like this keeping your needs give it to fault when you're dealing with hierarchical namespace that's a that strength to stuff if you give it to full I mean ten new objects great symphonies they're gonna inherit that same access anything in there yeah so I'm waiting from them to build the recursive stuff okay I'm gonna put that in go ahead and they give them the same stuff hopefully in two million files if there's a million files in there that we're not gonna be able to fix it let's see nothing in it okay cool all right well so now I should have access to my temp area I can maybe won't have this if I've set up in a brand new like now and it's entire environment maybe it's the fact I'm reusing a like okay so might do that will you know actually create at the table for me or will you just tell me that something else is broken doing a spark job so I don't think it did a spark job earlier so that's good news when he let stuff don't mind the fatty can't see humming number of tasks there are some people I'm doing some spot debugging I'd have a look at the number of tasks and say oh that's partitioned badly I need to run every partition or a coalesce to change the number of an RDD blocks looks like I've got a cup just got some stuff but I do love data okay so you've been using this and you want to set it up in your news hive it looks like you need to do a bit permission wrangling when you first it up and then it seems to work okay so we can use hi collect correct so let's try and do it another way so I'm doing most of it here yet okay I thought as much so me doing that just temp creating a temporary one in height that has to have a hide matter store I'd say please use hide and then it's actually worked so now I've got this default but there's nothing actually you know I'm not saving any proper tables but it needs that just basic database to go through anything so that's good can we do it the other way so let's do something I'm not liking how deep the bottom of this freedom game is bill okay so that's so it's doing sequel can I do a create database that's what I crave like a silly answer like the all those things in that same zone of my legs can start writing on with time and if that works I should then be able to see that database as a set well alongside that default certainly nice we had nothing but we have no data okay sounds like it successfully face okay that's good so I can do some around let's do sequel so I'm doing this so I'm writing Python but am using Python to actually some people because I can this is print it's working interesting this is how I normally work in terms of building out something frame working in terms of vector getting something that can do a lot of manipulation get a lot of data into the lake do some automatic cleansing or transformations or whatever happens to be and then when it's done we can then save Ridgid of that with height can we've actually with high it Pierce's a sequel table and people don't need to know it's at this location in the lake I need to use a parquet redoing it is Delta reader they can just go you know what select start from my table and it kind of gives that Laver abstraction away from all the other stuff that's going on in there so I'm gonna grab this from the function normally is just to kind of wrap all this stuff up gonna show you know I guess this is what I would normally do something like that so make a little python function passing the name pass it through the zone of my leg pass of the far path and they'll do something really easy and register it so k so i just need that using a delta location and they can pop a path in okay so let's try and see if that works grab my path from up here oh okay so this is the path I want to try and use so I can do this so this isn't going to move any data it's not gonna make a set for a copy of my day to in certain place in the lake or any of that kind of stuff which I'll not to save his table stuff does this is going to create just basically some metadata that points at my table okay to make every properties you can see that better so creative place so create table it not exists my database the name of the thing using Delta location and then this nice little dress and we can you close off that sparks equal fingers crossed see that actually works okay so I came back and said it's done some things I should go over that I can so we now should be able to actually refresh oh god yes nice it hasn't array of columns that's good okay but at least I have now a table registered with Holly's that's not permanent registered so again in my little sequel and over here I can go it's like stop from based on grace see that's gonna work that's kicking off spoke Japanese definitely again they're probably doing some Delta manipulation pull it out show me okay all right all the traditional stuff I would normally do registering tables with hive putting that into several databases so I've got some kind of control over it allowing analysts you may be on that spot friendly to commend your flights and sequel threat now again all the sequel I'm doing here I'm still doing through the spark engine I'm not using the sequel pools of the sequel demand so both sides I'm not even looked at yeah but from an initial first pitfall to the spark work spot works so that's nice so just to kind of cap that off the next kind of thing I would do hopefully I've got a lot of tables I want to register so I'd do something along the lines of let's see tables race or when I've got a dress so on product in there make it so much is the thing I've got category what else I got in my little storage and I like being able to refer back to the like that is good okay base public veggie works sales that's a product or any category and salient detail it's always beautiful one okay and let's get that in okay so quick little function is making some tables wanna be able to go through it again right the bottom is green it's kinda like so I'm saying four tables no more preach loop iterate through my list of stuff do it thing for each let's just to make sure that's working so that's good print doing a nice drink so an F spring is a fancy Python string we can just insert something into middle bit so I'm gonna take this so we take this whole thing and one from project so feet one of that wanted generate the same sequel statements got a little too curly brackets there which is where I'm going to put my table maybe and then I've caught a little bit and they run do the same so we just delete it and that's bracket so I can put people in there as well so it gonna make me a string because they created not exist base and then the name of my thing at this location blah blah and then the name I think so that should work let's just say you can you list a sequel come on guys I've generated a little just a few hive registrations so actually I can switch that round it's the same thing I used earlier and just say go register these things with hi I quite like a white leg registered with hyper you know that would be nice but then this way okay so something's exceeded it's going through okay so that looks happy now I'm going so starting to build up a high representation of my daily leg I can play and I can write normal high stuff and write some Python functions and loop around some things yeah seems pretty much I can work with this for now so yeah I'm going to do some more digging and getting another session I'm going to go into some other more Delta functions can we do time travel can we do we can optimize because that's a data brakes only thing we want to do things like vacuum and tiny per transaction lug but yeah so far so good that all seems quite nice so thanks for joining today I hope that was useful and I hope gonna join me on my little journey into the depths of sign up seeing can we actually use it in anger now obviously it's still in preview there's gonna be some bits that don't work gonna be some bits that change give me some bits that evolve as we go but yeah exciting stuff thanks for joining
Info
Channel: Advancing Analytics
Views: 22,350
Rating: undefined out of 5
Keywords:
Id: HvEi13pWNps
Channel Id: undefined
Length: 30min 12sec (1812 seconds)
Published: Thu May 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.