Azure Synapse Analytics - Getting Data & Orchestration

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello and welcome back to the sign up sessions helping me explore the wacky world of visual sign ups analytics and putting it through its paces so I know it's a big exciting week and spark 3.0 it's been released and Day Lyrics run time sevens and there's all sorts of exciting stuff some of the things happening but that's coming later we'll have some videos talking about all the new releases and all that kind of stuff later in the week for now I want to talk orchestration and it's a question I've had a lot since I started doing these videos is great you can do spark and stuff but I get thanked to intuit where's the data come from and I'll admit I cheated in the first video I had a date like that already had day turn it so we'll have a look at the data factory integration but is just data factory it looks like that a factory you've got pipelines and data sets and links services but it's a little confusing just things are in different places and they normally are so we'll have a bit of a look at how we do that and get you all so I'll start and set up so don't forget to Like and subscribe you can like the video and go from there all righty so I've got my science work specs and the eagle-eyed I notice it is a brand spanking new one in that I've spelled signups from write this done so I've got nothing set up here so I've got a lake so it's kind of just a blank lake it's got one folder in it nothing in my develop nothing in my orchestra essentially I'm starting from scratch so you had to switch it your subscription so we are in a brand new state so let's see what we can do so you castrate is the oldest place to start I've got my pipelines I can go in here and click new pipeline don't have data sets and that's super confusing because you're normally in data factory expect to see here's all my pipelines here's my data sets that supporting them and then down at the bottom we had connections and triggers and they're gone as well but then then I've gone in all manufactory so it's all change if you're used to doing data factory so I'd get pipelining and I got this wacky little window coming from the side saying what I want to call it so if I call I copy table real simple and what we're trying to get working is I copy edit like the most basic of basic ADF examples saying I've got database copy over here now we're gonna get a little fancier we're gonna make a parameter driven but that's literally all we're doing for now just saying each time or each time you run I'll give you some a name and a tape and then copy that from a database into my late so the few things that we need to get out working so I've got source and sink Oh me going go away properties don't like that side window that you kind of piece in occasionally but that's there now anyway okay so I've got sauce on I've got sink so where am I getting my data from where am I putting my debt to and I think nothing set up a knife others have got no data sets and my data sets bits has disappeared so I can't do much there I'm talking to the fish in a way nor they the easy way which is just do it inside my activity so my source next set I want to do one that's not that I need a new thing and then that will pop up and give me a load of stuff so I can create it that normal traditional way first let's look at link services we're gonna need a new link service for this source so what do that if I don't have that anymore well that's what this manage tabs about so now I'll manage as a place where I can do my sparkles my Apache sparkles my steeple pulls all of my general sign-outs convict this is also now where link services live so it's linked services now what's the whole of synapse not just for data factory but I can go in me and create a new one so I've got a database so I can go in its database so I can treatment be in a database but it's not as your sequel DB is not classed as a database for some mad reason so in a I can say I can assure the sequel database so using a database to go from there but I wanna call this this mining service sequence adventureworks and then this is the all-important new button Interactive authoring so this is the thing saying as you go through do you want to be able to test your connections and see what table you connected to and preview day 2 and essentially you take the factory normally if so you have to have this thing turned on so with this thing turned on I'll show you what it looks like and then we'll turn off so I'm gonna grab just some basic server details I need a subscription I need my database names it's kind of they're gonna do seal off be lazy just some test data and then I can hit test connection and go and you connect to my data there we go hit a little bit test connection successful so that's fine if I turn this interactive authoring thing off get me a warning then I no longer have that ability I can't test my connection which is just a little bit mad I don't know why why give people the option to make it really really painful to develop and now that's gonna take a little while to turn back on again what so I'm gonna go and create that's gonna create my link service for me and then we can look at predictive data sets on the back of it okay right so bundling service it's good okay so where are data sets they're in data I mean that makes sense it does it's just really hard to get used to as a data factory person I spend a lot of time and data factoring this is not second nature to me this is super confusing so I'm the data in my data tab I can go I want to connect to external data which is kind of doing things where its own creating like additional lakes and attaching things or I can do data set which is a data factory datus so I want to do a new one based on my sequel database at Tina's y'all it's Anna's your sequel database give it a sensible names by dead-set for benchmarks and it's picked up my new link service okay so now interactive authoring is still turning on from when I just toggled in switch dip back and forth minute ago and so I can't see the table names I can't go and look what I'm connecting to you so it's making me turn this on to be able to do most things and I think like don't yet know does that cost me money if I'm developing data factory am i paying for the privilege and that's to say it's kind of a similar model to what we've got in data flows so using data factory mapping data flows you have to turn on debug mode that has a cost because it's a spark cluster behind it this isn't a spark cluster but I still need to turn interactive authoring on we'll see what that looks like when we leave preview okay so I don't plug it to a specific table anyway so I may hit edit it breaks the table end up it's two parameter boxes essentially and then you say do that leave it as that really straightforward okay so I've now got a dataset correct no credit you can't still can't get my table names and I'm gonna parameterised this I'm gonna say look I'm gonna tell you the schema name a table name so two inputs schema name table name and that's now expecting that and then over in my connection I can say don't try and read the table because you're not going to be able to until this thing's turned on I want you to take our two parameters of schema name passing in as a parameter and table name not there we go interactive authoring has finally turned back on again that's it okay so now I've got that I should be able to preview my data and it's going well you've just parameterize did tell me what your answers are so these two win to make data work heard adventureworks of tables hit preview and that should be able to go off great my thing and there we go I've got some test data back so I know that's working I can't do that unless interactive authoring is turned on just a little bit for me okay all right so a new dataset on the other side so I've got my database connection I wanna look it into my leg so I'm doing an eighth of that gen2 connection I want you to write to parque and for me that's it unless this is gonna be my adventure what's going in as well generic okay and gonna call my local linked service so I should still be able to browse that yeah okay so conceived route I can see it's in there again I don't a certain thing particularly other so on say it's gonna be in routes I'm gonna put in raw but I want parameterize the directory because we're gonna run it for lots of different tables I don't want to fold it with just all my different data in there because no one does that and I keep seeing demo from people doing that that's not how people make legs okay so one thing to know because I've hard coded this I've said I thought that the hierarchical namespace file system is and it's switched this imports kima over into from connection so because I've said yeah he's a bit of the file name it's assumed that it can have a file the students there's a file there that can go and query and bring back and show me the results so I need to say no there's no file there doesn't exist yet the file won't exist what I've ran this particular job so don't try and look at the schema gogh and credit that's created again my little properties window comes in and then all the parameter Isis as well so I want to say what's my schema name my table name I can't do the other way around just make it really awkward when I try and remember that okay so Mike directory I wanted to be raw by one that to be raw and then some of the details so I can do that it's acting what a lot to be raw plus the scheming aid and at least I can't mail forms that and then on that to be the table name so you can do it one big thing I was prefer actually using this syntax so each of the individual rows you put inside its own little at and braces since they all not have one big concatenated string and then in that part of string injector schema name and then there's a slash on back to a string again and then that part inject a table name it kind of seems doing an F string in Python except my YES on the place by at curly brackets and then a pointer to some kind of functionality okay so that's real basic that is my directory name and file I'm not gonna tell you the file it can work the file out itself okay so we can tell it that that should be happy okay so we go back to my table now changed let's get a rod I so I can name this so I can copy table too late and I can start hooking things up so I've got my source data set it's gonna be my adventure works database it knows it's parameterize so it automatically brought in my two parameters but I don't have anything full I don't know what to pass that in so we need some parameters on my actual pipeline so it can give the whole pipeline two inputs then that passes those two inputs to each my dataset so if I'm clicking back I've got right down the bar which is now again playing the little Heidi game parameters and then it can give it my two parameters okay a schema name give him a table knife and then right that's now my whole thing is expecting to parameters make them go and hook those up to my two different data sets so table name and schema name or in here so at dynamic content right down bar table name writes the thing for me which is great that's schema name the bottom and dynamic content bring in schema name okay there we go so that is my source hooked up it's pointing to my sequel dead set that's past my due parameters in my sink same things I point at my park a lake dataset it goes at our that's parameterised so I need to have that to the schema name in there table name in there that should be if we think that we need so ultimate I have done this once before so that's going to clean up that data just to make sure there's nothing there so Ian my dad I can see route to go and take my room I folder ok so I've got my one thing I've done before let's delete that and have a clean leg nothing in there I went to go so we can go in and just that earlier I tried to do this in debug mode and debug mode when it didn't have the data sets he just returned a null pointer it's like I died don't know what you're talking about so this only worked for me once XD deployed it so I don't know whether that is a case of parameter for data sets or it's needed in service to be deployed or what that was it's actually I've not published it that's now actually live in my Deb area so I can trigger this and say well actually run so I wait to run for sales Lt I want you to wait for product and then that'll go and hopefully do a run so ok that's for running and then same as data factory over on the side I've got my monitor tab so I don't think that's running properly I can go and see and there we go so I can see there is a pipeline in progress it could have been running for eight seconds dick in my parameters about this little app site so I'm going to say what prompts are in net rings when a buck so I know I'm winning if sales Lt product now can dig into here to say what's going on inside and I've got a few bits so you might have a big chain about Tiffany's I'd have a line here for each of the activities and you can click on here to see what the Jason input so what you told it today so essentially you've got time out of two hours and two pointing to any base at all we know from that and then really just when you've got these two little glasses which is the details box so if you're doing a copy activity or data bricks utility all that lots of the data factory activities have a separate monitoring plane that you can open up now I found this doesn't actually well so once the things actually kicked off it works fine but for now that's just it just doesn't work well this is in the queued state so I'm assuming you've you had a big long copy as soon as the copies are actually working then you can open up and you can see what's going on while the copy hasn't yet got to the point when it's done executing I can't do anything I just have to sit here and look at cute and for me I mean minute 19 so far queuing up waiting to get some data factory resource to do that copy of data to copy a couple of hundred rows so that's not particularly great but again we're dealing with preview so it looks like it takes a while currently to actually a standard any kind of data factory resource so we're trying to say just run the thing it's got like a minute or two that it spends kind of sitting and waiting to spin up I think I know the one I earlier took about two minutes before I actually kicked anything off so we're gonna wait wait and see if that works but then we should see is a couple of copy details people things going through ooh there we go so that's now in progress and see I've already got had a success come up so I can say well actually what's going on if I just do one more refresh okay so go I succeded activity now I've got the output so I can see what's going on so all my details like red rose copied all of that stuff and then these glasses now actually give me more monitoring play there we go yeah so we can see the details their total duration one minute 52 actual in progress four seconds so it was queuing for nearly 2 minutes and it took 4 seconds to actually run but again it's preview normal data factory isn't nowhere near nap F so I'm assuming that is just a preview thing that's important thing is that as success so that means that we should be able to go over here go back to data have a look in my storage accounts I can go back to my lake so now where I feels you just had nothing i've now it says LT got product and then it got sounds simple to you rocky so it's nicely hooked together that's actually worked so now I can you run this whenever I want give it a any parameter of any my different tails and that I'll go through do a select star from adventure whoops and land that into my leg so we do more advanced things office that you know we normally do parent rappers saying get a list of things and for each one in this list go ahead and dump it into my leg we can we're pretty do another video kind of making that extra bit making it a little bit more and see but for now as a basic we're going dead for into your leg copy activities parameters run that thing lots of times a German pretty good now on the other site of Chris tration you know come essentially you know you've got copy data to get data written but then what do you do once it's in how do you all castrate stuff and that seems to be not quite there yet so we do have sign ups so we've got kind of this whole new area of things that we can do and similarly breaks we have a notebook activity so this is if I had some spots equal jobs if I said that notebooks that we put previously and I want to execute that on a given schedule so that is currently in there and you can select a given notebook again I don't have any notebooks freighted sure you might if I go and quickly create a notebook give our name my notebook and it might publish that up ah okay so I'm in normal point when I can't publish this because I need to pee that disk are there to push that that's gonna push out my notebooks and my notebooks now deployed and then if we go back and do that again in orchestrate if I have got my new pipeline under signups ugh monster notebooks no no and that didn't need a link service didn't need datasets I think anything setting up because its plot are the same workspace very got its own internal link service so you can do that kind of thing but no I don't have anything else I can do there so data bricks I have a lot more examples in my notebook tasks I can do pick a linked service when I pick my notebook I get parameters and I can do dynamic stuff but that is early days so cannon fire that is not a finished product in terms of things hopefully by the time they go GA we should see some different stuff but notebook parameterization and working with it in a more dynamic way but for now you can just kick off a notebook we'll try in another one just see if that actually works consider this buck job definitions if I want to do execute gel file that kind of thing and store brach I'm assuming is actually not think that's gonna be a sequel pools thing so that's gonna be a superlative why oh yeah it's a sequel analytics port and that doesn't give me the option of having my sequel under manual so store props that we currently gone there is especially for the super theta warehouse fire things the MPP cluster spa pools so if you like on thing you can do that in here but I would also assume that you could probably do that via the journal X cube store Brock so I don't know there's too much of a difference between those two maybe just a thing that they actually pick up maybe just the the difference between having to have a linked service for it and having something else um in the future videos when we look at sequel pulse we'll have a look deeper look at that so how are we actually doing that it's running better we can do because it's part of this integral workspace so yeah normally that's the kind of thing we'd expect so you have several pipelines a set of pipelines at the start to get data into your leg and then a set of pipelines saying now we run a load of jobs do some processing enrich my day to curate my data get it closer and closer to my actual business uses their loads of things that we can do in there that actually brings it up okay so that was a fairly quick overview that was lightning speed or how you very very quickly grab data bring it into a lake parameterize a pipeline so you can build one pipeline and then just put different parameters into it and slowly slowly get data into your lake now obviously it's not fully finished yet it does still have some preview teething the whole interactive authoring thing is a mystery to me but we'll see it getting better and yeah we'll dig further into it in future videos so again don't forget to Like and subscribe if you haven't already there would be some videos popping up about some of the previous sessions I picked up but otherwise catch us later in the week when we're talking spark so it and all the new announcements and all the cool stuff that we're seeing from their writing hello

Info

Channel: Advancing Analytics

Views: 7,410

Rating: undefined out of 5

Keywords: ADF, Data Factory, Azure, ETL

Id: J7eGkAJUc10

Channel Id: undefined

Length: 20min 12sec (1212 seconds)

Published: Tue Jun 23 2020