Understanding OneLake within Microsoft Fabric

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
yo what's up I'm Adam I'm Josh and Josh is here to educate us on what one lake is and just like what are the possibilities with it so Josh thank you so much for being here yeah thank you for having me always a pleasure can you let folks know like who you are and what you do so yeah I'm Josh Captain I'm the idle product management for one like at Microsoft now we've been working on some pretty new exciting stuff that we're you know real happy to show you right now I bet you've been you've been eager to tell eager keep it under wraps for a long time all right so I guess let's just start off what is one like what's the purpose behind it we're trying to make this into the one drive of data tagline aside there you think about really what OneDrive and services like that have done for file sharing way back to the days before we had them and we would go and we'd rack these servers and we'd set up these Network file shares and sometimes ftps and you would put files in there you'd use those folders or servers and everything to share files and it works it shares files but you had to do build that solution yourself and then compare that today with dedicated SAS services like OneDrive that have enabled file sharing much more directly and the type of collaboration you can now do we want to bring that to data Lakes okay you buy storage you don't buy a lake and then you implement a data Lake pattern and when we talk to a lot of customers they have these visions of these very pristine single data likes for their entire organization and just by having one of these they would enables so many things you just easily land all types of data into one place because it's in one place you can easily blend it together or transform it together there's less things to secure there's less things to go up and there's less things to match there's less things to discover which should make it easier to get into the hands of users and applications sometimes technical but most often it's organizational challenges people challenges it's hard to coordinate and drive everything into one leg this actually turns out to be easier to create multiple you end up with lots of these multiple solid lakes and to really get to that value you start Building Solutions on top of it to break down those styles to move data around these are expensive and complicated solutions that you have to build and you have to maintain with one leg we're going to give you one leg for the entire organization like just like you get one one drive you don't think about these things with one drive yep it's there you'll have a bit now a data Lake as a service that whole solution built out of the box for you and you can just start putting your data in there collaborating over it and using it so for my tenant there's just one one link there is one one like not never too late or a zero Lake and you didn't have to set it up you didn't have a provision it'll be there and no matter what you do when you start loading data to fabric it'll be going into one leg nice enough all this talking you know we like to do it here in Guyana Cube let's do what let's head over to your machine you know what we're looking at right now we're looking at a workspace we are looking at a workspace and this is what one Lake really looks like from the UI from the perspective of fabric it looks like the workspace and you can have many workspaces in fabric they're very lightweight very easy to create I can browse through them all here in the UI experience but only one one Lake multiple places yeah and it's the workload spaces actually allow multiple teams to collaborate over the same data lake is it fair to think of a workspace as kind of like a folder structure within one link and that's what's going to ultimately translate over to is it a place in one like you see in the workspace I'm in I have a few of these fabric data items here I have a data warehouse I have a lake house and let's actually open up one of these warehouses real quick inside the warehouse you'll see all right I have a schema in here and I have a table in here this is the UI view of the world if I actually flip over over down to the One Leg view you're not going to see workspaces and date items you can see files you're going to see folders so because we're the OneDrive for data you can explore your files right from Windows you'll be able to do that here with with one link as well so I can just open up file explorer actually and in file explorer and you'll see a one like option here and I see all my workspaces the same workspaces you saw in the UI but here they're folders yeah and going into that workspace you'll see those those two data items that you saw in the UI the warehouse that I had in the lake house and since we were in the warehouse before I'll go into the warehouse I'll see a folder for tables under that folder for tables I'll see a schema same and the one table we saw before wow I mean that looks exactly like one drive it makes it a lot more approachable there's nothing extra to install other than the the one light client here for Windows and then it's just naturally interacting with it uh like you would so that means like even from like Windows Explorer I can just drag and drop files and like if I've got some Excel files or other things like you can but each of these data items has a different way of bringing data into one link so we're in the warehouse right now and data warehouse is gonna be fully transactional and it's going to be you're going to work with it through SQL so I want to bring data to a warehouse I'm going to do it through SQL we have our one table in our small warehouse at the moment let's create a second table okay and we'll use this one to track the guy in the cube merchandise sales yeah there we go so I'll create a table here all T SQL let's insert one row I know those banana shirts are very popular so popular and I buy 50 of them oh I'm gonna start that one right all right create a table create the one row so load it through two SQL at this point I don't really see a data Lake right if I'm coming in and I'm used to working with a data warehouse this is the data warehouse today yeah same two SQL experience I'm used to is T SQL editor in here we come back to our Lakeview and let's actually go and refresh this screen a little bit not only do we see the table when I open it up we see a Delta log this data is actually not stored in some internal proprietary format which you would typically get when you try to do something with SQL it's stored in Delta like format means this is open source like you get open format you can use it anywhere with fabric and not just fabric this is an open data lake so any application that knows how to talk to ALS Gen 2 you can talk to one like and work with it typically a data Lake you can look put any kind of data in there not just structured data and you don't have to necessarily do it through a SQL we go back to the UI for one second you'll see the late cast we had in here Let's ignore tables for a moment let's get the file section file section lets you put anything you want in it so let's actually get some data in there you can browse go back and browse my workspace in the file explorer here CR Lakehouse and we see the same folder structure including the file section I'll take some images I'll just copy and paste them directly in here A bunch of different folders a bunch of different files and Company logo images right here nice I want to use ref refresh here that data is already there so unstructured data any file format is welcome in one like ultimately it's data like everything's a file yeah but we can do special things with tables if you store your data in Delta like format in the table section we know it's a table and we can make it work automatically with any engine in fabric so I can write a SQL on top of this camera spark on top of this in circling powerband reports directly on here but how do we get this in here I didn't go and upload every table from Windows data got it in here a few different ways these tables here were building these through databricks we actually switched databricks to go ahead and use one life because behind the scenes like I said if you actually look these are all files and to access these files files we support the same AWS Gen 2 apis and if you right click on any of these locations bring up the properties you'll see the path to those files yep databricks uses the abfs driver to go and connect so you can actually take the abfs pass directly from here copy it if I flip over the data bricks real quick oh man that's easy the thing I like like from the databrick side if you already have like some implementation in data breaks but you wanted to switch it over yep just change the the location all you gotta do is change the payout I mean this one was reading from a bunch of locations it could have been reading from one like in this case it was actually reading from another storage account right it does a few Transformations and then writes it back and this was originally pointing to a different ADLs storage location all I did was change this one line and I pointed it to the lake house we had there and if you look at the URL here there's always one one lake so the storage account is always the same the container information if you're used to ADLs terminology is just the workspace nice and then this is the lake house and the type of data item so lake house you know we were looking at the name that's it just shows up there you can read I can write directly from databricks or any application that's compatible with ADLs that's bananas and that's why he bought the shirts well if you think that's crazy though they asked me how this table got here you see anything special about it looks like a table again if I right click on it it looks like files if I told you this data is not actually in one like oh okay it's actually not even in Microsoft this data is actually sitting in Amazon S3 okay and we created what's called a shortcut to it yep a shortcut's just a pointer to the data whether it's in one leg whether it's in Azure whether it's in Amazon S3 it'll look like it's physically here and any application any engine that accesses it it will also appear like it's physically they don't know anything about S3 they can just use this data as if you had copied it into one like yeah but you didn't copy into one like there's only one copy here and in this case it's living in S3 now we can do the same with that table we created earlier normally if you want to use a warehouse data with your lake house data you're doing some natural Hoops to get in one place and a lot of times it will detail some copy okay yeah and there's boundaries that you got to fight through there's boundaries and typically if you follow that data mesh pattern a little bit you'll have different parts of your organization actually own different parts of the data yeah when you actually go and you you report on your data and you report in your business but more importantly you're taking data from lots of places to pull them together so today uh data duplication did I take it from the right Source did I take it from a copy of the source and is it up to date shortcuts can start to remove that mystery and also not only just simplify the act of not having to copy it keep you connected to the actual source of data well so that means I could have data in different workspaces within fabric I can reference that through a shortcut or it could be an Amazon or yeah if I right click on the table here and you can do this for tables you can use this for any folders that you have it doesn't have to be structured data we'll stick to the structured data for a moment here I'll say new shortcut I can take them from already within one Lake you can come from ADLs gen 2. we're going to get from Amazon S3 and there's more in the works but it's essentially just virtualizes it into one like and these things so if I'm referencing it from another workspace or somewhere else in in one like it's still going to be mindful of security like you can only access what you have access to yeah absolutely okay so let's actually go and find that table where we sold your banana shirts and that was in the business kpis this is a fully transactional warehouse now that we're gonna combine with a fully non-transactional holy cows I can explore it here right from the one Lake data Hub we see Christina we see our two tables we see our sales create it so just create that reference to it click on sales we can immediately start exploring the data wow should we sell anything else while we're at it we've got some amazing socks we'll call them Patrick socks I think they sold about 10 of those and because we didn't copy the data I don't have to actually now manage another ETL into another location whoever loaded the data they're responsible for keeping that data up to date but when we look at anywhere you reference that it's just there I can load this to spark I can start building power bi reports directly on this the cool thing about this too is sometimes when you're when you're designing a data strategy you're looking at the skill sets your teams have and you have some teams who are just coming from the spark side of the world or coming from their pure data engineering blood and work with SQL let them work let them work with their engine of choice so they're not even coming from within fabric everybody here builds the same data like yeah and and that data resides in one Lake and it can be referenced in other spots yep including power bi all right Josh thank you so much for walking us through that one lake is amazing and I'm sure there's a lot more that we can cover in other videos let us know in the comments below what you want to know about one leg or what questions that you have and we'll we'll get those answered and or like spin up some more videos on it if you want to continue your journey learning about Microsoft fabric check out the playlist up above as always thank you so much for being here keeping awesome and we'll see you in the next video
Info
Channel: Guy in a Cube
Views: 26,189
Rating: undefined out of 5
Keywords: data analytics, data engineering, data integration, data lake, data lakehouse, data science, data warehouse, fabric onelake, microsoft fabric, microsoft fabric onelake, onelake, onelake explorer, onelake shortcuts, power bi, synapse, synapse data warehouse
Id: wEcRTSNhtLg
Channel Id: undefined
Length: 10min 28sec (628 seconds)
Published: Tue Jun 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.