Spark SQL for Data Engineering 6 : Difference between Managed table and External table #sparksql

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi friends welcome to Tech Leg video tutorials this is Ravi and today I'm going to explain about a spark SQL table creation ddl operations so previous station we went through uh how to create a database and how to create database in external location and today primarily we will focus on table creation so this is very important when it comes to data breaks SQL when you are creating a many stable or external table in databricks so let's start with the session and you'll understand more detailed level into how to create a table how to create a manage table how to create a external table and the difference between internal table or you can say manage table or external table and advantages and disadvantages that we'll understand today okay let's start with the session so yesterday's session uh previous session we went through that how to create a database and today we'll understand how to create a table in databricks using spark SQL using spark SQL and different types of tables and storage type as well so how where it will store how it will store that also will see that so create table is a common syntax in every databases so similarly here also creative even you can create table from PI spark like using a spark read write API which you can create a table but before that what we will do we'll create a table first we create table name so what I'll do I'll create a table name called customers okay and table name is customer and I will go with the columns columns with the data types so column one is ID data type is integer column ID is a name data type is a string then column another column is a location and data type is string this time going for three columns then create this table so whenever you create this table what will happen this table will be available under data here if you go there in a default database in default database there you can find this now it is creation in progress so whenever you create a table without using option default it will create a Delta format data bricks if you're not specifying any format here using type default it will create Delta and when you are creating a table right it is showing exception saying that the location is already available so someone is using the location now you can verify that location if location is already available which you cannot create on the on that location you cannot create I will show you that first if location is not empty so you cannot create a directly create table okay so because when you are going for a create table it will create as a mainage table so what is minus table that will understand now now the table is already available data also is available now you can read this you can read this data in SQL the data means it is a Delta table it is Delta table by path so select star from Delta Dot then you can specify that the path which you can read that which you can read that okay only one record is available the previous session we created this table only one record is available now even you can create a table directly specifying location even there is a possibility create a table table name customers then specify that location so I am going to create a table on this location the location table already available data already available that is external table so now remember few functions one is show databases if you run this it will show available databases only one database which is the default one is available similarly show tables it will display available tables but in metastore So Meta store just now we created this table okay and it is showing is temporary false it is not a temporary now similarly show create table show create table this table name if you want to verify the ddl of the table if you want to verify the ddl of the table you can see the ddl now you can see the interior data definition language about this what has happened that that location table means data is already available it's a Delta but what we did we created a table on particular location when you are going for creating a table on particular location it is created a type as external table it is created a table as external table you say this on this location we specified when you specify the location keyword it will create as external table and it's a Delta table minimum reader version writer version will be available that's okay now it is spark catalog so data bricks does not have their own catalog so they are using Hive so high catalog here will call it as spark catalog to store the metadata nothing but Oliver tables columns data types everything is storing at spark catalog spark catalog default database is default the schema or database and the table name now so next what we will do I'll create another table just without location I'll create another table just a customer underscore dim okay just I'm going to create this I'm not specified I'm not specified any using type and I'm not specify location I'm not specified any location now verify this ddl so how to verify the ddl show create table then table name it's the detail copy this and I'll paste here and I'll show you the comparison between these two and this is external table this is internal table internal table or main is table type is table type you see main is table you don't see the location keyword you don't see the location so whenever you go for creating a table without a location then that will create as a default mainage table or internal table mean so the table will be available in that location that is called default warehouse location FS LS user hive where the house there is a table name it will be available in a subfolder inside Warehouse so your table will be available here okay so how why it is going to create here because databricks is using Hive catalog databricks is also open source Sparkles Hive also sorry data bricks is not open so it's a company spark is an open source highway is also open source so instead of developing their own catalog they are using Hive catalog here that is why the parameter default value when you go and verify in a spark SQL if you verify the parameter value using a spark Dot config.get so park dot SQL dot varos dot directory if this is the one of the parameter in spark configuration the default value is this one that's why whenever you create a table without specifying a location it is going and creating here location default warehouse location now what is the difference between external table and Main is table main is table or internal table so don't get confused on this you can call it as main is our internal table another one is external table so external means external location and internal means default location default location without location keyword if you create a default location it is at user hive photos whenever table them a database name our table name inside the database name then subfolder table name now so let's understand the difference between this so what I'll do I'll insert some records to explain that so insert into this table values just I'll insert one value so I am going to explain the difference between main is stable and external table many stable and external table now both tables are available you can query this to see this require one record is available verify another table as well verify another table as well both tables data is available okay come just observe the difference now now the external location this location whatever I specified so this location is internal location of the table okay where it is yeah so I will show you the both the differences so this is the internal location and here it is external location here it is external location okay that is mainly stable this is manage table sorry this one is the main is table which is default location that is external table even ddl also I shown that this is the ddl of external table and that is another ddl this one now if I drop I will try both tables drop table table name I'll drop both the tables I'll remove this one as well now both tables are removed now verify the table so table anyway it's not available you will get error saying that table or view does not exist so table review does not exist the customer underscore dim also it is not available so whenever you remove this table the liquid location that internal location of the table that where it is storing the data right so that also it will remove now you see that table location is not available so whenever you remove this internal table if we drop a internal table or you can say main is table if you drop main is table it will remove data and metadata the metadata from this Spar catalog data from the physical location where it is created the table it will remove both now whenever I remove external table if we remove or you can say drop external table it will remove only metadata from spark catalog data will be available on that location so that's our advantage of external location even if you remove that table right if you remove the drop the table table is not available but the data still it is available you can create the table again so this location again you can create a table create table this one so this way which we can go with the creating a table whenever you create a table default without specifying anything here here default it will create a Delta table now if you want to create another type like CSV table CSV type then you need to specify using type as a CSV then whenever you insert data into this table the data backend it will store as a CSV file let's insert one record and show you that insert into this table I'll insert just one record so whenever you insert a record into this table the table is available here a record will be available but where it will be stored the table default location user hive where else then here you can find this so you see the CSV file it is created in back end even you can read that even you can read that file I inserted this where a table is available here so the table type using type so the backend storage type it is CSV even if you go to show create table table name the ddl also will show you the same thing this is the detail of the table this one even you can create a table in some external location using a location keyword using location keyword you can specify this Okay so customers then hold up CSV so here I want to create a table then I want to create as a external sorry so this is the table creation so even we can create a stable with a specifying that location now now if I insert a record if I insert a record now you can go and verify in that location which location this is the location customers CSV so there you can find the CSV file same thing just a different location can read the data as well now even if you are if you are looking for the table ddl you need to go with the show create table show create table so we specify the location right when you are specifying the location you see this location is it is available in this location okay now if I remove this table means a drop this table drop table table name so when you are dropping this table the table is available in meta store you can't find the table which is external EXT table you you don't see the table then when you go to the location [Music] this is the existing file right this is the existing file even I will show you the location without a file name this location we specified still file is available still file is available so when you are creating a table by default with a location keyword it will treat as external table even there is option which we can specify a table type as create external table create external table on that location so consider I'll create another folder EXT so even you can use the tabled syntax create a external table whenever you use a location keyword the default it is external table so some location it is going to store if I remove it will remove only metadata not a data metadata from spark catalog now this table also same now if I insert data so data will be available there in a particular location its location and we if we remove the stable area even if you go for a create table right so show ddl show create table name sorry this is the table name or pH paste data now you see this so we specified as ddl as a create external table but still you see the ddl you see this okay even when you are creating a table with a constraints I will show you this another option I am going to drop that I'm going to drop this and create with the constraints not null constraint I will tell you integer not null string not null so when you are going for constraints I am creating this table with the constraints now if you go to the show ddl show create table verify this you don't see external table keyword you do not see the constraints few things when you are going for a short ADL external table it is not showing but location is showing but constraints whatever we use the constraints it is not showing Delta breaks Delta like supports constraints databricks Delta Lake supports constraints but here it is not a Delta it is CSV file just I am showing that example when you are going for show URL external table even if you're not specifying that but location is there then it will treat as the external table but when you go for a Delta Delta it will show you the table type this is per Delta this is for Delta so this is about main s table external table main is table means if you remove the table it will remove metadata from spark catalog it will remove data from the default location external table means it will remove only metadata from spark catalog your data location will be available so a data that files will be available so this is about spark SQL create table and many stable external table in another video we'll see a few more options like how to create a view different types of you so that is very important session three types of views are available that we will discuss in another session so databrick's Advantage here it is you can go with the creating a table with any format like CSV Json XML Excel any kind of format so I have given multiple examples here with the Partition by okay if you have a time you can go through this again we will see partition and indexes ah Delta index is nothing but a Delta optimize and G order is available and place uh bucketing and partitioning also we will see in a separate session Performance Tuning session okay this is about create table manageable external table thank you see you in another video
Info
Channel: TechLake
Views: 8,156
Rating: undefined out of 5
Keywords: sql server, sql query, sql database, database, what is sql, oracle sql, sql questions, mysql, join sql, sql interview questions, select, function in sql, join in sql, update sql, between in sql, delete in sql, sql download, online sql, w3schools sql, update in sql, sql developer, create table sql, leetcode sql, online sql compiler, window functions in sql, sql notes, sql live, sql practice, postgresql, hackerrank sql, sql practice online, my sql, sql practice questions
Id: 7o7P6JQYvKI
Channel Id: undefined
Length: 22min 45sec (1365 seconds)
Published: Fri Oct 07 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.