Spark SQL for Data Engineering 4 : how to create database in databricks? #SparkSQL #databricksSQL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi friends welcome to databricks video tutorials this is Ravi today we are going to start with the spark SQL database operations so first we'll start with how to create a database and uh where it will create a database and back-end file system what is data like that we will understand today and previous videos I have gone through that basic like how to create a databricks Community Edition how to start a practical session in cloud and how to use a databricks communication for that I have done two videos even I have done another video on oltp and olap traditional data warehousing project and other information added so please go through those videos and today primarily we will focus on database Creation in databricks then backend file system what is data Lake and what will be the backend file system whenever you create a table or database in databricks so that we will go through today so let's start with the Practical session we'll get a more clarity on how to create a database in databricks SQL so I'm going to share my screen now so once you log into the Community Edition so here you can see the communication right so there first we need to create a compute my compute already available okay if compute is not available you can create a new compute but only one you can create because the communication limitation is only one compute so that that's enough for learning purpose then go to the workspace as I told you so which you need to import this data breaks a DBC file which contains complete a spark SQL related tutorial so we'll start with the ddl data definition language so when it comes to data definition language primary thing is databases how to create a database how to create a database how to create a table how to create a view then database ddl operations like alter view or you can say drop database drop table drop view all these these operations you can find in data definition language first we'll start with the database so if you go to the left side like if you expand the menu right so there you see the separate option called Data so data means here you can find the databases as of now only default database is available we don't see any databases so attach the cluster so here you can attach the cluster and if you want to clear State and sell output you can do this if if notebook already having some State and output it will clear and you can start the new new session and I have given a syntax how to create a database and available options that it will start with option so how to create a database so I want to create a database so create database database name so what I'll do I'll create a Spark SQL underscore DB the database name I'm going to create spark SQL underscore DB then the right side you see the run this cell you can once you click on this run the command is going to execute and database is created which you can find here you see this database is created now if you want to describe database describe database database name so this is the database so describe so it will give you the information about your database what is your database name this is a database name where it is created spark catalog and the location where it is going to store this is the location this is the location so whenever you create a database and whenever you create a table in spark Y is using Hive catalog so that's why the default location of warehouse location is this one user Hive whereas inside that it will create one folder called the database name dot DB okay let's create one table inside this database and understand that create table I'm going to create a table name as customers and at two columns ID integer name String and where I'm going to I'm going to create this table inside this database so you need to specify database name Dot then table name so this table is going to create inside this database now so if you click on this data there you see the database rate as of now we don't see any tables once it is up once it is done and you can see that new table in in UI spark so here is data under data you can see the UI so there you can see the table still Creation in progress whenever you are creating a data bricks table in data bricks default table type it will create as Delta table now you can go here you see this table is available you can click on that it will show you the information about columns it will show you the information about columns two columns are available when it is created that information you can see this history as well okay so this is the table first we created only table so it's a Delta table audit log also will be available that you will understand in Delta okay now go to the notebook workspace notebook now if I insert any record so where it will store in this table but we need to understand the back end physical location of your databases and tables so what I'll do I'll comment this command and I'll insert a one record insert into this table insert into this table then values One customer id1 name Ravi just I am going to insert one record only one record to understand where it is creating a table where it is storing the data so whenever you create a database or whenever you created table in data breaks the default location always remember this is the warehouse location this is the warehouse location now go and verify using file system percentage FS so percentage FS LS LS is a listing files where and the user and the hive and the warehouse your database name will be created as with database name Dot DB as a folder because it's a database under that it will create another subfolder your table name so here you can find the data okay here you can find the data so whenever you create a table or database there's a default location user hide aware of default warehouse location now you see this data is available here Delta table so now we're not specified any location so default location is this one always remember default warehouse location so how it is going to create a default warehouse location okay now just I am going to show you few more syntaxes this I have given multiple examples in this notebook you can go through this now so this is spark parameter called spark dot SQL Dot warehouse.directory so if you query this you will get to know parameter location default value is the default value means whenever you create a database whenever you create a table it is going to store the information in this location as a physical location that is data Lake so data bricks communication default it is integrated with AWS S3 so AWS S3 location now that's why it is created here so always remember that is a default location of your warehouse now I specified a table as database name dot table so it is created under this folder if I am not specifying a database name then where it will create outside so that is default location so I am going to create another table the same table also I can create again if I am not specifying the location database name then the table will be created in default database so where in default database I will show you that again so backend is file system backend is file system if you go to data default database this is a default database you can see the table and physical location so without database name under Warehouse all default tables will be available here under here under Warehouse then table so we're not inserted any record that's why you do not see any data you don't see any data now if I insert data one record will be available in that location as a data file now if you query this percentage FS is another file system option which we can query the file system backend is distributed data that is distributed file system or you can say data Lake this is so whenever you are creating a database default it will go and create a default warehouse location because spark does not have own meta store so spark is using high metastore so high meta store default location is this one so that's why you can find this parameter value so as a parameter it is available you can use a spark Dot config.get so spark.sql dot various Dot directory so this is the parameter value default location is this one and databricks file system so dbfs means databricks file system the backend is a data Lake AWS S3 bucket and file system here how we can communicate that dbfs database file system which you can communicate a backend storage distributed file system now if I want to create any database in different location if I want to create any database in different location even that is possible so which you can provide the location keyword then you can specify the location I want to create a folder called spark SQL under databases then I need to create this database name as this one okay R will create a new database name Ravi okay so create database database name and location so what will happen whenever you create database in a specific location it won't store in default warehouse this location it will be available in this location let's show you describe describe database database name so now you can see this is a database location database location okay now if you run it again what will happen database already available it will raise exceptions saying that database already created now if you don't want to raise any exception just if you want to ignore you can use like a other databases if not exist if not exists always remember SQL is not a case sensible language okay but paths python our case sensitive path and python are case sensitive so even if you specify lower case uppercase mixed case doesn't matter create database if not exist if you're already available ignore that if already available ignore that so this way which we can go for creating a database in data bricks so those databases will be available here this is the default location this is a different location which is external location this is external location and whenever you create a table whenever you create a table inside this database so the table will be located in this location okay let's create a table to create table the name is table name I'm going with employee ID as integer name as a string just I am going for this one and I'm going to create under Ravi underscore DB database so in this location you can find the table so just if you verify the file system you can find a separate folder for this table as EMP you see this so inside that that's a physical location of your table so because inside the database you created right where the database is available it will go on to create in that location and if you go to the data here even you can say the database and table inside this database the table is available so this way which we can go for creating databases creating database in data bricks using spark SQL SQL is one of the common language so spark SQL and the backend meta store when it comes to this meta store is sparkmed catalog nothing but Hive catalog so they are using High Vision open source Warehouse okay and the spark also opens us so they are integrated here so instead of developing their own metadata catalog they are using spark ad so Hive catalog here that's all and databricks they are developed another Unity catalog so that will be available in only Enterprise Edition recently it is under public preview you can use in only Enterprises it is not available in Community Edition so this is default is Hive catalog another Unity catalog which developed by data bricks that is available only in Enterprise Edition even you can check the documentation databricks Unity catalog it's a new new product mainly for lake house architecture data mesh purpose so they introduced another one is Unity catalog so as of now existing Hive and the existing spark which you cannot use a primary key foreign key constraints but this catalog is supporting those constraints okay so that's a Unity catalog it's a new service from databricks which is available only in Enterprise Edition Community Edition it is not available okay that so this is about the creating a database in databricks so next what we will do next what we will do will understand creating a table and draw creating a view dropping database dropping table dropping view so that see you in another video thank you
Info
Channel: TechLake
Views: 11,483
Rating: undefined out of 5
Keywords: sql server, sql query, sql database, database, what is sql, oracle sql, sql questions, mysql, join sql, sql interview questions, select, function in sql, join in sql, update sql, between in sql, delete in sql, sql download, online sql, w3schools sql, update in sql, sql developer, create table sql, leetcode sql, online sql compiler, window functions in sql, sql notes, sql live, sql practice, postgresql, hackerrank sql, sql practice online, my sql, sql practice questions
Id: Lg_xjyXOFFc
Channel Id: undefined
Length: 16min 6sec (966 seconds)
Published: Wed Oct 05 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.