You use applications like Instagram, Facebook,
Netflix. You even know that these companies collect your data. Everyone knows about it;
everyone talks about it. These companies collect the data because they want to make
better decisions, understand their customers, and improve the overall business process. But
do you know how all of these things happen from the technological standpoint? In this video, I
will give you a complete understanding of it. The growth of data is massive and
exponential. As for the recent report, 90% of the worst data was generated in just the
last 2 years, and this report is just 2 years old. If you look at the bigger picture, you use
your phones and computers on a daily basis. The amount of data that is getting generated and
processed on a daily basis is massive. Now, this data generally gets stored on some
relational databases like PostgreSQL or as a text file on some file-based
storage system. But, at the end, all of this data gets stored at one place.
It is something called a data warehouse. Data warehouses are the system especially built
for analytical workloads. So when you want to store a huge volume of data and read them in bulk,
you can easily do that using a data warehouse. You can also find answers to questions such as: How
much time did this user spend on this page today compared to yesterday? How did we do on this
particular product sale this year compared to last year's? All of these questions can be easily
answered within seconds using a data warehouse. So, when companies like Google
and Facebook started growing, they started collecting massive volumes of data,
and they wanted to process this data and find valuable insights from it. Those were the days
when very few people had access to the internet, and all of these write, so the amount of data
that used to get generated was very less, and it was very easy to process
them using traditional technologies. But as more and more people started
getting access to the internet, the data started growing at an
exponential rate. At this time, the traditional data Vos technologies were
not capable enough to handle the huge volume and the speed at which the data was getting
generated. They had the architecture like this: This is the Shard Disc architecture where you have
one disc and multiple users trying to connect via some network. Or the other architecture was
like this, where you have shared databases, but to run the query, you have to use the
distributed query across different nodes. In a nutshell, all of these technologies
were not able to handle the speed and the size of the data that was getting generated at
that time. Traditional data warehouse systems started to struggle. They also required
significant time and resources to scale. Your database's performance was the big issue.
If you wanted to process a huge volume of data, it might take days or even weeks. And
more than that, the cost of managing all of these things was very expensive.
Businesses were losing out on valuable insights because they couldn't process
data on time or in an efficient manner. And the last thing is that all of these data
warehouse technologies only supported limited data types. So if you wanted to store the data, you had
to make your data into a structured format. So we have something called ETL (Extract, Transform,
Load), where you extract data from multiple places, do some transformation, and then you load
the structured data onto the data warehouse. So you have to do this entire processing before
you even load your data onto the data warehouse. So all of these technologies
couldn't handle the new age of data, and we needed something modern. And this is where
the Snowflake database comes into the picture. Now, before we go forward, I just want to say
that I'm not sponsored by Snowflake at all. This is the modern database that is gaining
popularity, and you will understand why as we go forward in this video. Snowflake is a
new type of data warehouse available in the market that is entirely on the cloud. The cloud
means you are using someone else's computer. So, in the traditional data warehouses,
you had to buy your hardware, make sure everything scales properly, update the software.
Even if you rent the server from someone else, you had to manage most of the things by
yourself. But the concept of the data cloud changed the entire game. Now, you don't
have to worry about all of these things. You just need to focus on your business side
and make sure how you process your data. What's cool about Snowflake is how it processes
and stores your data. It keeps your data storage and your computer layer separate so that
businesses can store more and more data and also process this data in an efficient manner.
So, this is the architecture of Snowflake. To understand it better, at the bottom, we
have the data storage layer where all the data really get stored. Then, we have the
compute layer where we can allocate resources to process our data and run queries. And we
have the cloud service layer where you can access different features like authentication,
security, and manage the overall infrastructure. So, for example, if you have multiple
teams working within the organizations, you can create something called as the virtual
warehouses where you can allocate different sizes of CPUs and RAM for different
teams. And based on the requirements, they will only use the allocated resources.
This way, you will not face the performance challenges. And even if you need more
resources, it will scale all of the systems. This makes Snowflake one of the most powerful
database technologies available in the market. That is not it. Snowflake is adapting to support
the next wave of enterprise workloads. There are so many new things that Snowflake has added
based on the problems they see in the market. For example, if you want to store structured or
unstructured data, you can easily do that. If you want to query data that is stored at some
different location, you can also create tables on top of it and start querying it. You don't
have to move your data from that location to your location. Also, you don't really require
the ETL part. You can directly load your data onto Snowflake and then do the transformation
using Python code, SQL code, or even Spark code. One of my favorite features on Snowflake is
something called a Snowpipe. So, if you want to create the data pipeline based on some event,
let's say your data is coming onto Amazon S3. So, whenever any new file gets uploaded, the Snowpipe
will get triggered, and it will directly store your data onto the Snowflake table. So, this is
one of the most powerful features I have seen. What Snowflake does is that they work with
large enterprises and try to understand what the real problem is. And based on
that, they try to solve these problems by building the right features. And they have
like hundreds of features that you can explore. So, let's look at an end-to-end example to
understand Snowflake in action and see what you can do with it. Now, before we move forward,
I just want to plug my course here. Is that if you want to learn the data warehouse technology
that is one of the most important skills you need to know as a data engineer, I have created an
in-depth course where you will learn everything about the data warehouse fundamentals. You will
do multiple projects, and you will understand the Snowflake database in-depth. It took me
3 to 4 months to build this entire course, so I will highly recommend you to at least check
it out. You will find the link in the description. Let's continue with our example.
Whenever you want to learn anything, the first step is not to find courses,
resources, books. The first step is to go to the website and create your account.
Okay, just get started. All you have to do is just write "Snowflake" in your browser,
and you will be redirected to the Snowflake page. Here, you will understand what
Snowflake is and everything about it. The first step we want to do here is to create
your account for free. They provide three trials for 30 days, and you also get $400 worth
of credit. So, all you have to do is just fill your information over here. So, I'll do
that. Once you fill your basic information, then they will ask you to choose the
Snow Edition. You have three options: Standard, Enterprise, and Business
Critical. So, we will go with the Business Critical version because that
provides most of the features that we want. After that, you can choose your cloud
provider. So, if you're working in a company and they have their existing
infrastructure on a particular cloud, you can choose one over here. I'll just go
with AWS. And region, you can select the nearest region as per your location. So, for me,
the nearest region is Singapore. Click over here, "Get started." You can fill all of these
basic information, or you can just skip this. Once you do that, you will get the
activation mail on your email. So, you can go to your registered
email, and here you will find all of the information. Just click onto
the "Click Activate." Once you do that, you just have to build your username, and then
you will be redirected to the Snowflake UI. So, this is the basic tutorial. So, the
important thing over here is that you bookmark your URL. You can also follow the basic
tutorial, but for now, I'll just skip this. So, this is the UI of Snowflake. As you can see, you
will get the worksheet. Worksheet is basically where you write all of your SQL queries. Then
you have the dashboard, so whatever dashboard that you create will be displayed here. We
also have the Stream Lead if you want to create interactive visualizations, applications.
So, they also provide third-party applications that you want to integrate. Then here you will
find all of this information about your data. So, all of the databases that you have,
whatever the sharing of the data that you do, so overall management of the data can be done
over here. They also have the marketplace. So, if you want to get some data for trial purposes,
if you want to practice SQL on reliable datasets, then you can find all of these datasets already
available for free. So, you can play with the Snowflake UI and get more understanding
about it. So, if you just play with it, you will get used to this UI, and you will
get more comfortable learning Snowflake. So, all you have to do is just click on this plus
icon and click onto this SQL worksheet. Then you will see, this is my Snowflake. So, this is
all of my data, and this is the sample data that you can use to write the query. So, this
is pretty simple. First, you make sure that you have selected the warehouse on top of it.
Then, over here, you select your database. So, you just click onto the database that you want
to use. You can also select the specific schema. So, in this case, I will go with this particular
schema that has the tables and data that we want to query. So, the first query that you can
write is something like this, which is a select query. So, over here, the query I'm writing is a
select star from this particular database. Then, there's a schema name, and there's a table name.
So, if I run this using this, you will see it will start running this particular query on this
sample table, and you will get the output. So, this is scanning the entire table and giving
you the final output. So, as you can see, this is a table information. You can see it over here.
Currently, it is only displaying the 10,000 rows, but this table has like millions of rows. So,
it has like 15 million rows available. You can also get more understanding about the query, like
how much time it took. If you want to debug that, you can just click onto this query ID, and
you will get the information about each and everything that this query did in the backend.
It did the table scan and gave you the result. So, you can also limit the result set just
by putting the limit as 10. We can also run more queries. So, let's say if you want to run
something like this where I'm just aggregating this entire quantity as a total quantity sold
from this particular table, if I run this, we'll get the complete information
about the total quantity sold. Now, I can do more analytics on top of it. It is pretty
simple. You just have to write the right query. If you have the basic SQL understanding,
this might be pretty simple to understand. What I'm doing is that I'm just trying to query
this particular table and joining this table based on the customer key that is available on
both of these tables, the order table and the customer table, and then selecting these four
columns. It will take like 30 to 40 seconds to run this query, and once you do that, you
will get the final result set given to you. If you want to understand about
your compute warehouse resources, you can just go to the admin. You will find
the warehouses over here. As you can see, we have the compute warehouses that are currently
running. If you want to create your own warehouse, you can just click onto this plus, give
the warehouse name as "test," for here, you have all of these options available. You
can just click onto the create warehouse. You also have the advanced option. So, if
you want to auto-resume it or to suspend it after some minutes, Snowflake has a lot of
different features: time travel, Snowpipe, and all of these other things. I can't
teach all of these in just an overview video. As I already told you, I have a
detailed course on data barrows for data engineers using Snowflake. So, I highly
recommend you to at least check it out. This was the complete overview about Snowflake.
Again, the intention of this video was not to make you a master of Snowflake but give you a
quick overview about this technology so that you get the confidence about learning this particular
thing. This is pretty easy. All you have to do is just understand the UI and bit and pieces of it.
Once you do that, you will get the confidence. Okay, so this was all from this video.
I hope you gained clarity. If you did, then don't forget to hit the like button
and subscribe to the channel if you're new here. Thank you for watching.
I'll see you in the next video.