(upbeat music) - [Tutor] Databricks provides
a unified open platform for all your data. It empowers data
scientists, data engineers, and data analysts with a simple
collaborative environment to run interactive and scheduled
data analysis workloads. Databricks is from the original creators of some of the world's most popular open source projects, Apache Spark, Delta
Lake, MLflow, and Koalas. It builds on these technologies to deliver a true Lakehouse architecture, combining the best of data
lakes and data warehouses for a fast, scalable and
reliable data platform. Built for the Cloud, your data is stored in low
cost Cloud object stores, such as AWS S3 and Azure Data Lake storage with perform and access
enabled through caching, optimized data layout,
and other techniques. (upbeat music) To work with your data, you can launch clusters
with hundreds of machines each with a mixture of CPUs and GPUs needed for your analysis. If you're on a large data team, policies can define
limits on cluster sizes and configuration. There is a Databricks
runtime for data engineers and data scientists as
well as a runtime optimized for machine learning worklaods. (upbeat music) See how easy it is to create a cluster with up to 390 workers. In the data science workspace, you can create collaborative
notebooks using Python, SQL, Scala or R. (upbeat music) Just like you can share
your Google docs with your colleagues and groups of colleagues, you can also share these notebooks. Plus, built-in commenting
tied to your code, helps you exchange ideas and
updates with your collegues. (upbeat music) In addition to using notebooks for exploratory data
analysis as you see here, many Databricks users love
the powerful integration with machine learning
frameworks like MLflow. Here, we're training a
model and testing it. But we can also look up at the
top here and see the MLflow experiment tracking, which records the
previous experiment runs, and you can see important
variables like their accuracy. Now MLflow is just one of the
integrations that Databricks provides with popular
frameworks for machine learning and data science. Databricks also supports
a variety of other open source libraries, which
are popular in the community. Want to know more about
what data your colleagues have shared with you? Take a look at the data tab
where you can see individual tables with schema and sample data. Importantly, you see the history of operations performed on each table, (mumbles) the transaction log. Now why does history matter? Well, it's important for compliance and security audits in many industries, but it also enables you
to explore your data by another dimension, time. Let's see how by opening up
this SQL analytics interface. The SQL analytics interface
gives us the ability to create visualizations and dashboards as well as query our Lakehouse
with performance exceeding or comparable to
traditional data warehouses. We achieve this level of
performance, reliability, schema enforcement, and scale through advances in
Delta Lake and Delta Engine. Delta Lake is an open format storage layer built on top of Parquet, which adds ACID transactions
to your Cloud Data Lake. Let's show you how the transaction log enables Delta Lake Time Travel. Here, we're looking at a
series of loan risk scores based on where a property is located. When we originally created this dataset, in version zero, we didn't
have any data for Iowa. We didn't have any loan
applications there, but as time went on and
we reached version 40, you can see that Iowa is populated with a loan risk score of eight, probably signifying that in
the middle of the country, there's a little bit less in
terms of natural disasters. Now let's show you the SQL
that powers these queries. And here you can see that
we very simply have added a version number into
our SQL query to indicate when we're querying the data from. This uses the Delta
Lake Time Travel feature in order to find the data at a particular point in time. Well I hope you've seen
how simple and powerful Databricks can be for
your entire data team. Whether data analysts, data
engineers, or data scientists, they can collaborate together to do their data plus AI on Databricks. Learn more at databricks.com.