(upbeat sounds) - Coming up, we're
joined by John Macintyre, Engineering Lead for Azure Synapse, Microsoft's limitless analytics platform, for a tour of the latest
updates now generally available and a first look at additional
capabilities coming soon. So John, welcome to the show. - Thanks Jeremy it's great to be back. - So we've been following
the momentum of Azure Synapse here in Microsoft Mechanics
closely over the past year. In fact, we recently chronicled several
early adopter customers and if you're new to Azure Synapse, it's Microsoft limitless
analytics platform that really brings together
enterprise data warehousing, also big data processing into
a single managed environment with no systems integration required. So, John, I know you and the
team have been hard at work, but what's new in the service? - So as you know Jeremy, Azure Synapse Analytics has
been available for customers for the past year. And we've had some really
great preview capabilities in the Synapse workspace that are now ready for
production workloads. Like Azure Synapse Link, which is the first
cloud-native HTAP solution. And that enables continuous analytics over operational data in Cosmos DB. That's done without interfering
with your operational or application workloads. Next SQL Serverless is also
now generally available. And that gives you the horsepower you need at the exact moment you run a query. It runs completely serverless so you only pay for each query
and the data you process. And beyond that, for analytics with Spark, we've also built performance optimizations for our implementation of Apache Spark, including enhanced shuffle, which aligns data to
improve query performance. We've also implemented
dynamic partition pruning to eliminate that unnecessary
data during job execution. All of these things are working together to really speed up performance. And that Spark environment
that Synapse offers is fully managed. So when a job comes in, the service will provision
resources, scale resources, and manage those resources
as you need them. - Right, and this is
really great news I think for Synapse users and really highly
anticipated capabilities. But you've also added a
host of new capabilities that were recently in preview that are now also available and
fully supported as of today. - We have, and you know, Jeremy, these have been focused
in a number of areas. So first, to help you
really easily get started: the new knowledge center
gives you Pipeline templates to bring data in, sample scripts
for analytics, automation, and Notebooks to start
to analyze your data as well as access to data
within the Azure open datasets. Second, we're making it even easier to bring data into Synapse
for advanced analytics. And to enrich that data in code-free ways and apply your Azure
Machine Learning models. - Right, and those capabilities
will make data analysts and also data scientists really happy. But, what are some of the
things that we've added for our data admins? - One of the biggest things we've done is to make it easier for
you to connect to your data and storage securely through
managed private endpoints. As you provision your
Azure Synapse workspace, you can simply enable the
manage virtual network option. And with that, we automatically handle
all that configuration of virtual network and private endpoints so that you can immediately
start running SQL scripts or use Notebooks to analyze your data. You can also enable the
exfiltration protection for your workspace. And what this does is it ensures that all that outbound traffic goes through private endpoints and only to selected
resources that are approved in your Azure AD tenants. - All right, and this is nice because you no longer
have to manage subnets, worry about IP ranges,
configure private endpoints like you said. You don't need deep
networking knowledge or, you know, knowledge about data
movement or orchestration. Also the performance and
resiliency is managed then by Microsoft. - That's right. We're removing that complexity for you. And also as part of our
comprehensive approach to data protection, we've
added new role types to Synapse for role-based access controls. They really give you more granular control over both your resources and your data. - Right, and this is a
lot of popular updates I think a lot of people
have been waiting for. But this is Mechanics, so why don't we make
this real for everybody? - Yeah, you know, this is
what I've been waiting for. Demos are my favorite part
of coming on Mechanics. So I'll start here in my
Azure Synapse workspace. And I want to walk you through how a grocery retailer
might use new capabilities in Synapse to plan their inventory levels. As you know, beyond just
monitoring operational data and sales data, we need to take into
account external factors that may impact sales and inventory. As we've seen in 2020, it's really the COVID-19 pandemic that is changing buying behavior. So for the most accurate forecast, we need to work with our
real-time operational sales data, but at the same time, correlate that with public COVID-19 data. So let's start in data. This provides a great view of all of your connected data sources. You can see it's easy
to keep my data unified and centralized, including
data managed in the workspace and data linked from sources
that sit outside the workspace. And from home, under useful links, I can get to our knowledge center. And this is so I can explore data sets that are available to me. In my case, I'll add the
Bing COVID-19 dataset, which provide daily confirmed cases as well as related data worldwide. All I need to do is click add data set, and you'll see this shows
up in my Linked data tab. It's integrated into my Synapse
workspace automatically. And without worrying about schema details or the format of the data, I can start to explore
the COVID-19 dataset. If I click on actions and
select a new SQL script, select top 100 rows, Synapse
will generate T-SQL commands to analyze the data. And I can start to explore
that data using Serverless SQL. And using the same process,
if I create a new Notebook to process and visualize
the data with Python, Synapse gives me a head start with pre-populated PySpark
code, all ready to execute. And you can just attach that notebook to a serverless Spark
pool, run the Notebook and start analyzing that data. And this experience is
available for data at any scale, whether it's just a few
thousand rows of data or millions of rows of data,
like we just demonstrated. - Now what's great is now you don't need to figure
out how to connect to the data and you can just start
your analysis right away. - That's right. We're removing that step for
you to make things easier. So in my case, the retail store data is
managed and Cosmos DB. And I want to see the
impact of the COVID-19 cases related to my operational
retail sales data. We can easily bring in new
Cosmos DB data and to do that, when I created the Cosmos DB container, I selected the analytical store option. And I can do this without worrying about how when I enable this, it's going to impact the performance of my operational data workload. And all that data is there
in the Synapse workspace in near real time. Now I've done this in advance and already have a Cosmos DB container with Synapse Link enabled. And as you can see here in
my Linked data with Synapse, that Cosmos DB is available to me. And now we can easily query that data between our sales system as well as the COVID-19
data that we've pulled in. And in my case, I want
to see the correlation between COVID case counts
and the sales data. To do that, I've added COVID data and filtered by Texas and California, where many of our stores are located and where we know case counts are high. And in this case, we're
specifically looking at sales of household paper
products and cleaning supplies. I've run a Serverless SQL query and I'll display the
COVID case count data. And you will see that in March, the sales and demand spiked before case counts started to accelerate. But if you look at week 30 and beyond, the COVID case count is a good
predictor of sales and demand for these products. When case counts go up, we can see higher demand quickly follows. To put this into further context, let's compare this to our
historical 2019 sales data that resides in my Azure Data Lake. I'll use the same parameters and we can see that our
run rate for these items is a lot lower. And it isn't really even in the ballpark of the actual demand that we're seeing. So this isn't really going to help us much with future predictions and forecasts. - Right, and what we just saw
was how simple and fast it was to bring in the public data. and also the operational data
that you had in Cosmos DB and analyze it at scale against your historical data that you brought in because
you're querying was serverless, also there wasn't a setup
or servers to manage or any configuration required. But how were you able to
bring in that historical data that we saw in the last step? - Yeah, I'm glad you asked Jeremy. We're making things a lot easier, not just for the administrators, but also for the data engineers. That historical data actually came from a legacy on-premises data warehouse. But let me show you how easy it is to bring in data like that with Synapse. I can either use a code-free pipeline, or I can simply load the
data into a SQL pool. To make data loading easier, we've added a new
experience for bulk loading. First you select the
folder, you right-click, then new SQL script, then bulk load. From there, you can
select a storage account, which I'll do, and I'll click continue. I'll keep the auto selected
properties, hit continue again. Then I'll pick a dedicated SQL pool where I want to load the data. I can create a new target
table or use an existing one. In my case, I'll use one I
created just for the show called MechLoad. The column mappings look good. Then I'll open the script, and beyond just a simple one-time import, what's really cool is that right here I can operationalize my data pipeline. Here I have a basic store procedure. You can see it from the
section that is commented out. So I'll uncomment that. When I do that and run it, you'll see the bulk
load procedure is added to my store procedure folder. I'll add this to a new
pipeline and that's it. It's operationalized. - Okay, so you've automated the pipeline to bring in the data that you need, but how do you take the next step then to perform predictive
analytics for sales forecast? - So now that I have the data flowing in, from here I can go on
to predict purchasing and stay ahead of demand. And I can use that COVID-19 case data for my predictive analysis. And we just built our
pipeline to ingest the data into a dedicated SQL pool where I can actually run
all of my predictions. We now have native integration
with Azure Machine Learning. And to save time, I've already linked to
my Synapse workspace with my Azure Machine Learning service. If I jump back to data, the
action for machine learning will appear against all my SQL tables. I just need to click into a table and select Machine Learning
and enrich with existing model. I'll see the list of all my
models from the ML registry that I have linked with
Azure Machine Learning. And this is the model registry that my data science team is using to develop their predictive models. And I can just choose one of these that corresponds to the selected table. Using this model, I can enrich the table. When I click continue, it's
going to analyze the table and the model, and it will automatically
map source column names with the model inputs to make sure everything works correctly. This next step will create
a store procedure for me so that I can continue running this model with my latest data. I just need to give it a name. I'll load this model
into an existing table. Now I'll deploy it and
that'll take just a second. And from here, we can
execute our store procedure to enrich the data from our table and we'll use our new predict function to predict our inventory forecast. I'll run it. And note, these ML predictions
are being calculated in the engine, which means all my queries
are still really fast. The ML engine is scaling with my cluster and there's no additional
cost for making API calls from outside my data warehouse environment to some separate scoring service. In just a few seconds, it's
analyzed three million records and we can see the
predicted quantities we need for inventory categories
all without moving the data. - And now it's also part
of a stored procedure so it's operationalized and it's going to stay up to
date then with its prediction. So all of what we just shown though, is probably part of a pre-production
or a test environment, so how do we push something
like this then into production? - So we built CICB
options into Azure Synapse under managed and sourced control. This means that your resource definitions, link services, connection
strings, pipelines, and code artifacts can all be
version controlled using Git. And you can deploy Synpase artifacts through your dev ops release pipeline, making it easier for you to
maintain your development and production workspaces. - Right, and as we've shown many times, it's really easy to serve up
that data to business users. For example, using Power
BI directly from Synapse. Now, everything you've shown
is out of preview today and can be used right now
for production workloads. - That's right. And the great thing about
continuous innovation in the cloud is that we've also just
released more capabilities in preview. We're making it easier for
you to transform your data at scale, code-free, with
power query built directly into the Azure Synapse Studio experience. You can also build Machine
Learning models, code-free, with Auto ML, without ever
leaving the Synapse environment. And we're really excited
about the native integration with Azure Purview, our new service for
discovering and mapping data across your complete data estate. With this integration, all that data is available for analytics within Azure Synapse. - So really a ton of progress
in the last couple of months. Thanks John for joining us today, but for people that want to get started and kick the tires of this,
what do you recommend people do? - If you're already using
Azure Synapse Analytics for data warehousing, you can attach a Synapse
workspace to it today to discover all this new functionality. If not, sign up for a trial or create your first Synapse
workspace at aka.ms/GetSynapse. - Amazing stuff. And now it's generally available for all your production
workloads with even more to try that's in preview. Of course, we're going
to continue to track this on Microsoft Mechanics so be
sure to keep checking back, subscribe to our channel
if you haven't already and thanks for watching. (upbeat music)