What is Azure Synapse Analytics? Generally Available Today.

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

(upbeat sounds) - Coming up, we're joined by John Macintyre, Engineering Lead for Azure Synapse, Microsoft's limitless analytics platform, for a tour of the latest updates now generally available and a first look at additional capabilities coming soon. So John, welcome to the show. - Thanks Jeremy it's great to be back. - So we've been following the momentum of Azure Synapse here in Microsoft Mechanics closely over the past year. In fact, we recently chronicled several early adopter customers and if you're new to Azure Synapse, it's Microsoft limitless analytics platform that really brings together enterprise data warehousing, also big data processing into a single managed environment with no systems integration required. So, John, I know you and the team have been hard at work, but what's new in the service? - So as you know Jeremy, Azure Synapse Analytics has been available for customers for the past year. And we've had some really great preview capabilities in the Synapse workspace that are now ready for production workloads. Like Azure Synapse Link, which is the first cloud-native HTAP solution. And that enables continuous analytics over operational data in Cosmos DB. That's done without interfering with your operational or application workloads. Next SQL Serverless is also now generally available. And that gives you the horsepower you need at the exact moment you run a query. It runs completely serverless so you only pay for each query and the data you process. And beyond that, for analytics with Spark, we've also built performance optimizations for our implementation of Apache Spark, including enhanced shuffle, which aligns data to improve query performance. We've also implemented dynamic partition pruning to eliminate that unnecessary data during job execution. All of these things are working together to really speed up performance. And that Spark environment that Synapse offers is fully managed. So when a job comes in, the service will provision resources, scale resources, and manage those resources as you need them. - Right, and this is really great news I think for Synapse users and really highly anticipated capabilities. But you've also added a host of new capabilities that were recently in preview that are now also available and fully supported as of today. - We have, and you know, Jeremy, these have been focused in a number of areas. So first, to help you really easily get started: the new knowledge center gives you Pipeline templates to bring data in, sample scripts for analytics, automation, and Notebooks to start to analyze your data as well as access to data within the Azure open datasets. Second, we're making it even easier to bring data into Synapse for advanced analytics. And to enrich that data in code-free ways and apply your Azure Machine Learning models. - Right, and those capabilities will make data analysts and also data scientists really happy. But, what are some of the things that we've added for our data admins? - One of the biggest things we've done is to make it easier for you to connect to your data and storage securely through managed private endpoints. As you provision your Azure Synapse workspace, you can simply enable the manage virtual network option. And with that, we automatically handle all that configuration of virtual network and private endpoints so that you can immediately start running SQL scripts or use Notebooks to analyze your data. You can also enable the exfiltration protection for your workspace. And what this does is it ensures that all that outbound traffic goes through private endpoints and only to selected resources that are approved in your Azure AD tenants. - All right, and this is nice because you no longer have to manage subnets, worry about IP ranges, configure private endpoints like you said. You don't need deep networking knowledge or, you know, knowledge about data movement or orchestration. Also the performance and resiliency is managed then by Microsoft. - That's right. We're removing that complexity for you. And also as part of our comprehensive approach to data protection, we've added new role types to Synapse for role-based access controls. They really give you more granular control over both your resources and your data. - Right, and this is a lot of popular updates I think a lot of people have been waiting for. But this is Mechanics, so why don't we make this real for everybody? - Yeah, you know, this is what I've been waiting for. Demos are my favorite part of coming on Mechanics. So I'll start here in my Azure Synapse workspace. And I want to walk you through how a grocery retailer might use new capabilities in Synapse to plan their inventory levels. As you know, beyond just monitoring operational data and sales data, we need to take into account external factors that may impact sales and inventory. As we've seen in 2020, it's really the COVID-19 pandemic that is changing buying behavior. So for the most accurate forecast, we need to work with our real-time operational sales data, but at the same time, correlate that with public COVID-19 data. So let's start in data. This provides a great view of all of your connected data sources. You can see it's easy to keep my data unified and centralized, including data managed in the workspace and data linked from sources that sit outside the workspace. And from home, under useful links, I can get to our knowledge center. And this is so I can explore data sets that are available to me. In my case, I'll add the Bing COVID-19 dataset, which provide daily confirmed cases as well as related data worldwide. All I need to do is click add data set, and you'll see this shows up in my Linked data tab. It's integrated into my Synapse workspace automatically. And without worrying about schema details or the format of the data, I can start to explore the COVID-19 dataset. If I click on actions and select a new SQL script, select top 100 rows, Synapse will generate T-SQL commands to analyze the data. And I can start to explore that data using Serverless SQL. And using the same process, if I create a new Notebook to process and visualize the data with Python, Synapse gives me a head start with pre-populated PySpark code, all ready to execute. And you can just attach that notebook to a serverless Spark pool, run the Notebook and start analyzing that data. And this experience is available for data at any scale, whether it's just a few thousand rows of data or millions of rows of data, like we just demonstrated. - Now what's great is now you don't need to figure out how to connect to the data and you can just start your analysis right away. - That's right. We're removing that step for you to make things easier. So in my case, the retail store data is managed and Cosmos DB. And I want to see the impact of the COVID-19 cases related to my operational retail sales data. We can easily bring in new Cosmos DB data and to do that, when I created the Cosmos DB container, I selected the analytical store option. And I can do this without worrying about how when I enable this, it's going to impact the performance of my operational data workload. And all that data is there in the Synapse workspace in near real time. Now I've done this in advance and already have a Cosmos DB container with Synapse Link enabled. And as you can see here in my Linked data with Synapse, that Cosmos DB is available to me. And now we can easily query that data between our sales system as well as the COVID-19 data that we've pulled in. And in my case, I want to see the correlation between COVID case counts and the sales data. To do that, I've added COVID data and filtered by Texas and California, where many of our stores are located and where we know case counts are high. And in this case, we're specifically looking at sales of household paper products and cleaning supplies. I've run a Serverless SQL query and I'll display the COVID case count data. And you will see that in March, the sales and demand spiked before case counts started to accelerate. But if you look at week 30 and beyond, the COVID case count is a good predictor of sales and demand for these products. When case counts go up, we can see higher demand quickly follows. To put this into further context, let's compare this to our historical 2019 sales data that resides in my Azure Data Lake. I'll use the same parameters and we can see that our run rate for these items is a lot lower. And it isn't really even in the ballpark of the actual demand that we're seeing. So this isn't really going to help us much with future predictions and forecasts. - Right, and what we just saw was how simple and fast it was to bring in the public data. and also the operational data that you had in Cosmos DB and analyze it at scale against your historical data that you brought in because you're querying was serverless, also there wasn't a setup or servers to manage or any configuration required. But how were you able to bring in that historical data that we saw in the last step? - Yeah, I'm glad you asked Jeremy. We're making things a lot easier, not just for the administrators, but also for the data engineers. That historical data actually came from a legacy on-premises data warehouse. But let me show you how easy it is to bring in data like that with Synapse. I can either use a code-free pipeline, or I can simply load the data into a SQL pool. To make data loading easier, we've added a new experience for bulk loading. First you select the folder, you right-click, then new SQL script, then bulk load. From there, you can select a storage account, which I'll do, and I'll click continue. I'll keep the auto selected properties, hit continue again. Then I'll pick a dedicated SQL pool where I want to load the data. I can create a new target table or use an existing one. In my case, I'll use one I created just for the show called MechLoad. The column mappings look good. Then I'll open the script, and beyond just a simple one-time import, what's really cool is that right here I can operationalize my data pipeline. Here I have a basic store procedure. You can see it from the section that is commented out. So I'll uncomment that. When I do that and run it, you'll see the bulk load procedure is added to my store procedure folder. I'll add this to a new pipeline and that's it. It's operationalized. - Okay, so you've automated the pipeline to bring in the data that you need, but how do you take the next step then to perform predictive analytics for sales forecast? - So now that I have the data flowing in, from here I can go on to predict purchasing and stay ahead of demand. And I can use that COVID-19 case data for my predictive analysis. And we just built our pipeline to ingest the data into a dedicated SQL pool where I can actually run all of my predictions. We now have native integration with Azure Machine Learning. And to save time, I've already linked to my Synapse workspace with my Azure Machine Learning service. If I jump back to data, the action for machine learning will appear against all my SQL tables. I just need to click into a table and select Machine Learning and enrich with existing model. I'll see the list of all my models from the ML registry that I have linked with Azure Machine Learning. And this is the model registry that my data science team is using to develop their predictive models. And I can just choose one of these that corresponds to the selected table. Using this model, I can enrich the table. When I click continue, it's going to analyze the table and the model, and it will automatically map source column names with the model inputs to make sure everything works correctly. This next step will create a store procedure for me so that I can continue running this model with my latest data. I just need to give it a name. I'll load this model into an existing table. Now I'll deploy it and that'll take just a second. And from here, we can execute our store procedure to enrich the data from our table and we'll use our new predict function to predict our inventory forecast. I'll run it. And note, these ML predictions are being calculated in the engine, which means all my queries are still really fast. The ML engine is scaling with my cluster and there's no additional cost for making API calls from outside my data warehouse environment to some separate scoring service. In just a few seconds, it's analyzed three million records and we can see the predicted quantities we need for inventory categories all without moving the data. - And now it's also part of a stored procedure so it's operationalized and it's going to stay up to date then with its prediction. So all of what we just shown though, is probably part of a pre-production or a test environment, so how do we push something like this then into production? - So we built CICB options into Azure Synapse under managed and sourced control. This means that your resource definitions, link services, connection strings, pipelines, and code artifacts can all be version controlled using Git. And you can deploy Synpase artifacts through your dev ops release pipeline, making it easier for you to maintain your development and production workspaces. - Right, and as we've shown many times, it's really easy to serve up that data to business users. For example, using Power BI directly from Synapse. Now, everything you've shown is out of preview today and can be used right now for production workloads. - That's right. And the great thing about continuous innovation in the cloud is that we've also just released more capabilities in preview. We're making it easier for you to transform your data at scale, code-free, with power query built directly into the Azure Synapse Studio experience. You can also build Machine Learning models, code-free, with Auto ML, without ever leaving the Synapse environment. And we're really excited about the native integration with Azure Purview, our new service for discovering and mapping data across your complete data estate. With this integration, all that data is available for analytics within Azure Synapse. - So really a ton of progress in the last couple of months. Thanks John for joining us today, but for people that want to get started and kick the tires of this, what do you recommend people do? - If you're already using Azure Synapse Analytics for data warehousing, you can attach a Synapse workspace to it today to discover all this new functionality. If not, sign up for a trial or create your first Synapse workspace at aka.ms/GetSynapse. - Amazing stuff. And now it's generally available for all your production workloads with even more to try that's in preview. Of course, we're going to continue to track this on Microsoft Mechanics so be sure to keep checking back, subscribe to our channel if you haven't already and thanks for watching. (upbeat music)

Info

Channel: Microsoft Mechanics

Views: 34,939

Rating: 4.8984127 out of 5

Keywords: azure synapse, azure synapse vs snowflake, synapse, data warehouse, synapse download, azure data warehouse, azure sql data warehouse, sql data warehouse, synapse software, azure analytics, data warehouse software, data ware house, analytics db, Power BI, Azure Data Factory, Azure Cosmos DB, Azure Stream Analytics, Azure Analysis Services, Microsoft Power BI, SQL DW, enterprise data warehousing, Big Data analytics, query data, machine learning, azure, big data, microsoft

Id: dvP0JwchjfI

Channel Id: undefined

Length: 14min 57sec (897 seconds)

Published: Thu Dec 03 2020