Azure Synapse Analytics: Building end-to-end analytics solutions with a unified | OD333

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC] >> Hi, I'm Santosh Balasubramanian. I'm here to talk to you about how you can build your end-to-end analytic solution using Azure Synapse Analytics. Today, we're going to talk about how you can use Azure Synapse Analytics in order to build your end-to-end AI to BI solutions. How you can extend your analytic solutions seamlessly over your operational data. How you can build real-time analytics solution with Cloud scale, and how you can do your data warehouse migration to Azure Synapse Analytics. Our customers, all over the world are trying to transform their business with actionable insights. Customers in various different domains, healthcare, finance and all the other domains are trying to learn more about their own customers. How they're able to get signals from a variety of sources, their own data, external data to build a 360-degree view around their customers, so they can engage with them deeper. They're learning, how can they optimize their supply chain and be able to optimize and drive efficiencies into their operations. They are learning how they can go and reinvent their products due to global and local changes, and how they can get signals from usage of their products. The signals from what their customers are speaking about their products to go and reinvent them. All of this that they are doing is to enable their employees. They want to enable their employees to be able to take decisions through the insights on data. Data that they have, data that they are getting from different places and data, which is the most important asset that is driving these insights. One of the challenges that they face while trying to get these insights are the various stages that you have to move the data through to get the insights. For example, I need to be able to ingest this data from a variety of different sources. I need to be able to explore this data. I need to be able to get my data science teams to work over this data all the way to my business analytics teams to work over this data. Let me just take two examples. Let me take the example of data science and business analytics. Traditionally, data science being done by data scientists, they're familiar with a set of tools and languages, let's say Python and Scala and Spark. They're familiar with working over data in the Lake. This data is in variety of different formats, structured, semi-structured, unstructured data, pictures, PDF files. We hear about all the different formats which is there in the Lake. These data scientists, these data engineers want to be able to explore this data very easily. They want to be able to run experiments over this data very easily. But then when you want to serve this data to your business users through BI tools or applications, you will need to go through a data warehousing system which provides you certain capabilities such as dependable performance at scale, workload management, proven security capabilities such as data masking, Row-Level Security. Now when you look at these multiple systems which are required, there are challenges that customers are facing with collaboration between your data scientists to collaboration with your BI developers who are familiar with SQL. Apart from this collaboration, while building end-to-end solutions, you need to start thinking about what is my entire security story across the solution. How do I prevent data exfiltration? How do I make sure that I have data encryption across this whole solution, as well as all my compliance needs are met. This is where we come to Azure Synapse Analytics. Azure Synapse Analytics brings the world of big data analytics and data warehousing all into a single service. What this enables you to do is really think about your data and your data estate, and how you need to reason over this data estate with the tools that are right for the job. Azure Synapse Analytics enables you to reason over any of your data. Data in your Data Lake, data in your Data Warehouse, data in your operational stores such as Cosmos DB, you have the power of using SQL or Apache Spark to be able to analyze the data. You can do this in serverless or dedicated pool modes. Along with this, you have data integration capabilities which enables you to bring data from 90 plus sources, be able to orchestrate pipelines and whether you're using your data integration capabilities, whether your data scientists are using Apache Spark or your BI developers are using SQL, all of this is within the same management, monitoring and security boundaries of the workspace. An example of this is if you want to ensure that your entire solution, data integration to using Apache Spark to SQL is within the same VNet boundary. All you need to do is select a couple of buttons with Manage VNet and data exfiltration protection, and that's it. It doesn't matter whether your data scientists are using Spark, your data engineers are using Spark or SQL, your BI developers are using SQL. All of this will be within that same VNet boundary. On top of this is our Synapse Studio. What the Synapse Studio enables you to do is for all the different data developers to collaborate with each other with the artifacts that they produce. Apart from this, they can also manage and monitor their entire solution. Let's actually go through a demo and show how you can build your entire AI to BI solution with Azure Synapse Analytics. I'm going to start with an example. This example really starts with how business questions are asked by people. Here, we have a couple of actors who are going to help solve the business questions. We have Josh. Josh is your BI developer. Josh is familiar with SQL. He knows about data warehousing. He has been working with BI tools and you have Nellie. Nellie is going to play the actor of both, your Data Scientist as well as your Data Engineer. Nellie is familiar with Python and Scala and Spark ecosystem, she's familiar with working over data which is there in the Lake. Let's just try to see how you can answer this question. When the business's user is asking, "I want to be able to get insights over a customer survey that I sent out." The business user will ask Josh, and Josh first needs to go and find whether this data on the survey exists or not. He will look across the entire data estate that is there. Once he finds the data, he needs to explore the data, he needs to examine it and see if the information that he wants is there in this data. Then if he's unable to find it, for example, here let's say, sentiments over some of their surveys. He needs to start working with Nellie. Nellie, who is this data scientist and data engineer in order to add these sentiments. Now let's go over and see what Nellie needs to do. Nellie will be having to find exactly that same file that Josh was looking at. Nellie then needs to decide whether she's going to build or reuse some of the models that will add sentiments to this particular survey comments. After she does that, before she operationalizes it, what she needs to do is confirm with Josh, "Is this really the right thing before I operationalize it in my end-to-end pipeline?" Let's think about another hand-off to Josh. What Josh needs to do now is look at the analyzed data with the new insights which has been added by Nellie, and confirm it's the right thing. This is not just a one-way street. Generally what happens is there is a feedback loop which keeps on happening between your BI team, your data engineering team, your business users, and only then once they arrive at the right answers, are they able to go and load the data into the warehouse and be able to operationalize the pipelines, create your deployment strategy and CI/CD strategy. Then Josh can take this data, which is there in this enterprise data warehouse and build a BI report to give answers to his business users. This is a very complex process. If you start thinking about Josh living only in the world of business analytics solutions and Nellie living in the world of data science and Data Lake solutions. Because this is further complicated by having to stitch together all the pieces which are necessary from security to monitoring to management. Because from a customer's standpoint, it's an end-to-end analytic solution. Let me show you now in this demo how both Josh and Nellie can work in the same Azure Synapse Analytics workspace to be able to answer this business question. The first thing I have done is I have created an Azure Synapse Analytics workspace. In this workspace, I have added a dedicated SQL pool, which is my Data Warehouse. I've called this EDW. I have my serverless SQL pools, and I also have my Apache Spark pool. Then I have gone and added two users. As I had said, Josh, who is my BI developer, who is familiar with SQL, and Nellie, who is playing the role of my data scientist and data engineer who's familiar with Spark. Now, let's see the world through the eyes of Josh and Nellie working together in able to answer the business questions. I'm going to first show when Josh logs into the same workspace, what is his experience? The first thing he needs to do, as we had said, was the ability to find the data that his business user said. The data is a survey that they said, and customer survey. What Josh can do is, because of the integration of Azure Synapse Analytics with Azure Purview, he can start searching his entire data estate from right here, his Synapse Studio experience. Let us say Josh searches for "survey". Here, you can see from Purview, he finds Data Lake file system, which says "surveyresults", and some form of CSV file, which says "feedbacksurvey". He clicks into this CSV file. Now, again, Josh doesn't know too much about the Lake, Josh doesn't know too much about any of the CSV formats, or Parquet formats, and other formats that's in the Lake. He knows SQL. Here are the things Josh can do. Josh can easily look at what is the information which is there in the CSV file. He can see things, such as it has votes, and name, and topics, and subjects, and comments. Looks interesting. It doesn't have sentiments, but let me look deeper. This is how easy it is for Josh to be able to run serverless SQL queries directly over the Lake. Josh basically selects the "Develop", and "Run", and that's it. Now, he is running SQL queries where he is seeing there are different topics, such as shipping, and fulfillment, and praise, and what are the different subjects, as well as what are the comments on top of it, what are the awards that have been given by other users. But what he doesn't see here is, what are the sentiments. Now, Josh needs to start working with Nellie, who is this data scientists and data engineer. He tells Nellie, "Hey, this is that file that I found in the Lake." Now, let me show you the world and the workspace through the eyes of Nellie. Nellie logs on to the same workspace. Then she searches on Purview for "survey", and she's able to get to this exact same file that Josh got to. But Nellie is more familiar with Spark, so what she does is, in the Develop tab, she can open a new notebook. She can load this data into a DataFrame, or she can create a Spark table right over exactly that same file which is over the Lake that Josh was working with. Let's say she creates a Spark table. She's already created this, so let's go and look at this table. This table is there in my default Spark database, and this is a table which is called "rawsurveytbl". Let's look at the columns of this table. It has exactly those same columns, which is subject, and comments, and votes, and other things. Now, Nellie needs to be able to add a sentiment analytic model in order to get sentiments from the comments field. She can do this in multiple ways; she can build a model, she can use something existing, or because of the seamless integration of Azure Synapse Analytics with Cognitive Services, she can choose one of the two cognitive services models which are here. She can do anomaly detection or text analytics. She chooses "Text analytics" and goes through a very simple wizard. This low code, no code wizard enables her to create a notebook. This notebook is all she needs to run. She's already run this notebook, so I'll walk you through it. She's able to import some libraries, she's able to run sentiments over the text comments, and then she's able to display the results. Let's look at the results which she sees. She's able to see what are the comments, what are the sentiments. You can see some of these comments have mixed sentiments. Why these have mixed sentiments is because the cognitive services sentiment analytics model breaks the comments into sentences, and sees what sentence is positive, negative, or neutral. Now, she's able to say, ''Wow, this does makes sense. This is exactly what I need. I need to now collaborate back with Josh, so I'm just going to write this results back into the Lake, and I'm going to write it as a Spark table." That's all Nellie does, a few lines of code. Once she does this, now, she has created this new Spark table which has data in the Lake. This table has a few additional columns. One of these columns is "Sentiments", and this is what Josh was looking for. She tells Josh that there is a Spark table that she has created. Now, Josh, who is our BI developer familiar with SQL, logs into his workspace. In his workspace, what he sees is the Spark tables. He's able to open the Spark database, look at all the Spark tables, and he finds a Spark table, which is Sentiment table. Josh doesn't know Python, Josh doesn't know Skylark, he only knows SQL, but he's able to right-click this table and query this Spark table with SQL. What Josh has just done right now is, he's able to work with what his data scientists are familiar with, the same data, the same data definition, and be able to get the results. Here, as you can see, Josh is able to say, "Wow, I've got my comments and my sentiments, which is exactly what I need. I just need to operationalize this so I can create my BI report.'' That is the next hand-off to Nellie. Nellie, now that she knows that Josh has everything that he needs in the Lake, has to load the data to the warehouse. For doing this, all Nellie has to do is go back to the same Notebook, and using the built-in Spark to SQL connector, write a few lines of code to read data from the Spark table, and write data to Synapse SQL, the dedicated SQL pool, which is the EDW, and create a table in there. Now, for her to operationalize it, it is as easy for her as clicking this button, and saying that, "I want to operationalize this in an existing pipeline." Once she does this in an existing pipeline, she just has to say when she has to run her notebook in the pipeline. Then she can add triggers, either event-driven triggers or time-driven triggers. All of this is also backed by source control using GitHub in this case. She can do her entire CI/CD pipeline, and that's it. That's how easy it is for Nellie to be able to operationalize the data which he found in the Lake and be able to write it to the data warehouse, create pipelines, create CI/CD pipelines after this. Now Josh is able to get this exact same data in his enterprise data warehouse. He's able to go and see what are the new tables that Nellie added. In this case, it is the sentiments table that Nellie added. He can query this table, or what he wants to do is create a BI report after this. Because of the built-in integration of Synapse and Power BI, he can easily go and connect to the Power BI workspace, link it to the Synapse workspace, and is able to see all the datasets, Power BI datasets, and start creating this BI report right here in the Synapse experience. For example, here what he's going to do is be able to add to an existing reports his sentiment information. Once he adds all of the sentiment information, all he needs to do is click on "File Save", and it's back to the business user. The business user can go to Power BI, and in Power BI is able to see the end result of this end-to-end analytics that Josh and Nellie had to work together to provide to him. Here's an example. The business user is able to see what are all the topics and the sentiments across the topics. Let's take a topic like shipping. He's able to click on "Shipping" because it sees there is mixed reviews, and is able to see why it is a mixed review. You have a customer who's very disappointed in the way things were packed. This customer has always been satisfied with the service and would trust that you would make this right. Just by being able to learn about this so easily, now your business user can know what are the changes that need to be made to your shipping system in order to retain the customer and make them happy. As you saw, what Azure Synapse Analytics enabled Josh and Nellie to do was make analytics a collaborative team sport. It enabled your BI developers and your data scientists and your data engineers to work together. Not as an or, but an and, how data science and business analytics works together with the tools that they are used to using over data, whether it's in the Lake or data which is in the warehouse. All of this is surrounded by the ability to be enterprise-ready across the entire solution, whether having your manage VNets capabilities or using data encryption with customer-managed keys, your unified monitoring, management, deployment, and CI/CD across the entire solution. Now that I have walked you through how your data scientists, your data engineers can use Azure Synapse Analytics, let me go on to the next topic. This is how do I extend doing analytics over the Lake and data warehouse to seamlessly doing analytics over your operational data. This is operational data which is enabled through Azure Synapse Link for Cosmos DB. Let me start with what is the challenge that we have today. Today, we have tens of thousands of our customers using Azure Cosmos DB in order to run their applications. But then they're starting to ask questions on the data that is coming from these applications. Questions such as I want to do some BI dashboarding. I want to be able to do some predictive analytics on the data which is there, whether it is for retail recommendations or predictions on failure of devices to help alert of fraud detection. In order to do this, they would need to be able to run analytics over the data which is there in the transactional store, which is an Azure Cosmos DB. Azure Synapse Link for Cosmos DB breaks down barriers between your transactional processing and your analytical processing. With a few clicks of a button, you will be able to run your data signs of BI workloads using Apache Spark or SQL in Azure Synapse Analytics over data in Cosmos DB. While you're running these analytic workloads, you will not have any impact to your transactional workloads. Not only that, you also don't have to manage any pipelines, you also don't have to think about data and the latency of the data because you can run these analytics in near real-time data. All of this is fully managed by Azure Synapse Link. Let's see how this works. Let's say you have a transactional store, and you are having your operational data being written to this. You enable Azure Synapse Link for Cosmos DB, and you say that this is the container in which you want to have your analytical store. We automatically do a sync, and we pulled the near real-time data from your transactional store and write it to an analytical store in the right columnar format, which is what you need for being able to don your analytical queries. You can easily use Apache Spark or SQL and Synapse to be able to query the data in your analytical store. Let me show you through a demo how this works. Now I'm going to show you how you can set up Synapse Link for Cosmos DB and be able to analyze the data in your analytical store easily with Azure Synapse Analytics. The first thing that you will do is go to your Azure Cosmos DB account and turn on Azure Synapse Link. Once you do that, you will go to your container that you are creating, a new container that you're creating, and say that you want to be able to enable your analytical store. After doing this, you don't have to do anything else. You will be automatically able to run your Spark or SQL using Azure Synapse Analytics. For this demo, I have already set up the container SynapseLinkIoTDemoDB. This has a few collections which is writing to my analytical store. Now let me jump to Azure Synapse Analytics. In here, the first thing I will do is go to my Data tab, click on this "Plus" button, and say, I want to connect to external data. In this external data, I want to be able to connect to my Azure Cosmos DB SQL API. Once I do that, I click on "Continue", fill in some information, and what you see is in my linked data, this exact same collection, which is associated to the IoT Cosmos DB, which I had shown. Now I can select this and start running MySQL and Spark right over this collection. Here, I can select this. I can say, I want to query this with serverless SQL. It is not just this, like I can go and do a lot more out of this. I can set up my serverless SQL database. I can set up views over data, which is there right in place in my Cosmos DB analytical stores. I can set up external tables over data, which is there in my Lake. Once I do this, I can use the power of what I am using with my BI tools and be able to query this data automatically. Here are the external tables that I have created. Here are the views that I have created. Now let me show you some queries. I can run a variety of different queries from being able to do schema inference, flatten complex logic, aggregation, to even joining data in Cosmos DB with data in the Lake. This is the power of being able to run analytical queries over your data in your transactional stores without impacting your transactional applications or workloads. Now I could have also chosen to go to the same container and be able to create a notebook. Here is a notebook that I've already created just to walk you through this. If I want to be able to do anomaly detection over data, which is there coming from my IoT sources to Cosmos DB, I can do that right here in my Azure Synapse experience. Today, as you saw in the demo, it is so easy to extend, being able to reason over data in the Lake and data in your warehouse to data in your operational stores. Today, we are proud to announce that Azure Synapse Link with serverless SQL pool is in GA. We have 2-3 times faster query execution times over what you saw in preview. We also have extended the network isolation capabilities from the managed VNet of Azure Synapse to your analytical store using private endpoints in Cosmos DB. Also the data encryption capabilities using your customer managed keys are extended across your Cosmos DB transactional store and analytical store. As you can see, what we have done is not only made it easy for you to do your data science and BI workloads over data in your Cosmos DB account, but also, they have extended the enterprise promises of security across Azure Synapse and Cosmos DB. Now that I have shown you this, I want to move to another section, which is really talking about scale and being able to do real-time analytics at Cloud Scale for your T-SQL developers. One of the things that we are going to be soon announcing in gated preview is T-SQL Streaming in Azure Synapse. What this enables you to do is do real-time analytics using things which you might be used to with windowing functions and other complex event processing functions on your incoming streams of data. You will be able to get DDL support for new streaming objects, and you can do in-memory processing over the streaming data with high throughput and low latency. I'm going to walk you through a demo in order to show this to you. I am getting some data which is coming from a connected factory. I'm having this data come in through an IoT Hub. It could also have been an Event Hub. This data is coming as an external stream, which is a new concept that I'm going to show you in the demo. I'm going to run my T-SQL streaming query on top of this and write the data to an output stream, which is a Synapse SQL Pool Table and then my BI report to be running off the Synapse SQL Pool Table, which you will see in a BI dashboard. Let me jump into the demo right now. Now I'm going to show you how do you create your external streams. What you see here is a BI dashboard, which I had shown is running off your Synapse SQL Pool Table. How I built this BI dashboard is I'm getting data from hundreds of thousands of sensors running T-SQL streaming queries on top of it and writing it to the SQL table. You go and select your New SQL Script, new external streams and here is where you can create your input or output external streams. Let me show you some information on where all can you read streaming data. You can read streaming data from IoT Hub. From Event Hub. You can have data which is in Blob Storage or ADLS Gen2, you can output this data to a variety of different sources. For example, your Event Hub, your Blob Storage, ADLS Gen2, SQL Database, or within the Synapse SQL database to the Synapse SQL table, which is what I showed in this demo. Let me select that. I'm going to write a name for the stream. I'm going to click on ''Continue''. I'm going to select the Synapse SQL dedicated SQL pool. I'm writing a name for the table and continue and that's it. That's all I needed to do to create an external stream. For purposes of this demo, I've already created three external streams. There is an input stream, an test input stream, and an output stream that I have created for this demo. The input streams are reading from IoT Hub and the output stream is writing to the Synapse SQL table. Now let me show you how do I create streaming job. I select the same thing and I click on "New streaming job". Here, I have to select the existing streams that I want to use in this job. I select all the three streams. Click on "Continue". Now I need to give a name to the streaming job. Let me say, streamingJob. I say how much resources I want to give to this and click "Okay". What happens is with a very simple local no-code experience, I'm able to get my SQL script that generates the streaming job. Then because this is in test mode, I can test this against my test input streams and be able to execute this. For example, here I'm running this particular streaming job in my test mode. One of the things that I can do is also write more complex streaming logic in order to create my job. Here I have a query script which I have created and here, I am inserting into the output stream information from thousands of my sensors. The average temperature across thousands of my sensors, the maximum humidity across all the sensors, and grouping it in a tumbling window by one second. Now that I am writing these complex queries, this job is continuously running all the data which is coming from my IoT Hub or Event Hub and writing it to the SQL pool. Here, I'm going to monitor my streaming job. I'm clicking on the streaming job, "StreamingSynapse32". What you see here is the metrics across the number of different event counts which is coming through the system, the job graph of what exactly you're streaming job is doing and I can select and add more metrics to it. For example, I can add metrics such as, what are my input event bytes, what is the input source received and other things. What you are seeing is that this is an entire solution that I am building from being able to define and create my jobs at scale, which is analyzing data across millions of my sensors and writing it at scale to my SQL pool table and being able to read that in my BI report. Now that you have seen how your data scientists and BI developers can collaborate for the AI to BI scenarios using Azure Synapse Analytics, how you can extend the reasoning and analytics of our data, which is also in your Cosmos DB account, or be able to do real time stream processing at scale with Azure Synapse Analytics, you might wonder, "How can I use it faster?" One of the things that we have been working on is also enabling you to do your data warehouse migration to Azure Synapse Analytics. For those of you who are either using On-Premise analysis systems or other Cloud data warehouses, we understand how significant it is migrating to a new analytics platform. A data warehouse migration can quickly become expensive and lengthy. For those of you who are considering migration, one of the critical blockers is also translating your SQL code written and optimized for the current system to now being optimized and written for the new system. Organizations worldwide want to modernize their analytics platform. They want to enjoy both the total cost of ownership and the innovation benefits of the new modern analytics platforms. However, customers have invested thousands of working hours, millions of dollars, and written hundreds of thousands of lines of code for their existing data warehouse. To translate this critical SQL code, customers have to either manually rewrite their existing SQL code or invest in enormous amounts of their budget to get outside practice to rewrite and convert their code. That is why we have created and recently announced a new migration tool. With a point and click tool the mandatory and critical migration processes are automatically completed in minutes. For example, scanning your source system, producing an inventory report that maps your database footprint across your organization, and translating existing code for your target system are accomplished in minutes, not weeks, not months. Hundreds of thousands of lines of SQL code translated in less than one hour. No manual rewriting of code, no chance of human errors. You can now de-risk your entire migration project and save on massive cost of rewriting years of code. You can get started with this today by going to aka.ms/synapse migration. I hope in closing, what you saw today in our session is how Azure Synapse Analytics is your complete end-to-end Cloud analytic solution. It enables your data science teams, your business analytics steams, your data integration teams to work together. It enables you to build your entire solution using our Synapse Studio and do seamless analytics over your operational data with Azure Synapse link. It enables you to run complex event processing and stream processing with T-SQL streaming and also to get started, we are announcing the data warehouse migration tool to get your existing data warehouse into Azure Synapse Analytics. I hope you can reap the benefits of Azure Synapse Analytics and be able to get insights for your organization with agility and better collaboration with your data development teams. I hope you enjoyed this session as much as I did. Thanks for all your time. See you soon. [MUSIC]
Info
Channel: Microsoft Ignite
Views: 5,171
Rating: 4.9069767 out of 5
Keywords: igfy21q3, ignite, ignite 2021, microsoft ignite 2021, microsoft ignite, microsoft, msft ignite 2021, msft ignite, ms ignite 2021, ms ignite, OD333, Azure Synapse Analytics: Building end-to-end analytics solutions with a unified | OD333, Azure, Session, Santosh balasubramanian, Maxine Coo, Marko Hotti
Id: sDOUu_LlliU
Channel Id: undefined
Length: 37min 55sec (2275 seconds)
Published: Thu Mar 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.