[MUSIC] >> Hi, I'm Santosh Balasubramanian. I'm here to talk to you
about how you can build your end-to-end analytic solution
using Azure Synapse Analytics. Today, we're going to talk
about how you can use Azure Synapse Analytics
in order to build your end-to-end AI to BI solutions. How you can extend your analytic solutions seamlessly
over your operational data. How you can build real-time
analytics solution with Cloud scale, and how you can do your data warehouse migration
to Azure Synapse Analytics. Our customers, all over
the world are trying to transform their business
with actionable insights. Customers in various different
domains, healthcare, finance and all the other domains are trying to learn more about
their own customers. How they're able to get signals from a variety of sources, their own data, external data to build a 360-degree
view around their customers, so they can engage with them deeper. They're learning, how can they
optimize their supply chain and be able to optimize and drive
efficiencies into their operations. They are learning how
they can go and reinvent their products due to
global and local changes, and how they can get signals
from usage of their products. The signals from what their
customers are speaking about their products to
go and reinvent them. All of this that they are doing
is to enable their employees. They want to enable their
employees to be able to take decisions through
the insights on data. Data that they have, data that they are getting from
different places and data, which is the most important asset
that is driving these insights. One of the challenges that they face while trying to get
these insights are the various stages that you have to move the data through
to get the insights. For example, I need to be able to ingest this data from a
variety of different sources. I need to be able to
explore this data. I need to be able to get my data science teams
to work over this data all the way to my business analytics
teams to work over this data. Let me just take two examples. Let me take the example of data
science and business analytics. Traditionally, data science
being done by data scientists, they're familiar with a set
of tools and languages, let's say Python and Scala and Spark. They're familiar with working
over data in the Lake. This data is in variety of
different formats, structured, semi-structured, unstructured
data, pictures, PDF files. We hear about all the different formats which
is there in the Lake. These data scientists,
these data engineers want to be able to explore
this data very easily. They want to be able to run experiments over this
data very easily. But then when you want
to serve this data to your business users through
BI tools or applications, you will need to go through a
data warehousing system which provides you certain
capabilities such as dependable performance at scale, workload management, proven security capabilities such as data masking, Row-Level Security. Now when you look at these multiple
systems which are required, there are challenges
that customers are facing with collaboration between your data scientists
to collaboration with your BI developers who
are familiar with SQL. Apart from this collaboration, while building end-to-end solutions, you need to start
thinking about what is my entire security story
across the solution. How do I prevent data exfiltration? How do I make sure that I have data encryption across
this whole solution, as well as all my
compliance needs are met. This is where we come to
Azure Synapse Analytics. Azure Synapse Analytics
brings the world of big data analytics and data
warehousing all into a single service. What this enables you to do is really think about your
data and your data estate, and how you need to reason over this data estate with the tools
that are right for the job. Azure Synapse Analytics enables you to reason over any of your data. Data in your Data Lake, data in your Data Warehouse, data in your operational
stores such as Cosmos DB, you have the power of using SQL or Apache Spark to be able
to analyze the data. You can do this in serverless
or dedicated pool modes. Along with this, you have data
integration capabilities which enables you to bring data
from 90 plus sources, be able to orchestrate pipelines and whether you're using your
data integration capabilities, whether your data
scientists are using Apache Spark or your BI
developers are using SQL, all of this is within
the same management, monitoring and security
boundaries of the workspace. An example of this is if you want to ensure that your entire solution, data integration to
using Apache Spark to SQL is within the same VNet boundary. All you need to do is
select a couple of buttons with Manage VNet and data exfiltration
protection, and that's it. It doesn't matter whether your
data scientists are using Spark, your data engineers are
using Spark or SQL, your BI developers are using SQL. All of this will be within
that same VNet boundary. On top of this is our Synapse Studio. What the Synapse Studio
enables you to do is for all the different data developers to collaborate with each other with
the artifacts that they produce. Apart from this, they can also manage and monitor their entire solution. Let's actually go through a
demo and show how you can build your entire AI to BI solution
with Azure Synapse Analytics. I'm going to start with an example. This example really starts with how business questions
are asked by people. Here, we have a couple
of actors who are going to help solve the business questions. We have Josh. Josh is
your BI developer. Josh is familiar with SQL. He knows about data warehousing. He has been working with BI
tools and you have Nellie. Nellie is going to play
the actor of both, your Data Scientist as well
as your Data Engineer. Nellie is familiar with Python
and Scala and Spark ecosystem, she's familiar with working over
data which is there in the Lake. Let's just try to see how you
can answer this question. When the business's user is asking, "I want to be able to get insights over a customer survey
that I sent out." The business user will ask Josh, and Josh first needs to go and find whether this data on
the survey exists or not. He will look across the entire
data estate that is there. Once he finds the data, he needs to explore the data, he needs to examine it and see if the information that he
wants is there in this data. Then if he's unable to
find it, for example, here let's say, sentiments
over some of their surveys. He needs to start
working with Nellie. Nellie, who is this
data scientist and data engineer in order
to add these sentiments. Now let's go over and see
what Nellie needs to do. Nellie will be having to find exactly that same file
that Josh was looking at. Nellie then needs to
decide whether she's going to build or reuse some of the models that will
add sentiments to this particular survey comments. After she does that, before she operationalizes it, what she needs to do
is confirm with Josh, "Is this really the
right thing before I operationalize it in my
end-to-end pipeline?" Let's think about another
hand-off to Josh. What Josh needs to do now is look at the analyzed data with the new insights which
has been added by Nellie, and confirm it's the right thing. This is not just a one-way street. Generally what happens is there is a feedback loop which keeps on
happening between your BI team, your data engineering team, your business users, and only then once they arrive
at the right answers, are they able to go
and load the data into the warehouse and be able to
operationalize the pipelines, create your deployment
strategy and CI/CD strategy. Then Josh can take this data, which is there in this enterprise
data warehouse and build a BI report to give answers
to his business users. This is a very complex process. If you start thinking about Josh
living only in the world of business analytics
solutions and Nellie living in the world of data science
and Data Lake solutions. Because this is further complicated
by having to stitch together all the pieces which
are necessary from security to monitoring to management. Because from a customer's standpoint, it's an end-to-end analytic solution. Let me show you now in this demo
how both Josh and Nellie can work in the same Azure
Synapse Analytics workspace to be able to answer
this business question. The first thing I have done is I have created an Azure Synapse
Analytics workspace. In this workspace, I have
added a dedicated SQL pool, which is my Data Warehouse. I've called this EDW. I have my serverless SQL pools, and I also have my Apache Spark pool. Then I have gone and added two users. As I had said, Josh, who is my BI developer, who is familiar with SQL, and Nellie, who is playing the role
of my data scientist and data engineer who's
familiar with Spark. Now, let's see the world
through the eyes of Josh and Nellie working together in able
to answer the business questions. I'm going to first show when Josh
logs into the same workspace, what is his experience? The first thing he needs
to do, as we had said, was the ability to find the data
that his business user said. The data is a survey that they
said, and customer survey. What Josh can do is, because of the integration of Azure Synapse Analytics
with Azure Purview, he can start searching his entire
data estate from right here, his Synapse Studio experience. Let us say Josh
searches for "survey". Here, you can see from Purview, he finds Data Lake file system, which says "surveyresults",
and some form of CSV file, which
says "feedbacksurvey". He clicks into this CSV file. Now, again, Josh doesn't know
too much about the Lake, Josh doesn't know too much
about any of the CSV formats, or Parquet formats, and other
formats that's in the Lake. He knows SQL. Here are
the things Josh can do. Josh can easily look at what is the information which is
there in the CSV file. He can see things,
such as it has votes, and name, and topics, and subjects, and comments. Looks interesting. It
doesn't have sentiments, but let me look deeper. This is how easy it is
for Josh to be able to run serverless SQL queries
directly over the Lake. Josh basically selects the "Develop", and "Run", and that's it. Now, he is running SQL queries where he
is seeing there are different topics, such as shipping, and fulfillment, and praise, and what are the different subjects, as well as what are the
comments on top of it, what are the awards that have
been given by other users. But what he doesn't see here is, what are the sentiments. Now, Josh needs to start
working with Nellie, who is this data scientists
and data engineer. He tells Nellie, "Hey, this is that file that
I found in the Lake." Now, let me show you the world and the workspace through
the eyes of Nellie. Nellie logs on to the same workspace. Then she searches on
Purview for "survey", and she's able to get to this
exact same file that Josh got to. But Nellie is more
familiar with Spark, so what she does is, in the Develop tab, she can open a new notebook. She can load this data
into a DataFrame, or she can create a
Spark table right over exactly that same file which is over the Lake that
Josh was working with. Let's say she creates a Spark table. She's already created this, so let's go and look at this table. This table is there in my
default Spark database, and this is a table which
is called "rawsurveytbl". Let's look at the
columns of this table. It has exactly those same columns, which is subject, and comments, and votes, and other things. Now, Nellie needs to be able to add a sentiment analytic model in order to get sentiments
from the comments field. She can do this in multiple ways; she can build a model, she can use something existing, or because of the
seamless integration of Azure Synapse Analytics
with Cognitive Services, she can choose one of the two cognitive services
models which are here. She can do anomaly detection
or text analytics. She chooses "Text analytics" and goes through a very simple wizard. This low code, no code wizard
enables her to create a notebook. This notebook is all
she needs to run. She's already run this notebook, so I'll walk you through it. She's able to import some libraries, she's able to run sentiments
over the text comments, and then she's able to
display the results. Let's look at the
results which she sees. She's able to see what
are the comments, what are the sentiments. You can see some of these
comments have mixed sentiments. Why these have mixed sentiments
is because the cognitive services sentiment analytics model breaks
the comments into sentences, and sees what sentence is
positive, negative, or neutral. Now, she's able to say, ''Wow, this does makes sense. This is exactly what I need. I need to now collaborate
back with Josh, so I'm just going to write this
results back into the Lake, and I'm going to write
it as a Spark table." That's all Nellie does, a few lines of code. Once she does this, now, she has created this new Spark
table which has data in the Lake. This table has a few
additional columns. One of these columns is "Sentiments", and this is what Josh
was looking for. She tells Josh that there is a
Spark table that she has created. Now, Josh, who is our BI
developer familiar with SQL, logs into his workspace. In his workspace, what he
sees is the Spark tables. He's able to open the Spark database, look at all the Spark tables, and he finds a Spark table, which is Sentiment table. Josh doesn't know Python, Josh doesn't know Skylark, he only knows SQL, but he's able to
right-click this table and query this Spark table with SQL. What Josh has just done right now is, he's able to work with what his
data scientists are familiar with, the same data, the
same data definition, and be able to get the results. Here, as you can see, Josh is able to say, "Wow, I've got my comments
and my sentiments, which is exactly what I need. I just need to operationalize this
so I can create my BI report.'' That is the next hand-off to Nellie. Nellie, now that she knows that Josh has everything that
he needs in the Lake, has to load the data
to the warehouse. For doing this, all Nellie has to do is go back to the same Notebook, and using the built-in
Spark to SQL connector, write a few lines of code to
read data from the Spark table, and write data to Synapse SQL, the dedicated SQL pool, which is the EDW, and create a table in there. Now, for her to operationalize it, it is as easy for her as
clicking this button, and saying that, "I want to operationalize this in
an existing pipeline." Once she does this in
an existing pipeline, she just has to say when she has to run her
notebook in the pipeline. Then she can add triggers, either event-driven triggers
or time-driven triggers. All of this is also backed by source control using
GitHub in this case. She can do her entire CI/CD
pipeline, and that's it. That's how easy it is for Nellie
to be able to operationalize the data which he found in the Lake and be able to write it
to the data warehouse, create pipelines, create
CI/CD pipelines after this. Now Josh is able to get this exact same data in his
enterprise data warehouse. He's able to go and see what are
the new tables that Nellie added. In this case, it is the sentiments
table that Nellie added. He can query this table, or what he wants to do is
create a BI report after this. Because of the built-in integration
of Synapse and Power BI, he can easily go and connect
to the Power BI workspace, link it to the Synapse workspace, and is able to see all the datasets, Power BI datasets, and start creating this BI report right here
in the Synapse experience. For example, here what
he's going to do is be able to add to an existing
reports his sentiment information. Once he adds all of the
sentiment information, all he needs to do is
click on "File Save", and it's back to the business user. The business user can go to Power BI, and in Power BI is able to
see the end result of this end-to-end analytics that
Josh and Nellie had to work together to provide to him. Here's an example. The business user is
able to see what are all the topics and the
sentiments across the topics. Let's take a topic like shipping. He's able to click on
"Shipping" because it sees there is mixed reviews, and is able to see why
it is a mixed review. You have a customer who's very disappointed in the way
things were packed. This customer has always
been satisfied with the service and would trust
that you would make this right. Just by being able to learn
about this so easily, now your business user can know what are the
changes that need to be made to your shipping system in order to retain the customer
and make them happy. As you saw, what Azure Synapse
Analytics enabled Josh and Nellie to do was make analytics
a collaborative team sport. It enabled your BI developers and your data scientists and your
data engineers to work together. Not as an or, but an and, how data science and
business analytics works together with the tools that they
are used to using over data, whether it's in the Lake or
data which is in the warehouse. All of this is surrounded
by the ability to be enterprise-ready
across the entire solution, whether having your
manage VNets capabilities or using data encryption
with customer-managed keys, your unified monitoring, management, deployment, and CI/CD
across the entire solution. Now that I have walked
you through how your data scientists,
your data engineers can use Azure Synapse Analytics, let me go on to the next topic. This is how do I extend doing
analytics over the Lake and data warehouse to seamlessly doing analytics over your operational data. This is operational data which is enabled through Azure
Synapse Link for Cosmos DB. Let me start with what is the
challenge that we have today. Today, we have tens of
thousands of our customers using Azure Cosmos DB in order
to run their applications. But then they're starting to ask questions on the data that is
coming from these applications. Questions such as I want to
do some BI dashboarding. I want to be able to do some predictive analytics
on the data which is there, whether it is for retail
recommendations or predictions on failure of devices to help
alert of fraud detection. In order to do this, they would need to be able to run analytics over the data which is
there in the transactional store, which is an Azure Cosmos DB. Azure Synapse Link for Cosmos
DB breaks down barriers between your transactional processing and your analytical processing. With a few clicks of a button, you will be able to run your
data signs of BI workloads using Apache Spark or SQL in Azure Synapse Analytics
over data in Cosmos DB. While you're running
these analytic workloads, you will not have any impact to
your transactional workloads. Not only that, you also don't
have to manage any pipelines, you also don't have to
think about data and the latency of the data
because you can run these analytics in
near real-time data. All of this is fully managed
by Azure Synapse Link. Let's see how this works. Let's say you have a
transactional store, and you are having your operational
data being written to this. You enable Azure Synapse
Link for Cosmos DB, and you say that this
is the container in which you want to have
your analytical store. We automatically do a sync, and we pulled the near real-time data from your transactional store and write it to an analytical store
in the right columnar format, which is what you need for being able to don your
analytical queries. You can easily use
Apache Spark or SQL and Synapse to be able to query the
data in your analytical store. Let me show you through
a demo how this works. Now I'm going to show
you how you can set up Synapse Link for
Cosmos DB and be able to analyze the data in your analytical store easily
with Azure Synapse Analytics. The first thing that
you will do is go to your Azure Cosmos DB account
and turn on Azure Synapse Link. Once you do that, you will go to your container
that you are creating, a new container that you're creating, and say that you want to be able
to enable your analytical store. After doing this, you don't
have to do anything else. You will be automatically able to run your Spark or SQL using
Azure Synapse Analytics. For this demo, I have already set up the container
SynapseLinkIoTDemoDB. This has a few collections which is writing to my analytical store. Now let me jump to Azure
Synapse Analytics. In here, the first thing I
will do is go to my Data tab, click on this "Plus" button, and say, I want to connect to external data. In this external data, I want to be able to connect
to my Azure Cosmos DB SQL API. Once I do that, I click on "Continue", fill in some information, and what you see is
in my linked data, this exact same collection, which is associated to the IoT
Cosmos DB, which I had shown. Now I can select this and start running MySQL and Spark
right over this collection. Here, I can select this. I can say, I want to query
this with serverless SQL. It is not just this, like I can
go and do a lot more out of this. I can set up my
serverless SQL database. I can set up views over data, which is there right in place in
my Cosmos DB analytical stores. I can set up external
tables over data, which is there in my Lake. Once I do this, I can use the power of
what I am using with my BI tools and be able to
query this data automatically. Here are the external
tables that I have created. Here are the views
that I have created. Now let me show you some queries. I can run a variety
of different queries from being able to
do schema inference, flatten complex logic, aggregation, to even joining data in Cosmos
DB with data in the Lake. This is the power of being able
to run analytical queries over your data in your
transactional stores without impacting your transactional
applications or workloads. Now I could have also chosen to go to the same container and be
able to create a notebook. Here is a notebook that I've already created just to walk
you through this. If I want to be able to do
anomaly detection over data, which is there coming from
my IoT sources to Cosmos DB, I can do that right here in
my Azure Synapse experience. Today, as you saw in the demo, it is so easy to extend, being able to reason over
data in the Lake and data in your warehouse to data
in your operational stores. Today, we are proud to announce that Azure Synapse Link with
serverless SQL pool is in GA. We have 2-3 times faster
query execution times over what you saw in preview. We also have extended the network isolation capabilities
from the managed VNet of Azure Synapse to your analytical store using
private endpoints in Cosmos DB. Also the data encryption capabilities using your customer managed keys are extended across your Cosmos DB transactional
store and analytical store. As you can see, what we have done is not only made it easy for you to do your data science and
BI workloads over data in your Cosmos DB account, but also, they have extended
the enterprise promises of security across Azure
Synapse and Cosmos DB. Now that I have shown you this, I want to move to another section, which is really talking about
scale and being able to do real-time analytics at Cloud
Scale for your T-SQL developers. One of the things that we are
going to be soon announcing in gated preview is T-SQL
Streaming in Azure Synapse. What this enables you to do is do real-time analytics using things
which you might be used to with windowing functions and other complex event
processing functions on your incoming streams of data. You will be able to get DDL
support for new streaming objects, and you can do in-memory
processing over the streaming data with high
throughput and low latency. I'm going to walk you through a demo in order to show this to you. I am getting some data which is
coming from a connected factory. I'm having this data come
in through an IoT Hub. It could also have been an Event Hub. This data is coming as
an external stream, which is a new concept that I'm
going to show you in the demo. I'm going to run my
T-SQL streaming query on top of this and write the
data to an output stream, which is a Synapse SQL Pool Table and then my BI report to be running off the Synapse
SQL Pool Table, which you will see in a BI dashboard. Let me jump into the demo right now. Now I'm going to show you how do
you create your external streams. What you see here is a BI dashboard, which I had shown is running off
your Synapse SQL Pool Table. How I built this BI dashboard is I'm getting data from
hundreds of thousands of sensors running T-SQL streaming queries on top of it and
writing it to the SQL table. You go and select
your New SQL Script, new external streams
and here is where you can create your input or
output external streams. Let me show you some information on where all can you
read streaming data. You can read streaming data
from IoT Hub. From Event Hub. You can have data which is in
Blob Storage or ADLS Gen2, you can output this data to a
variety of different sources. For example, your Event Hub, your Blob Storage, ADLS
Gen2, SQL Database, or within the Synapse SQL database
to the Synapse SQL table, which is what I showed in this demo. Let me select that. I'm going to write a
name for the stream. I'm going to click on ''Continue''. I'm going to select the Synapse
SQL dedicated SQL pool. I'm writing a name for the table
and continue and that's it. That's all I needed to do to
create an external stream. For purposes of this demo, I've already created
three external streams. There is an input stream,
an test input stream, and an output stream that I
have created for this demo. The input streams are
reading from IoT Hub and the output stream is writing
to the Synapse SQL table. Now let me show you how do
I create streaming job. I select the same thing and I
click on "New streaming job". Here, I have to select the existing streams that
I want to use in this job. I select all the three streams. Click on "Continue". Now I need to give a name
to the streaming job. Let me say, streamingJob. I say how much resources I want
to give to this and click "Okay". What happens is with a very
simple local no-code experience, I'm able to get my SQL script
that generates the streaming job. Then because this is in test mode, I can test this against my test input streams and
be able to execute this. For example, here I'm running this particular streaming
job in my test mode. One of the things that
I can do is also write more complex streaming logic
in order to create my job. Here I have a query script
which I have created and here, I am inserting into the output stream information
from thousands of my sensors. The average temperature across
thousands of my sensors, the maximum humidity
across all the sensors, and grouping it in a tumbling
window by one second. Now that I am writing
these complex queries, this job is continuously
running all the data which is coming from my IoT Hub or Event Hub and writing
it to the SQL pool. Here, I'm going to
monitor my streaming job. I'm clicking on the streaming
job, "StreamingSynapse32". What you see here is the metrics across the number of different event counts which
is coming through the system, the job graph of what exactly
you're streaming job is doing and I can select and
add more metrics to it. For example, I can
add metrics such as, what are my input event bytes, what is the input source
received and other things. What you are seeing is that this
is an entire solution that I am building from being able to define
and create my jobs at scale, which is analyzing data across millions of my
sensors and writing it at scale to my SQL
pool table and being able to read that in my BI report. Now that you have seen how your data scientists
and BI developers can collaborate for the AI to BI scenarios using Azure
Synapse Analytics, how you can extend the reasoning
and analytics of our data, which is also in your
Cosmos DB account, or be able to do real time stream processing at
scale with Azure Synapse Analytics, you might wonder, "How
can I use it faster?" One of the things that
we have been working on is also enabling you to do your data warehouse migration
to Azure Synapse Analytics. For those of you who are
either using On-Premise analysis systems or other
Cloud data warehouses, we understand how significant it is migrating to a new
analytics platform. A data warehouse migration can quickly become expensive and lengthy. For those of you who are
considering migration, one of the critical blockers is also translating your SQL code written and optimized for the current system to now being optimized and
written for the new system. Organizations worldwide want to
modernize their analytics platform. They want to enjoy both the
total cost of ownership and the innovation benefits of the
new modern analytics platforms. However, customers have invested
thousands of working hours, millions of dollars, and written hundreds of thousands of lines of code for their existing
data warehouse. To translate this critical SQL code, customers have to either manually rewrite their existing
SQL code or invest in enormous amounts of
their budget to get outside practice to rewrite
and convert their code. That is why we have created and recently announced
a new migration tool. With a point and click tool the mandatory and critical
migration processes are automatically
completed in minutes. For example, scanning
your source system, producing an inventory
report that maps your database footprint
across your organization, and translating existing code for your target system are
accomplished in minutes, not weeks, not months. Hundreds of thousands of
lines of SQL code translated in less than one hour. No
manual rewriting of code, no chance of human errors. You can now de-risk your entire migration
project and save on massive cost of
rewriting years of code. You can get started
with this today by going to aka.ms/synapse migration. I hope in closing, what you saw today in
our session is how Azure Synapse Analytics is your complete end-to-end
Cloud analytic solution. It enables your data science teams, your business analytics steams, your data integration
teams to work together. It enables you to build
your entire solution using our Synapse Studio and do seamless analytics over
your operational data with Azure Synapse link. It enables you to run complex event processing
and stream processing with T-SQL streaming and
also to get started, we are announcing the data
warehouse migration tool to get your existing data warehouse
into Azure Synapse Analytics. I hope you can reap the benefits of Azure Synapse Analytics and
be able to get insights for your organization
with agility and better collaboration with
your data development teams. I hope you enjoyed this
session as much as I did. Thanks for all your
time. See you soon. [MUSIC]