>> Today on the IoT Show, Andrew Liu will tell us
everything we need to know about Cosmos DB and how to use
it in an IoT application. >> This is going to be
a really great session. I'm really looking forward
to having you attend. [MUSIC] >> Hi, everyone, you're
watching the IoT Show. I'm Olivier, your host. Thanks for coming back. Today we have Andrew
from the Cosmos DB team. >> Hi, everyone. >> Who was here to tell us
everything about Cosmos DB, and why should you care
as an IoT developer. Andrew, thanks for
coming to the show. >> Yeah, I did, it's awesome to be here. >> Yeah. >> Thank you so much. >> So can you tell us
a bit about yourself. Who you are? What is your
team doing at Microsoft? >> Sure. I'm a PM on the
Azure Cosmos DB team. I've been on the team for
about the last 5-6 years now, I've been working on everything
from programmability to launching our service
from preview to GA to V2. >> Awesome. We were discussing just before starting recording that, your audience, your customer is the developer using Cosmos DB, right? >> Yes. >> Okay. So you're going
to tell us everything crash course on what is Cosmos DB. We are IoT guys. >> Yeah. >> So we really don't know
much about what is going on, and there's so many
storage solutions for different reasons on Azure that we really want to understand
what is Cosmos DB? When should we use Cosmos DB especially in the context
of IoT scenarios? >> Oh yeah, absolutely. So yeah, let's dive right in. So the way I like to think about this we have this house diagram and it's
like laying out the foundation, and then looking at it
as like a layer cake. But in a nutshell, Cosmos DB
is a distributed database. What I mean by distributed
database is it runs on a cluster of machines as
opposed to a single machine. >> Okay. >> So that's going to be a core
concept that you see scale out, and really in a nutshell, we scale out on two-dimensions, we both shard and partition the
database that allows us to get ability to ingest
really large amounts of data as well as a really
high rate of data. We also replicate the data that
gives us a lot of redundancy, so that if a machine dies, the system fails over seamlessly, and is highly available. We also see people
use the replication to bring CDN like capabilities to a database where rather
than rely on caching, we're doing active/active live
replication across regions, we can even use on multi-master. >> Okay. >> So beyond availability, people are doing this from a
really low latency perspective. So this gives us, flushes out the bottom
core foundation, so I like to make an analogy, think of Cosmos DB
almost like do you think of a.NET,.NET's got a backend. >> Yeah. >> That backend is going
to run on some bytecode, it's got a runtime and
I can expose it through a various different ways
of programming against it, whether it's C# or F#. >> Okay. >> Same thing in Cosmos DB, we have a backend that uses the replication for
turnkey global distribution, that partitioning for elastic scale
out of storage and throughput. >> Okay. >> We do this as a fully
managed service with their own resource
governance there to give you a predictable low latency, as well as from a
replication standpoint one of the things that really makes the global distribution work
is the fact that we can do this in a multi-master way
with consistency guarantees. Then finally, as a
fully managed service, we have a bunch of
comprehensive SLAs, that'll actually lead into
why people are using this. But think of this as the backend
then we have a frontend, that finally allows us to go and understand different
protocols ranging from our own SQL API to open-source
APIs like the MongoDB, and Cassandra, and Gremlin APIs. We can target existing Mongo
and Cassandra applications to Cosmos CD without really
having to rewrite any code. >> Okay. So like ingest super-fast lots of data
and [inaudible] the data. Those resonant to me as
an IoT developer, right? >> Yeah. So we tend to see this
in IoT and two core scenarios, one is the device telemetry. >> Yeah. >> If we all know this from
a big data perspective, people like to joke like
oh Facebook, or Twitters, k, so many users, but honestly like, "I'm not
that active on social media. I might tweet handful
of times a day." >> Yeah. This IoT
Show episode, right? >> Oh, yeah. I definitely
be tweeting about this one. >> You should better. >> I mean, looking at our devices, they're going, here's my state, here's my state, here's my
state, here's my state. in 10 seconds, it's produced more data points than I
can as a human do all day. >> Yeah. >> When you have fleets of
tens of thousands of this. So for example one of
our users is ExxonMobil. >> Yeah. >> They're tracking 50,000
different oil wells all around the US making sure that
the oil wells are healthy. They're tracking the flow rate, the temperature, making
sure everything is healthy. You basically had this problem I have lots and lots of writes
per second coming in. >> Yeah. >> There's several
ways of solving that. Historically, what you would do is if your database
can't cope with that, what you do is you put a
stream processor and take many data points and turn that into one data point
and aggregate, right? >> Yeah. >> But the challenge of that
is fidelity of the data. What happens if something
is normally happens? >> Yeah. >> Like the max or the
average temperature goes up and you're wondering, where's the rod telemetry? Can actually get a fast
index data store that can ingest this semi-structured
sensor data at a high rate and provide queries? So this is why we see device
telemetry land on Cosmos DB a lot, it's really just the partitioning and automatic
indexing on what makes it pop. >> Yeah. >> The other side of it is we
see a lot in device catalogs, people are using our replication. >> Okay. >> If you look at Cloud services, we typically look at availability by measuring
the number of nines. So some data stores have three nines, some have four nines,
some have five nines. >> Yeah. >> For Cosmos DB, we're actually at the very high end of
that, the five nines. >> Okay. Nice. >> So if you're doing things we have big automotive manufacturers
where an assembly line, they have little
robots running around and they're driving commands
to each one of these robots. To flush that state, what they wanted to do was
they wanted something that was extremely highly available, for that way the assembly
line never stops. From that catalog perspective, we typically see Cosmos
DB use that mostly for its extremely high availability. >> Academic sense. From the
IoT developers' perspective, Cosmos DB will be actually
sitting in the backend, you would ingest the data from
the devices through IoT Hub, then you will truly transform
the data using functions, some analytics, and then put
that into Cosmos DB, right? >> Right. So here on the example on the architecture diagram because
it's actually from ExxonMobil. >> Okay. >> Their oil wells,
their gas compressors, their pipelines, everything
comes into Azure IoT Hub. Typically, you might position
like a hot store and cold store, that way you get a blend of
both fast interactive queries on something that's very hot and Cosmos DB is typically
the sorting layer there. >> Got it. >> Likewise, if you're
using something like Azure Time Series Insights, I mean this is more for interactive when you have a small dev team, feeling like reporting BI on it, but not really driving
a programmatic REST API to serve your mobile and web app. >> Okay. Got it. Yeah,
so multiple services, multiple usage in Cosmos DB, they're definitely like
for the whole PaaS was heavy analytics that's going
to happen behind it, right? >> Yes. >> Okay. Love it. >> So in terms of what every developer needs
to know coming into Cosmos DB, what I want to do is I
want to separate out, there's some things that are must know and some things
that are nice to know. >> Okay. >> Must know are things like
the partitioning request units, and in the context of IoT, are a TTL and change feet. I explain what these are. Nice to know is things
like, for example, how you tune the index
policy because ultimately, changing the index
policy you can change after the fact it's
very easy to do so, but some of these things
like partitioning if you get that wrong early on, it's a bit harder to change. So what I'll do is I'll
jump into an actual demo, show you what Cosmos DB looks like, and talk through some
of these concepts. >> Let's go. >> So here I'm in the Azure portal. I'm on my own Cosmos DB accounts, and whenever you create
a Cosmos DB this is the first thing you'll
see in portal.azure.com. >> Okay. >> If I go to the Data Explorer, what I've done here is I've set up a device catalog and a
device telemetry store here. Just to show you what
the system looks like, I'm ingesting a bunch of
telemetry into my Cosmos DB. >> Okay. >> What I've done is I've set up
a VM to just pound the database. So what it does is it spins
up a bunch of threads, and in each one of these threads
it runs in a tight loop, just pound the database, write lots and lots of data to it. >> As the PM, you're doing what
the devs don't like which is actually using the service
and trying to hurt it. >> So let's go and run this
basic code real quick. Initially, what I have here is telemetry stores set
up at 10,000 RUs. RUs are just think of
this as our compute unit. So as we scale out, we can scale out to
infinite number of RUs to get you as much
write ingest as you need. So this is going to be the currency
which you think of both from a performance especially a throughput perspective as well as
a cost perspective. >> Okay. >> Now, this is really
good to understand. One of the neat things about
Cosmos DB is let's say I have this system where I'm writing
about 1,000 writes per second, it uses about 10,000 RUs
per second right now, what if I find that, hey, the business needs to go and deploy 2X more sensors or 5X more sensors. One of the cool things about
Cosmos DB is we can actually scale this out without any downtime. That's I think one of the
things that's a little bit unique to this and one of the things you'll
want to actually think about is in terms of using RUs, you want to do it dynamically. So example if you're
doing batch jobs off of Apache Spark on top of Cosmos DB. At the beginning of the batch
job you might ramp up the RUs, do the batch job then at the
end of the job ramp down. So now that we've scaled this up, I believe it was 5X to 50,000 RUs, what you'll see is
this little app here, I think the app is actually
the bottleneck here, but it's going to ramp
up and converge up to 50,000 RUs as it spores up. >> Okay. >> Just like that, I mean, this is
how scaling works in Cosmos DB. >> So you've done that manually, so is there a way to
automate that scaling? >> Yes, you can do it
author programmatic APIs. In fact if you're doing batch jobs, I would have a setup
and a cleanup step that cause our APIs to do
this programmatically. >> Okay, because I can
see that once again in the Azure storage we were
saying the scenarios like I have done many devices
and then I know I'm going to have a specific
event and suddenly I'm going to have a huge amount of
devices connecting or more data actually incoming in which
case actually you will have to have that
scaling that happens. Scaling is also about
optimizing cost, right? >> Yes, it's all about
optimizing cost and performance, both are deeply tied. Basically, I used to think of it as how many compute
units you are buying. The more compute units you buy, the more throughput you'll get but the more cost the solution
will have as well. >> Obviously. >> So deeply tied concepts. So what's neat about this is if
I flip over to the Cosmos DB, the collection, not only are we ingesting things
at a very high rate, let's say you have a
gen one device and you're about to do
your gen two device. Historically, if you've
started your own data store you're thinking about how
to do schema management, but in this case, software moves a little bit faster than hardware. You're probably not
going to go back to all your gen one devices and
add that additional telemetry. So let's say here that I'm
going to add another field, like in this case I might add, let's say, a flow rate
to this telemetry. Let 's say, for this
actually fill out a value. This was just a demo data set. But I have a flow
rate of let's say 50. So now that we've added a new field, we can use the automatic
indexing capabilities inquiry this dynamically on the fly. So here if I look at
the message schema, look at its fields, and look at the flow rate and look at something that's let's say
flow rate greater than 40. Right there, without any
alter tables create index, I can go and add
properties on-demand, query across that in real time and
do so with very little latency. >> I can tell you that as someone
coming from the non-data world, and the IoT world, I'm always scared when I started
playing with databases and indexes and queries,
that reassures me. >> Yeah. >> I like that. >> It makes it super easy to use. So let me pause this workload real quick and I'm going to show
you a second workload. So I'm going to show you a demo here, when we talk about queries,
how fast are these queries? So what I am doing here is
in just a While True loop. I'm going to go and run a query and I'm going to set up a stopwatch, and look at just how fast our
queries are in Cosmos DB. So if I set this up, you'll notice that once the client starts up and allows the topology, catches everything to
know how to initialize, and connect to the database, I can run live queries in it and consistently get under
10 millisecond latency. So when we're talking
about real time, everyone likes to say real time, it's all about how real time. Is it an hour, a minute, a second? In this case we're talking
about milliseconds. >> In addition to the
speed, actually real-time, that's a notion that is
dear to my heart because I used to develop real-time
embedded systems. It's not that much about the speed, it's about the notion that you can predict it's
going to be less than that. You can count on that
service to give you that answer in less
than 10 milliseconds. >> That's correct. >> You guys have SLAs
in place as well. So that means that for
the costumers there is that reassurance that by using Cosmos DB you're going
to get these answers that fast. >> That's correct. So
what we see is scenarios like example in vehicle telematics where we see our users
voting solutions like, hey, when an airbag gets deployed on a vehicle immediately pick up
the phone and call that driver, "Hey man, are you okay?" These are things that
are very time sensitive, I don't want to call you an hour later and that's
the thing, right? If they answer, good, and then I can calm down. If they don't answer then
you can go and live query, "Hey, where are the GPS coordinates? Let me get emergency services
on-site right there, right then." >> Definitely, hence
the notion of hot pass on your lambda architecture. >> Absolutely. So
things you'll want to know in order to
achieve this scale out, Cosmos DB does a lot of
the heavy lifting in terms of scaling out this workload
across a set of machines, but the main thing is
you're going to need to tell Cosmos DB a logical hint, how should I distribute
that workload, and so a lot of people tend
to get this part wrong. When you create a new
container or collection, it's going to ask you for
this thing called a partition key and that's where we're going
to partition that data on. Intuitively, some
people might think, oh, over time I'll have a pretty good distribution of data so they might
choose current time. But what they miss is at
any given point in time all of your sensors have the
exact same current time value. So this loads all of the requests on a single partition that partition
is running blazing high, meanwhile all the other
partitions are left idle. In IoT scenarios, typically, the natural key is going
to be the device ID. Your workload is going to
grow as the devices grow. This is going to be a much
more natural starting point. So understand how to choose
a good partition key. Sometimes if you have devices
that get really large, you might even form a composite key like
concatenating the device ID. Time might not be good
in isolation but is actually a very good candidate
for a composite key. So do the device ID plus the
current year or current month. You might also want to tune
the index policy later on. Indexing, what we do is we can automatically index any new property. If you have really large telemetry
like the sensor payload, if you tune this you'll find that the RU cost for each one of the writes is going to
be much more friendly. What's neat here is instead of a whitelisting approach you can do a blacklisting approach. So from a whitelisting approach
you might index slash star, so index everything
by default but more specific paths like maybe I know I'm never going to query off of this ETag or different
other properties, I can save some RUs by actually
blacklisting a few fields here. >> Okay. >> Then another thing I want to
show you is this Time to Live. IoT, if you're using a hot store
a lot of times what you're doing is maybe it's the last
30 days of telemetry, 90 days, one year that you're really
going to be querying off of, your can turn on a
"Time To Live" and set how much time you want to save data. So for example, if I
want to do 30 days, I'd convert 30 days
times 24 hours in a day, 60 seconds in a minute, plug that how many seconds I
want to save that data for, and this is actually one of
the number one cost savings. In most telemetry
stores you're paying the IOPS for both the write
as well as the delete. >> Yes. >> In this case we actually
did the TTL for free. >> Okay. >> So you only pay for the
writes, deletes come for free. >> For free, got it.
Yeah, so typically in the scenario where you
have these IoT data if you want to have this
hot pass with Cosmos DB, you'd have to think also for the long-term storage I'm
going to use something cheap. >> Yeah. >> That actually just like
going to dump things in there, and because there is going to be
post mortem anyways I don't need the super fast access so I can definitely go on
some cheap solutions. >> Yeah. So you do a
hot store, cold store. The TTL will give you a way of managing retention on the hot store. Today, we tend to see
people use things like storage or data lake
for the cold storage. What's cool coming
soon on our roadmap is Spark API with automatic
archival to our cold stores. So we'll actually have this
as an out-of-box feature. >> Okay. So in parallel, love it. >> Then the last thing
to know is change feed. If you want to do incremental
sourcing of the events, let's say the telemetry coming in. We bring message queue like
semantics to the Datastore. It's not a message queue. It's still a database.
We persist everything, we index everything, but you
can get fast incremental reads. Typically the advantage of using
this approach is if you need a single source of truth
for eventing off of that, you want to avoid the
consistency problem of, "Hey, I successfully wrote to
Datastore 1 but not Message Queue 2 or vice versa," this allows you to get a single
source of truth on all of this. So these are the things I
think everyone should know. >> I think you covered
all of them, loved it. >> So let's talk about
how to get you started. To ease you into this, we've built an IoT solution based off of a lot of how our
users are using the system, and so you have the ingest piece in the top left corner where you're coming through IoT Hub and shoving
the data in the Cosmos DB. We're showing you how to do the hot store cold store pattern and run an Apache Spark job to do
things like predictive maintenance on the bottom left and shove
that into something that can do real-time scoring in AKS and have a web app in that top
powering off of it. We also show you how to do
that event sourcing in the bottom-right so if you want
to do some stream processing, stream analytics and Power BI dashboard or it's an e-mail
alerts when things like, "Hey, the temperature is off,"
we can use logic apps for that. >> Love it. >> So go to aka.ms/CosmosDBlabs, where was the URL again,
CosmosDB-IoT-Lab. >> Okay. >> What you'll see is we
have a GitHub repository. If you go to the IoT
lab we actually show you step-by-step how
to build the solution. >> Awesome. >> Yeah, like what
are you waiting for. >> Andrew, that was a very
fast dump of a lot of information but very valuable for
our IoT developers out there. So bring back the link real quick. >> Sure. >> So that's aka.ms/CosmosDB-IoT-Lab. That's your next step to learn
even more about Cosmos DB. Andrew, thanks a lot for
joining the IoT Show. >> Awesome. Thank you for having me. >> Hope to see you again. >> Thank you for watching. [MUSIC]