IoT Solutions and Azure Cosmos DB

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

>> Today on the IoT Show, Andrew Liu will tell us everything we need to know about Cosmos DB and how to use it in an IoT application. >> This is going to be a really great session. I'm really looking forward to having you attend. [MUSIC] >> Hi, everyone, you're watching the IoT Show. I'm Olivier, your host. Thanks for coming back. Today we have Andrew from the Cosmos DB team. >> Hi, everyone. >> Who was here to tell us everything about Cosmos DB, and why should you care as an IoT developer. Andrew, thanks for coming to the show. >> Yeah, I did, it's awesome to be here. >> Yeah. >> Thank you so much. >> So can you tell us a bit about yourself. Who you are? What is your team doing at Microsoft? >> Sure. I'm a PM on the Azure Cosmos DB team. I've been on the team for about the last 5-6 years now, I've been working on everything from programmability to launching our service from preview to GA to V2. >> Awesome. We were discussing just before starting recording that, your audience, your customer is the developer using Cosmos DB, right? >> Yes. >> Okay. So you're going to tell us everything crash course on what is Cosmos DB. We are IoT guys. >> Yeah. >> So we really don't know much about what is going on, and there's so many storage solutions for different reasons on Azure that we really want to understand what is Cosmos DB? When should we use Cosmos DB especially in the context of IoT scenarios? >> Oh yeah, absolutely. So yeah, let's dive right in. So the way I like to think about this we have this house diagram and it's like laying out the foundation, and then looking at it as like a layer cake. But in a nutshell, Cosmos DB is a distributed database. What I mean by distributed database is it runs on a cluster of machines as opposed to a single machine. >> Okay. >> So that's going to be a core concept that you see scale out, and really in a nutshell, we scale out on two-dimensions, we both shard and partition the database that allows us to get ability to ingest really large amounts of data as well as a really high rate of data. We also replicate the data that gives us a lot of redundancy, so that if a machine dies, the system fails over seamlessly, and is highly available. We also see people use the replication to bring CDN like capabilities to a database where rather than rely on caching, we're doing active/active live replication across regions, we can even use on multi-master. >> Okay. >> So beyond availability, people are doing this from a really low latency perspective. So this gives us, flushes out the bottom core foundation, so I like to make an analogy, think of Cosmos DB almost like do you think of a.NET,.NET's got a backend. >> Yeah. >> That backend is going to run on some bytecode, it's got a runtime and I can expose it through a various different ways of programming against it, whether it's C# or F#. >> Okay. >> Same thing in Cosmos DB, we have a backend that uses the replication for turnkey global distribution, that partitioning for elastic scale out of storage and throughput. >> Okay. >> We do this as a fully managed service with their own resource governance there to give you a predictable low latency, as well as from a replication standpoint one of the things that really makes the global distribution work is the fact that we can do this in a multi-master way with consistency guarantees. Then finally, as a fully managed service, we have a bunch of comprehensive SLAs, that'll actually lead into why people are using this. But think of this as the backend then we have a frontend, that finally allows us to go and understand different protocols ranging from our own SQL API to open-source APIs like the MongoDB, and Cassandra, and Gremlin APIs. We can target existing Mongo and Cassandra applications to Cosmos CD without really having to rewrite any code. >> Okay. So like ingest super-fast lots of data and [inaudible] the data. Those resonant to me as an IoT developer, right? >> Yeah. So we tend to see this in IoT and two core scenarios, one is the device telemetry. >> Yeah. >> If we all know this from a big data perspective, people like to joke like oh Facebook, or Twitters, k, so many users, but honestly like, "I'm not that active on social media. I might tweet handful of times a day." >> Yeah. This IoT Show episode, right? >> Oh, yeah. I definitely be tweeting about this one. >> You should better. >> I mean, looking at our devices, they're going, here's my state, here's my state, here's my state, here's my state. in 10 seconds, it's produced more data points than I can as a human do all day. >> Yeah. >> When you have fleets of tens of thousands of this. So for example one of our users is ExxonMobil. >> Yeah. >> They're tracking 50,000 different oil wells all around the US making sure that the oil wells are healthy. They're tracking the flow rate, the temperature, making sure everything is healthy. You basically had this problem I have lots and lots of writes per second coming in. >> Yeah. >> There's several ways of solving that. Historically, what you would do is if your database can't cope with that, what you do is you put a stream processor and take many data points and turn that into one data point and aggregate, right? >> Yeah. >> But the challenge of that is fidelity of the data. What happens if something is normally happens? >> Yeah. >> Like the max or the average temperature goes up and you're wondering, where's the rod telemetry? Can actually get a fast index data store that can ingest this semi-structured sensor data at a high rate and provide queries? So this is why we see device telemetry land on Cosmos DB a lot, it's really just the partitioning and automatic indexing on what makes it pop. >> Yeah. >> The other side of it is we see a lot in device catalogs, people are using our replication. >> Okay. >> If you look at Cloud services, we typically look at availability by measuring the number of nines. So some data stores have three nines, some have four nines, some have five nines. >> Yeah. >> For Cosmos DB, we're actually at the very high end of that, the five nines. >> Okay. Nice. >> So if you're doing things we have big automotive manufacturers where an assembly line, they have little robots running around and they're driving commands to each one of these robots. To flush that state, what they wanted to do was they wanted something that was extremely highly available, for that way the assembly line never stops. From that catalog perspective, we typically see Cosmos DB use that mostly for its extremely high availability. >> Academic sense. From the IoT developers' perspective, Cosmos DB will be actually sitting in the backend, you would ingest the data from the devices through IoT Hub, then you will truly transform the data using functions, some analytics, and then put that into Cosmos DB, right? >> Right. So here on the example on the architecture diagram because it's actually from ExxonMobil. >> Okay. >> Their oil wells, their gas compressors, their pipelines, everything comes into Azure IoT Hub. Typically, you might position like a hot store and cold store, that way you get a blend of both fast interactive queries on something that's very hot and Cosmos DB is typically the sorting layer there. >> Got it. >> Likewise, if you're using something like Azure Time Series Insights, I mean this is more for interactive when you have a small dev team, feeling like reporting BI on it, but not really driving a programmatic REST API to serve your mobile and web app. >> Okay. Got it. Yeah, so multiple services, multiple usage in Cosmos DB, they're definitely like for the whole PaaS was heavy analytics that's going to happen behind it, right? >> Yes. >> Okay. Love it. >> So in terms of what every developer needs to know coming into Cosmos DB, what I want to do is I want to separate out, there's some things that are must know and some things that are nice to know. >> Okay. >> Must know are things like the partitioning request units, and in the context of IoT, are a TTL and change feet. I explain what these are. Nice to know is things like, for example, how you tune the index policy because ultimately, changing the index policy you can change after the fact it's very easy to do so, but some of these things like partitioning if you get that wrong early on, it's a bit harder to change. So what I'll do is I'll jump into an actual demo, show you what Cosmos DB looks like, and talk through some of these concepts. >> Let's go. >> So here I'm in the Azure portal. I'm on my own Cosmos DB accounts, and whenever you create a Cosmos DB this is the first thing you'll see in portal.azure.com. >> Okay. >> If I go to the Data Explorer, what I've done here is I've set up a device catalog and a device telemetry store here. Just to show you what the system looks like, I'm ingesting a bunch of telemetry into my Cosmos DB. >> Okay. >> What I've done is I've set up a VM to just pound the database. So what it does is it spins up a bunch of threads, and in each one of these threads it runs in a tight loop, just pound the database, write lots and lots of data to it. >> As the PM, you're doing what the devs don't like which is actually using the service and trying to hurt it. >> So let's go and run this basic code real quick. Initially, what I have here is telemetry stores set up at 10,000 RUs. RUs are just think of this as our compute unit. So as we scale out, we can scale out to infinite number of RUs to get you as much write ingest as you need. So this is going to be the currency which you think of both from a performance especially a throughput perspective as well as a cost perspective. >> Okay. >> Now, this is really good to understand. One of the neat things about Cosmos DB is let's say I have this system where I'm writing about 1,000 writes per second, it uses about 10,000 RUs per second right now, what if I find that, hey, the business needs to go and deploy 2X more sensors or 5X more sensors. One of the cool things about Cosmos DB is we can actually scale this out without any downtime. That's I think one of the things that's a little bit unique to this and one of the things you'll want to actually think about is in terms of using RUs, you want to do it dynamically. So example if you're doing batch jobs off of Apache Spark on top of Cosmos DB. At the beginning of the batch job you might ramp up the RUs, do the batch job then at the end of the job ramp down. So now that we've scaled this up, I believe it was 5X to 50,000 RUs, what you'll see is this little app here, I think the app is actually the bottleneck here, but it's going to ramp up and converge up to 50,000 RUs as it spores up. >> Okay. >> Just like that, I mean, this is how scaling works in Cosmos DB. >> So you've done that manually, so is there a way to automate that scaling? >> Yes, you can do it author programmatic APIs. In fact if you're doing batch jobs, I would have a setup and a cleanup step that cause our APIs to do this programmatically. >> Okay, because I can see that once again in the Azure storage we were saying the scenarios like I have done many devices and then I know I'm going to have a specific event and suddenly I'm going to have a huge amount of devices connecting or more data actually incoming in which case actually you will have to have that scaling that happens. Scaling is also about optimizing cost, right? >> Yes, it's all about optimizing cost and performance, both are deeply tied. Basically, I used to think of it as how many compute units you are buying. The more compute units you buy, the more throughput you'll get but the more cost the solution will have as well. >> Obviously. >> So deeply tied concepts. So what's neat about this is if I flip over to the Cosmos DB, the collection, not only are we ingesting things at a very high rate, let's say you have a gen one device and you're about to do your gen two device. Historically, if you've started your own data store you're thinking about how to do schema management, but in this case, software moves a little bit faster than hardware. You're probably not going to go back to all your gen one devices and add that additional telemetry. So let's say here that I'm going to add another field, like in this case I might add, let's say, a flow rate to this telemetry. Let 's say, for this actually fill out a value. This was just a demo data set. But I have a flow rate of let's say 50. So now that we've added a new field, we can use the automatic indexing capabilities inquiry this dynamically on the fly. So here if I look at the message schema, look at its fields, and look at the flow rate and look at something that's let's say flow rate greater than 40. Right there, without any alter tables create index, I can go and add properties on-demand, query across that in real time and do so with very little latency. >> I can tell you that as someone coming from the non-data world, and the IoT world, I'm always scared when I started playing with databases and indexes and queries, that reassures me. >> Yeah. >> I like that. >> It makes it super easy to use. So let me pause this workload real quick and I'm going to show you a second workload. So I'm going to show you a demo here, when we talk about queries, how fast are these queries? So what I am doing here is in just a While True loop. I'm going to go and run a query and I'm going to set up a stopwatch, and look at just how fast our queries are in Cosmos DB. So if I set this up, you'll notice that once the client starts up and allows the topology, catches everything to know how to initialize, and connect to the database, I can run live queries in it and consistently get under 10 millisecond latency. So when we're talking about real time, everyone likes to say real time, it's all about how real time. Is it an hour, a minute, a second? In this case we're talking about milliseconds. >> In addition to the speed, actually real-time, that's a notion that is dear to my heart because I used to develop real-time embedded systems. It's not that much about the speed, it's about the notion that you can predict it's going to be less than that. You can count on that service to give you that answer in less than 10 milliseconds. >> That's correct. >> You guys have SLAs in place as well. So that means that for the costumers there is that reassurance that by using Cosmos DB you're going to get these answers that fast. >> That's correct. So what we see is scenarios like example in vehicle telematics where we see our users voting solutions like, hey, when an airbag gets deployed on a vehicle immediately pick up the phone and call that driver, "Hey man, are you okay?" These are things that are very time sensitive, I don't want to call you an hour later and that's the thing, right? If they answer, good, and then I can calm down. If they don't answer then you can go and live query, "Hey, where are the GPS coordinates? Let me get emergency services on-site right there, right then." >> Definitely, hence the notion of hot pass on your lambda architecture. >> Absolutely. So things you'll want to know in order to achieve this scale out, Cosmos DB does a lot of the heavy lifting in terms of scaling out this workload across a set of machines, but the main thing is you're going to need to tell Cosmos DB a logical hint, how should I distribute that workload, and so a lot of people tend to get this part wrong. When you create a new container or collection, it's going to ask you for this thing called a partition key and that's where we're going to partition that data on. Intuitively, some people might think, oh, over time I'll have a pretty good distribution of data so they might choose current time. But what they miss is at any given point in time all of your sensors have the exact same current time value. So this loads all of the requests on a single partition that partition is running blazing high, meanwhile all the other partitions are left idle. In IoT scenarios, typically, the natural key is going to be the device ID. Your workload is going to grow as the devices grow. This is going to be a much more natural starting point. So understand how to choose a good partition key. Sometimes if you have devices that get really large, you might even form a composite key like concatenating the device ID. Time might not be good in isolation but is actually a very good candidate for a composite key. So do the device ID plus the current year or current month. You might also want to tune the index policy later on. Indexing, what we do is we can automatically index any new property. If you have really large telemetry like the sensor payload, if you tune this you'll find that the RU cost for each one of the writes is going to be much more friendly. What's neat here is instead of a whitelisting approach you can do a blacklisting approach. So from a whitelisting approach you might index slash star, so index everything by default but more specific paths like maybe I know I'm never going to query off of this ETag or different other properties, I can save some RUs by actually blacklisting a few fields here. >> Okay. >> Then another thing I want to show you is this Time to Live. IoT, if you're using a hot store a lot of times what you're doing is maybe it's the last 30 days of telemetry, 90 days, one year that you're really going to be querying off of, your can turn on a "Time To Live" and set how much time you want to save data. So for example, if I want to do 30 days, I'd convert 30 days times 24 hours in a day, 60 seconds in a minute, plug that how many seconds I want to save that data for, and this is actually one of the number one cost savings. In most telemetry stores you're paying the IOPS for both the write as well as the delete. >> Yes. >> In this case we actually did the TTL for free. >> Okay. >> So you only pay for the writes, deletes come for free. >> For free, got it. Yeah, so typically in the scenario where you have these IoT data if you want to have this hot pass with Cosmos DB, you'd have to think also for the long-term storage I'm going to use something cheap. >> Yeah. >> That actually just like going to dump things in there, and because there is going to be post mortem anyways I don't need the super fast access so I can definitely go on some cheap solutions. >> Yeah. So you do a hot store, cold store. The TTL will give you a way of managing retention on the hot store. Today, we tend to see people use things like storage or data lake for the cold storage. What's cool coming soon on our roadmap is Spark API with automatic archival to our cold stores. So we'll actually have this as an out-of-box feature. >> Okay. So in parallel, love it. >> Then the last thing to know is change feed. If you want to do incremental sourcing of the events, let's say the telemetry coming in. We bring message queue like semantics to the Datastore. It's not a message queue. It's still a database. We persist everything, we index everything, but you can get fast incremental reads. Typically the advantage of using this approach is if you need a single source of truth for eventing off of that, you want to avoid the consistency problem of, "Hey, I successfully wrote to Datastore 1 but not Message Queue 2 or vice versa," this allows you to get a single source of truth on all of this. So these are the things I think everyone should know. >> I think you covered all of them, loved it. >> So let's talk about how to get you started. To ease you into this, we've built an IoT solution based off of a lot of how our users are using the system, and so you have the ingest piece in the top left corner where you're coming through IoT Hub and shoving the data in the Cosmos DB. We're showing you how to do the hot store cold store pattern and run an Apache Spark job to do things like predictive maintenance on the bottom left and shove that into something that can do real-time scoring in AKS and have a web app in that top powering off of it. We also show you how to do that event sourcing in the bottom-right so if you want to do some stream processing, stream analytics and Power BI dashboard or it's an e-mail alerts when things like, "Hey, the temperature is off," we can use logic apps for that. >> Love it. >> So go to aka.ms/CosmosDBlabs, where was the URL again, CosmosDB-IoT-Lab. >> Okay. >> What you'll see is we have a GitHub repository. If you go to the IoT lab we actually show you step-by-step how to build the solution. >> Awesome. >> Yeah, like what are you waiting for. >> Andrew, that was a very fast dump of a lot of information but very valuable for our IoT developers out there. So bring back the link real quick. >> Sure. >> So that's aka.ms/CosmosDB-IoT-Lab. That's your next step to learn even more about Cosmos DB. Andrew, thanks a lot for joining the IoT Show. >> Awesome. Thank you for having me. >> Hope to see you again. >> Thank you for watching. [MUSIC]

Info

Channel: Microsoft Developer

Views: 3,302

Rating: undefined out of 5

Keywords: Microsoft, Developer, Azure, Cosmos DB, Cloud storage, IoT, internet of things, azure iot, iot cosmos db, iot solutions

Id: QKdonfbV_0w

Channel Id: undefined

Length: 20min 22sec (1222 seconds)

Published: Mon Nov 18 2019