AWS re:Invent 2018: High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R1)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right well good afternoon Seifer noon we're going to talk about streaming data so my name is Alan Mukesh I am a Solutions Architect here at AWS and I work primarily in our streaming data space which primarily is Kinesis I work with a lot of customers who do other streaming data technologies as well such as kafka but for the purposes of today it's gonna be focused on on Kinesis with me is gabriel como from Comcast and he is going to talk to you all about how they have leveraged high-speed performance dating 4/4 performance data at Comcast and some of the learnings that they found in the in the platform that he's created so I'm gonna jump in some of the details and then we'll invite Gabriel to come up here in a few minutes to jump into the Comcast details so this is a quick view of what we'll be talking about in the next 45 to 50 minutes we'll hopefully have some time at the end for Q&A so let me get there there's a couple of mics feel free to get behind the mics and ask some questions of us but what we'll do is a streaming data overview I want to make sure just we're all on the same page for what we're talking about with regards to streaming data we'll do a very quick interactive demo we're gonna do that here at the beginning just to kind of highlight what it is that we're talking about with streaming data then we'll do it a very quick introduced introduction to Amazon Kinesis and the four services that make up the Kinesis set of services and then we'll start talking about consuming from Kinesis streams because there's two different ways you can do this and each them have their pros and cons so I want to make sure we jump into that and you understand which model might be best for you and then Chris will come up I'm sorry Gabriel will come up and he'll talk about Comcast and what they have done in the headwaters platform he's generally built a platform for streaming data across all of Comcast and he will dive into the details of considerations for scaling your data streams and then the impact that enhanced fan-out which is one of the two different consumer choices that you get when you use Kinesis stream what impacts that had on their architecture all right let's talk about the value of data and this is this is why we stream data but let's just quickly talk about the value of data and how that data loses value quickly over time on this chart and this comes from a paper produced by Forster the author's name is Mike gallery it's a good paper called perishable insights where he talks about the insights into your data is perishable meaning over time it decays or the value of what you can get out of that data diminishes over time so in this chart we look at on the left side the things called time critical decisions those blocks time critical decisions typically happen in real time so you know milliseconds after the data was created or in seconds or minutes this is all going to be very dependent on your application but in some cases seconds or minutes is fast enough for your use case but can still be considered a time critical data piece traditional batch processes some of the historical data that you might use there might have they generally lose value over time but still have a lot of value just not valued that you can generally use to make a time critical decision a very simple example and probably some of you have experienced this since you got to Las Vegas you swipe your card somewhere and you get almost an immediate text message or notification from your credit card provider right they've got some kind of real time analysis happening on all the transactions in their cards and they're looking for fraudulent activity they might say oh suddenly you're in Las Vegas and your card has been swiped at this ATM to get out a bunch of cash or you use it at this restaurant or whatever it might be and they've alerted you and you can reply yes it's good or you know that wasn't me so that's like a real-time decision that they want to know about because if it wasn't you obviously they want to take care of it if it was you then all is good but they still do batch processes right they still send you an invoice at the end of the month and you still have to pay whatever it was you charged on your car now they could have waited until the end of the month to say out of these hundred transactions three of them look anomalous and you can then kind of go through all the different transactions and decide which ones are good or bad but that's not good for them not good for you so it's much more important for them to make that time critical decision on that data as soon as it gets created so how is this happening through streaming those transactions get streamed streaming means ingesting the data as soon as it's get soon as it gets generated processing that data on the fly and then doing some kind of analysis doing anomaly detection on it some kind of machine learning doing alerts or actions as soon as you learn something from that data so this whole thing is what we're calling streaming data and there's different technologies to enable this but the concept is streaming data who can remember the day where you would probably put data into a database from your application and then you didn't do any learnings from it until a batch process ran that night or a batch process ran at the end of the week to move that data into some warehouse where you could then go run reports I'm pretty sure probably probably pretty much everybody could put their hand up to say yeah you used to do that or you still do it because you know like I said there's still plenty of use cases where that applies but to get those real-time insights it's important to ingest that data as soon as it gets generated and then analyze it immediately so common use cases of streaming these are just a few and I'm not going to talk about them all in detail but the two that we see very very popular our log analytics feeding into data lakes and then I'll say IOT analytics but I'm going to generalize IOT analytics it's not quite I or T devices like not doesn't necessarily have to be temperature sensors or pressure sensors but I will call an IOT device just a connected device like your phone a game that's running on your phone these are connected devices that are always sending data so between those three on the right they generate a ton of streaming real-time data that our customers are using to get real-time insights out of them and then the industrial automation space we have plenty of customers in the factories who are using streaming data to get insights into what's happening across their operations in near real-time imagine they want to know if one of their lines is down or is taking longer to process some product than it typically does they want to know that right away you have to go in and fix it so that's a good example there so how do we do this at Amazon we have four different services for streaming data this is probably old news to do maybe quickshow hands who are already using a Kinesis service in production today so it's about maybe a third of the room but that's 3 or 2 1/2 of the room so very good so I'm not going to get into then a ton of details about what these services do but I'm going to tell you some of the differences between and why you might choose to use one over the other I'll start on the left Amazon Kinesis video streams this is a relatively new service we've launched it at reinvent last year but it enables you to ingest and process binary encoded time series data so that could be video audio lidar radar it's intended we call the video because video is kind of the number one use case but there's other binary formats that can easily be ingested into video streams the other three services and what we're going to focus on mostly today is our data set of services so these are the types of services that are just receiving either unencoded or encoded data that's coming off of your systems could be log data it could be application event data being sent into Kinesis so you can get some of that real-time analysis that we just discussed so the three services on the right data streams data firehose and data analytics can all work together to get you those insights Kinesis data streams to compare and contrast it with firehose I get that question a lot Kinesis data streams I'll tell you the biggest question that you can ask yourself to determine should I use streams or should I use firehose is this Kinesis data streams what requires you to typically write custom code to get data from the stream so you have a data producer that puts data into the stream it's going to sit there in a buffer by default 24 hours you can extend it up to seven days and then your consumers one or more are going to take the data off the stream and do something with it so in that context the stream is just a buffer and you have to write the custom code to take the data off the stream and apply your business logic to it the stream doesn't care if the data was used or not by your consumer once that window expires of 24 hours which is the default the data's gone from the buffer so it's a little bit different than a queue like with a queue if you're using something sqs you can pop a message off the queue and it's no longer going to be consumed by any other queue consumers with a stream it's going to sit there until it expires now with case of streams you're also writing your own code to take the data off of the stream and that contrasts a little bit with firehose firehose yes the the front end how you get the data into the firehose is very similar it's going to be an agent perhaps that's running on your localhost that's taking log files and streaming the logs into the firehose or you're using the SDK to write messages directly into firehose but the biggest difference is firehose has a managed consumer to take the data off of the streams and put it into one of four different destinations you can put it into s3 put it into Splunk put it into redshift or put it into Amazon Elastic search service so when you ask yourself I want to stream data I don't really want to do much with it at least not yet I just want to put it into s3 so I have a durable store really relatively inexpensive spot to put my streaming data firehose firehose by far and away is going to be the easiest way to do it and it's probably should be the first consideration you make there's a lot more complexities there you can do things like data transformation with firehose using lambda so there's a lot of more complexities that I'm kind of glossing over but you know if you're asking yourself between streams and firehose and you ultimately say I really just want to stream the data and put it into elasticsearch so I can do log analytics and elasticsearch and i don't really have to transform my data in the middle consider firehose the last service on your right Kinesis data analytics this service gives you the capability to do real-time analysis on that data before it even persists in any destination so it's basically an application that can listen to the streaming data get some insights that you have to code you have to tell it what insights you want so we have two versions now Kinesis data analytics one is sequel base so you can write some sequel code to do things like tell me how many times the this event has occurred in the past two hours I'll give an example we work with an e-commerce customer not not Amazon a different one and they use Kinesis data analytics to find out what the top-selling products are in a 30 minute window on their website so they stream all the order data through Kinesis streams they run Kinesis data analytics which inspects that streaming data and they run an aggregation every 30 minutes and then they say that out to a dashboard so their business users can go in and look at the most popular ordered products over those windows and then they can make decisions about how they want to increase promotions on the website for other products that they want to perhaps get higher in the sales rank so that's just a quick overview of those three services and we'll move now into a bit more detail after we do a very fast demo to kind of put it all into perspective ok so I need you all this is an interactive demo I need your help I want you to pull out your phones and go to that URL don't do it on your computer you could it's just not going to be as valuable please use your phone I'm leaving it up for one second give you guys a chance to pull it up what you should see is a quadrant or the number of boxes or sorry quadrant with a ball you can move the ball around and I have a window somewhere there we go okay so this is all happening in real time this is a real demo I'd like everybody to try and move the ball into quadrant a move the ball into quadrant a and then hold it there and we should see within three to four seconds this should all show up in a yeah very cool we're also capturing some other numbers so about five most five hundred yeah over five hundred of you now have participated that's unique on the top and then it's numbered the raw data about how many art in each quadrant I said a but we have a couple of a couple of non conformists that's okay so that's all go to C let's all move it to see hold it there should be three or four seconds we should jump it there we go come on there we go okay so pretty cool so this example just highlight in kind this could be anything right this could be your business KPIs this could be log data this could be errors that are streaming from your application error logs but you can take that data use Kinesis analytics use Kinesis to ingest it get very real insights out of it and put this on a dashboard for your business users of course per one second might not be necessary for some of those types of things I just mentioned but you could of course change the window to be every you know every every few minutes every hour this is just a good one to show you for how fast we can do it last thing we're also capturing device data so iOS to Android not uncommon slightly higher in the iOS no Windows Phone users there is a point funny point on that because I did this at a previous talk not here at read invent but it was at a different session and it did show up there was a Windows Phone and I asked the person in the room to put up their hand to volunteer but they did not alright let's go back to here all right cool alright I'll figure this out here there we go okay this is the architecture of what you guys just did I don't have a whole lot of time to spend on this we have to kind of jump through and get to the next section but the nice point about that whole demo is it was all swivel 'us so there's no ec2 instances and mixed nothing to manage on the backend for servers there was all just Kinesis and lambda and DynamoDB and of course some static content in s3 for the JavaScript and HTML so a very simple application and the fact it can even be improved because since this demo was built there have been changes to the Kinesis analytics product I don't know if I have a laser pointer that will actually work anyway you can see here after the Kinesis analytics application and before the lambda function there's this extra stream this Kinesis aggregate stream that can be removed now in fact there's a feature that's been released that allows Kinesis analytics to invoke a lambda function with the results of its analysis so I could take that piece out and it's even a simpler architecture but what we're really here to talk about today is high performance data streaming and how you can best do that on Kinesis particularly with Kinesis data streams before we do that we talked we want to talk a little bit about producers and consumers because we're primarily going to discuss consumers today but I don't want to make sure we're all on the same page there's plenty of ways to produce data to Kinesis plenty of ways to consume I am definitely not going to go through each of these the point I want to make on the consumer side of it is the Kinesis client library and AWS lambda those are the two that can most benefit from some of the features that we're about to discuss the other ones that you see here what we're looking at here like Apache flank Apache storm pachi spark they can also benefit from some of the new features but it will require the light the open source libraries to use some of the new API is that we have created that I don't believe they have yet so talking about consumers I have a clicker here I should use it standard consumers this is what the vast majority of stream consumers on AWS are using right now and that's because it's been the only way to do it up until about August so what is a standard consumer actually before I get a jump into that take a look at the the the kind of a white rectangle in the middle that is the representation of a stream that is made up of shards if you're not familiar with shards a shard is essentially a unit of scale in a stream it can accept 1 Meg per second or a thousand records per second so when you do your analysis of a Kinesis stream and you have to determine how many shards are sorry how much data you want to ingest into that stream that determines how many shards you'll have to put into that stream so if your use case says that you're going to stream about 4,000 records per second you will need 4 shards in your stream so a shard is a unit of scale and a stream a consumer application has multiple threads each thread gets data from a shard via a get records call this is largely abstracted from you if you're using one of our libraries called the KCl the Kinesis client library or if you're using lambda to consume the get records operation is abstracted by those libraries or the service but it's important to understand what's going on so that when we if you consider the the KCl 2.0 which is the next version or the newest version of lambda so you can use enhanced fan out of Kinesis which gives you that lower latency so this is what's going on ket records is made and data is returned come on my clicker might be dead oh you have to press it a whole bunch times okay so the get record the get record operation can only be made five times per second on a single shard that's a limitation in the service and it's not something that can be changed and data can be returned at a rate of two megabytes per second per shard also that's a limit from the service not something that can be changed so with that the fastest that your consumer can get data from the stream is every 200 milliseconds this is a polling operation that your consumer application has to implement or the KCl will do it for you on your behalf but get records every 200 milliseconds because you have to stay below that five requests per second for the get records call if you go beyond that if you lowered it to one every 100 millisecond of a polling window that's ten times per and five of those would get throttled so the service that Kinesis service would return an exception and your consumer would have to have to swallow that exception do a little bit of a back off and retry so you know 200 milliseconds is not bad in many cases that's good enough for real-time like I said earlier all depends on your you and your company's definition of real-time but it's not bad but what if we do this what if you have five consuming applications all of which have you know different business logic perhaps different teams in your organization that want to do something with the data now each of them can only ask for can only get 400 kilobytes of data each because it's a two Meg per second egress limitation on that shard and have five so that means they can only get 400 KB and they can only ask for data once per second because with five of them they don't want to go beyond the five requests per second so they can all only ask for data now once per second so now your latency is going to go up from the previous use case with one consumer it could ask every 200 milliseconds these can now only pull every single second so your latency for how quickly your consumer can respond to data in the stream it goes up so now with enhanced fan-out we've changed that model quite considerably this is a new feature like I said launched in August and we've introduced some new API to take care of a lot of the issues that we just described so the few new API is the first point to make it's the sub title on here as consumers no longer pull messages are pushed to your consumers as they arrive so the server will push them over a long-running HTTP to request that has been made by the consumer over a subscribe to shard API call so the consumer initiates subscribe to shard that will remain open for up to five minutes it uses HTTP 2 so how many people here are familiar with WebSockets a lot of you that's great so this I mean this is not WebSockets but if you know what WebSockets does this is very very similar it's just managed by the protocol itself it's not a layer on top of HTTP 1 1 it's managed by the HTTP 2 protocol that allows the server in this case the Kinesis service to send data out of order to the client without having to be asked for each packet so come on there we go so this it's hard to see there's a little curly brace in between the data producer and the stream that's our piece of data that's coming in the shard will persist it and then it will send it to the consumer application right away I'm gonna have to go back to the keyboard and then it will keep doing it over and over and over again for as long as data is being sent to this to the stream so that means your data is now your story your consumer is now getting that data very very frequently on the order of 50 to 70 milliseconds active data was written was being delivered to your consumer contrast that with the best case and the get records operation was about 2 to 300 milliseconds we're now down to about 50 milliseconds latency that's much better that's only with one consumer using the get Records model when you had those five consumers using the get Records model right you're talking about one one request per second so much longer Layton sees the other big benefit is that each consumer gets its own dedicated two megabytes per second of egress so each consumer has to say register stream we create internally this pipe the fo pipe that gives us its own dedicated throughput and so now when it comes back and calls subscribe to shard we give it its own two Meg's per second egress limit just for that one consumer and so it's no longer having to contend for that two Meg's per second per shard limitation that previously existed with the old API I can now come in and add more and more consumers the data will be pushed to the consumers and they can get the full throughput from the stream to the consumer so no more contention for two Meg's per second no more contention for five reads per second so it really enables you to add more and more consumers to a stream and not have to worry about contending for resources where you did have to definitely consider that in the older model we're doing get records so when should you use standard consumers that was the first one that's calling get records if your total number of consumers is low and your consumers aren't particularly latency sensitive then consider using the older model our library the KCl we have two versions one point eight point X and 2.0 point x one point eight is the older one we've upgraded to two I would recommend you all use 2.0 because it supports both you just have to implement different interfaces for the consumption model that you want to implement but if your consumers aren't latency sensitive and you have a low number of consumers consider using the standard consumers and it's also going to be cheaper there are cost dimensions with the enhanced fan-out feature so we will charge you for those pipes these efo pipes on the previous slide and there's a per gigabyte data egress now in that model so it's cheaper to use the old consumers which is why if you have very few consumers with low latency requirements use the old way when do use enhanced fan-out multiple consumers let's say more than three and you have very low latency requirements for that if you have low latency requirements less than 70 milliseconds then enhanced fan out is how you're gonna get there and that does it for me so I'm gonna hand it over to Gabriel now and he's going to take you through what they did at Comcast sorry my apologies is my screen up here isn't different there we go thank you all right so 100% of the Canisius users that I have worked with so far or either overpaying for the kinases streams or facing technical difficulties because there's trim is under scaled so how do you scale a kinases stream and how does the new enhance fan-out consumer changes the pictures hi my name is Gabriel como and I lead streaming data platform at Comcast I'm really excited to be here I hope you guys are enjoying the AWS reinvent 2018 as much as I am and so I will split the rest of this session into three parts first I'll give you some background and some context about the streaming data platform that we've built at Comcast then we'll see five considerations to think about when considering scaling in kinesis dream and finally we'll see how the consumer fan-out the new feature changes this picture so before we start I've got a good news and bad news the bad news is that I left a lot of math on the slides the good news is that I will not be boring you through those instead I will focus on the principles and the logic to arrive to those formulas and I left them so that if you want to dig further into this topic after the talk you have the material to do so the second good news is that we actually implemented those formulas into a calculator that were recently open sourced and I'll come back to that towards the end of the demo of the session sorry all the time we came to call the concept of managing the stream data essentially a streaming platform Jai Creve J Krebs wrote that in a blog article in 2015 actually that first link that you see on the side and Joe Krebs and his team are the one who built Apache Kafka while they were working I at LinkedIn and I really like this quote because it marks to me a transition in the industry were streaming really to goth it's it's to me a paradigm shift because data is not a table in some database anymore data set is alive and changing and it has a whole new dimension when you consider streaming so I added two articles that are excellent nicing fundamental for an engineer working on on streaming I would highly recommend you to go through those if you haven't read them already so we called our streaming data platform at Comcast headwaters and there are three objectives that we try to accomplish the first one is that we want to decouple the data producer from its consumers so the teams that writes the applications that interact with our platform or from different divisions within Comcast but CB even from different companies and therefore it's important to sorry so because they are from different divisions and companies they have different objectives and different budgets so it's really important to us to have as as loose as a coupling as we can between those various applications also the platform acts as a buffer in case one of those application isn't available for some time it doesn't affect the other application or at least as little as possible and also it formalizes the data exchanged between those applications so not only from a technical point of view in terms of API and data sterilization but also in terms of processes such as data governance and data security the second group of objective is to assist the data team which their streaming needs so we help them scale the kinases or the data stream we help them manage the data retention and onboard consumers finally and really importantly we want to foster real-time data exchange within the company that's really important to me because I believe that data is one of the most valuable asset that a modern company can have so we want to lower the bar of entry as much as possible so that teams can share the data that they have so to that end we manage the metadata about the stream to answer questions such as who when how and also the the data set whether structured or unstructured or associated with a schema and that schema evolves over time so we manage that so that the consumer team can make sense of the bytes that flows through our platform finally we implemented what I call a fair cost model in which the teams that benefit from the data are the one that financed the infrastructure to power those those data exchange in proportion to what they use I don't have time to dig further into this but this is a fascinating topic actually to give you an idea of the scale of our platform we have hundreds of data streams and combine to generate millions of of events per second or gigabytes per second so this is a high-level overview of the architecture of headwaters it's composed of three main groups of component the first one on the upper left is the CICU pipeline created a that uses the AWS services that you typically use to build a cloud native application it creates the headwaters control plane which is essentially based on that pattern with API gateway Linda and dynamodb we use other services where it makes sense such as SSM or is our SES and also for troubleshooting and monitoring we use GLAAD trail cloud watch and x-ray now this control plan is the the one that creates the datastream so obviously the data streams are mostly focused around kinases but also we create an SNS topic that we associate with each one of the data stream so that the stakeholders of the stream have a communication channel to communicate about that particular stream so the stakeholders of the stream are us the platform the data producer and the data consumers so for example if we detect that there's no data in coming into the particular stream we may raise an alert on that topic or if the data producer is going is planning to perform some maintenance on the the producing application that may impact the availability of the data in a that team may use that topic to communicate that so in order to understand better how our platform works this is a flow diagram of how a data producer is going to create a stream so first the data producer requests the creation of the stream to Arlanda function we add way the nanda function is going to create the appropriate resources the kinases stream and the SNS topic and also is going to register the user to that SNS topic then it's going to use I am to grant the appropriate permission to the user and finally we will send a welcome email with further instructions on how to proceed at this point the data producer can write data to the kinases stream can monitor the streams using cat watch metrics and communicate with the other stakeholders so now let's see five considerations that I think you should keep in mind when scaling a kinases stream number one the data producer limit so and I just went through that do you remember what those two limits there are two limits hard limits that you cannot change about the data producer do you remember what those are the two limits yeah so the one megabyte per second per shard and 1,000 put requests per second Shawn so let's see how this translates into a real-life scenario a real-life use case Comcast users can watch TV live TV or on-demand movies through the Internet that's video over IP and there's a number of devices that they use to do that whether mobile devices such as iOS Android desktop application or IP set-top boxes all those devices reports analytics data back to our platform in a single stream and this is one of the most valuable streams that we have within Comcast there are literally a dozen different teams that are consuming from this data set and the range from business intelligence group wanted to report usage of the of the service to operations team who needs to monitor specific part of the system to data scientists to data engineer who may process the data as it streamed through so they may filter unread or session eyes the data and they usually send the data back onto our platform so this is what the actual usage of the stream looks like you see the bandwidth or throughput over four days so as you may expect you have the the usage going up and down as people our users watch more TV throughout the day the users increase and it Peaks around dinnertime and a little bit after that and then falls back down when users go to sleep so with a little bit of history you can easily compute the average and the maximum vent weighs what I would like to draw your attention to or those many ways search that you see and those happens because we have hundreds of cells and millions of devices reporting analytics datum and they may react to a particular event and generate a surge of traffic like that for example an es alert would trigger such an event or if you have popular shows say the the game of in the latest episode of Game of Thrones or the Superbowl might have search of traffic like this so when you think about the 1 megabyte per second per short limit you have a choice here what Ben was are you going to use and this drives to what I call this back pressure scale where you have to use at the minimum the average pathways and you can use all the way to the maximum and ways if you use the maximum ven ways you'll see it's going to be a little more expensive because you have more shards but if you used average bent ways you'll have more latency in your data now dealing with back pressure is actually pretty difficult and expensive so you have to consider the whole picture when you're looking at the cost and so I would strongly advise you to look at the maximum bandwidth for the period of time that you're dealing with if you whether you whatever Ben was you use you need to deal with back pressure because you can always have a new usage that goes over your historical maximum so you have to deal with that if you don't guess what happens you get exceptions right and you're most likely to lose data so this drives us to the first slide with the formulas because you have those two limitations you have two formulas when you look at those two together you see that if your messages on average are smaller than when one kilobyte then the throughput limitation is what's going to drive the number of shots so what you do in that case is it's very well documented in the Kinesis documentation you do record aggregation essentially what you do is you batch records together before sending them to two kinases and kpl does that for you automatically so that will drive you down to the first formula where the the Ben ways is the limiting factor and is driving the number of shards so what you do in that case to lower the number of shots that you see online is to compress the data so the providential part about this particular slide is that as I was writing this very slide a few weeks ago I was contacted by a team within Comcast who had a problem they were seeing exceptions on their stream and they keep scaling up and scaling up their stream until they reach 700 shots and they were concerned about the price of running that stream so I looked at their stream and sure enough they were generating a lot of small messages so I gave them the two advices the two pieces of advice that I just gave you aggregate the record compress the data and they were able to drive down the the need for the number of shards from 700 to 80 and more importantly the cost of running their stream they saved about two hundred thousand dollars per year so I hope that those advice this advice is gonna be as beneficial to you as a West for them now consideration number two limit on the consumer side so again anin talked about that do you guys remember what those are the two limits right so the first one is that the consumers as a whole as a group are going to share two megabytes per second per Shawn and similarly to the producer this gives us this formula now the second limit is that the consumers again as a group may not consume may not request more than five get requests per second and so this is problematic because you cannot solve that particular limit by increasing the number of shots that's because it's it's related to the frequency at which you queried those charts no matter how many they are so you have two groups of solutions to this problem the first one on one side is to try to solve this problem on the an application side so if you control all the consumers if you can dynamically adjust the frequency at which you request district the get record for the string and if your consumers are aware of the other consumers in the group then you can solve that problem so as you can see this is pretty challenging to do and also this is naive anyway because you can still have a consumer that would be a rogue consumer that hugs all those requests per second and basically performs a denial of service attack on your stream so the other categories of solution is to have a different architecture on the server side so one of the ways you solve that is to create a copy stream a slave string you do that by using lambda or kenosis analytics and you spread the consumers across those two strings the downside of the solution is that the reliability of the slave stream is lower than the master stream so you can think about other architectures that you can create in order to solve that problem now consideration number three is to look at how Canisius actually works on to scale with stream up and down so let me illustrate that with an example imagine that you have a stream that has three shorts and a data retention of wente Kinesis has only two operations that he can do it can either split a stream into two or merge two stream to short sorry into one so I imagine that on day three you split shot one into short three and four short one at this point is going to be closed so you won't you will not be charged for that particular shot however however the data that it had been receiving up until that point or still stored onto that shard and they're going to be stored for the duration of the retention period right so on only 24 hours after you you do that split operation the stream is going to actually remove short one and you're gonna end up with four short string so let's talk about something a little different here how does kinases know what shard to write the data to you guys know it uses the key right so when you send a piece of data to kinases it you send the data associated with a key and based on that key it will it will determine which shard the data is going to so there are two types of keys either your key is perfectly balancing the the data amongst all the shards or it's not if it's not what's going to happen is that some of the shards is going to receive more data than the others and this is going to create hot shot's in that case you want to split those particular shard the goal being that the each shard received the same amount of data as much as possible so this is a little difficult to manage actually and unless you have a good reason to do that I would recommend you to use a key that spreads the data evenly across the shards and if that's the case what you want to do is use that update short count operation that Kinesis provides and it's a convenience method that basically uses those split merge operations that I mentioned in order to create the number of shot the appropriate number of shots and maintain an equal portion of the key space for each one of those shots and so if you observe kinases scaling from 3 shards to 5 shards you see that it goes through a bunch of steps actually following this pattern of split split merge split split merge until it creates the shards that are needed so as you can see it creates a number of temporary chard and because those operation are synchronous you see that it may take some time especially if you have a lot more shorts than this so now let's see consideration number for the consumption speed so it's unfortunate but in my experience it happens from time to time that the consumer comes to you and says I cannot process the data fast enough you need to scale up the data stream so obviously the and that's because they have data processing on their side that is taking a lot longer than and so creating a bottleneck so the appropriate solution to this problem is to reorder TechEd the consumer so that you basically take that bottleneck into a second layer that can scale independently from the layer that is consuming from Kinesis so in the example that i havent decide each so you have a three short stream each shard is receiving a thousand message per second but consumer number two can only consume 500 shots 500 message per second and so because the stream has three shots the consumer is bounded by having three consumer processes so again the solution here is to really tag consumer to but the thing is that this takes time and so in the meantime you may have to scale the stream to comply with the request of that particular consumer but the good thing is that you can put a price tag on how much discussed and then you can charge this particular consumer as an incentive to do the work and so this drives you to this formula that basically you have to compute for every consumers look at how many shorts each one of them need so now the fifth and last consideration is what I call the acceptable data latency and there are different cases where this can happen the one that I that I illustrate here on this diagram are the surge many ways that I showed you at the beginning of the of the presentation the red line is the producer producing data and the blue line is the consumer consuming the data and because of this particular scenarios with those numbers here the consumers are bounded by a consumption throughput that is lower than what the producer can will produce and therefore the data the consumer is going to is going to start lagging the data is going to have more and more delay more and more latency so the difference between the end of the surge and when the consumer catches up is going to be the maximum latency that the data is going to have that the consumer is going to see and the point I'm trying to make here is that you can compute the number of shots that you need if the consumer have an SLA if the consumer have specific requirements on how much that data a latency can be at the maximum so in this slide I named some variables and that leads us to this particular formula and that loose complicated it's really not that you can use to plug the numbers to see how many shots you need to comply with requirements such as this there are two other contexts where there are similar to that if a consumer if a consumer is unavailable for some time and suddenly comes back up it will need time it will accumulate some lags and it will need time to catch up with the real time stream and again you can compute how long this is going to take based on various input variables and most importantly the number of shots the other cases where this happens that's similar to this is if a consumer is trying to reprocess pass the data and again you can compute the rate at which that consumer is going to consume data so for example it may need 10 it's to process an hour of data so with that let's see what the new enhanced valid consumer changes these pictures so as an unmentioned and explain a fan at consumer have basically a dedicated two megabytes per second per shark and essentially what that does is that it isolates this consumer from the other consumers so what this does not change is the fact that Kinesis chillin forces limits on the data producer that kinases still works the way it does by splitting and merging shards together and it doesn't change the consumption speed because the bottleneck is not in terms of the consumer reading the data from Kinesis but it's a little bit further down the stream with what this does change however is obviously the fact that the detect consumption limits are affected so the fan up consumers have seen no limits because they can consume twice as fast as the producer can produce and it changes also the other consumers the regular consumers because now they have a smaller pool of consumers that they have to share that two megabytes per second bandwidth with similarily it changes the maximum acceptable latency and for the same reason again the fan at consumers will see no lag and the regular consumer have a small pool of consumers to share the two megabytes per second with so this is the calculator that we implemented it is open source I would encourage you to go check it out basically what it is is it's a web form in which you input the numbers that describe your Akina stream and things such as the average message size the average food the maximum throughput etc and it's going to recommend the optimal number of shots that your stream should have it also is going to give you a cuss estimate for how much the stream is going to cost you and also it supports the new enhanced valid consumer and it will also provide you an estimate for how much running a specific and hence valid consumer will cast in this particular context so in conclusion I hope that this presentation has given you a framework to think through how to scale Akina stream again the considerations to keep in mind or the producer limit the consumer limits the way Kinesis scales a data stream the consumption spin and the maximum acceptable latency with that I thank you for attending this talk and if you have any questions for either Anna or I would be pleased to answer them thank you [Applause] yeah there's a microphone if you don't mind step into the microphone hello so in one of the previous sessions I was told that a shard should ideally be matching with one V CPU to make it efficient is that what you noticed also could you repeat that I'm sorry so shard when we are like building up - are the number of shards so ideally one shard should match to one C V CPU if you're concerned all heard that with the KCl if you're building a consuming applications using the KCl then essentially a thread will be lease will be obtained a shard lease so it makes the most sense for optimization of CPUs and shards so that makes a lot of sense it's not required but that would be the optimal way to do that so you may also consider having multiple hosts process the same stream so if you have a very large team for example with let's just say four hundred shards and you try to consume that with one single host you're not going to get a box with a 400 V CPU so you'll have to spread that across multiple multiple hosts multiple ec2 instances but the KCl will manage the distribution across the different hosts related to the topic of the spreading over multiple hosts with the enhanced fan-out is the to make per second limited to unique consumer or is it spread across the application that is load balancing across those hosts it is specific for the shard so typically the the the conceal consumer record processor so similar to that question where there is a thread that is leased to a given shard its specific to that to that individual that individual record processor for that singular consumer application so another consumer application that might come online would get it's another record processor and it would get its own 2 Meg's per second egress but off of a different shard now on the same chart on the same showcases every time out then every time a consuming application registers every shard in the stream gets another 2 Meg's per second specifically for that application even when you're using a shared application group in the load balancing method rather than the multiple unique consumers method well every time at application every time the consumer application calls the register stream consumer API which it has to do ok every consuming app consumer application has to make that API call because we give it an AR n we give it a unique ID once that's done we create we provision resources in the stream that gives that particular consuming application 2 Meg's per second specific to it which is independent of any of the older consumers who might also be calling the get records if you have older consumers who are calling the get records operation they're still going to be limited to the 2 Meg's per second across all consumers who are calling get records so only that 2 Meg's pipe if you want to call it that is dedicated to your new efo and hands fan-out consumer ok and for the enhanced fan-out where is the data persisted I know in their in the regular consumers the KCl uses dynamo to persist the position per Shire per unique consumer application that hasn't changed that's that's still promoting dynamo just looks slightly different yeah the checkpointing implementation is is exactly the same yep oh yeah I we use the kpl and get a lot of performance improvements for the record compaction as it puts it onto the queue if we go serverless with lambda is the kpl still gonna work that way there is a there is a library that you can use in your lambda function that will essentially D aggregate the records that were aggregated by the kpl right but that's on the consumer side I'm concerned about the producer side well you tell lambda for producer I'm sorry I misunderstood same thing the library can I also aggregate the the messages together before sending them yeah okay now you don't get it's the what it will do is create the same aggregation structure but it's still a little different than running the kpl locally on a host because the KPI locally on a host does a lot of buffering because it's running a separate process in the background like a c++ based process yeah which has a local buffer has a lot of the back off and retry capability already built in this library will allow you to use lambda to create the same structure but you lose some of the benefits that you might get from running like a long-running process that the kpl provides if you're running it natively okay thank you okay for for the efo case and the limits that you guys were talking about those numbers apply whether you're using the kpl and having aggregated messages or not right so it's like a thousand per second that's a thousand aggregated or individual messages if I wasn't using the kpl yeah the thousand records per second when you do aggregation with the kpl from the Kinesis streams per sec perspective it just sees even if you aggregate you know a hundred of your records into a single Kinesis record that appears as a single record in Kinesis and you could send a thousand of those over the wire before you'd be throttled by the Kinesis front end so Kinesis itself the service does not know that there's any aggregation going on this is just the buffer the only things that know about this aggregation are the producer and the consumer except from the Kinesis streams perspective it's just a record just happens to be made up of aggregated and smaller records right okay and then from the KCl side I know that there's a way you can set like max records fetched eat with each get that's again referring to like Kinesis messages yes which would be aggregated which may be made up of hundreds of your actual user records that the producer created right which depends on the load of course because it because the kpl doesn't have like a fixed aggregation factor right it's just sort of I think it's like five megabytes or every so often it will just ship them right so yeah the KPM well you had it's not going to a grenade on the maximum amount of records that can go into a single put which is a megabyte so there is a limit in what it does know where to stop aggregating and you do have some properties in the kpl that say i only want to aggregate 200 records max even though that's well below that one Meg you can you can still make that property setting okay I'm sorry one Meg compressed its if we don't care so it's one Meg of binary data so if that's two Meg's of your own data that you've compressed using gzip down to one Meg before you send it that's fine by us as long as your consumer knows how to unzip it yep so AWS me launched a managed Kafka service today that's that compared to Kinesis when might you want to use one I was waiting for that question so I hope you guys heard that so yeah today we announced Amazon managed streaming for Kafka so Kinesis is not it's it's owned and built by the same team that built Kinesis the Kinesis will continue to evolve independently the the reason we built that new service one of the main reasons was we have a lot of customers who are using Kafka today in the data center or E or even an AC ec2 and we've worked with them to consider moving to Kinesis but they're like we just have too much invested from a development standpoint a lot of code a lot of operations that we just know what we're doing and we would love for you to manage some of that for us but we're just not interested in a migration at this point because we're using some features that you don't support log compaction is a good example you can do in Kafka you can't in the Kinesis so in order to get those customers kind of a nice managed solution for their environment that was the genesis for managed Kafka yeah this is great what I would add to that is that kefka and Canisius are really two different ecosystems you have different tools and indeed teams have a lot of work that they that they put into their application in order to get to where they are so you know it's good to have those two options I think this is great and we just run out of time as we are getting hooked from the stage so thank you all very much
Info
Channel: Amazon Web Services
Views: 38,291
Rating: undefined out of 5
Keywords: re:Invent 2018, Amazon, AWS re:Invent, Analytics, ANT322-R1, Amazon Kinesis, Amazon Kinesis Data Streams
Id: jKPlGznbfZ0
Channel Id: undefined
Length: 63min 7sec (3787 seconds)
Published: Fri Nov 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.