App 2025 Episode 5: Storing data with service integrations

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
alright here we go hey everybody it is stream time for app 2025 I'm Rob today we're gonna be talking about storing data with service integrations if you've joined for my step function series over on the AWS twitch channel we will sort of reference that but this is gonna be a little different a little more in-depth so this is not a repeat you're gonna want to stick around and see this we only focus on step functions for a very minor part as you see on the agenda there so first I will give a quick overview over of what a service integration is what we mean when we say that what it replaces and then we're going to talk about a new service not just new to the series Amazon Kinesis data firehose if you're not using this today you absolutely should be using it in your application it's extremely robust way of moving streams of data around to different locations and in particular if you need to both process transactions and set yourself up for analytics it's a really good choice so that's what we're gonna build with it I'm gonna show you how to get events off of the event bus and into Kinesis data firehose so that you can get them stored eventually wherever you store them for us that'll be s3 I'll show you how to use AWS step functions workflows to store and then by extension retrieve data to and from dynamodb tables without using any AWS lambda functions so that's a declarative transformation that you put in and then we're gonna build this concurrent online transaction processing and online analytics processing pipeline and it takes an event onto our bus and dispatches it to two different targets so the same event comes in but goes out the one hand for a workflow to handle as we've shown previously with Express were closed at standard workflows and then on the other hand it's gonna go to Kinesis date of firehose to ultimately be stored in a relational model that we can then query later and the whole purpose of this is for next week we're going to be using Amazon Athena to show you how to server lessly query your data and this sets that data up using glue into you know schema managed data in an s3 bucket that you can query pretty easily so it's gonna be sort of the the last couple pieces that we cover with app 2025 so hey there's my Docs bot I love my dog spot it's glad to see it work today it's actually the links bot the home page for Kinesis data firehoses there along with pricing my good buddy DJ geeky geek every time man every time who's here in the channel moderating for me did a really good episode of sessions with Sam that's linked here that's focused on Kinesis date of fire hose and Kinesis data analytics so it's different again from what we're gonna build today so if you want to get even deeper into Kinesis data firehose that's a good resource the step function service integration episode is listed their service integrations in general and then just some other links that I think will be helpful for you ok so let's go back to our buddy our architecture and let's do this as we talk about what service integrations are if you remember we have this event bus running you know with time left to right and events come on to the bus and workflows in AWS step functions pick them up and handle them that itself is a service integration right the AWS event bus rule dispatches that event to another service and that services AWS step functions so it's integrated it doesn't take it send it to lambda and have lambda invoke the service it's a direct service to service call so whenever we say service integration that's what we're talking about is passing data or invoking other services directly without intermediaries and one of the major advantages to doing this is price right so if you were to invoke this same call with an AWS lambda function in the middle you pay for that execution time if you were to write data out to your dynamodb table using a lambda function you pay for that execution time right and it's deterministic to a point but you also have jitter jitter that's my second reference for it already the another reason is when you use a service integration it's declarative so you declare what the parameters are for the input and the parameters for the output and we handle like even to the runtime level getting that stuff over so you're not looking at dependencies libraries other areas that introduce phoner abilities or that introduce liability because as our friend Jeremy Daly says the line of code is not an asset it's a liability so you want to minimize the number of lines of code that you're responsible for service integrations let you do that so in this case we've seen this pattern a couple times in our Express workflows episode in our long-running workflows episode using standard workflows where some event comes on to the bus and dispatches the workflow but what we're actually gonna see here now let's see if I can draw fast enough to make this worth everybody's while we're gonna change this to OLTP using our our step functions but over here we're gonna pull in let's see fire hose can I get it can I get fire there's my fire hose look at that y'all modern technology you know what I mean it's just crazy to undo that can I is this grouped or something what do y'all think no I think I got it now get rid of that there we go and drop that in there right so for for this we don't need to put a lambda function here what we're gonna do is drop this right into an s3 bucket like so and that makes this architecture even simpler right so there are some cases no didn't want to do that we're gonna we're gonna live with this diagram okay there cases maybe I'm getting arrow directional connector you go there arrow Gobert alright there are some cases where you may want to use a lambda function and yeah fYI you can do draw dot IO right and vs code I know I tweeted about that and I loved it the problem is this diagram doesn't have a background associated with it and so when I opened it up to do that some of my fonts were like the wrong color and they got washed out so for accessibility reasons I want to make sure that as many people as possible can see what I'm doing so yes if you haven't seen the draw that IO extension and Visual Studio code you definitely want to check that out thanks for highlighting that so what we're looking at here is we have a single event that we're gonna dispatch to two workflows right look at this show I'm getting better he can be taught now Kinesis data firehose allows you to perform transformations on records after they've entered the stream on their way out it does this using AWS lambda functions and again if you want to see a really good example of this check out edj geeks sessions with Sam episodes that I've linked in the links here because that's what he does but one of the principles for us is we want to make as much use of the built-in functionality of the services as we can and Kinesis I'm sorry event bridge allows us to do this with input transformers so we can take the format of the event that we received on the bus and using a pattern that's really similar to step functions in fact it's derived from JSON path we can transform that event body into the format that we want before we send it over to our destination to any destination so you can do that for Canisius date of firehose we're gonna do it for the step functions workflow this week and then I've shown you in previous step functions episodes as well how you can use the past state in step functions to transform data or how you can use the parameters object when you're doing service integration calls like to dynamodb to pick and choose the pieces of the state that you want to get your data into the right shape for your service integration call without having to put a lambda function in the middle so always before you go writing like a glue lambda function I don't mean AWS glue I mean like conceptual glue always stop and think what does the service that I'm using and the service that I'm invoking give me to accomplish this same task and that's just a little thing that helps keep your implications down which helps keep your pricing down and can even reduce your latency as well so it can bring your performance up and again long term because it's declarative you don't have to worry about that being broken as libraries underneath change so let me take a look at my agenda here okay yeah destinations so today our OLAP flow is going to store our data in an s3 bucket so that next week we can query that server lessly using amazon athena there are a couple other options one you can send it directly into an Amazon redshift data warehouse so if you're already running redshift cluster then you can use Kaneda can Kinesis data firehose to take these records and just spray them directly into redshift so that's one format yeah as edj gate geek says don't use lambda to transport only to transform only to apply business rules really even your transformations a lot of times you can do without lambda so it's really once you need to apply a business rule to it as when you should be doing it another good one for this and let me maybe even build this on here because you could conceivably do the same thing as you can give yourself Amazon Elastic search service is a target for Canisius data firehose so while we send this original event over for analytics processing we may also want to send it into an Amazon Elastic search service cluster so that that data is searchable by our application whether it's log data or you know text data whatever you're you're searching through that's another target for Kinesis data fire hose and one stream can publish to multiple targets I believe etj geek keep me honest on that one not 100% sure but I believe it can't he says right so we're gonna assume that I'm right on this one Oh important thing Amazon Kinesis data firehose is a purely serverless service which means it meets our four tenets right there's nothing to provision there's no infrastructure to manage it's highly available it auto scales and you pay for value right so you don't have to predefined shards ahead of time like you do with Kinesis it's the number of records that you push through determines the price you pay and that's that end of story so you can get a look on that on the pricing page but it's only the sorry and transformations because you pay for the compute behind them but if if you don't send any data through you don't have shards sitting there idle you only pay for the data that goes through so that's another key point and and look at that there's my bot just in time helping me out with the links and then the the other target that I want to talk about for Canisius data firehoses Splunk so a lot of enterprises have big investments in time and money and Splunk for visibility into data inside their networks Kinesis data firehose can publish directly into Splunk as well so if you're a Splunk user definitely go check that out I have a different mug today I apologize if it makes more noise hopefully I don't spill it all over myself you can also do data transformation within the firehose that's correct and there's two types one we're gonna do one we're not gonna do the data transformation within the firehose I'd advise you to look at the sessions with Sam what we're gonna do is a format transformation so we're getting our data in as JSON we're gonna write it out as Apache park' so that's another key point for Canisius data firehose what we're gonna do is one feature where we can archive the raw data so we'll have actually - I need to modify Maya let's get rid of you I need to modify my diagram here a little bit what we're actually gonna have is two buckets and one of them is gonna get raw data and one of them is gonna get the data that we transform or process is what it's called into parquet so this single workflow you know event bridge doesn't really know about any of this it just knows it has a target but inside here we're processing it to parquet for optimization for search and querying later and then at the same time storing the raw original event and then you can apply lifecycle policies to these types of things AIDS them out once they're no longer needed do it in test dev but not production if you want to use it as sort of a debug mechanism whatever your use case is it's just important to know that in addition to sending it to multiple targets you can also send it to multiple buckets inside the s3 target and then of course we have the code for all of this now a very important point to note about Kinesis data firehose is it is not covered under the free tier now it's not a very expensive service in fact let's view that link to see because you it's priced per gigabyte again it's different by region but if you look at this per gigabyte is less than three cents at the the most expensive level so it all depends on how much data you push through it but it is a calculable expense but there is no free tier on it so important to recognize that you will incur charges for using this if you use it in dev or test just to explore it's pretty easy to keep your inbound data under a gigabyte and then you pay less than three cents to see if it's gonna work for you or not so still I'd say give it a shot just be aware all right so now that we know that this is what we're doing here in fact let me just too to give you textfields miss variable text I'm just completed here so it's clear raw and parquet do you all remember the commercials from like I think it was that 80s where there's a little tub of vegetable spread it's like butter and it goes parquet butter RK all right an actual question Turin gal how are you doing with the at least once delivery guarantees of event rich always ensuring idempotency in destination if needed eg booking a couple ways if you look at the episode from last week where we talked about sqs FIFO queues those can help you out but ultimately when you get down to your business rule inside an AWS lambda function you need to check that you need to handle that item potent to yourself and there's a lot of good resources on it it's how can I put this it's going to be an edge an edge case that occurs right so you should absolutely assume it's gonna happen that you're gonna get multiple implications regardless of how low the frequency is the the general way to do this is the first time you you or every time you get an item you attempt to retrieve that unique ID from DynamoDB to see if you have one in flight or one that's already been processed and if you find it and it's in a process or a completed state then you just discard that operation right because it's already been performed to completion then beyond that it's up to you you may want to just return with a you know operation in flight error you may want to sleep and then check again something like that and if you don't find anything then you write your record into there and start processing again idempotency is non-trivial we have a lot of material specifically to it but you do need to be considering it so good point there for the the way that these types of streams work if you if you're sending it to a step functions workload the workflow itself is gonna execute once if it's a standard workflow if it's an Express workflow it won't so you'll need to consider that for when you're storing it to dynamodb from LTP perspective it's largely not an issue if you write it twice and that's why I put item in DynamoDB is both create item and update item if you override it with the same exact information including the transaction ID then you haven't really caused anything to occur unless you're consuming that downstream using dynamodb streams and then that's where you need to consider it sort of the same for an s3 bucket right if you dump this in multiple times for this OLAP flow it doesn't matter because you're really only overriding the object and then you you don't run your analytics in real time with this example you run your analytics and batches after the fact so it you're right in some cases it matters but make sure that it your application architecture really requires it in a lot of cases it's okay it's like eventual consistency right like a lot of times it's okay to have eventual consistency but if it's not you need to know it you need to know how to manage it let's pop over to look at the code and I've collapsed all of this stuff here for us real quick I've gone ahead and deployed all of these stacks because there's a lot going on we're only gonna be dealing with the infrastructure and billing today but I just wanted you to see that they're already there when you start wondering why I'm not writing this in real time because it's complicated so I want to talk about event bridge to firehose first thanks for the next point hey thank you thanks for the great question so event bridge the firehose as you saw from the original episode when we have event bridge we need a rule and that rule needs a target and generally an AR n for a role as well so that's exactly what we have here we have a transaction rule we're going to listen for transaction initiated events which we see here and then we're gonna send them to a couple targets let me minimize the step functions target for now so that we're this is all we need to get it sent over to Kinesis data firehose along with the right permissions right you've always got to get your iam permissions tight so we use this transaction firehose role and this all comes out of the docs these like don't be frightened by any of this stuff this all comes out of the docs but we need to be able to write our s3 objects into our bucket this is both the raw data bucket and all the objects in it and the process data bucket and all the objects in it and then again when you look at glue I've sort of shortcut adhere to avoid having the same thing built for all of them but you need to be able to call these three operations these are less relevant today we're gonna have a tightened up policy for next week it's more relevant next week when we go to Athena but I did want to include it so that you can see that there's another step in there right you have to have this glue schema defined and you'll see that when we define the pipeline itself but to get from event bridge sorry I just showed you the wrong role that's I was wondering why that was so complicated to get from a vet bridge this is just like what we used in the customers micro service before to get from event bridge to fire hose you just need a role that can be assumed by events and it needs to be able to put record and put record batch onto your fire hose and that's it so this is actually properly scoped out so that we'll get our resources into the fire hose from there we need to get them out and that's where the fire hose declaration comes in so real quick don't name your s3 buckets because it doesn't matter in reality you'll probably get these as parameters from somewhere else in a production application but as we said we have a raw data bucket and a process data bucket and I'm gonna after I show you the definition I'm gonna come back and show you the difference in how these two get written out in those buckets themselves here we define a Glu database and we name it what else apt 20-25 needs to be all lowercase must specify a catalog ID which is just your account and then a glue table again this codes gonna be available for you and it's also straight from the docs so don't be alarmed by the confusion of it this is essentially your DDL for data description language for your what's really becoming a sequel query a table that's queried by athena so you tell it how to partition that data and you'll see that reflected when we do the transformation in firehose and then you tell it the fields that you're gonna store some input formats that you have to put in this is um your serializer and deserialize err this is again non trivial but documented so definitely spend some time with the docs on these just half an hour of reading the docs on this will prevent a lot of questions especially with the examples and then check this code yeah Docs that's right buddy and repo check this code once I post it after this episode as well as well as I happen to remember the sessions with Sam link is in there as well for some more Kinesis data firehose code but all we're doing with glue and with glue is defining a table in a database so we don't really need to dive into the details of that today just think of it as there's a database out there and this is a table in that database and then the firehose itself is where we get crazy so we have direct put which allows the event bridge to put into it we have the bucket this extended s3 destination configuration there's a it's just s3 destination configuration that's pretty simple if all you're gonna do is dump records into s3 but if you're going to transform them like we're doing here you're gonna need to use the extended configuration because it lets you choose serializers it lets you choose data formats it lets you choose compression and several other things so what we're giving you here in this code is a pretty thorough production-ready example that we'll handle very large loads because it's using parquet and gzip compression one thing you need to focus on when you're doing the transformation you have to set your compression format to uncompressed that's because the serializer handles compression for you at the end and if you send compressed data into the serializer it doesn't know how to handle it so you'll see here we have this data format conversion block and that's what's taking our format from JSON to parquet and then in the end that parquet serializer is going to compress it using gzip for us so we save on storage we get that good fast searchable columnar parquet format gives us exactly what we want to use and then for the schema configuration it's that same catalog ID reference to database reference the region our own role with those permissions that I showed you earlier in the table name and we've just locked to latest here you can do versioning and aliases but we've just locked on the latest version so that we always get the latest version of the schema I'm gonna come back to this error prefix and error output prefix last because that's where you actually see the difference in the buckets so for buffering hints that's how long that Kinesis data firehose is going to allow records to accumulate before it attempts it right if you'll notice down here in the buffering hints for the backup configuration it's 60 and one that's because we're just writing raw events they're up here at 60 and 64 so when you do a park a data format transformation your minimum size and megabytes is 64 again we're gonna give you this code it's in the docs and it's also some of the best errors that I've seen from cloud formation with Kinesis data firehose it tells you exactly what's wrong and that the minimum size has to be 64 if you try to do that as one so really good stuff here these fine that's the buffering hints how long it's going to buffer before writing out s3 backup mode is enabled if you enable it you have to provide a configuration we just give it the AR n of the bucket we created we want to compress it there too we give it the same role and that's that okay so this is for our transformation of records that come in and what this is gonna look like is we're gonna give it this year and then we parse the timestamp for the four-digit year month day and our prefix and this means that in our bucket its first we're joining the table which is transactions then all of this stuff right so in this case we'll have transactions error the error output type and then we're gonna drag that and then all of that stuff again so it's a way of preparing that bucket to look like a sequel table via the glue schema so that Athena can read it later so let's take a look at what these look like all right s3 this is our raw data bucket and I've sent one event through here and in fact let me go ahead and once we get in here I'm gonna send another but you see it's May the 14th the last one these are UTC times the last one was sent at 17 we're into 18 now and we had these records so let me go back over here and send one because you do have to wait for that minute for it to so we see we get an event ID that we can use to track everything later that's actually going to be our transaction ID but you do have to wait a minute for that buffer to fill up because we're not going to reach a megabyte with this single event or 64 megabytes with this single event so it's going to wait until after it's filled up and then it's going to write them out so we'll kick that off but we see we have these events in our raw data bucket just by date so it hasn't done any transformation it's not aware of a glue schema it's not aware of park' it's not aware of anything it's just going year month day hour and then this will download and open I'll bring this in here make it a JSON object at it and then we see we've got the same event just in its raw format detail customer ID etc I know it's a little small to see let's open it Visual Studio code and see there we go format document minimize this for a second so if we look at this side-by-side with the event that we passed it should already be open we see we've got this same format right this customer ID initiated at etc right so customer ID initiated at from account this is just like the event bridge events that we've seen the whole time right there's a source there's a detail type some information around the record itself that we got and then the body that we provided so we haven't done any kind of transformation any kind of anything to this but if we go back here now we see that indeed our new event came through and we can download that and view it as well so these are the raw events bucket name year month date hour if we look in our processed bucket this is our glue table name that we gave it let's go back here let me close you don't need that we gave it the table name of where are we transactions right so as expected this did what we asked it to joined it with the table name and then the next thing we should expect to see is this year equals 2020 and we do so this will go into where queries right so select from this source where from transactions where year equals 2020 and month equals five and day equals 14 right so it does this sort of transformative naming for us so that we see and then here's the same event and you see this is a park a file so there's really nothing for me to do with this I don't have a park a viewer can open it in TextEdit and you'll see no it opens okay let's see if I can zoom in on this yeah a little bit of binary characters there but you see this is a they define the columns and then the data etc right so we see that it has been transformed into the parkade format for us so that's the difference in the raw bucket and the processed bucket or as it defines them the s3 backup where we put the raw bucket and the bucket a are in for processed data okay so at this point we've taken the data on to the bus and we've sent it out to Kinesis data firehose Kinesis data fire hose has saved a copy unmodified and then it's taken it and transformed it to parque and written it out according to an AWS glue schema so that we can search it next week with amazon athena so there's a lot going on here right already are there any questions at this point because the next part is easier so this is it's better to ask your questions now if you have them and we can dig around in this all right super gonna drive on the next thing that we're gonna do if we return to our diagram so we've done this already and now we know that we've got that stable there we need to send this over to a workflow that does some processing and this in fact is going to store directly into dynamo DB enter alright so we can just put that there so we're not gonna put a lambda function in between we're just gonna go direct to DynamoDB let's hide you and let's hide some of this because the firehose has done its job now we want to go back and look at the rule as well we'd hidden this before right so we've still got the same rule so it's gonna capture the same event and it's just sending to two targets one of them is firehose and one of them is step functions workflow so for this one I've given us an example of an input transformer everything up until here you've seen before right it's the same as the last one you're just referencing a step functions workflow giving it a human readable name and giving event bridge an I am role to assume to VOC that workflow the one thing that we've done differently is we've given ourselves an input transformer here so in a vent bridge before we invoke that workflow before we pass it an object we're gonna transform that object and we do that with the input transformer and this is the input paths map and then there's an output that's a template so if you're familiar with defining items in dynamo DB like item attribute where you have to define that something's a string and then reference it by another name it's a very similar concept here so this is JSON path notation again similar to what we're used to with step functions it just means out of that top-level object let me split this so we can see here's our event that we're actually using right that's the one that I just sent over and I keep sending back we can update it for fun just to make it different but whatever doesn't really matter so this is the event that we're going to send but remember all of this is going to be wrapped in the event bridge wrapper so this becomes the detail I'm sorry that's not true the detail is in here as the detail the only thing that happens is it gets an event ID added to it so we're gonna call this transaction IV the unique identifiers that's generated by event bridge our customer ID is in this object in detail customer ID and you see we just extract these other fields the received date time we extract from the event object all of this stuff you don't define these yourself when you're putting them on to the bus they're defined in this case because it's debug the time becomes the time you do define the others but so we take all of these and we give them referential names and now we have this now that we've extracted them out into this model we use this template and here it's a foul if you put quotes around these fields okay so don't do that it'll tell you again the documentation on this is great it'll tell you that you can't have quotes around variables in your transformer but you use this angle bracket notation to refer back to these so ultimately what this means is I'm gonna pass to our step functions workflow an input object that looks like this a top-level key of transaction ID that works out to be the unique identifier assigned by a VIN bridge a top-level key of customer ID that works out to be the detailed customer ID field from our event the received time that works out to be the time of the event etc right so you can shape this object before passing it over then when we look at that workflow it's a very standard workflow we see that we have two states here one write it out two DynamoDB and the second state is just publish it onto the event bus as a well this is account normalized but we're actually publishing it as transaction processed so let me update that here for you so we're publishing it as a transaction processed event back onto the same bus and sending it this detail information in the event right so this is how we wha this reminds me a lot of what I see at Whole Foods I don't know where to go with that but all right you're prime now shopper all right me too but some will put this directly into DynamoDB we'll take that returned object from dynamodb and put it in our result path and then we'll publish it back onto the event bus it's always curious as to how we use our database didn't know if we use sequel or DynamoDB okay gotcha tracking the answer is you probably use both and sort of the generic best practice unless you know otherwise is to use DynamoDB for OLTP like we're doing here and use some sort of sequel database or data warehouse for your analytics because DynamoDB excels at transaction processing but you don't want to be running a bunch of scans on it sequel is a solved problem a well-known domain so you were probably using both together not necessarily sure that it was s3 and Amazon Athena might have been redshift might have been something else so alright cool yeah you're welcome so from here we've got this published and we can go back to our step functions console and see what that looked like for our most recent execution so this is the one that we ran in the middle of the show you see it's got that same event ID that we had before and we can look at the the flows again as we come through this transaction ID doesn't look like the information that we passed in right if we go back to visual studio code this is the event that we passed in the event had all of this information and initiated at field at from a count field but because of our input transformer we instead of a from account we have source account instead of initiated app we have requested date time so it's a way that you can you have a legacy workflow that you need to consume new standardized events like you've standardized on a certain schema but you don't have time to modify that workload you can use an input transformer to take the new standardized schema and shape it the way that your legacy workflow expects it so that you don't have to go through making breaking changes to your workflow you can just leave it in place even though other applications are now putting events onto the bus in the correct format so this is a really powerful migration pattern always remember this if you're trying to bring a legacy workflow into an event bridge driven application so when we send our stuff to Amazon flex we're probably using dynamo probably but I do not know how those applications are built internally and then again we can check ourselves this is the dynamo DB output that we get from there and we publish that event back on to the bus and once we publish that we see this is the response from a vent bridge where we get a copy of the payload back we get the event ID so we'll know that that has come through we haven't actually picked this event up so we won't see it anywhere but we get all the other information about it so then we can come over here and again here's our our happy little event as described right with source account instead of from account and requested date time instead of initiated up all right when there are transmissions does data mismatch I mean it depends on how you can how you configure it so it's up to you to match the data that you expect and the data that you send right one powerful tool for this again is schema registry in event bridge I do find that sometimes items don't always match that's correct yeah schemas can help you with this if you haven't checked into schemas in a BIM bridge and the schema registry that's another good tool if you use that to build your your applications and you already know you also have schema discovery that we recommend you enable in sub production environments because it lets you see when errant events start getting published onto the bus right so if you start seeing a new event that you didn't expect or define that's probably an error in that application and you want to hit the pause button and investigate that a little further to see what's going on there and that'll help you with this data back and forth so at this point we can see we've got data from a single event coming into a couple different places right we've got data from that event it's making its way through our online online transaction processing workflow and step functions and being stored into our dynamo DB table it's making its way through an OLAP prep flow where the data is being transformed and stored into s3 according to an AWS Glu schema so that we can run queries against it later and all of this has happened without any AWS lambda functions at this point so we don't have code vulnerabilities and liabilities and you know dependencies to maintain we have this entirely declarative configuration that gets us from end to end so in fact at this point in the entire application we've done event bridge is the backbone of your app we've done Express workflows for short running processes we've done long-running workflows for human interaction done service integrations I feel like I'm missing one at this point I may well be missing one of my own shows but out of all of these shows we've written one lambda function and we've only written that function to put data back on to the event bridge bus for services that don't have a direct service integration with it so it's a generic function that we can use I'm trying to find my own my own link here and I can't anyway we've either done four or five episodes I'm a little tired but in all of those episodes across all up no we've done this is the fifth episode across all of those episodes oh we did simplify in your architecture with Amazon SNS and Amazon sqs so across all of those episodes we've only had to write one lambda function and again as I said that was just a function to merge some of these services back with a vent bridge so there's a real powerful pattern here right you can sort of take this entire repository and clone it and the lambda functions should be in there this is not an anti lamb to show they're important they matter but that's where your business logic goes that's where your your special sauce your company's value ads should be is in those lambda functions not in making these services talk to one another right so what I've tried to give you here is the setup for a fully distributed fully decoupled fully resilient fully serverless back-end application that handles all of your needs and now you need to plug your actual business processes into them so hopefully this has been useful yeah for so far a DJ geek but this is number five he's got to play this there for you thank you and back before when you said do you think it's human error just data transfer problems I data transfer is pretty reliable at this point I would say it's largely human error right the problem with machines is they do exactly what we tell them to do so if they're not doing what we want it probably means we didn't communicate clearly enough to the Machine what our expectation was there's a little life lesson in there for some of us but um yeah it's a it's generally gonna be human error and and again that's why schema discovery in in a sub production environment or even sampling in a production environment is such a powerful tool because when one of those errant messages comes in you discover it and you can take action for it so let me see again we didn't any of the data inside our state machine or the past state we can I've already done a step functions episode on that I suggest you go back and check that one out it's the service integrations one so I will drop that link here maybe there make sure we built everything yeah so that's that's pretty much it for today's episode subject to your questions so what we've done we've learned that service integrations allow us to process and transform data without invoking lambda functions whenever we're moving information between services we've learned that Amazon Kinesis theta fire hose is a purely service service that allows us to stream data into Amazon s3 Amazon redshift Amazon Elastic search service and Splunk so that we can use one event for multiple types of workloads we've learned that event bridge can directly target Kinesis data firehose with its rules so a service integration that ultimately allows you to get data from a vent bridge into any of those four services without writing lambda functions we've learned how to use the step functions DynamoDB service integration pattern to write data directly into a table it's the same thing to read data out from it you just need to change the call the API call and then I've given you this pattern that we drew in our diagram for concurrent OLTP OLAP and even search if you need it to using Kinesis data firehose so I hope that's been interesting for you next week we're gonna tackle how to ingest all of this data into Amazon Athena and search against it I'm gonna spend between now and then I'm gonna write a lambda function they're just sort of sprays data non-stop into our s3 buckets so that we have some good big data sets to to search through and look through but again yeah I am crazy I'm gonna use your account to do it to a DJ geek because I don't want those I don't want that bill running up in mind no next week that'll be our topic it'll be super cool same time and our final episode of AWS step function series is coming up next Tuesday at 2 p.m. on the AWS channel on twitch let me drop a link here for that too that session is going to be on nested workflows so step functions calling step functions step functions workflows all the way down so definitely an important topic especially when you consider how you can use long-running workflows to consume short running workflows and orchestrate them together and eliminate some of the the code that you don't want to be writing please join me for that otherwise I don't see any new questions coming up so yeah torn gal thank you thank you for joining thanks to everybody pescetarian thanks for all the questions and I hope to see you all next week have a good weekend and go out and do some building all right bye Oh
Info
Channel: Serverless Land
Views: 320
Rating: undefined out of 5
Keywords:
Id: _nseply4SPc
Channel Id: undefined
Length: 51min 41sec (3101 seconds)
Published: Thu May 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.