Build a Real Time Data Streaming System with AWS Kinesis, Lambda Functions and a S3 Bucket

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's up everyone welcome back to another episode of aws tutorial and today i'll show you how to build a real-time data streaming system with aws kinesis lambda functions and s3 bucket so this is how the system is going to work so whenever we upload a file to our sd bucket it's going to trigger a lambda function which is a data producer that reads the file content from the s2 bucket and then push the data to kinesis and now we have two consumers that consume the data from the stream and they can do different things with it for example consumer number one can read the data and then send an email to all the clients with the information or you can just publish the data to social media platforms and then consumer number two you can just save the data to a database so there are many things you can do with the data if you have any specific interest in any topics comment down below and i'll try to create a separate video for it and now without further ado let's build a system together all right so right now i'm on the home page of the aws console so the first thing we're going to do is we're going to create an imro for our lambda functions to use so that i have the permissions to access our sd bucket and our kinesis stream so i'm going to type in iam open that into a new tab and then click rows create row we're going to choose lambda because the lambda functions are going to use it and now we're going to add some policies to it the first one is going to be cloudwatch because we're going to use that to look at the logs for this demo i'm just going to give it full access and then the next one is going to be s3 i'll give it full access as well and then lastly we're going to give kenesis full access and then hit next review and then give it a name we're just going to call it data streaming system row create row all right so it's done and now we can move on to step number two which is to create the three lambda functions that we have in the system so i'm just going to type in lambda open that into a new tab create function give it a name i just call it producer we're going to use node.js and then under permission we're going to use the item row that we just created i think it's that one and then create function all right so it's done so remember the architecture diagram that i showed you earlier so that is that lambda function is for the producer and now we're going to create two more for the two consumers right click on functions open that into a new tab create function give it a name i'll just call it consumer1 we're going to use node.js for that as well and the same thing we're going to use the row that we just created create function all right it's done now one more consumer 2. use an existing row great all right so that is done as well and now we can move on to the next step which is to create a sd bucket to host our data source so go back to the console type in s3 open that into a new tab create bucket uh give it a name i'll just call it gen meister data source and then i'll leave in a usb1 region enable versioning why not and then enable encryption and then create a bucket and now what we're going to do is we're going to get into the bucket and then configure a trigger so whenever we upload a file it's going to trigger our producer lambda so go to properties scroll down under event notifications we're gonna create event notification and then give it a name i just call it data upload prefix we're not gonna have any prefix suffix we're gonna restrict that to only text files and then event type we are going to check order object creation events such as upload a new file or update an existing file and then we're going to choose lambda function and then we're going to choose the producer function and save changes all right so that is done and the next step is to create our kinesis stream so go back to the home page and then type in kinesis open that into new tab create data stream and then give it a name i'll just call it gene monster data stream and then number of shards since we have two consumers i'm just gonna give it two charts so that they can be executed simultaneously and then create data stream all right so it's done and it's active we're going to change one configuration so click on configuration and then under encryption we're gonna enable server side encryption and then we're gonna let aws manage the key so save changes alright so it's done and now we are ready to write the lambda code for the producer and the consumers let's do the producer first so let's go to the index.js we're going to delete everything inside the handler function and then first thing first we're going to import aws sdk so i'm going to type in const aws equal to require aws sdk and then i'm going to update this region under config usb1 because that's the region where we create our kinesis stream usc's one and then we're going to define as3 clients we're just gonna call it s3 new aws dot s3 and then we're going to create a kinesis client all right so it's done and now let's define the handling first thing first we're going to log out events to see how it looks like and then we're going to extract the bucket name and the key name from the event body so the event is going to come in as a list we're only taking the first one because there's only one object inside and then it's the key name and then we're going to define our parameter to get the file from s3 key let's get the key name and then we're going to use the s3 client to get the file params dot promise that then and then in the callback function we're gonna have a have an async function as well that's going to deal with the data remember to add async here because inside we're going to call another async function so this is necessary so first we're going to get the data string to do the dot body to string so the body is going to come in as a buffer and right here we're just converting that into a string and then we're going to define a payload to send kinesis data string and after we constructed the payload object we are ready to send it to kinesis so we're going to do a weight sent to kinesis we're going to define this function later payload and that is just going to be the partition key that we're gonna use and if we have any errors we're just gonna lock it out it should be a comma and now we're going to define this function it's going to be an async function that takes in payload and partition key first we're going to construct the param for sending it to kinesis data we're going to stringify the payload object this is very important we have to string divide that because the kinesis client is expecting a string for the data parameter and then the next one is partition key okay and then lastly we're gonna specify the stream name and i believe we call it gene mine's the data stream and here we can finally send the data to kinesis using the clients record parents dot promise that then responds we're just gonna lock it out and then if we have an error we will just lock it out as well all right so that is everything for the producer so we're going to deploy and now let's write the code for the consumers so the first one click on index we're going to delete everything inside the handling function and first thing first we're going to log out the event body to see how it looks like remember what we mentioned before the event object is going to come in as a list and in the producer we only have one item on the list because we only upload one file at a time but in this case we can have multiple items in one event so we're gonna use a for loop here second record and then we can extract the data json.parse going to come in as a buffer as well so we're going to parse it to a json object and this is the data and i believe is a base 64 encoding all right so that's the data so in here you can do all type of things with the data such as send emails to clients publish it publish the data to social medias and stuff like that but for this stem i'm just going to lock out the data and we're gonna save it and then do the same thing for consumer number two let's change it to consumer two deploy and what we need to do next is we need to connect our kinesis to our consumers so let's do add trigger we're going to type in kinesis and then choose the kinesis stream that we just created hit add and now it's added and then do the same thing for consumer number two conditions add all right so now both are enabled and now we are ready to test our data streaming system so what i'm gonna do is we're gonna go to vs code open a new folder and then create a test file txt remember we say that it has to be text file and then i just type something like this is just a test file just another line write more stuff here more data okay and now let's upload this file to our sd bucket and see how the entire system works out so let's go back to aws aws console go back to the sd bucket and then hit upload add files and then navigate to the file that we just created and then hit open upload all right so it's done and now let's go to the producer and see if it triggers it scroll up monitor view logs in cloudworks and this is our lock let's open that let's look at the object so this is the event that triggers the lambda from our s2 bucket there we have the bucket name and then the file name here and then after it sends the data to kinesis it returns a successful message now let's go to our consumers and see if they have processed the data so go to monitor view logs in cloudwatch it got executed and that is the event object from kinesis so that's the data that is encrypted and encoded and after we extract the data and read it and cancel that out this is what we got this is from consumer one and this is the exact data that we wrote in the file and then let's look at consumer number two i believe the same thing happened here and there you go consumer 2 and then the data one thing i forgot to mention before is if you go back to the producer lambda and hit refresh you should be able to see s3 is configured as a trigger for for the producer lambda and this is it everyone i hope you have learned something and if you like this video i hope you can give it a thumbs up and i'll see you in the next video
Info
Channel: Felix Yu
Views: 8,825
Rating: undefined out of 5
Keywords: aws, kinesis, streaming, data streaming, real time data streaming, lambda, s3, data producer, data consumer, automatic file processing
Id: We5Jr4GGLL0
Channel Id: undefined
Length: 17min 12sec (1032 seconds)
Published: Fri Apr 16 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.