AWS Lambda Trigger on S3 in Java | S3 Lambda Trigger in Java | Process CSV file in Lambda on S3 PUT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everyone to the next video in lambda tutorial series with java in this video i will cover lambda invocation on s3 put object we will take a simple example we'll upload a csv file to s3 and that will generate an event which will trigger our lambda function agenda of this video is first i'll discuss various use cases for this particular scenario then i will cover general settings in lambda so that you don't waste time on debugging the issues then how can you enable lambda invocation on s3 put so basically there are two ways one is from the s3 console and second is from lambda console i'll show you both in the demo and the last one is java code to read the s3 event so we'll do the live coding in this video and you'll get to know how to process the csv file in java coming to the use cases i feel that you should be very comfortable with lambda invocations on events in general be it s3 or sqs or sns because this is very important part of implementing event driven architectures on aws first example which we are looking on the screen is an application which is storing csv data in dynamodb so you can see this reference architecture lambda will be invoked on put object and you can perform some cleansing and filtering of the data and once that is done you can store the records in db second example is video transcoding generally used to support multiple devices for streaming platforms like youtube and netflix same process here also you upload one video to s3 based on put object event lambda would be triggered then we have in-house aws transcoding service that can convert files to different resolutions you can see here we are capturing three resolutions here and then again all three are saved into different buckets in s3 so this is a real life example that you might encounter if you're going for system design interviews also then the last one is the etl flow where we are uploading our data and then transforming it as per our needs in lambda function itself then saving the data in parque format so believe me in your real life projects if you have huge data in csv you should better convert that to parque format now let's cover few settings in lambda so the first one is bucket should be in the same region where you are creating the lambda function you might say that s3 is a global service but remember we do select the region when we create a bucket second thing is lambda should have permission to read from s3 you should create a new role with basic lambda execution role and s3 read only access the last one is more of a error handling scenario based on your use case you can use dlq or custom destinations by default lambda tries for two times in a sync invocation let me show you so this is lambda console you have to go to configuration here in a synchronous invocation you can see the maximum age of event then retry attempts currently we have two retry attempts so if your lambda fails then it will try for two times and after that you can configure that letter q here so if i go and edit this i do have an option to select sns or sqs and if i select that i need to choose the topic this is the best practice that you should be following in your production grade code because this will help you to get to know if your lambda is failing and you can take necessary actions on that this is an example request payload of s3 put event you can see we get an array of records here for one object we get one entry here we have s3 object details you can see here we have the bucket name and then arn of the bucket we don't get the actual object rather we get details about bucket and the object so you can see here we have the object key we will use bucket name and object key to fetch the actual object using s3 sdk so the step-by-step process would be like this so you derive bucket name and object key from event object as i explained in the previous slide then you create s3 client for creating s3 client you don't need aws access key and secret key in lambda and you can use profile credentials provider to get the client so once you have the client you can pass bucket name and object key in get object method that will return you a stream of object which you can process so i will give you demo for all these four steps when we code our lambda in java let's start our demo so first step is to create s3 bucket i'll click create bucket here i'll type here hopefully we'll get this bucket name and i'll choose region as london eu west 2 yeah we got the bucket here now the next step is to create i am roll for lambda i'll go to roles i'll create a new role i will choose lambda basic execution rule and this one and i would choose s3 read only i will choose s3 read only access because our lambda would be reading the object from s3 click next we'll name it as i'll click on create role yeah so our role is created now i can go to lambda console and create a new function i'll name it as lambda s3 event i'll choose runtime as java 11 and then in default execution role i'll choose an existing role which i created lambda s3 event i'll click create function here now to invoke lambda on s3 put event i have two options i can add the trigger here in lambda console or i can go to my s3 bucket and i can go to properties and in properties i have something called even notifications you can click create event notification here both places are fine so even if i make changes here in s3 console i should be able to see the event here as a trigger here so let me show you that so i let's say i'll choose this i'll click on add trigger and i can choose my service which is s3 then you have an option to choose bucket i created this one then event type you can choose a particular event for which you want to invoke lambda i will choose put here so there can be the case that inside this bucket you have multiple folders and then subfolders and you want to invoke lambda only for a particular folder files then you can add that folder name here but for now i don't have any folder inside that i'll i will not choose any prefix here so that means all the objects that i create even if i create a new folder inside the bucket so for every object that i upload lambda would be invoked a suffix is also very important to know so let's say if you want to invoke lambda only for jpg or dot csv file then you can add that so lambda would be invoked only for that particular file format so i will not add anything here i'll just keep it as it is and click this checkbox and click add here now you can see that from the source side we have s3 if you want to see the details you can go here and you can check that event type is object created by put and then the bucket name you can see the notification name here i'll just copy it for your reference now i'll go to s3 i'll show you that this event is already replicated here i'll go to properties here yeah in even notification we do see one entry because we already created that from lambda console so you have two options you can choose either from lambda or you can do from s3 also let me show you that once you can click create event notification you can give event name then again the same thing you can choose prefix here then suffix here then what event type i chose only put here so you can choose put then uh what should be the destination for this foot so in our case we wanted to choose lambda we are invoking lambda function you can choose sns topic also as sqsq for lambda function you can choose from the list so we have lambda s3 event so i'll not be saving the changes because i've already added an event now our lambda is also created with s3 as a source now the next thing is to actually write the code which processes the s3 event i'll go to my intellij i'll be using the same workspace which i did in my previous video in video 3 i'll create a new java class i'll implement request handler here yeah so this will ask me to overwrite handle method yeah handle request so in my case i want uh s3 event as an input and let's say i want to return back boolean as an output i will just change it okay even if i delete this and next time if i overwrite this this should automatically resolve two correct types yeah you can see here now we'll check if we are getting any record okay so what we can do is i can write a if check here i can call input dot get records dot empty if it is empty then i should return as false okay and one more thing i can do here is i can instantiate lambda logger so before returning false i should print here now the next step is when if we are getting the records then we should process those records right i can write a for loop here in this for loop i should be iterating over s3 event notification record yeah i will iterate over this first i'll get the bucket name from here as i explained in the ppt we do get s3 object and inside that we get bucket information and object information we need bucket name and object key to actually fetch the object from s3 right so now what we'll do is we'll get the bucket name from s3 record dot get s3 dot get bucket dot get name now i'll get the object name object key sorry get object dot get key so now this event doesn't give us the actual object this only gives us the required information about the event to actually fetch the object we need to call get object method first of all for that we need a dependency here for s3 sdk i'll go to may 1 and download that go to palm and just paste it here just go to maven and dream port this should add our dependency here now you can see here we got all the required dependencies here yeah here it is you can see here this is a s3 sdk now we have object key and bucket name so the next step should be first we create s3 client is uh we should be invoking get object and then processing the input stream from s3 so these are our next steps so for creating client it's always the best practice to do it outside your handle request method because that client could be reused in the next lambda invocation i'll create a client now i'll be choosing with credentials and default credentials provider chain and build so our client is created first step is done now we'll invoke get object method and this get object method takes bucket name and object key that's why you know we wanted bucket name and object key because we need these two to actually fetch the object this should return me s3 object now from this s3 object we can get the object content get object content and this content is nothing but s3 object input stream yeah now we got the input stream and we want to process this input stream we would be uploading a csv file so i can just print the content of csv right so for processing csv file you can use many methods you can use open csv or apache csv but i would not suggest using these third-party libraries until unless it's necessary because when you're using third-party libraries you are you are actually adding extra dependencies and this can lead to lambda cold start i can just use buffered reader here we are getting input stream here and so this is throwing exception i can catch that so i can just use logger here and lock the error message and i can return false in this case so now i have a buffered reader object i can just call lines and uh if i have my header row then i can skip the first row then you call for each and i can just print the content of csv file for every line i can just uh call logger.log here i can print the line just put slash in here max selection here yeah yeah so i think uh i can change this to true yeah so i think this is it this is what we need from our lambda now we can just package this and upload that to lambda i'll call ambient clean package i'll just refresh this yeah we have the target directory and this will open this in explorer now you can see here our jar file is close to 8 mb right i mean it's a huge size i can go to lambda console and in code i can just go and upload the jar file so it's loaded now the next step is uh to upload csv file to s3 for creating a csv file you can use mokaro to generate the random csv data so what i can do is i can just can use these four folders and i need let's say 50 rows i can just download the data yeah i'll save it here i'll go to s3 and click upload its file is uploaded now our event should be triggered and to confirm that we can go to monitoring here monitor and click view logs in cloudwatch if our lambda is triggered we should be getting a log file here i think we forgot one thing right we should have changed the default handler this runtime settings right because our runtime setting is different i'll just copy the package name here and then my class name and then method name is handle request which is i think correct yeah click save here yeah we do have one entry and this says class not found that means that lambda is invoked but it tried to look out for this class hello in the jar file but it is giving us error class node found yeah i'll go back and i'll just delete this and i'll upload the file again yeah it's uploaded again i'll just go to cloud watch and just refresh yeah invocation started let's just click on retry yeah you can see here our lambda has printed all the 50 rows because we generated 50 rows data now if you want to replicate that scenario which i explained during the theory part if you go to configuration and click on asynchronous invocation you'll see here retry attempts too so if you want to actually see this happening this retries we can try that now what i can do is i can just go to my code here and instead of you know calling this i can just throw an exception here i'll just comment this code through a next runtime exception i'm throwing the exception i'll follow the same process i'll just package this and upload to my console and go to code and upload jar file yeah it's successfully uploaded what i can do is i can just go to cloud watch and clear the logs because i want to only see the failure logs now to invoke lambda i'll upload the file again to trigger our lambda i'll just refresh cold start is happening yeah we have the logs you can see here this is printing the runtime exception right something wrong happened i so there's a typo here it should be in our code something wrong here so this is the first failure right now it should retry for couple of times so it will not try immediately but it should try in next few minutes so i'll just click on retry so you can see first failure happened at to 18 right and secondary tries at 22 19. so this is almost trying after one minute so this is our second failure something wrong happened now it should try for the third time third and the last one yeah you can see the third retry happened at 22 21 so yeah almost after two minutes so first one is after one minute and second secondly after two minutes so this is about lambda retries so this is all what i wanted to cover in this video thanks for watching this video guys you can subscribe to my channel to watch more such content on aws thank you bye
Info
Channel: Ajay Wadhara
Views: 17,218
Rating: undefined out of 5
Keywords: S3 Lambda Event Example, S3 Lambda Trigger Example, Process CSV with Lambda Java, AWS Lambda in Java
Id: 3oV4Nj_ruOA
Channel Id: undefined
Length: 23min 52sec (1432 seconds)
Published: Tue Feb 15 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.