VPC FLOW LOGS | WHAT IS AGGREGATE INTERVAL | Visual Explanations

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so we all know that log files have always been a savior when it comes to debugging issues with the services and applications we run so do we have something in vpc that can help us with this yes let's talk about vpc flow logs and let's understand how you can consume logs while working with your own virtual private cloud so if you're ready let's begin and in today's episode for flow logs we will be talking about what are the vpc flow logs and how do they work what is the logging format and vpc flow logs what is vpc flow log record and what is aggregation interval and we'll do a short hands-on demo for vpc flow logs as well and all the links are in the description below so please make sure you check them out so we have always been using logs for various purposes let it be using them to watch over the console activity or the internet traffic activity or the rest api calls or even the terminal logs to understand what exactly is going on as and when we perform a certain activity and working with logs in vpc is also no different you would always want to keep track of the internet traffic that you have that is coming in or going out of the vpc for the same we make use of the vpc flow logs so let's talk more about that so vpc flow logs is a feature that enables you to capture information about the ip traffic going to and from the network interfaces in your vpc so remember this very carefully when you read this being a feature then imagine having an option to switch it on or off for a service that you're currently using that's the same reason why it's rightly mentioned here that vpc flow logs is a feature that enables you to capture information in the form of logs and you can publish the flow data or the flow log data to amazon cloud watch logs or amazon s3 so if you wish to see the logs you have to go to either of these services and view them and yes there are a lot of benefits of using flow logs but these three points have been actively mentioned in the documentation as well so the first point is monitoring the traffic that is reaching your instance that is helpful as you can review the incoming request and analyze or make changes to the application depending on what type of logs you are receiving and the second one is also very useful diagnosing overly restrictive security group rules so in case there are issues with connectivity to the instance or the services that you are trying to access it can also help you figure out the issue the number three that we have is determining the direction of the traffic to and from the network interfaces so another thing that you will understand when you ski the log formats or vpc flow logs is that it contains the information about the source and the target instances or services that the logs are being sent to or received from and that also can help you with debugging so let's move on and understand how we can use the vpc flow logs so you can create vpc flow logs with three entities so the first one is vpc itself or the subnet or a network interface so if you enable it for a subnet all the instances and interfaces within that subnet will also be monitored neat isn't it let's see how it does so when you think of log any event that occurs in the pieces of entities that you see here will generate entries that contain information about what exactly has happened and that piece of information or the entry is called as log and the flow log data for a monitored network interface is regarded as flow log records but if you wish to publish logs you must keep these three steps in mind so the first step is the resource for which to create the flow log so it could be your instance or subnet or vpc and the second one the type of traffic to capture so it could be either your accepted traffic or rejected traffic or it could be all the traffic and the third one the destination to which you want to publish the flow log data that is either if you want to store them as a file in s3 or the cloud watch log stream so if you see the visual here subnet a has the vpc flow logs enable for the instance or the network interface and it publishes logs only for that but when you see on the right hand side the vpc flow log actually has been enabled for the whole subnet and as i have already mentioned before this will cover all the instances v3 and v4 and the network interface that are part of the subnet unfortunately as nothing is attached to the instance v2 it actually misses out on the logs simple isn't it and we can create flow logs for the interfaces that are created with elastic load balancing amazon rds amazon elastic cache amazon redshift amazon workspaces nat gateways transit gateways so many of them are here so that actually gives us a lot of provisions to enable vpc flow logs and monitor these systems and you can send them to cloudwatch or amazon s3 that's one added advantage isn't it that we get so now that we have seen how we create the flow logs let's see the log format so the syntax of the flow log is similar to this which contains a lot of information and i didn't want to discuss all of them but here are a few important ones that i wanted to discuss so the first one that you have here is version so the vpc flow log version if you see here we have already mentioned that the default format actually specifies the version s2 and it will choose the highest version among the specified fields so if you specify the version to be 2 then the highest version is 2 and it will choose that version is 2 and if you specify a mixture of fields let's suppose version 2 comma 3 comma 4 it will pick the highest number from there and choose that as the version of the version of the log so that version will be 4 because we have mentioned 2 comma 3 comma 4. so the next one is account id so it's very evident by the name itself so the aws account id of the owner of the source network network interface for which the traffic is recorded the interface id so basically every network interface that you have will have its interface id and that also will be recorded as a part of the traffic and source address source addr the source address for the incoming traffic that will be your private ipv4 address and the destination address the destination address for the outgoing traffic that you have so that will also be your private ipv4 address the source port it's very evident that it's the source port of the traffic and the destination port also has been mentioned so next one is protocol so we all know that what the protocol means so it's the iana protocol number of the traffic so ina is basically internet assigned number authority so that organization is basically responsible for maintaining the collection of registries for protocol numbers and the next one is action action is very important because action that is associated with the traffic so it can be either accepted or rejected so they accepted the recorded traffic was permitted by the security group and the network acl and if it gets rejected then the recorded traffic obviously was not permitted by the security groups or network acl so here you can identify from the locks itself that there if there is any problem with the network acl or the security groups with the permissions and the last one that we have here is log status this is also very important so the status of the flow log so there are three statuses that we are seeing here so one is okay one is no data and one is kept data so the okay okay is basically data is logging normally to the chosen destination so there is no problem with that so no data so there was no traffic from or to from the network interface during the aggregation interval so we will discuss about this aggregation interval because let's suppose for a specific period of time there is no data that has been captured then you say that there is no network traffic to and from the network interfaces and there is skip data so so here what happens is some of the flow logs are being skipped during the capture window or what we call as aggregation interval due to some internal error so these are all the information that are very valid information that you can get with the log formats when you're using vpc flow logs and after all this you might still have some doubts like yes we send logs but what will be the frequency and is it sending logs in real time or not so let's discuss that so when you read this line which says a flow log record represents a network flow in your vpc then the flow log record captures the information about the network internet protocol traffic flow which means the flow of packet traffic that carries information from the source interface or instance to the destination so that is what it actually means so the flow log record captures the information about the network internet protocol traffic flow that is your ip traffic which obviously means that it is a packet traffic and what does the packet carry it basically carries the information from the source interface or instance to the destination and one more important thing that i forgot to mention that you might feel that the enabling of these flow logs it might impact the performance or latency but it does not because it runs outside of the overall service transaction and you can create these flow logs without having to worry about the impact on the performance so don't worry about that and the time interval in which the traffic flow occurs is called a capture window or more precisely an aggregate interval and remember this meaning of aggregate very carefully that it is an entity formed by a combination or collection of things and if you combine interval with aggregate it means that it's a combination of time intervals so here we have a time frame with each cell being a time interval of one minute and as it is rightly mentioned here that aggregation interval is the period of time during which a particular flow is captured and aggregated into a flow log record by default the maximum aggregation interval is 10 minutes so that is what you need to remember for the vpc flow logs the maximum aggregation interval is 10 minutes so basically your aggregation is a collection of time frames and in that time frame your logs are being captured so that is why it is called as capture window and as i already said aggregation is a combination it is made of small time intervals or small time frames called the sampling intervals the sampling interval is basically the distance or time between which measurements are taken or data is recorded so s1 that you see here has a sampling interval of one minute and a one that we have as the aggregation interval has the aggregation interval of five minutes so it will collect all the log records in that time frame that is the capture window consisting of five sampling intervals similarly we have s2 with the sampling interval of 5 minutes and the aggregation interval of 10 minutes here we have a log capture window of exactly 10 minutes and if you see here and try to understand the flow log works on the principle of the capture window time frame and it can produce more number of flow log records if the maximum aggregation interval is reduced so if suppose i reduce the maximum interval aggregation interval from 10 minutes to 1 minute it is going to generate a huge amount of log records and for the nitro based instances that you have it is basically by default it is set to one or less and the most important reason why flow logs don't generate logs in real time is that once the data is received it takes time to process and push them to either s3 or cloudwatch so don't expect that as in when you make changes you might be looking into your screens to have your results published so i hope that was clear and if you have some doubts then please put them on the comment section below and we can have a discussion on that as well i know theoretical concepts can be boring sometimes so let's do a small hands-on demo as well okay so now let's do a demo for the vpc flow logs so here we have the vpc console and if you want to enable logs for this just click on any of the vpc that you want to enable the flow logs and here you get the tab that you can see here like details or slider or flow logs or tags you just come to flow logs and we don't have any flow logs enabled as of now so you just click on create flow log yeah here this is the form that you get okay so now you can just give the name like my flow log demo as i already told you before like you get three ways to actually capture the traffic so it can be either the accept traffic or the reject traffic or the all the all actually combines both accept and reject so we can just choose all and the maximum aggregation interval is set to 10 minutes so i already told you what does that mean so if you want to listen to that once again then please rewind the video so you can have it either 10 minutes or one minute i can just keep it at one minute for now just to capture the logs as fast as i can and then we have the destination the destination to which to publish the flow log so i told you before that is the second step where is the destination isn't it so either i can send it to the cloud watch logs or i can send it to an amazon s3 bucket so first let's create for amazon cloud watch isn't it so okay i'll just choose this destination log group so i can just have one that is already there i can use it the same way or i can create a log group so if you want to create a log group go to cloud watch so this is basically how your cloud watch console looks like so you have cloud watch you have the dashboard you have the alarms and you have the logs and the log groups so you need to go to the log group so here you have the log groups i already have by default log groups that were created when i was working with lambda so no don't worry about that so you can just create a log group as well so i can just give my demo low log never expire i don't want to give any other settings so just create this one so this is my log group and this is the arm so now you can just refresh this you can see the option here already enabled this is automatic isn't it it's very good and one more thing that you need to do here is if you want to access cloud watch i think you need to have i am permission as well okay if i don't have this what you can do is you can just click on setup permissions as well so you can choose to create a new iom role and give a role name so low logs rule i don't think so you need to change it just click on allow that's it one click request sent and response received so come back and refresh this now you will have flow logs rule so select that and the log format can either be aws default format or you can have a custom format like you can change this and the way you want it isn't it we have already discussed what exactly these do so don't worry about it you can change it as well and now what happens is if you want you can choose the log format you can just choose it among the filters as well if you want to have a custom format but we will choose the default one the tag is already given that's it just click on create flow log now we have created a flow log and that is basically using cloud watch logs and the same i can just create another flow log by using my demo low log s3 this will be my x3 blue log and i can just click on all maximum aggregate interval will be one and i'll send it to the s3 bucket and it is asking me for a amazon resource name okay let's go back to s3 i think i already have a lot of buckets that i already created so i can create one more not a problem so this will be my log bucket bucket so i'll choose the region the same and rest will be same i don't have to do any changes to this one just give it a name and just save it okay my log bucket already exist okay let's give something not a problem but it should be unique so you have to remember that otherwise it will keep failing just click on this and you have to copy the amazon resource name just copy the bucket here and name go back to your vpc console and paste it that's it i don't think so we need to do any anything else just click on create flow load it's a very good thing that we have two right now so the main objective for us is so we have zero objects right now don't worry about it within few minutes or few seconds you will be getting logs and we can go back to cloudwatch as well and we can just go to the log groups and this group will click on this and now we have already started receiving logs that's great isn't it just click on one of them and you can see we have already started receiving logs this is reject log this is reject log this is reject log so don't get confused what i just added i came to cloudwatch i clicked on the log groups i clicked on the log group that i had created just click on this again and here you are getting the log streams so log streams is a collection of every interval time interval or the capture window that you have if it has any proper data it will create a stream for it okay so here it has created stream for three you can click on each of the log stream to see the logs okay don't worry about this so these are the events that are being captured you can see this is a reject okay call so the first one that you see here is the version number the version number is 2 by default this is my account id this is the interface id that i have this is the source ip this is the destination ip it is trying to reach so this is not mine and i think someone else is trying to use it so we can just see that as well so this is the this is the source code this is a target port this is i think for the protocol six i don't know what i think it is tcp but uh okay so six is for tcp transmission control seven is for cbt egp igp okay so once you go back then you have the time start time this is the end time this is the type of the request reject okay i don't think so there are any accept requests so but is there a way to check from where these requests are coming because this will be very interesting for us so are there any sites that can help us with um oh yeah i can find some ip address trackers so i can just copy the ip address that i have here and i can just paste it okay so someone is trying to access some of the interfaces we have from china it's interesting isn't it so this is from china and let's suppose we go back to another instance that we have the source instance and just paste it here this from london who is trying to access our instances obviously there will be some free public ip addresses that we have so now people want to access it is from russia region unknown city saint petersburg and if suppose i want to do the same for my instance what i can do is i can go to the ec2 and go to the instances that i have and this is my public instances in it or you can go to the networking part and you can go to the eni and this network interface what it can do is as we have already created this vpc flow log so it's already being tracked you can see here it's already been tracked but if you wanted to create it for the specific eni you can do that as well so you can just click here as well and create it you don't have to necessarily go to the vpc and create because you'll be capturing the data of all subnets and all the instances you can go to the interface id as well and you can create it so that's one more thing that i wanted to share and once you go back to the s3 console then you can just refresh this and we'll see whether yeah yes we have the logs now so aws s3 this is the account id this vpc flow log the region name 2020 11 16 and this is the log gz file isn't it i can go back download just save this file yeah so this is the file that we have so these are the logs that have been generated this is the same thing i think but most of the accept requests are there so it's a good thing for us so the same way you can have it and you can actually pass this by using athena that we have already seen so you can go back to the athena tutorials and you can see how we have accessed these logs and we have created sql scripts for accessing the data and passing the log files that we have from s3 so this is one way to do it so now here we have seen two formats or the two ways that we can do for one is for the cloud watch and another one is for the s3 so if you want to have it in any way you can do it either you can have them both ways as well and you can send them both to s3 and to cloud watch and you can have fun by looking at the logs and tracing out errors and debugging and have fun about that isn't it so thanks everyone for joining in for the session for vpc logs or the vpc flow logs and if you have any doubts then please put them in the comment section below and if you wish to support me the links to insta mojo paypal and patreon are in the description as well so until next time it's pytholic signing [Music]
Info
Channel: Pythoholic
Views: 5,383
Rating: undefined out of 5
Keywords: RoadToAWS, SAA-C02, Pythoholic, amazon web services, pythoholic, awssolutionsarchitectassociate2020, what is flow logs, vpc flow logs, how to investigate network issues in vpc, troubleshoot network issues in vpc, diagnose overly restrictive security group rules, security tool to monitor the traffic, vpc flow log records, network flow in your flow log, aws cloud, aws vpc, aws vpc endpoint s3, aggregate interval
Id: TzwrVF-45t0
Channel Id: undefined
Length: 22min 46sec (1366 seconds)
Published: Tue Nov 17 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.