we are so-ho-ho back | TryHackMe Advent of Cyber 2023 Day 2 [Python + Jupyter Notebooks]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well ho ho hacky ho hackers and gentle hackers welcome my name is Husky hacks and I look forward to this all year and we finally made it it's try hackme Advent of cyber 2023 TI the season it's hackmas everybody we made it and let's have some fun today so what is on the docket for today well it's day two and so we're going to have a nice easy gentle introduction to a wonderful little tool that I like to call Python and specifically Jupiter notebooks for d analysis and what a skill to be leading off Advent of cyber 2023 with if you can get good with python and Jupiter notebooks and data analysis you are unstoppable when it comes to cyber security so let's not waste any more time let's Dive Right In and get rolling all right so when we get to uh task 8 which is day two of the Advent of cyber for this year uh we've got a little bit of our story Exposition after yesterday's resounding success MC honey Bell walks into the Antarctic crafts office with a gleaming smile she takes out her company issued laptop from her napsack and decides to check the news traffic on North 15 highway glad I skied into work today she boasts it's got a little bit of exposition here and she gets a message from Holly chat and it says it's another task it reads the B team has been tasked with understanding the network of Antarctic Craft's South Pole site so taking a minute to think about this task mcconell realizes that Antarctic craft has no fancy technology that captures events on the network no Tech no problem exclaims McCone Bell she opens up her python terminal oh baby I love python I love python because it is simple it is approachable and yet is so powerful in so many different contexts and if you can get good with python as a cyber security practitioner a data analyst uh any number of ways that you can use this technology you are going to be nigh Unstoppable So today we're going to use Python if you've never worked with the language before it's a very clean approachable easy language to work with but we're specifically going to be using a really awesome tool that uses python as its kind of core and the tool is called Jupiter notebooks so before we get any further I have just started up the machine that we're using today so you can just go ahead and click the start machine if you have not and you just got to wait for that one to spin up so as we are spinning this up let's read a little bit more so data science 101 so the core element of data science is interpreting data to answer questions fantastic data science often involves programming statistics and recently the use of artificial intelligence to examine large amounts of data and understand Trends and patterns and help businesses make predictions that lead to inform decisions okay fantastic so we are We Are Scientists and we're using data we're using information that's all around us and we're going to like harness that right it talks about kind of the different like concepts of data science here so there's a lot of like you know cleaning data and organizing data and then drawing insights out of that data so go ahead and read through that if you want all right so that's all well and good for data science in kind of the abstract but what about data science specifically related to cyber security so that is where we get things like the seam so that is like the correlation of lots of events so as things happen in a network or on a computer they they create logs they create events and so like grabbing those events and correlating them in one central location and then being able to read through and look through all these events honestly is a really powerful way to extract a lot of insight about what's going on in a network threat Trend analysis predictive analysis so all of these are applications of data science as they relate to cyber security and so now we're going to use the principal tool that we'll be using for this lab which is Jupiter notebooks so at its core Jupiter notebooks uses python so let's say like we had our terminal right here and so just like if we were to run Python 3 and then we open up The Interpreter of python we now have the ability to give python some some syntax and have it do things based on that syntax hello aooc and so if we print hello AOC then the python interpreter takes that text and says oh they want me to print hello AOC and so that's what I'll do so right here we're using python in the terminal right so this is one way to interact with python now additionally there are other ways to interact with python as well one other way is the I python the interactive python interpreter and this is actually very similar to what we'll be using today and we can do the same kind of thing here so print and then we can say hello AOC and that's kind of the same deal so we're using the interactive interpreter here to give python some kind of syntax and have it do something on its behalf and so this is another way to interact with python would be through the IPython terminal but again in the terminal in the regular python terminal and in IPython we're kind of missing something so what are we missing we're missing the context about why this code is running let's say that your friend has developed a python script for you and it's extremely useful but they hand it to you and they don't give you any other explanation and you can read through the code a little bit and you can kind of Reason about what it's doing but there's a lot of context that's missing here and so Jupiter notebooks is one solution to provide the context for why this code is running and what it's actually doing now if you have not already go ahead and go to the main notebooks section here and we're going to open up this first one intro to Python and you can click on this ipynb notebook and you'll get the notebook in the side here and I'm just going to collapse this terminal so we can see this easier so what you see in front of you is an interactive python notebook ipynb and specifically the important thing to notice here is that we've got some readable text so we've got introduction python is an extremely versatile high level programming language some examples include etc etc notice how this is all just nice and readable text however then we can go down to another cell in this notebook and we have python that we can actually run now now before you do anything else you should go up to the kernel tab here and let's just go ahead and say restart kernel and clear outputs of all cells and it's going to say do you want to do that yes now the reason we do that is because the last time somebody ran this notebook the the output from everything that they ran is already there and so we want to clear that output and get a nice clean slate to work with so let's talk a little bit about how to control what's going on inside of these notebooks so this is all good but like how do we actually like interact with this and that is the key to Jupiter notebooks is that it is interactive right so in contrast to what's going on up here in the hello world block this is all markdown syntax if you doubleclick on that you can actually see the markdown that is printing out all of the text for us and so we can actually even add to this right if we want to write some markdown we can say something like I can put another title block here and say new markdown and I can say hello everyone happy hackmas and if I hold control and then hit enter I'm going to run this this markdown and then I've got a new section that says new markdown hello everyone happy hackmas right and so what is the point of having all this markdown here this is to document our code right so we're going to offer explanations alongside the code that we're running and in this case we are explaining the fact that we are about to run hello world and print it out using python the syntax in this case is print with two open parentheses and then whatever you want to print in double quotation marks inside of that and so we select this cell this this little block of python code and we're going to hold control and hit enter and it's going to say hello world so look at that we just printed out from within our interactive python notebook our Jupiter notebook we have printed out hello world by running python code so I think you're starting to see like the the see the board here about what's going on so instead of just handing somebody a script or instead of just having a block of text explaining what something is going on you have both in a Jupiter notebook you have interactive python code and you have explanations and markdown around that now this is a gentle introduction to what's going on inside of python so we're talking a little bit about variables in this case so if we click on this cell right here we see that we're defining a couple of variables we also have these right here which are additional comments in our code you know that they're comments because in Python they have this pound sign the hashtag right here and anything that's on the line with this symbol right here is not going to be interpreted by the python kernel it's to be passed right over and that's just more context that we can use to explain what code is doing and so in this case we are defining three variables we're defining an integer and we know that because we're giving it a number with no quotation marks or anything like that uh we're defining a string and we know that because we are giving it a value wrapped within double quotes and then we're also giving it what's known as a Boolean value and this is either true or false So in computational theory things can either be true or false false and we can represent that by passing in the capital T true or the capital F false value as well so that's kind of like a keyword in Python I want to point something out how does The Interpreter know what kind of data types each of these are I'll let you think about that for a second so we're giving it an age and we're saying hey this is a number and it knows that that is an integer or a number value we're giving it a name wrapped in double quotation marks and we're saying this is a string value value it's a string of characters and then we're giving it a Boolean true or false now the answer the python interpreter is smart enough to make a type inference it's going to say hey you don't even need to tell me that 23 is a number I know that just by virtue of the fact of what it is and so it's going to use that type inference to denote the data type and so if we control and enter here we have now assigned each of these variables to the their corresponding value and now we go down here and we can read through again we've got nice and and clean documented code here variables are essential in programming uh for now we're just going to use the python print statement so yes you can use your variables in in many different ways in your program but if we run this next cell here we're going to get name Ben we're going to get age is 23 and is Left-Handed is true now another thing that we can do and remember this is all within the jupyter notebook so we can you know edit the code or change the code to see fit another thing we can do if we're unsure about what type our variable is at any given point we can use the type method here so we're going to say print out the type of name and so if I control and enter one more time I now have a new printed line that says St Str which stands for string and so that's good for name but what about H and I can run that again and now I have an integer so remember Python's smart enough to know based on the actual shape of the data that you're giving it for your variables what type of data that is and sometimes sometimes that can be very important and sometimes that's going to shoot you in the foot so just be careful about your data types in Python so now we can also reassign a variable we can redefine it with a new value right let's say that we had Ben right here who is age 23 and is Left-Handed true now we can run a new block a new cell of python code in our notebook to change the age variable to 24 and the rest of this is going to remain the same and so when we run that now we have age is 24 now notice that I have to run these in order and so we're running these in order and you can tell that by the little bracket right here bracket five means this was the fifth cell that was executed and then correspondingly the latest one that I just ran is bracket six and we have ages 24 but what would happen if I went back up here to bracket number two and I held control and hit enter I now have this cell ran as the seventh cell and so that has re defined all of the variables now instead of 24 age has been set back to 23 and I know that because if I go back down here and let's say that this was not being redefined let's say I comment that out right there if I rerun this code we now have ages 23 again so if you're if you're not following here the point of all of this is that in a Jupiter notebook python cells are run in order and you can go back and rerun cells from earlier and that might have an effect on what the variable values are given where they are in the program now if you want to clean the Slate completely if you want to start over from scratch a handy little key binding here is that hitting zero twice on your keyboard is going to say restart the kernel do you want to restart this all variables will be lost that's basically the same as quitting out of python and then re-entering Python and now we don't have any variable definition here for this eighth block that we've run and so if I were to run this again it's now going to say hey I don't know what any of these are and that's because they were not defined since I restarted the kernel now moving on in Python there is the concept of the list this is a structured set of data in Python and so lists are good for like collecting things into a set so if you want transport to have car plane and train and then age to have 22 19 and 35 you can go ahead and Define that in square brackets and you can have that be a list now something else to note here is that these are strings again we're type inferencing because of the quotation marks uh but you can have a list of any kind of data type you can even have lists of lists it's it's very strange so let's go ahead and run this one remember controll enter and now we have defined a list called transport that has three values and we have car plane and train and then we've defined a list called age and we have 22 19 and 35 as our values now here's something that's also interesting remember jupyter notebooks I can just add a cell if I want want to right here and so right under this other cell why don't I do something like this I can print out transport and let's run that and I've got car plane and train and then I can do something called indexing into this list now this list has a a set of items so I have the uh first second and third item in the list Python's a little special it identifies the first element of a list as position zero and so if I want the first element of the list I can open up square brackets of the transport list and I can do that I can say hey give me the zeroth in other words the first element of this list and that's car and then I can also do the first the the oneth element which is the second item in the list a little confusing just something you kind of have to get used to in Python and that is uh plain right here and then you can do that with the corresponding uh third element of the list or the position two elements which is train if I try to do anything after that it's going to say there is no value in that position and so I don't know what you want me to do and we can actually just get rid of that sell uh for now so that's good okay next up importing libraries so libraries are just packages of code there's they code that somebody else wrote that's useful to you and you want to get them into your program now in this case we're going to work with a library called pandas and so we can import pandas by just using the import keyword and then using pandas like that the the name of the package or we can rename it to be a little cleaner and the idea here is that if you're going to be using this imported Library a lot you might want to name it something shorter just so that you don't have to keep typing out you know pandas do this pandas do that you can just say PD very conventional very pythonic thing right here is to import it as a a shorthand like this so again in jupyter notebooks it's as simple as this we hit control we hit enter and then we import this code these code packages into our uh notebook now we're actually we don't do anything further here and this is actually an important part if we have other code after this in our Jupiter notebook we've now imported pandas that we can use in that other code however we're going to move on to the next notebook in this uh room so we have lost the these notebooks are self-contained they do not understand that there are other notebooks unless you tell them specifically about them and so when we go into two here intro to pandas we are in a completely new separate python kernel and so if we want to use pandas we have to repport it so far we've been using vanilla plain Python and that is just the basic standard set of python code that will print variables and assign variables and that kind of thing but here we're specifically looking to use python in kind of like a data science data analytics method right and so we're importing pandas to help us with this this is a a robust Library that's defined uh to do data collection data analysis and work within data sets right so uh pandas allows us to manipulate analyze explore and clean data sets cleaning is essential to data analytics as it will improve the data quality produce more accurate results and make the data easier to work with so one of the first things that we're going to do here is to make a data set or a series so this is a key value pair so in this case we have the key of zero has the value of train the key of one has the value of plane and the key of two has the value of car and so let's see how to make this work the first thing that we need to do in fact before we do anything else let's go up to the kernel and let's restart kernel and clear all outputs of all cells and so we've got a nice clean slate to work with so let's go ahead and import pandas as PD we hit control we hit enter let's create a variable that is going to contain our data which is a python list I want to take an aside here actually notice that for try hack me this makes so much sense Jupiter notebooks really kind of like goes right with the feel feeling of try hackme which is to walk you through what's going on with text with easy explanation approachable explanation and then put you there alongside the code that's happening and so like from a like teaching perspective that's so powerful but imagine that you can take that power and apply that anywhere else if you're a data analyst and you're using Jupiter notebooks to run your data in your organization and do analysis on it you can hand a notebook over to somebody else and you can say something like don't worry I have all of the explanation right alongside the code in markdown think about how powerful that is and I think what doesn't get talked about in our profession a lot is that code is fantastic and code solves problems but explanation and context and giving people like a stepbystep walkthrough of something is so undervalued and it's so critical so Jupiter notebooks makes that so easy and it makes sense because it just kind of like feels like what try hackme does in the first place right so we're going to make a list and then turn that into a series right so we're going to take a list of things train plane and car and we're going to give that a key value pair to be used in pandas so we're going to define a list called transportation and then we just print it out so we know that we've got our list all ready to go and there it is and then so we're going to use the series method right here and so we call our series method where we're asking pandas we're saying hey I know you have a function called series and you're going to take in the list that I call transportation which is plane train and car and I want you to make a series out of that so we've already imported pandas as PD all we have to do is put a dot and then call series and pass it in the argument in this case for our list and so when we run this we now have a series we have the zero is train the one is plane and the two is car now another transformation that we can do with this data is a data frame so think of this as like this is almost like a spreadsheet it's like an Excel spreadsheet almost and so instead of just a list with uh items of the list and then the uh corresponding integer values which were the series we're actually going to give everything context in this case we're going to say hey we've got stuff like a name we've got stuff like an age and we've got stuff like the country of residence and here are the data items in this spreadsheet and so this is a little funky here but what we're doing is we're defining a two-dimensional list so we have both a column header and a row set set as well and So within our data list we have in in of itself three different sets of data Ben who is 24 who lives in the UK Jacob who is 32 who lives in the USA and Alice who is 19 who lives in Germany and so what we can do is have pandas make us a data frame in this case which we're going to call DF so when we run this cell we now have almost a spreadsheet but we've done this in Python and so in the column headers we have name age and Country of residence and then we've filled in our spreadsheet our data frame with the different values that we defined earlier here in this list and so we have just transformed this data from just kind of like random all over the place we now have given its structure by giving it a data frame now this is really cool because once we have a data frame we have all manner of like manipulations and and ways that we can sort and look at that data available to us in pandis now if we want to retrieve a specific row of our new data frame we can use the loc location method and we can give it the index the position of the row that we want and remember like I said earlier Python's a little weird the first elements of an index starts at zero and so if we want the first row we would pass it in location of zero but if we want the second row we're going to pass it in location of one and so when we run this instead of the uh first position here which is at position zero we actually get the second row right there again a little confusing it's just something you have to get used to python is a zero indexed language but we're not stopping there because we've got even more manipulations and methods of handling the data that we can use just with pandas and the data frame and so what we can do is Group by and sum our data inside of our spreadsheet so if you think like an Excel spreadsheet you can kind of like select all of a certain type of cell you can sort them by their values you can sort them ascending or descending you can do lots of different manipulations like that well pandas allows you to do that as well and so the first thing that we're going to do in the file system here we also have this awards. CSV file so this is just a basic comma delimited set of data right here so we've got like Ben who is in the IT department who has one prize and etc etc and so what we can do is we can read in that data just by opening up the file and reading the CSV and so we can do that let's close this and make this a little bit bigger so we can do that by running this cell and so in this case we're making a new data frame and we're saying hey pandas I've got a CSV in my working directory called awards. CSV can you go ahead and read in all of that data and make a new data frame out of it and it's it's that easy it it reads in the CSV and then it creates our new data frame and now we have the indexed position of each of our rows from this CSV now that we have a data frame of our CSV what can we do with it well we can use things like the group bu and sum methods as well so what we're doing in this case is we're saying hey pandas for that data frame that we've defined let's group it by department and prize and give me the sum value so when we run this we're going to say hey the accounts department has one prize the IT department has two prizes and the support department has no prizes and it gives you the breakdown of the data as well all right so let's go back and the next thing up here is that we're going to use use Matt plot lib Matt plot lib is my favorite part of data analysis who doesn't love a graph I agree I totally agree all this data is awesome and we're going to get a lot of insights out of this data but what if we could make really pretty charts and graphs and things that we can show other people to really drive the point home and so what we're using in this case is a combination of pandas which we've been using before but then we're going to use matte plot lib which is an additional python library for plotting things out on graphs so let's go ahead and go up to the kernel and we will restart and clear all outputs so we are at a clean slate and let's run to import pandas and matte plot lib and again we're doing them shorthand so we don't have to type all of that out so in any case uh to create our first plot we're going to use the plot method right and so it's it's simple as that if we run this cell we will have a graph cool but there's nothing in it okay and the idea is that of course if you have have a graph with no data points in it well it's going to be an empty graph so what we can do is fill up that graph with a couple different values so let's fill it up with on the x axis January February March April and on the y axis 8 14 23 and 40 and so let's go ahead and try that and there we go there's our pretty little graph January February March April on the x axis and on the y axis we've got everything up to what was the top 40 right there and so what's important to note here is that those arguments are passed in X and Y and so the first set of arguments we pass in a list of everything we want on the x- axis and then the second set of arguments is a list of everything we want on the Y AIS along the corridor and up the stairs so along and then up and so we can also make graphs that have a little bit more context right and so when we run this one we also have a descriptor remember Jupiter notebooks is all about context right a descriptor a line graph showing the number of Toys produced between September and December and then we've got our months of the year labeled on the x axis and our number of Toys produced labeled on the y- axis and it looks like we've got a a crazy uptick since October and yet that makes a lot of sense because we're gearing up for the holidays and so in addition to this we can also make bar graphs as well and just like we did before we read in the CSV into pandas to make a data frame we can do the same thing in the working directory here we have drinks. CSV and we've got a breakdown of drinks and the numbers for I guess their favorite drink how many quotes uh each drink got and so let's go ahead and make a graph of this we're going to build the data frame by reading in the CSV and call that spreadsheet and so then we're going to Define two variables that say drinks and votes are going to equal the spreadsheet values for drink and vote and then we are going to plot our figures along a new bar graph and in this case we're going to use the matplot lib barh method right here to build a new bar graph so let's go ahead and check it out awesome all right so we've got a bar graph showing the employees favorite drinks and we've got a clear winner for the frostbite frapuccino which had 16 votes and the Holly Jolly Fizz was a distant competitor in second place with eight so the frostbite frapuccino doubled the votes of the second competitor so there is a unanimous and clear uh number one here as well and so that has demonstrated Matt plot lib and pandas working together to make bar graphs who doesn't love a nice graph and finally we're going to go back to the Capstone here and here is our final Capstone for this room and so what we're going to do is follow in with the narrative here and so what we have in our working directory is the let's see here the network traffic. CSV and it's got a breakdown of packet number timestamp Source IP destination IP and protocol okay so we are going to use this CSV to understand a little bit more about what's going on in the internal network of this company and we're going to use pandas to do so your task is to apply the skills learned throughout today's task to analyze the packet capture from Antarctic crafts Network so you're going to use pandas to analyze the source destination and protocol for these packets and then we're going to use functions like sum and average and size to describe this grouping let's go ahead and go up to the kernel again and one more time let's restart kernel and clear all outputs just so that we have a nice clean slate and let's start by importing our packages that we need so we start by importing pandas and Matt plot lib and then we're going to read in the network traffic CSV into our data frame and then print out the head the first five entries in this data frame and so it looks like we have successfully loaded our data into the data frame fantastic but the data from that CSV is a lot longer than these five right and so here is where we need to get a little Savvy to answer the following questions so so what we want to do is get the data frame count so we're going to get the count per question number one how many packets were captured and so what we'll do is DF do count when we call count on this and run the cell we get the outputs for the packet number our count for packet number goes up to 100 and so 100 is the answer for that question next up the second question is what IP address sent the most amount of traffic during the packet capture and so what we want to do is kind of follow along in what this is doing right here and so remember we call our data frame DF and we're going to use the group bu method we open up parentheses and then we've got to say okay what is the the data column header for what we want to group by and so the name in this case with single quotations is the source because we're looking at which IP sent the most amount of traffic during this packet capture and so the Source in this case would be what we're interested in and so what we can do is then use the do size method as well and if we run this we're going to get kind of like a disorganized output but we do have in fact the correct data that we need to look for so one thing that I like to do as well is to use the sort values method and you can do ascending equals false with a capital f right there and so that in this case is going to give you the sorted set of data for what you're looking for and so the source IP with the most packets that was captured during this packet capture is 101014 with 15 instances of packets that were sent from that and so we put in 10.1.1 do4 and that is the correct answer and finally we've got the question number four which is what was the most frequent protocol and in this case we're kind of doing almost the same thing that we just did up here so we can even take this code so we're going to do a DF Group by and we're going to copy this right down here but instead of the source what we're looking for is the protocol and let's run that one as well and so for protocol our data items are icmp had 27 packets that were considered icmp which is of course ping DNS following closely behind with 25 HTTP and TCP following that but icmp is the clear protocol with the most traffic on this packet capture and so we input icmp and there we go all right and that is the challenge for today uh that's a gentle introduction into some of the most powerful tools that we have at our disposal as cyber security practitioners uh if you can get good with data if you can get good at wrangling data and interpreting and analyzing data there is no question uh in this industry that you won't be able to solve and I mean that personally I use python in some capacity every single day of my life and uh I use Jupiter notebooks often and so this goes to show you just the power that you can unleash if you can get good with these Technologies so that's it for day two of Advent of cyber I hope you enjoyed that please let me know what you thought about the video leave a comment below and uh I hope to see you here in a couple days hint hint so be on the lookout for that but hey take care of yourselves enjoy Advent of cyber and uh see you around
Info
Channel: HuskyHacks
Views: 22,668
Rating: undefined out of 5
Keywords:
Id: L_GinPxbuzI
Channel Id: undefined
Length: 32min 14sec (1934 seconds)
Published: Sat Dec 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.