How The 'awk' Command Made Me A 10x Engineer

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
before i learned about the awk command i was a very unproductive employee my boss would assign me simple tasks and i just couldn't get them done fast enough to meet his expectations one day i was asked to make some changes to a text file containing temperature values some of the temperature values were stated in celsius while others were stated in fahrenheit and they needed me to convert all of them into celsius i tried converting them by hand but this took forever and the result was full of mistakes writing a dedicated program to solve this problem wasn't an option because that would take forever and this task was never going to be repeated again in the future i tried everything to find a way to solve this problem faster i was beginning to feel like it was hopeless but that's when i discovered the power of the arc command with awk i was able to immediately complete this task in one second with the following command to understand how this awk command works let's start by examining a simpler awk command that does something similar this awk command simply iterates through every line in the file let's compare this output with the original file as you can see this file contains two columns one with the temperature and one with the unit the two columns are separated by a space using this awk command you can see that we capture the first column into the dollar sign one variable and the second column into the dollar sign two variable one of the primary things that aug is good for is taking lines of text and splitting them into columns once the columns are separated you can work with them individually now that you can see how to extract the columns with awk let's try to get back to the command that normalizes temperature values to celsius we can start by trying to use the same formula as before this awk command performs the fahrenheit to celsius calculation on all lines of the file this isn't what we want because it mistakenly tries to convert values that are already in celsius we can use the ternary operator to only convert values that are in fahrenheit and leave the celsius ones alone okay that's a bit closer now we can add back in the units but this is incorrectly changing the first line of our file we can prevent this by adding nr greater than one nr stands for number of record the first line of the file is considered record number one let's try this okay that's a bit closer but we're still missing the header to print the header without changing it we can add this statement this will check to see if we're on the first record number and then execute an empty statement which defaults to printing out the current line and now we have the full completed output that was shown previously but that's not all the awk command is a general purpose text processing tool that can solve even more problems than the one we've just seen when we look at this output any of the values that were converted from fahrenheit look a little bit messy they include more decimal points than the celsius values we can improve upon this by using a slightly more complicated arc command instead of just using the print function which we used before we can use the formatted print function this lets us specify fancy printing options for the numbers but this command is otherwise the same when we used this basic awk command to separate the two columns it almost seems like magic how it knew where to separate them this is because the field separator is set to be a space by default if the columns in our data was separated by a comma instead we could specify the field separator to be a comma and it would still work here's what happens with the updated file if we don't set the field separator as you can see both columns get stuck together in dollar sign one if we specify the field separator with dash f now it works just like before this works the same way if the field separator was a tab instead unfortunately awk is not quite as smart as a csv parser for a case like this most csv parsers would understand that this comma is inside of double quotes therefore a csv parser would probably treat this entire thing as the first column as you'll see awk is not quite this smart and here you can see how awk isn't quite as good as a csv parser if you play around with awk you'll find that dollar sign one gives you the first column dollar sign two gives you the second column and if there was a third column dollar sign three would give you that but in our case there isn't one dollar sign zero always gives you the entire line awk is specifically designed for iterating over text files one line at a time the syntax for many awk commands can look very strange and confusing you could easily be misled into thinking that the syntax for awk commands is extremely hard to learn but the opposite is true awk commands are very simple think of awk as just a simple program that iterates over your file one line at a time for every line in the file your program checks to see if that line has a certain pattern if that line matches your pattern some action is performed then you check for another pattern if that pattern matches you do something else since this way of doing things is so common in awk the if statement is never written and if you do you'll get a syntax error therefore you can think of awk commands like this for every line in the file test for the pattern that's in the parentheses if it matches do whatever's inside the braces here's a more specific example the pattern that we're checking for here is to see if this is record number one if it is then we print this is line one and then print the entire line then we say if it's record number two print this is line two and print the entire line even though we only see two lines of output these two expressions were tested on every line in the file it just happens to be the case that there's only one line that's line number one and one line that's line number two it's also helpful to know that the parentheses are optional the fact that so much of the syntax in awk is optional makes it very difficult to read for inexperienced users the awk command makes a surprisingly large amount of default assumptions for example if we just say print it will assume that we want to print the entire line to be even more terse we can just have an empty statement with a semicolon however if you want to print nothing on these lines you can add empty braces if we want to print the rest of the lines in the file other than lines 1 or 2 we can add an action statement that defaults to printing out the entire line it's useful to know that an action statement with no pattern associated with it defaults to always matching if we add the print statement back to these two lines now you can see that for the first two lines they get printed twice because line one matches this pattern but also the default empty pattern line two matches this pattern and also the default empty pattern the rest of the lines in the file are only able to match this pattern the awk command can increase your productivity by up to 10 times on certain tasks thus making you a 10x engineer here's an example that illustrates this fact the awk command is also capable of doing regular expression matches too let's use awk to extract all lines from this file that end with the letter e as we saw before dollar sign 0 represents the entire line this part here is our regular expression specifying that we want to match an e that comes just before an end of line this symbol here specifies that the entire line will evaluate to true if our regular expression matches the entire line and here are the results as we saw before the parentheses are optional with awk to make things even more simple if you don't specify what the regular expression is being matched against it will assume that you're matching it against the entire line to make things even more simple if you don't specify what to do after the regular expression match by default it will print out the entire line this command is simple enough that you could almost consider using it as a replacement for grep in a similar vein of thought you could almost consider using the awk command to replace the sed command as well this awk command will execute a regex replacement to replace any e at the end of the line with five zed characters and here you can see the result having said this this syntax is a lot more awkward than the syntax with said and not all of the features are exactly the same similarly the feature set that you get with awk is not exactly the same as what you get with grep most of the advantages that you get with grep come from the simplicity of the command line flags which were discussed in another video most of the advantages of awk come from situations where you want to do something more complicated than a simple search or stream replacement this example shows how awk is more like a fully functional programming language more than a simple line editor earlier we discussed how awk uses a sequence of pattern and action statements the word begin is a special pattern in awk it doesn't match any specific line but instead causes the action to execute when aux starts up similarly the end pattern executes the action when awk is about to shut down this lets you run awk commands that are much more sophisticated and act like a fully functional computer program the entire text that you see here is one awk command this awk command iterates over all lines in the temperature values file from before for each line it normalizes the temperature value to celsius just like it did before but in this case we add the celsius value to a running total our total is initialized to 0 in the begin action after all lines in the file have been processed we can calculate and print out the average temperature now that i've learned to use the awk command i really feel like i'm starting to move up in the corporate world in fact my boss said that if i worked overtime seven days a week for the next three consecutive years that i might be eligible for a one percent raise
Info
Channel: RobertElderSoftware
Views: 169,613
Rating: undefined out of 5
Keywords:
Id: FbSpuZVb164
Channel Id: undefined
Length: 10min 40sec (640 seconds)
Published: Fri Dec 11 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.