Introduction to Regular Expressions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello my name is Gary Sims and this is Gary explained today I want to take a look at regular expressions now a regular expression is a way of defining a pattern for searching through some text or searching through a string to find particular matches so if you want to find out more about regular expressions please same explain now regular expressions are one of those things that pop up all the time in different areas of computing for example if you're doing any kind of system administration on Linux or on something like FreeBSD then you're going to come across regular expressions when you use a tool like grep and we'll talk about grip in a moment but also if you're doing any kind of programming in any of the major programming languages whether you're an amateur or a professional you're gonna come across rigger expressions when it comes to string matching inside of the computer program so basically a regular expression allows you to find a pattern so that you can match certain sequences of characters is one of those things that we could study all the theory about the language but actually I found the best ways to see it in practice and then you kind of say ah and now I understand how it works so to do that we're going to go over to a raspberry pi and we're going to use the grep command with a very very basic command that you find on linux and on freebsd and the are and the e is that of grip gr EP the r and the e there is regular expression okay so it's a tool that's designed for finding strings based on a regular expression okay let's go to the raspberry pi okay so here I am on my raspberry pi now I have a file which contains the full works of Conan Doyle for the Sherlock Holmes series so there it is there it's a long big text file if we do a word count on it we can see that it is pretty big let's see how that comes up look at that one hundred and seven thousand lines over half a million words that gives us something really good to search through and as I mentioned we're going to be using grep which will search for a string so if you just do grep Watson in Sherlock dot text then of course you get all the occurrences of the word Watson and that is a very simple way of just searching through a file now what's interesting about regular expressions is that they are completely case sensitive so if I search now for Watson with a lowercase W then it doesn't find any now there is a way of actually doing a grip which is case insensitive but I don't want to talk about greps flags and grips different options I want to talk just about what you can do inside regular expressions so if we take another example if we look for the word let's say some okay and we search for that there we find all the words some but they all begin there with a capital S and if we search with a lowercase s will get different words highlighted there the word some with a lowercase s so what you can actually do and here we are now going to look at our first part of regular expression we can say square bracket which means you're going to start a list and then we put in it s big s and little s and then we have om es what we're basically saying is I would like to match big s or little s followed by ome so now if we search for that we actually find we get examples where we have a capital S there's an example with a capital and an example with a lowercase so there is your very first kind of regular expression saying you would like to match something from a list followed by some other letters now of course you can do it the other way around you can have that in the middle at the beginning you can form this expression this way any any way you want so for example we can look for the word realize okay and here it is using the spelling with a Z or a Z now what's interesting is this book also has for whatever reasons it has the alternative spelling as well this is the difference in British English and American English with an S in it okay so we've got both of them in there so how would we search for that inside of a grep regular expression now first of all I'm going to put this now in quotes that just make sure that what we're typing in here is recognized to be a parameter for grep and not something that the command line the the bash shell want to try to find so now here we can also put a Zed in there doesn't have to be now ordering without their own as well so we're saying match the word realize with either Zed or an e so here you can see the list of what we're trying to find is in the middle of the word and if we do that we will see that we get both of them max they're here at the bottom you can see one with an S and then here you can see one with a Z now there are other things you can do to alter the way you search for example if we were just to look for the word wind okay that would find a wind window whirlwind these are all things I words that I know are inside of this text file so you can actually say by using backslash less than I want only the word wind when it is the beginning of a word so backslash less than means beginning of a word so that would exclude for example whirlwind but it will find wind and window so here we find wind and here we find window for example now we can actually say well what happens if I'd also what just the word wind you can actually say the beginning and the end of a word so if you do that now you can see it only match the word wind through all of these examples now you can also make that a bit easier by saying backslash B and that means on any word boundary don't have to specify the beginning or the end just say a word boundary doesn't matter where you are and that we don't remember the less than greater than it just means at the beginning or the end of a word and there we can see it's matched the same things now another thing you can do with grep is you can say I want something only at the beginning of a line that matches at the beginning of the line so we can just search for the word there with a capital T at the beginning of a line and there you can see it just matches all the way down there only at beginning of a line not here for example this one here in the middle of the word it doesn't match that it's match the one here at the beginning of a line also we saw the word there so if we put this to be on a word boundary now then we're only get the we're not gonna get there and you notice that that wasn't matched in there okay so another thing you can do again beginning of a word as we can say well let's find all the lines that begin with either a B C or D for example so that will now list out all of the words that begin with all the lines that begin with a b c or d because only the beginning of a line here we are we can see them all highlighted here okay now you can also if you wanted to go a d e f g h i j then that might be quite complicated so you can actually put a list so we can do a through to g and that will just create that actually it will automatically create BCD all the way through to the list that you create as a shortcut of making a range a range of things to match against and here it's matching a through to g now exactly the same way as it did with ABCD a moment ago now another thing we can do is if we could search for example for all four-digit numbers inside the text now to do that we could have a list and we could say you know naught 1 2 and start building up an expression that way but there's actually a predefined thing which is square bracket colon digit colon square bracket and that says now i want a list of all of the digits and now what you can say is backslash curly bracket for backslash curly bracket and that says oh four instances so not three not five just four instances of digits and that will help us find all the four digits in there so there we go all those years listed out there from when this was written and all the dates relative to that stuck to that period so that's a clever way there now i there are different flavors of regular expressions so there are normal regular expressions there are extended regular expressions and there are perl regular expressions so there is another program called ygritte okay which uses extended and one of the main differences a few of the characters don't need to have backslash behind them so in Ygritte you can just do it like that which you must admit does look a bit nicer and that will find exactly the same theme if we ran that program just on grit it won't find anything this grep doesn't understand the curly bracket it has to have the backslash - so when you're using this in a programming language for example inside Perl or inside of a library that you get for C C++ C sharp whatever you are using Python you've got to work out is it regular regular expressions or extended regular expressions or Perl like regular expressions and that would determine whether you need to use some of these back slashes or not okay now if we want you to look for let's say the word flood inside here that's pretty easy it can it can find it and you can also look for the word for example fluid so we can just search like that now you can actually search for both fluid and flood by saying dot dot here so I want a word that begins with an air followed by a nail followed by any character followed by any character and then followed by a DS it's almost like solving a crossword and it says any character can go in there but it must be that it must be a five letter word that's what this actually says it has to be a character any character followed by any character and now if you do this we can see it finds fluid and flood now you can do a very similar thing but with optional characters so here we're going to say for example if you have the word died for example it can find that inside of the thing but also of course we have the word did that's obviously found a lot as well so how do you search for both of those well like you actually do is you say you include the letter that you want e for example and then by putting backslash question mark what you're saying is the e is optional so or match did and it will match died because the e can be matched zero or one times but not more than one time and now if we search for that you'll see that actually it finds died and it found did because it the e was an optional search and it found both of those so that's really quite clever and here again we have an example of where we could use Egret for extended regular expressions and we can get rid of the backslash so it becomes di e question mark makes the e optional D and you can see that works but if you try to run that through grep it won't work because it doesn't understand the question mark without the backslash behind it okay I've got two more quick examples to show you and then we will summarize what we have learned if we want to look for all some really long words inside this text so we could start here by listing an ABCDE if now there is actually a shortcut which like before with digits is actually alpha so that means all alpha numerics okay and then what we want to say is like before I want it to be let's say all 16 character words but we'll learn something more here you can actually say comma 17 so it says between 16 and 17 so let's have a look now for that and look at the words we're finding here enthusiastically and indistinguishable and misunderstanding and those are all 16 or 17 long words so that's a good way again of doing kind of matching for different types of things again if you were doing crosswords or somehow that could be quite interesting okay one more example I can of course search for the month of June inside this file I can of course search for the month of July inside of this file and it finds both of them so how do you search for both where you can actually say search for June and then backslash bar meaning or or July and then that will search for both of those June and July and as you can see here it finds July and it finds June now there's obviously a lot of stuff to learn for an expression you can really build up some very complicated and quite complex regular expressions and there are some people who are really really good at doing this and they kind of you know it's kind of a hobby for them to come up with some really interesting regular expressions I do have a cheat sheet which I'm showing you now here which you can download from my github examples repository and this kind of gives you a basic understanding of how regular expressions work okay so you have it there is my introduction to regular expressions now it is I mean introduction because of course there is more that you can go into but if you understand those basic concepts you will be able to start using regular expressions for simple tasks okay my name's Gary Sims this is Gary explained I really hope you doing this video if you did please do give me a thumbs up please don't forget to subscribe you can also hit that Bell notification icon and well that's it I'll see you the next one [Applause] can be quite complicate they can't become couple of people [Music] [Applause] [Music] [Applause]
Info
Channel: Gary Explains
Views: 21,940
Rating: undefined out of 5
Keywords: Gary Explains, Tech, Explanation, Tutorial, Regular Expressions, grep, Regex, Linux, regexp, rational expression, search, strings, pattern, pattern matching
Id: vcRPNhLbhoc
Channel Id: undefined
Length: 13min 31sec (811 seconds)
Published: Wed Apr 24 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.