Python Regular Expressions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're going to talk about regular expressions in Python regular expressions are a really useful tool that all Python programmers and programmers should have they have many applications and let me just tell you a little bit about them supposing you had a document let's say a word processor document and in that document you wanted to find an email address now if you knew the email address you could just use the find function in the word processor and it would match that email address and you could do the same with anything with up with the URL or a file name or anything that you were looking for if you have the specific name now imagine that you didn't have the actual email address that you wanted to look for imagine if you had a document or a stream of information that you knew contained many email addresses and you wanted to extract all of those that's where regular expressions can become very useful they can search for patterns so you enter the pattern for the email address using regular expressions and you'll be able to find every single email address in that document and then using something like Python or another programming language you can extract all of those and put them into a list or put them into a data frame and you can do that with any type of text pattern so you can do it with file names you can do it with URLs you can do it with just about anything that you want to find now in the next section I'm going to give you a little overview of how regular expressions work in this first part I just want to give you a general overview of what regular expressions can do and how they're used in Python so don't worry too much about the detail I'll explain that in the next section to use regular expressions in Python you need the reor the r-e library so you import that and then let's say we wanted to find an email address we wanted to look at a piece of text and we wanted to find the pattern we wanted to extract the email address how would we do that well we'd come up with what we think it defines an email address in regular expressions and that's this set of characters here don't worry too much about the details I'll explain that later and then that's stored and then we have our text which is this text here my email address is Giles at no spam please calm , but please do not send me spam and what we want to do is we want to find that email address within that text so we have this variable match and we're using the email pattern that we've created and we're going to search the text to see whether we can match this pattern that we're looking for so we've run that and and now let's print out what this variable match is and you can see that this is more than just a variable it's in fact an object but what it tells us when we printed is that there is indeed a match it's found the email address and it's given us some other information as well and that's this span information which is telling us where the first and last characters are in the match now there are several other things that we can do with this match variable this match object we can use this group command here and then it will return the match the actual email address that we were looking for we can use start which will tell us where the match starts within the text and end which is where it ends and then we can ask for the span as well what else can we do well I've got this text to here which has several email addresses in it it's got Giles at no spam please calm it's got James my domain coat UK and Janet at Carleton comm so let's have match two and this time we're going to search using the same email pattern text - what do we get well you can see here we only get one email address it hasn't found the other email addresses and that's because we've used search what search does it looks for the first match of the pattern and then it doesn't carry on looking it doesn't look any further so it gets it returns the first match that it finds which we have now okay now we're going to use last time we used search this time we're going to use match which is another r-e regular expression command in python and we're going to do it on this text - and when we print that the return here match 3 we get none now this object returns none if there's no match the reason there was no match was because we used the match command which only looks for a match at the start of the text so if it doesn't exist at the start of the text it won't find it anywhere else in the text and then finally we have match for this variable and we're going to use find all in this second lot of text and when we run that and we print out what match for returns we get a list of tuples returning all of the email addresses that are in that text we also get something else returned we get a second search group and the reason we have that is because we have these brackets here which is a sort of second search group in here so we get everything that we were looking for finally using find all but a different way of searching for a pattern we put the pattern in here that we're looking for we put the text that we want to search and we get the email addresses returned and as you can see we've used a different style of looking for an email address compared to the style that was used up here but I will explain the details of it in the next section that just gives you an overview of the sort of thing that regular expressions can do so you can see how powerful regular expressions are now in this next section I want to show you some of the features that Python has when using regular expressions so now we're going to dig down a little and just see how regular expressions work so if you want to match actual simple strings or characters or digit you can match the exact string so for example here you have IO n and when we do the search on Great Expectations it returns IO and that's exactly what we wanted to do there are some special characters in regular expressions and those are these and if you want to search for these in their non special context you have to back slash them which makes them no longer a special character so for example if you wanted to look for the dollar sign because the dollar sign is a special character you have to backslash it and then when we do that here we have the ticks that cost is $20 and we find the dollar sign notice this R here that indicates a raw string in Python if we just have a look at this example if you were to print a backslash t be backslash T see instead of printing the this string you actually get a tab because in Python backslash T means a tab in a string and what you want is to print the actual string and not the tab so you just tell Python that you want the raw string and then you get that there okay so let's have a look at other special characters and character groups so here we have the pattern is a raw string and then backslash W backslash s backslash W and if we have a look at the text that this is matching the return is e space F X space I s space 9 and s space oh and if we have a look that's these matches here and that's because we have used these special characters and let's have a look at what they mean backslash W means match any alphanumeric character backslash s is match any white space and then we're matching any alphanumeric character again so it's looking for character space character and that's what we've got there so you can start to see how versatile and powerful regular expressions are let's have a look at some more here we have a group now AEIOU in these square brackets means that it's looking for any of these it's not necessarily looking for them in this order it's looking for any of these vowels and it's going to match them now we've used split here on the word consequential what this does is it splits this word where these letters occur so it splits it where it finds the O and then the E and then the U and then the I and then the there you can also combine these groups so here we're looking for any letter uppercase followed by any number between 0 & 9 and when we run that we get G 2 and H 6 but it ignores this here you can also do other things as well so here we have backslash W which looks for any alphanumeric character and these curly brackets mean that we want it repeated three times so we're looking for any alphanumeric character repeated three times when we search this text we get V and then qu I be ro and fo X but we can also use a plus here we're plus means match one or more repetitions of the preceding group you can use wildcard characters which have certain meanings the plus here means match one or more repetitions of the preceding group here we have a backslash W which is any alphanumeric character so here we have the quick brown fox can we get the quick brown fox because it matches everything up to a non alphanumeric character that occurs one or more times there's also the star which matches zero or more repetitions that's a very quick overview of regular expressions in Python I have also got some other interesting stuff which I think might help you if you go to this stack overflow page it has useful patterns for matching various different inputs and also there is a very nice tutorial on regular expressions here at machine learning class comm there's an online regular expression tester which you can run in Python I've put all these links in the description and also you can go to the Python page on regular expressions and if you want to cheat cheat on regular expressions Dataquest has a nice one here a lot of that Jupiter content was taken from a github page by Jake Vander Plaats who has written extensively on Python and Python for data science there are links to his work in the description to this video so do check those out there are also other links about regular expressions for you to find out more about them and I did promise to explain to you that email matching Pattin that we had in the previous section so I'll do that now so first of all we ought to match lowercase letters uppercase letters the numbers from 0 to 9 a dot an underscore a percentage character a plus or a minus or a dash and we want to match one or more of those then we want to match the at sign and then we want to match the domain name with letters numbers and a dot and then finally we want to match the the.com of a dot code okay section of the address and that's how it's done and I'm going to let you work out the other email matching pattern that we used for yourselves if this video has been helpful then please do like it subscribe to the channel and I'll see you in the next one bye bye
Info
Channel: Python Programmer
Views: 9,759
Rating: 4.8950438 out of 5
Keywords: python regular expressions, regular expressions python, regular expressions, python regex, learn how to use regular expressions in python
Id: rcM26jV7Mdo
Channel Id: undefined
Length: 11min 43sec (703 seconds)
Published: Thu Jun 14 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.