Regular Expressions in Python | Regular Expressions Python Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys in this video let us try to understand the concept of regular expressions in python which is also called as rejects what is regular expression let's say you have a file or a document and you want to search for a specific email id or a phone number in this document then you can do this search by specifying the exact email id or the phone number however let's say your document has thousands of different email ids and thousands of different phone numbers then searching for each of this email id or each of this phone number one by one would almost be an impossible task this is where regular expressions comes into picture using regular expressions you can define a pattern for your email id or for your phone number or for that matter for any other text and then search for this pattern in your given data even if your given data has millions of different email ids and different phone numbers using regular expressions you should be able to search and extract all of these email ids and phone numbers or for that matter any other text that you're searching for there are only four steps that we need to follow in order to use regular expressions in python so to explain that let's say i have a text here with me so which has some dummy values and i want to extract all the numbers from this text the first step is to import the regular expressions module which is re so just type import re all the functions and features that we need in order to use regular expressions is available in this module so we need to first import this module so this is my step number one the next step is to create the pattern so in order to create the pattern i can use the function called as compile which is part of my re module and then inside this compile i can pass the argument that is my pattern so let's say since i want to search for three digits i can use a character class called as slash d and i repeat it three times because i'm trying to match it with three digits so don't worry about this character class for now i'm going to explain this in a short while so for now just remember that each of this digit is a regular expressions uh symbol or a character class so slash d matches with one digit so since i want to match three digits i'm using three uh times slash d okay and i'm going to assign this pattern into a variable so let's say i create a variable like pattern so and then i'm going to make this as a raw string so what exactly is a raw string so let's say if i have a print statement where i want to print something like old slash new okay so if i execute this program you can see that even though i mentioned old slash new python is printing like old in the next line expanding if because this slash n that i mentioned here is actually treated as a special character that is treated like a new line character to avoid this kind of transformation done by python we can tell python that treat this whole string as a raw string so we can do that by just mentioning an r before the start of the string so once i mention this as a raw string then python will treat this whole string exactly as it was given so now if i execute this program you can see that it's just printing old slash new so it doesn't interpret distraction as a new line character or anything else so this advantage of using r and when we are creating a pattern to avoid any conflict we try to always make this particular string as a raw string so this is just a good practice so this is my second step so i just mark this as my step number two so once i have created my pattern the third step is to try to match this pattern in my given string so to do that i'm going to create a variable like matches and then i'm going to search for this pattern in and i'm going to use a function let's say a search and i'm going to search in the given text okay so this is my step 3. what my step 3 is doing is i'm trying to search for this pattern in this particular text okay and i'm going to use the method that is or the function search okay so what search will do is it will look for this pattern in the given text and as soon as it finds a match it's going to return that so it's going to return and store it in this variable matches now i'm going to print this matches so that's it now if i execute this program so let me clear this and if i execute this program you can see that i'm getting the output it's telling one to three is matched and it's present in the fourth to seventh index so this is the fourth to seventh index and it has found one two three so this is what uh search method has done but if you can see here i have one two three i also have four five six i also have hundred but only one particular number or digit or pattern has been matched this because the search method only looks for the first possible match and returns that match now let's say if i wanted to match all the possible matches in this given string or given input then i can use a different method called as find iter so this will basically return all the possible matches from the given input so and it will return it into this particular variable so in order to print this i need to use a for loop because the return from the return variable from find iter is an iterator so it will have multiple values so let's use a for loop to print all of those values so for let's say match in matches and then i just do a print of match okay so now if i execute this program you can see that it's printing all the three matches so it's printing one two three four five six and hundred all the three uh numbers that were matched from this particular input if we just want to print the exact match we can just say match dot group this is also another function so it will just return the whole matched value so now if i execute this program you can see that only the matched value is being returned now so basically this is my step 4 so this is it so i had my text this one and i wanted to search for numbers so i just used regular expressions to search for numbers in the given string so if i repeat the first step is to import the module the second step is to create a pattern the third step is to search for that pattern in the given text and the fourth step is to in this case i'm just extracting this pattern and printing it if you want to perform any other task using the extracted values you can do that in your program so this is the whole flow of how we can use regular expressions in python now the most important step here is the step the second step which is to create the pattern here i have created the most simplest of patterns that you can think of but if i wanted to match some complex strings like an email id url or a phone number or a date then i would have to write a little bit more complex patterns so now let's try to understand the different symbols character classes and special characters that are available in regular expressions first let's look at the different character classes that are available in regular expressions which we can use in python so the first one is slash d which basically stands for digit character so this slash d will match with any digit from 0 to 9 so like what we saw in our previous program here since i wanted to match the tree digits i just mentioned slash d slash d slash d so each of this slash d will match with one digit since i wanted to match three digits i just mentioned it three different times next we have slash w which stands for word character so a slash w will match any letter a numeric digit or an underscore character and next we have slash yes which stands for space character it will match with a space a new line or a tab character next we have slash uppercase d this basically is kind of a reversal of slash lowercase d so this slash uppercase d what it will do is it will try to match with a character which is not a digit so anything that is not a digit from 0 to 9 will get matched by using slash capital d whereas slash capital w or slash uppercase w will match with any non-word character so it's basically a reversal of slash lowercase w so slash uppercase w will match with any character that is either not a letter not a digit and not an underscore okay similarly the slash uppercase s will match with any character that is not a space a new line or a tab character okay so this should be pretty simple right these are all the character classes that are available there are a few others as well but it's not that widely used you should be familiar with these first so once you understand these character classes next there are a few symbols as well which we can use in regular expressions so these symbols or characters can be used along with these character classes to perform some specific task so let's look at them one by one the first one is the group character basically we can group together multiple patterns into a single group there are several advantages of grouping together multiple patterns into a single group we are going to look at some examples in a short while let's look at the other characters so we have plus and we have star so basically what plus does is if i indicate a plus next to the character class then it will look for this particular character class either once or more than once similarly star is almost similar to plus but the difference is star will match even if there is no value next character is braces if we want to repeat a particular pattern multiple number of times then we can specify the that number within the braces so let's say if i wanted to match three different digits then i could just type slash d and then inside the braces i just mentioned three so this will match three different digits for example if i go back to my program that i wrote here so i wanted to match three digits and i just mentioned slash d slash d slash d i could also write the same one just by mentioning braces and putting a 3 here now if i execute this program you can see that i'm going to get the same output because whether i mention slash d slash d slash d or i just mentioned slash d and inside the braces three it basically means the same so it's basically going to repeat the pattern three different number of times there are other advantages of using braces as well so if i wanted to look for a particular pattern in in a particular range so let's say i want to see if there are either three or up to five digits then i can just mention three to five so minimum of three digits or it will match for a minimum a maximum of five digits so if i didn't want to mention a maximum number so i could just do three comma in this case it would match with minimum of three digits up to any number of digits similarly if i did not mention the minimum value then this would match a minimum of zero digits so basically even if there was no digit this pattern would still match and it would match up to a maximum of five digits okay so this is what braces will do the next character is a question mark character it has two different meanings in regular expression the first one is that it can be used to indicate an optional uh search so let's say if i have a pattern like slash d and if i indicate question mark after that then it means that this slash d pattern is an optional pattern so even if this pattern will match with the string or not the whole overall match will still uh be successful okay so it just makes a particular pattern as an optional pattern okay the next use of using question mark is if you use a question mark after the braces then it will indicate that you need to match with the shortest possible match so to understand this let's say you have a string like hi hi hi and your pattern is something like you just mentioned hi put it inside the group so this will be treated like a single character or something high and then in outside in the braces you just mentioned one two three so it will try to match minimum of one maximum of three number of times since i'm not using any question mark here it will try to match with the maximum or the longest possible match that is high high high even though i give the i'm trying to match just with one high since this particular input has three different highs it's trying to match with all the three and it's returning high high high but let's say if i wanted to do the shortest possible match after the braces i could just mention a question mark since i mentioned question mark we'll try to do the shortest possible match so it's trying to match with hi since it finds the shortest possible match after the second character itself it's just going to return the first two characters that is high so this is how by using question mark after braces you can tell python to do the shortest return the shortest possible match the next character is the dot character this is basically a special kind of character in python where this will match with any values so any values other than the new line character the next character that we have is the square brackets so this basically helps us to create our own character class the character class that we saw here may not be sufficient to match all the characters so let's say if i wanted to only match english alphabets a to z then i do not have any inbuilt uh character classes to perform that because slash d will also match underscore and it will also match numbers right so that is how why we can use square brackets so insert square brackets we can just mention all the characters that we want to match so either i can just mention something like this so if i wanted to match all the mobile characters i could just mention a e i o u so only these five characters will get matched okay but if i use a dash that is a dash z that means it will match from all the english alphabets which are lowercase if i wanted to match all the uppercase english alphabets then i could just mention uppercase a to uppercase z okay and then zero to nine will match for all the digits the next character is the pipe character this basically something like an or search so basically it will try to match with a character that is mentioned on the left hand side of the pipe or it will try to match the character that is mentioned on the right hand side of the pipe don't worry if this looks a little confusing because very soon we are going to write programs and use different patterns where we are going to use all of these symbols which should make it much simpler for you to understand regular expressions so going ahead the next character is the carrot symbol so the carrot symbol uh has two different meanings uh in regular expression and the first one is that it can be used to match the beginning of the text so let's say you want to only match a text which starts with a specific value then you can use the carat symbol so this one meaning of using the correct symbol the second meaning is that if i use a carrot symbol inside my square brackets like this so basically if i mentioned the mobile characters inside the square brackets without the carrot symbol then this would match with every character that is a vocal character but since i am using a carrot symbol here then what it will do is it will try to match every character that is not a verbal character so this basically does a reversal of what your square bracket would do the last character that we are going to look here is the dollar symbol which is basically used to match the end of the text so if i wanted to look for a text which is ending with a particular character then i can just mention that character and then mention a dollar at the end lastly we also have a list of special characters these are all the characters which have special meaning in regular expression so if you want to write a pattern where you want to specifically search for these characters then you need to escape these characters how you can escape these characters is by just adding a backslash before using these characters now we are going to use all of this and write programs hopefully that will be a better way for you to understand the concept and how to use regular expressions so i have a text here with me so in this text i can see that i have a few dates and i have few email ids mentioned i also have a few phone numbers okay so now we are going to write some uh a program using regular expressions to match through these dates email ids and phone numbers so let's start by trying to write a program which will match all the dates okay so as i told you the first step in using regular expression is to import the regular expression model module which is the import re so once i have imported this module the next step is to create the pattern so let's say i'm going to name this date pattern so first i'm going to search all the dates in this string so let's say now in order to create this pattern i just say re dot compile and then i just pass a raw string so i need to create my pattern here now let's start by looking at these three dates so these three dates i have the first two characters are a digit it's followed by a dash and then it followed by three letters then again followed by a dash and then again followed by four digits so this is what we know and this format is the same across all these three dates so first let's try to write a pattern which will match this format okay this date format so we know that first two digits first two characters are a digit so i just type slash d slash d because i know that slash d will match with any digit from zero to nine the next one is a dash so i can just straight away put a dash so and then followed by three letters so i i can mention letters by using uh the word character class but then the problem with word character class is it will also match an underscore and also a la or a digit but i only want to match it with three with three uh letters so in such cases where i want to match us only a specific character class specific set of characters i can use a symbol uh which which is basically the square bracket so this square bracket symbol is basically used to create your own character class so how we can do that is i can just open a square bracket and close the square bracket inside this i can mention all the characters that i want to match so in this case i want to match all the english alphabet so i just say a to z so when i mention a dash z it will match from everything from a to z so all the 26 english alphabets but here i have mentioned this in lower case but i also want to cover the upper case so in this case i just mentioned capital a dash capital z so what this will do is any character that is at the range of a to z in lower case and a to z in upper case will get matched okay but then i know that there will always be three letters in this month format right so in order to mention three letters one thing that i can do is i can just repeat it three times so one two 3 but this is not the best way of doing it so the best way of doing it is by using braces so if i go back to my symbols here i have another symbol which is the braces symbol this symbol is basically used to indicate whenever you want to repeat a pattern a few number of times then you can specify that number inside the braces so i just open my braces i mentioned three and close the braces so now what this will do is it will try to look for this alphabet three times okay so this is how we have tried to match the month next we again add a dash because we know that the separator here is a dash and then i know that there will be four digits so four digits i just mentioned slash d slash d slash d slash d i can do this this is also fine but then instead of repeating the digit four times i can use the same uh concept here that is of using braces so i just mentioned slash d once and then i write 4 inside my braces so what it will do is it will try to match four different digits i can apply the same thing even in the beginning here so i just mentioned slash t i remove this and i mentioned two so now what happens is this is my pattern this pattern should be able to match all of these dates here so the next step is now to search for this pattern in the given text to do that i will create a variable called as matches and i am going to search for this pattern pattern dot i am going to use the method find writer and i am going to enter my text here so this will search for this pattern in this text and it this fine item will return an iterator value so all the possible values will be returned by this it will be stored into this variable matches and then i'm going to print the values so i'm going to use a for loop for match in matches and then i just do a print match and then i'm using a group so i just display the exact matched result okay so now if i execute this program i'm getting an error okay so i do not have something like pattern i created a variable like date pattern so let me give the proper name and then let me execute again so now you can see that it's returning three dates so it it's matched the first date it's matched the second date here and it's matched the third date so basically we have used the uh character class slash d and then we have used the braces to repeat the digit we have used we have created our own character class by using the square brackets and then completed this pattern and finally we are able to scan through this text and then extract all the dates so this is fine but then if you see here we also have three more dates but the problem is that these dates are in different format so my this date here instead of apr i it's mentioned like zero four so here instead of dash it's mentioned like dot and here the separator is a slash now let us try to modify this pattern so it will match any possible dates present in this text okay to do that the first thing that we need to check is we see that the first two characters in all of these dates is still two digits okay even here even here and even here so that means this particular part of our pattern will work fine so we don't need to change that the next thing is a separator in the first three days we had the separator as a dash and here also we have a dash but here we have a dot and here we have a slash so the possible values of a separator are dot slash and dash so what i'm going to do is i'm going to create my own character class so i just open a square bracket and i'm closing it here and then here i'm going to pass all the possible values that can be a separator so i know it's going to be a dash it might also be a dot and it might also be a slash okay so one thing to remember is i told you that dot is a special character and when you're using a special character you need to use a backslash but there is an exception whenever you're using any character inside a square bracket you don't need to escape it so even if you're using a special character inside a bracket you can just mention it as it is because uh python will treat it as the character mentioned here okay so this is my separator here so this separator here is going to match look for values which are either a dash dot or a slash so i'm going to use the same separator here as well okay so this is fine the next thing is the month so in in the month i i have actually defined the month to be an alphabet which is going to be of three uh three uh alphabets basically now i know that it can either be an alphabet uh letter or it can also be a number so if it's a letter it's going to be three uh characters long and if it's going to be a number it's going to be two characters long so what we are going to do is first i'm going to create this i'm going to put this whole month into a group so i'll put this into a group so inside a group now inside this group what i'm going to do is i'm going to use a pipe character as i told you previously the pipe character that we have here is basically going to do an or match so whenever we want to uh do an or match meaning that it has to either be this or it has to either be that in that case we can use a pipe character so my month is something like three letters then this particular pattern is going to do the match but let's say if the month is two numbers here then i need to add another pattern here so another pattern would be is like slash d and i know it's going to be two digits so i'm sorry two digits okay so now what's going to happen is since i mentioned whole of this into a group and i have a pipe character here so what it will do is it will either try to match this pattern or it will try to match this pattern so my first pattern here will try to match with three letters and my second pattern here which is on the right hand side of the pipe character will match for two numbers this pattern should work to match my month and next my separator i have already changed it so my separator even if it is dash dot or slash it will get mesh and finally we have the year which is the four digits so i don't need to make any changes here now i've modified my pattern so let me execute this program let's see if it's able to now catch all the dates and it's working fine so you can see that the first three dates were extracted and the last three dates which are present in this particular line are also extracted so even though these dates are in different formats because i modified my pattern such that it would match any of these date formats the program is working fine and i'm able to extract all the dates now let us create another pattern to match our email ids so to do this what i'm going to do is first let me maybe comment out this line these things and create another pattern here so let's say i'm going to create a pattern like email pattern okay so the same process re dot compile and here i just mentioned my raw string and here is my pattern so let's look at our email so i can see that our email will always have a username then an ad symbol and then followed by a domain and then followed by dot com or it can also be dot co dot in something like this right so let's write a pattern which is going to match all of these email ids and return to us in the beginning let's try to focus on only on these two email ids because the format of these two email ids are almost similar this one is slightly different because it has a dot code in we'll come to that later so we can see that the user name is having either letters or digits or it can also have underscore so we know that our character class w will match a letter a digit or an underscore so we can straight away use a slash w here but then the number of slash w's because i am not sure how many characters are present here so here there are a different number of characters and here there are different number of characters so when we have such cases what we can use is we can use a symbol called as plus so what plus does is if i go back to symbols plus will match one or more characters so i'm doing up slash w so this will match for a letter digit or underscore and then i have a plus so it even if there is one such word or if there is multiple such letters it will be a match so this one should work so all my username whether it has a letter digit or an underscore will get matched by this particular pattern so followed by an ed symbol so i know that email will always have an ad so i can straight away put an ad symbol here and then next i have the domain so i know that domain name generally will not have a number or an underscore it would just be an alphabet so what i'm going to do is i'm going to create my own character class similar to what i created here previously so inside this character class i'm going to just mention a to z i know that it's not going to be an upper case it's always going to be a lower case hopefully so i'm just trying to make a character class and i just say that a to z so all the alphabets all the lowercase alphabets will get matched here but then i don't know how many characters here i see i have five characters here there are only four characters so i just put a plus so when i say plus it's just going to match either one or more such characters okay next i need to find a pattern which will match the dot com or dot net so to do that first let's try to match the dot since dot is a special character so if i go back to my special character tab you can see that dot is a special character so if i want to match a dot i need to mention it like backslash dot so i just mentioned backslash dot and then after this i know that there's going to be three uh three letters right so i just create my own character class so let's say i mention a to z okay and i know it's going to be three so three such characters right so if i um try to read this pattern again what i'm doing is i'm using a slash w so this will match for any letter digit or underscore and it's going to see if there is any one or more such characters so after that there should be an add it will match with an add and then there is a character class where i'm looking for lowercase alphabets again i'm going to use an so even if there is one such character or more than one it will match then followed by a dot i am using an backslash dot because dot is a special character and i am trying to exactly match with a dot and then i'm having another character class a to z all the lowercase letters and i'm saying that there will be three such lowercase letters okay so this is my pattern for email now the next two steps are again the same so i just create a variable matches equal to email pattern dot i'm going to use find writer and this one and i pass my input text and then i say then i'm going to use my for loop for match in matches and i just print the match dot group okay so now if i execute this program you can see that the output of this program is i'm getting two different email id so the myname2020 dummy.com and askhelp demo.net both these email ids have been extracted by this pattern but we also have another email id which has a slightly different format so the user name should be fine this pattern should be able to fetch so we have an ad and doma demo that is all fine but the last that is the dot com here it is mentioned as dot co dot in so now we will have to modify our pattern so it will match this kind of email address as well so to do that i know that up to this domain name there will be no changes so i am not going to touch that after this domain name what i'm going to do is i'm going to put all of this into a group so i create a group like this okay so inside my group first thing i'm going to match is with the dot and then i'm going to match with an alphabet which is looking for three alphabets but here i only have after dot 2 alphabet so what i'm going to do is i'm going to say 2 comma 3 it's going to look for an alphabet minimum of two number of times and maximum of three number of times so basically even a c o because it has two letters will get matched and even a c o m which has three letters will get matched okay so a dot and then followed by either two letters or three letters will get matched and then whole of this i have put inside a group and then i'm going to say a plus so meaning that because after dot go i also have dot in so this whole thing even if it is present more than once it should still match that is why i'm using a plus okay now if i execute this program you can see that even the last email id is being fetched so basically we have written a pattern which is now able to match all of the email ids even if they are in slightly different formats okay i hope this was clear finally let's let's try to write another pattern which will match all the phone numbers okay so let me comment out this line and let me just write it here so in order to find all the phone numbers i'm going to create another pattern like phone pattern equal to re dot compile and draw string okay so now again it's time to create the pattern so as you can see this phone number here um the first phone number here starts with a plus six there is a plus six in the second phone number as well but in the third phone number there is no plus six okay so it means that i need to first write a pattern which is going to handle this plus six so it's it should be an optional pattern so in order to do that i know that the first character is a plus but then plus is a special character as you can see here plus is a special character so all the special character we need to use an escape we need to escape it so that is why i'm going to use a backslash and then i know that there's going to be a digit so i i just put all of this into a group so this is my group so inside my group first it's going to look for a plus symbol and then it's going to look for a digit and this whole group i'm going to make it like a question mark so meaning that this whole thing is an optional pattern so if there is a phone number which has a plus and a digit it's going to match even if there is a phone number which does not have a plus and a digit it's still going to match okay so we have handled this the next thing is we see that there are three digits right in all of these phone numbers we have three digits so in order to handle three digits i just write slash d and then open my braces and mention three so this will match for three digits and then followed by a dash so i just put but i see that i have a dash here but in the next phone number i have a dot and in the next phone number i have a dash so it can either be a dash or a dot so what i'm going to do is i'm going to create my own character class and i'm going to say either it's going to be a dash or a dot so this character class should be able to handle both this phone number so whether this phone number has a dash or whether it has a dot it should still be able to match that and after this dash then followed by three digits so again i just mentioned slash d open the braces and close the braces i mentioned three here so this will match for three digits after three digits i see that there is a space here but here and here also there is a space but here after this three digit i don't see any space so it means that there can either be a space or there can the space may not be there so what i'm going to do is i'm going to use this backslash yes which because backslash yes as i told you here will is actually stands for a space character so and then i have spaces here but i do not have spaces in this particular phone number so this space has to be an optional search so i just make it optional by entering a question mark after this backslash yes so we have now handled this pattern up to this particular section the last four characters are digits in all the three phone numbers so i just mentioned like slash d and then inside the braces i just mentioned four so hopefully this pattern should be able to match all of these phone numbers let's let's proceed let's write now the matches equal to uh the same thing like phone pattern dot find iter and then i mention my text and then here i just play so in order to display this i use a for loop for match in matches and then i just print this print match dot group okay so now if i let me just clear this and execute this you can see that this pattern is now able to match all the three phone numbers so even though there were some the format of this phone numbers were different this pattern was able to handle it so this is how we can use regular expressions to match any type of text you just need to know how to write a pattern so it might take some time for you to start understand how different patterns work and how different symbols and different character classes and different special characters in regular expressions work but once you get a hold of it you need to practice it's actually pretty simple so now let me just comment out all of this lines and if i execute the whole program now you can see that from this particular text i'm able to extract all the phone numbers all the email ids and all the dates okay so this is how regular expressions can be really useful in extracting the data that you want so if i go back to my excel here i mentioned so we have seen all these character classes which should be pretty simple and we have seen most of these symbols as well there are a few symbols that we have not seen like the carrot symbol and the dollar symbol and also the star symbol so let's write some programs to see how we can use that so let me go back to my vs code and let me just create another program so i'm just going to save it like demo3 dot py and let me just close this so here let me just import my module import re i'm going to create a dummy text so let's say my dummy text would be something like uh hi how are you okay and so what i'm going to try to do here is i want to match a text where it starts with a high and always ends with the question mark okay to do that i'm going to create my pattern so let's say pattern equal to re dot compile and here i just mentioned so i know that my text has to always start from high so i just put this together into a group and i know that it should start with a high so i just use the carrot symbol so if i go back to my symbols the carrot symbol is used when you want to match a particular text in the beginning of the particular character in the beginning of the text so i want to make sure that the beginning of my text is always high so i put this together into a group so this will be treated as a single character or something like that and then i know that by using carrot i tell python or the regular expression that the first character always has to be a high and then followed by it can have anything right so what i do is i'm going to use a dot character because as i told you a dot character can match anything so it will match anything or any character other than the new line character so i just put a dot but there can be more than one characters here so i just put a dot and then then i put a star so what star does is even if there is a character or if there is no character this will still do a match okay so i i'll explain you further but now let's proceed so and at the end i want to match with the question mark so but the question mark as as you can see here is a special character so in order to use a special character i need to escape it so i just mentioned like backslash question mark and then i put a dollar symbol so what dollar symbol does is it's going to make sure that it's only going to match this text if the text will end with the question mark the last character is a question mark okay so now i have created my pattern now again the same steps i need to search for this pattern in the text and then fetch those results so to do that i create matches equal to pattern dot find writer i pass my text and then i just use for loop for match in matches i just print this so i just say match dot group okay so if i execute this program now let me clear this okay so if i execute this program now you can see that it's printing hi how are you so this whole string has been matched but let's say let's say instead of this high i just made it like high lowercase high and if i execute this you can see that it's not extracting anything because uh this are case sensitive so i'm trying to match for an uppercase h and i but here i have a lowercase it does not match hence there is no result so let me go change it back to capital i and now if i execute you can see that there is a match and it's returning that whole match now let's say if i remove this question mark or after this question mark let's say if i add a space if i execute this program you see that there is no result because i'm using a dollar here so what this means is the last character has to be what i mentioned here that is a question mark so if i enter a space it does not work now let's say that i remove all of this so i remove all of this and just there is the first character that i am looking for hi and the last character that i am looking for question mark so i need to have an escape character here if i run this program now it will still work it's still returning what i have in my text the reason is i'm using a dot dot will match with anything but after dot i'm using a star and star what it will do is it will try to match 0 or more number of times so if i go back here and if i see my symbol it's telling matches 0 or more characters so even if there is no characters or if there is more than uh one characters or one character it will still do a match so even if i had anything mentioned here any value okay so my star would be able to match so if i execute this you can see that the whole string is getting returned so this is how we can use star the carrot symbol and the dollar symbol i hope this was clear now the last thing that i want to explain in regular expression is the sub method which is used to substitute a particular value to explain the sub method i'm going to create a new program so let's say i'm just going to save it like demo 4 dot py and let me just clear this so here i'm just going to do an import re because that's the that's the most basic step and then let's say i have a text here i already created this text so what this text is i have like my income in 2019 is 85 000 and then in but now it is 120 000 okay so this is my text now let's say i want to modify this text such that i want to hide the amount this is a confidential piece of information so i want to hide it with some some other values so in such cases i can use a method called as sub which is available in our regular expression so to use that what i just need to do is first i need to create a pattern to match the string which i want to replace so let's say pattern equal to re dot uh sorry re dot compile and then i just mention my pattern here so in order to capture this pattern i know that it always starts with a dollar but dollar as you can see here is a special character so i need to use backslash so backslash dollar and then followed by some digits and it also can have a comma so i create my own character class by using the square brackets and i mentioned 0 to 9 so it will match for any digit from 0 to 9 and also a comma so it will also match for a comma and i say it's going to be present more than once okay once or more than once okay so now this is my pattern so this pattern will match for this amount as well as for this amount and after the pattern is created the next step that i need to do is i need to let's say i do a matches or uh yeah so let's say i do a matches equal to pattern dot sub and inside my sub so this basically stands for substitute or kind of a replace there will be two arguments the first argument here is the value that you want to replace it with so let's say i want to replace that amount with something like a star star star okay and then the next argument is the text itself the input so what happens is now it's going to search for this pattern in this text and if it finds a match it's going to replace that match with this particular value that i passed here okay so now let me just print my matches since i'm not using a find iter here so it's not going to return me multiple values it's just going to replace the whole string with this whatever it matches this value and it's going to replace it with this string so i just print matches here so now if i execute this you can see that just to be more clear let me execute it again you can see that it's printing the whole statement but it's replaced the dollar this amount with this star okay so this is basically how we can use the sub method to search for a particular pattern and then replace that pattern with some text that we have mentioned here okay so i hope all of this was clear i believe this was a pretty long session but regular expression is not a very simple topic to understand you need to practice if you have any doubts then please make sure to leave a comment i'll try to clarify those doubts as much as i can if you like this video and if you found this useful please make sure to like and share this video with friends and colleagues who may be interested in learning regular expressions thank you
Info
Channel: techTFQ
Views: 2,031
Rating: 4.9436622 out of 5
Keywords: Regular Expressions in Python, Regular Expressions Tutorial for Beginners in Python, regular expressions, python regex, python regular expression, regex, learn regular expression, how to use regular expression in python, python regular expressions, python tutorial, python regex tutorial, python regex module, re module, python re module, match patterns, regex pattern, regular expression patterns, learn python, techtfq, Regular Expressions Python Tutorial
Id: V_BozMwoYe4
Channel Id: undefined
Length: 46min 40sec (2800 seconds)
Published: Thu Apr 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.