Regex - Text Processing with Regular Expressions in C#

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello there my name is Prasad my health I am the head of training at software University and I'm also software engineer here today we're talking about regular expressions or more than shortening reg X which helps us achieve advanced text manipulation will first cover what reg X is why we use it why we need it and how we use it then we'll cover the different components of redx white characters operators constructs and many more afterwards we'll see what regular expressions in C sharp are how do we use them in C sharp because reg X actually can be used in many different languages and C sharp is only one of them which offers an API for using them and we'll explore just that so now let us start and I can I can't wait to start when I feel that you are with me on that so are you ready for some new knowledge now let's go ahead now let's see what regular expressions actually are well regular expression looks like this the thing well the thing at the top it's a bit funky but it's basically well a specification for how you want to get some data from text for example I'll let you understand this better a with an example imagine that you have a string a set of strings which er can be anything but you also have emails in them for example you can have hello you can have the string high one at a B V dot B G which is a Bulgarian mail service you can have George you can have me add PME howl of calm you can have soft uni all these are examples of strings but only some of them are emails and how do we find an email well the thing is we have a problem here we have a set of strings and we have to identify which of these strings are actually emails so which strings are emails well these are the strings emails might be those strings strings which contain the @ symbol the @ symbol for example in this case we have several matches we have I won at a bb-dub beachy which is an email and that's correct we have me at be Mahalo calm which is also an email so those two are actually correct email addresses based on the specification we have the rest are not however if they pass a string which is add well a or a Obama is this an email according to our specification it is but actually it's not so there should be some other well rules we should enforce as well and how do you achieve this well the easiest thing you can do is you just do some for cycle with the string and check whether it contains an ad but what if you have some code more complex conditions for example you want to have an email which starts with a letter you don't allow emails which start with no letters well then you can modify that for psycho and do something else but maybe you wanna have the well you might have specification which says that you won't have add some alphanumeric strings let's say some words dot some words here and that's an email okay so you can have different specifications about what kind of pattern you're trying to match in a string and the way you can achieve that is by using normal string operations but doing all different string operations for things like that can be really exhausting it can take a lot of time and sometimes the conditions you have can be really hard that's why you have reg X what reg X gives us is something like a language for finding patterns for defining patterns letting defining patterns for example with rags you can easily say I want to have a string which star which contains the @ symbol you can say I want to string which starts with number or string which starts with a letter and the thing is you can do that with normal knowledge we have so far but with reg X it's much much easier because with reg X you have something like on API you use and apply however it's not exactly an API because you're not invoking some specific method which says get all strings which contain the @ symbol however you have this specification of our language which defines patterns and that's what reg exists it's a dead language which defines patterns and then you can use those that specification to check if the pattern OODA file matches for example here look at this this is an example well come on this is an example of a pattern it looks very cryptic it's not a real thing it doesn't bring that much value in terms of the words in it or anything but this actually defines a specification for some pattern in this case what it actually does is it says I want to match on uppercase a string starting with an uppercase character and having letters afterwards for example a string like that can be pressed up like this this case this is something which matches this this is the same thing this says I want to purge the same thing for example macau this is another thing with matches then I want a space and a dash and then I want to match an email this matches an email it says I want to match something which has a word an @ symbol a word a dot and a word again and having this specification here allows us to match some butter for example I want even off with some email and here you can even see how the color coding helps us realize what part of the radix catches which part of the matches for example this thing corresponds to this this thing corresponds to this this corresponds to that and etc etc so as you can see with a remix you can define what patterns you want much and then against that pattern you defined you can see whether the string you're given matches or not that way you can filter all strings from others which are emails you can filter all strings from a list which are sites and things like that depending on the use case you have so for now maybe these special letters here don't you wouldn't be able to understand them but maybe based on this example you get some hints what those strange things in the pattern might mean and now it's time to explore that now as it seems to know what regular expressions are useful for matching some patterns we can start exploring how we build them and how we use them before that I want you to make a research and read more about waggons read an article about regular expressions understand why we use them where do we use them they'll give me some use cases or think of some use cases where this tool can be useful okay and after that go to the next video where I'll be waiting for you where we'll start actually exploring how a reg X is being filled we'll start by covering what character classes from now on we're starting to cover the different and utilities better regular expression gives us one of those utilities is defining a character class before that let me give you an example of the easiest radius you can think that imagine that you want to match every string which which just is which is just press lock or okay everyone depressed what let's say programming you want much any string which is like that well you can make a regular expression in which you pass directly the letters programming and you will be able to match them and now I will show you a site where you can practice that the site is called red X are calm this is the site where you can test reg X in a very simple way what you do here is you write your button this is where you write your regular expression that's where you define what you want to match and here you have some text which you can use for testing the simplest reg X you can write is plain text for example if I write below here you will find all the occurrences of below in the text in this case it's 1 if I write reg X are you right you find all occurrences of reg X are and this is case sensetive it the case matters here you can find everything which is a comma and here you can see there are four commas we have one here one here here and here so the easiest way to create the rivets is just writing plain text and then the pattern you're searching for is that exact characters you provide this is an easy way for searching let's say for a substring in a string by using records you can say I want to find all the occurrences Linux in my text or I want to find all the occurrences of text in the text I'm given in this case it's one okay so that's the easiest type of Records now we will cover the more advanced parts of it these so far what I just covered should be something strange it should be something that new because it's really simple you just write some text and you're trying to find it in a bigger text it's why I'm trying to find the substring in a string but this time we're doing it not with not with you know for loops and some algorithms but we just use it we just use red X and one remark I wanna do here one one say here in this side you can test you can write reg X and to show you patterns so far we don't know how to apply these patterns into programming but understand that by the end of this lecture you will see what reg X API you have in c-sharp and once you understand it all you gotta do is apply what you learn in this side and it will directly be applicable inside your C sharp program as well so don't worry about that we won't see that much code in this lecture but we'll see by its end how we use this knowledge how we use reg X for now understand that we will use red X in the site in order to mark some patterns well later we will see how we can use these patterns to find something in text in c-sharp so now let's start with the different parts of red X first is a character task so imagine that you have some text and well okay how give you an exam here imagine that you have let's say the string I don't know well I'll just use the string cutter sir uh okay the just use soft uni so what you wanna do now is find all the strings which contain off to uni and as one of the letters let's say s P and or something like that for example you want to match all the cases of soft uni puffed uni not unique whatever that means so you want to match all these cases you can want to find either of these classes so what you try to specify here is match either well s P or n and match exactly of uni so that's what we're trying to say much either one of these characters and then match the string of Union hot words so how do we do that well the thing we use is a character class it looks like this you use square brackets and you specify characters what this means is you want to match either of these characters for example this means I want to much either and V or G let's see how this supplied I have well as you see right here soft uni pop uni not unique so you wanna match softly and you write this but this doesn't match the other two strings what you can do then is to use a character class and say look I want to match I any one of these letters in this case it's one but you can continue in and right P and M and once you do this I'm saying I want to match off to knee and any of these characters so this is something which matches well this fear that one stitch much and once we hover over it you can see here the explanation and says I'm I'm matching first the character set it can be any one of these characters s Puran then I'm matching the characters oft you and I exactly so that's what your ring is trying to say and this is what you match okay so that's a character class you can specify much either of these characters and you have some other derivatives of the same principle when you add whatever this symbol is called I'm not sure how you call it when whenever you add this to the start it means I want to match anything which doesn't include this simple what if you don't want to match soft unit puffed Union off to me but you wanna match everything else like a oft you need queue of uni and the both unique in this case what you can do is make that well whatever operator that was it means I want to match any character which is not in this range so it's something like negation it's like you say I don't want to match either as the or n I want to match any anything else but not those so in this case soft when you post Union not Union aren't matched because they contain that characters we're trying to avoid but everything else is matched so so far we learned the basic structure SPN of character class match any of the characters and this one with the efforts to learn how to call that well let's call it an upper arrow with our Lu cap you you say match any other character so that's that's what the character class does and another derivation another well kind of character class is one in which you specify and range imagine that you're trying to say look I want to match any letter so one way to match and letters right ABCDEF g h i j k and on and on and on T we get to that but instead since that's tedious you can specify a range of letters and you can say I want to match anything which is in the range from A to Z and once you do that you match all the strings here because they all start with a capital English letter which is this pattern here so if you see the explanation here notice that this is a range so we can use character classes not only with distinct characters but we can also specify ranges of characters like in this example notice that this won't match let's say soft union with lowercase if you want to include the lowercase letters as well what you can't do is you can just add another range but this time the range is for the small letters from A to Z Z wait why is this not highlighted I got confused here oh wait this should be uppercase sir so uni so notice how previously we have a problem that the you should have been capital case because we're trying to match it with capital case progressed it wasn't okay so here we try to match all any English letter whether it's uppercase or lowercase and that's one way of doing it by specifying ranges what if you wanna match digits let's say 0 of unit 1 of CUNY and things like that then you can specify numbering just wipe 0 to 9 well I D all here and you can also not only use the whole range you can as part of it you can for example say I want all the digits to match which are between 5 & 9 like here and now 0 & 1 of the unit doesn't match but if I write 5 of the Uni it matches so that's it for a character class and now I have a challenge for you I want you to write a red X which matches soft uni but keeps all the other matches as well notice the reason it we don't match soft unit now it's because we're expecting an uppercase u here while they're providing us a lowercase u so I want you to write the riggings which will match both this and this well not this but these okay so that's your challenge now open reg X are calm and attempt to do it so that's it for character class basically you can match specific letters as you want not only letters but any kind of symbol you can match any other symbol but not those which are specified and you can match ranges of digits okay you also have some other ways of specifying character classes this is well a bit more simplified version for example if you want to match a word you can use left double and it matches any word which is which means right a to z a to the uppercase or these in other words if I now write the quick brown fox went over the lazy dog if I have done when I remove this records now alright I want much all the letters so I can write well I want to have all the letters which are lowercase but let's say they're uppercase letters well then I'll just add the upper case letters range but what if there were numbers what if this was all in this as well what is this was one so we wanna match digits as well then you add another range and this goes on as well finally you can have underscores between those words like this what do you do then well you add the underscore character and this complex character class matches any lowercase letter any uppercase letter any digit or any underscore character and there's a simplification for it and by the way this only matches a single character in order to match a whole word you have to add the plus we will explain this plus later but for now understand that if I write it just like that without the plus it matches match individual characters I'm saying I want to match any character this strength but I want match only one character which means that I'll match t-h-e underscore qu1 on and on and on until the end individual if I want to merge them as a whole you can add a plus character which tells me I want to match any digit of this but I want to repeat it until the pattern no longer matches okay in this case we're saying look you hear everything every single character matches this character class but once we got to the whitespace the white-faced no longer matches that so that's where we finish okay so that's what the plot does for simplicity of course it's a bit more complicated more involved but we'll see it later and it's in a more simpler way of doing the same thing is we can just write / w / w plus means well / w means match well this is well this is the same thing as writing a to z h z 0 to 9 on the scope it's just a simplification and if you want to achieve the word thing you can write the plus so this matches any character in the range from lowercase letters uppercase letters care digits and the underscore as in this example it matches this thing it doesn't match this because these are either not English characters or special characters if you have the same thing but upper case it means match anything else it's a negation so it means match everything which doesn't have that pattern notice how here we matched ABCD zero nine underscore but here we matched everything else so when you write something like that not only for the W put for any kind of character class of this type when you write it with an uppercase it means I want to match everything with doesn't match that pattern wow if you write only one W it means I want to match it okay so it's like a negation there's others as well the s means match any whitespace character for example if I write slash s it will match all the white spaces here we have four if I go down you see all the white spaces matched if I write s uppercase s to match everything other than a light space white space in this case it's nesting here so this is everything other than white space okay similarly to how the W works and D matches any digit the D here says I want to match a digit for example if I write zero to nine how much all the digits but if I write slash D I'll match the digit as well which gives me the same result and if I write uppercase D it will match everything else so these character classes are simplifications which you can achieve by only using this by using the normal character test where you specify some range or specific characters since there are some ranges some character classes which are very often searched for like any letter in English alphabet or a digit you have special character classes porting which are easily accessible with a specific character like a W s or a deep but notice that this is only a simplification even if you don't know this that this part of the character classes you can still achieve anything you want using just these three bits so that's it for character classes now go to remix are and practice a bit with I'll give you a challenge again I want you to match the following text programming is the number one thing exclamation so I want you to match every single character of the Abdi's by using the character classes we just covered so that's your challenge I'll see you in the next video another concept another thing which we can use in records is called quantifiers what a quantifier is this is something we already saw actually one example the plus sign essentially what it helps us do is we can say take a pattern and repeat it for some time for example you want to match some specific pattern more than once for example previous I showed you how you can match words or for now letters you can write /w and you write you can match everything but you match them individually independently if you want to say I want to match this button continuously you use the plaza what the plus means is I don't want to match a single letter I want to match it four times more specifically the plus sign means I want to match this the previous element once or two or more in other words this will return true I mean it will match something when there is at least one occurrence of the Slayers double for example here this programming thing is matched because it has the P this matches the first time and it continues marching on and on and on there is no limit you can match very big the words this is essentially a pattern for matching words it says take this take any word character which is a letter or a number or underscore and we match it more than once or more times so what this plus means there are others as well the star means I want to merge the elements zero or more times so it's very similar to the plus however it gives you the possibility to merge the element zero or more times for example if I write the the star here it will match more what would be the difference in this case there is no difference but let I'll give you an example where this can be useful so we're moving the pattern let's say you have soft uni is a great place you wanna match this exactly see you're right soft uni is a great place now in order to match these characters I have to skip them and then I match this string exactly as if I change anything in it it changes totally but what if I want to have this thing as a placeholder that is I want to have everything else as is but this but this thing in the square brackets I want to be dynamic I want to be anything for example I want to match this but I also want to match soft in is a bad place or I want to match soft uni is a awesome place although that doesn't make sense grammatical but anyways I want match all three buttons here what you can do is you can write /w plus when you write /w plus it means match any letter more than once so as long as you have valid letters here you will match them for example if I write soft uni is a something place it was too much because you have a letter repeated more than once okay but what if it's empty what if you wanna match soft uni is an empty place well you don't have anything and I want to include both these things on the top and this thing on the bottom now in its current form we're not matching anything because the plus quantifier means I want to match a letter at least once for this pattern to be valid in this case there is no letter so there isn't even once a letter here between the brackets but if I use a star instead it means look I want to either match this infinitely many times I want to match a letter infinitely many times or I also assume it can be and I would also match it being empty okay so that's the difference between a star and a plus quantifier the plus matches gives true returns true matches the pattern as long as whatever you specify matches at least once it can match multiple times as well but it have to match at least once while the star will also match the infinitely many occurrences of let's say a letter but it will also match an empty string like in this scenario that that's what the start and the plus quantifier do and and the well what the question mark matches the element zero or once so that's when you want to have something either there or not for example let's say well I up this let's say I'm giving in my email maybe I won't give it but let's hit press laugh at gmail.com and I want to match it where I write at gmail.com I want to have plus laughs beforehand priority should be escape so now we match this email however in Gmail there is this option that you can have infinitely many dots between the names for example in energy my own name which is BR dot AES lab at communal comm this email is the same as the above one P dot the rest lab at gmail.com is the same as well PR e dot Slava at gmail.com should match as well so all these should match along with the first one so what you can do here is apply the question mark quantifier what I'm trying to match is a dot you're trying to find the dot here a dot here a dot here here here or here I'm not sure whether you are all to do it in the end well let's just start for simplicity with this one okay now we match this case but now we don't merge the last case the first one which was the ultimate in the most basic one so how do we match it well what we can do what we actually need is to match this button but allow the possibility to for there to be a dot on the second place in this scenario you can apply the question mark when you apply the question mark quantifier means I want this symbol zero or one time in other words if that dot there is not present that's fine I was to accept the pattern if it's present I will accept it as well so it means it can either be there or not okay and in order for this to be legit I have to do this button for every single space like this now will match any possible variation of this pattern as you can see all we did was we had the PR es la vie matching but we have these question mark quantifiers with a dot as well okay so that's what the question mark does it tells us match the previous thing either zero times or once and it can be useful when something can either be there or it can be invalid one more example is let's say you want much valid variable me a very valid variable name who is one let's say in c-sharp which starts with well characters let's say num is valid but nine nine is not valid because you can start variable name with the number however you can start a viable name with an underscore this is also Paulette so a variable name is one which either starts with an underscore or has normal letters now of course you can always have the variable name this which is valid but I won't suggest doing it and we won't cover that case okay so we want to match now and underscore now the first one we can match by writing Nam Derek or just writing match any letter so a from Z a to Z and that's it anyone watch it multiple times but notice how in the second case we don't match the underscore character so we can write hey I want to match the underscore character as well in the start but then we don't have the first button that's why we can use the question mark to say I can either have an underscore character underscore character una and then I won't have letters that way you match a variable name which consists only of letters but you also mention variable name which starts with an underscore now of course variable names are not validated by this kind of frigates I mean this won't match any all the valid variable names for example if I write num nine this nine here won't be matched I'll have to extend this reg X to cover it but that's not important I just want to make this as a remark not to be confused what is important here is to understand how the question mark works it says match the previous thing either once or zero times so that's about but quantifiers to recap the the star quantifier the Asterix quantifier tells us to match something either 0 or more times that is it can be there or it can be there more than once many times the plus sign the plus quantifier means I want to March the previous element 100 or more times it means that if you have an empty string there that won't be valid but if you have at least one character which matches that is valid and finally the question marks tells us you can either have it zero times or once that is whatever this is it can match only one time or only zero times so it's something like a special case in which you have the normal pattern but there is one character which you can go in there which is a special case in that case you can use the question mark let's see with the star you match plus 3 5 9 something something and the plus as well why is that well the plus is always mentioning 2 into 3 cases then we're saying match a digit well here's a digit here here there is no digit with the Astrix quantifier we're saying I want to match the digit either 0 times or more in this case this is the more case because we have many digits and we're matching them but we also match this because although you only have the plus sign and no digits afterwards we said that it it can be there 0 or more times so in this case is 0 times there are zero three digits which is still correct with the plus sign however the second case won't match because the plus point pair means I want to match the digit one or more times in this case the upper scenario matches because you have a digit and you have it at least once even have it more than that but in the second scenario this is not matched because well you don't have any digit it should be Atlee there should be at least one digit there in order for this pattern to be valid and with the question mark it means I want to mark a digit only once or zero times in this case we match this zero times because there is no digit it's okay to not have a digit it's optional but in the above scenario we will only match the first two symbols of the string because this doesn't say much 0 or more times it says much either zero or exactly one time so in this case it matched exactly one time and it didn't match the rest of the pattern so with these three visual examples you can understand what was different between the different quantifiers so now let's think of a challenge I showed you that this is not a valid records for matching variable names your challenge is to write a Ribbit's which can correctly match variable names in c-sharp let me remind you a variable name variable names begin with an underscore or a letter contain letters digits and underscores so in other words the character num is correct because it starts with a letter in taste letters digits an underscore and u9 is correct underscore name is correct num underscore is correct but 9-month is not correct so I want you to write a records which will match these cases but not this okay that's your challenge go on have fun and I'll see you in the next video I'll search for both character skips which we have to mention in projects sometimes you need to look for some special characters and these special characters might be a tab for example this or a new line the way you can match this is by using escape character just like in normal programming when you well not the W this thing and this thing and the way you do it is just like you would write it in a c-sharp problem with the escaping character and the special character which denotes whatever you mean in this case we have a tap and a new one let's see this in practice I have let's a name I click tab press that then I have a new line last name tap me have so I want to match both these things not only one of them but bold and I can start writing I want to have name then I have to want a column then I want attack now if I write a space here if I click for spaces or a space it doesn't match this tab character in order to match it up Carter you have to write backslash T backslash T means I want to match a tab character which means the character two keys underneath your escape key it's the thing with which you open your score in counter-strike if you're into that stuff then I want a name now I would match the name with the w+ and now i just match this part but now i want to go to the next line a match last name with some something else in order to do that i have to match a new line well how do i imagine line I can just write enter like this I'm trying to click enter nothing happens in order to is to say I want to match new one you have to escape it and right / and like this and finally you right I won't have the last name : tap and a word again notice what we match we match this whole thing because we matched name here attack character some word and then a new line and starting again last name tab character and some word and that's it so there are some cases like this in which you would like to match these special symbols some of these special symbols are tap a new line and this is the way you can match them by using escaping characters and here you have an example okay now now four character escapes I showed you how to match this thing I wanted to complete the example by matching a whole form which is name last name phone number I want you to match correct form for normal life 3 5 9 8 9 8 8 8 20 24 7 8 9 okay yeah so this is a phone number in Bulgaria by the way and then I want you to write the records to match it and then I want to have gender and it can be M or F and that's it for now as you know so I want you to complete this example and match this whole part be careful about how you match a phone number and be careful about how much a gender by the way for matching a gender do you would need to use I think something we haven't learn now actually actually it's okay you can use character classes for them anyways in this is a bit more complicated and involve the example but it only requires what you already know for you to complete so try a bit with it experiment as we say here and do your best I'll see you in the next video another component of regular expressions are anchors there are two anchors to be exact and they define a start of the line and on end of the line how can they be used well imagine that we have this white before let's remove this example here but imagine that someone has batted it with white spaces and you want to match this pattern only if it starts at the beginning of the line you don't want to match it anywhere you want to match it only when it's in the beginning the way you can do it is by using an anchor and the anchor is an upper arrow when you write this upper arrow it means I want this thing here this first character this first part whatever it is to be at the beginning of a line if it's not I won't match it and now this pattern will only match if it's at the beginning of the line there's another anchor for matching the end of the line imagine that someone here wrote last name in hello and partners which is incorrect and yet you're still matching this pattern so you wanna say I want this part here to be the end of the line and I want I don't want have anything afterwards the way you can do it again with an anchor but this time you use a dollar sign so what the dollar sign means is I want this thing here to be the end of a line and that's it I don't want have anything afterwards so me how often partners won't work but if this is in the end wait interesting let me try something else so I want match I'll just save it here huh oh there there was some flak which I should have covered yeah it's this so in order for this to work you have to go to flags and tag multi-line basically the thing is that if you don't have this flag it means that this whole text here is a single string and you try to say I want to match the end of the string but you will only match the end of the string if you're way down here if you want to treat every line of this input as a separate string you should have that multi-line flag home so now when you have that notice how this thing here McAuliffe matches when it's at the end of the line if it's end part once it doesn't match now let us get back to the previous example I hope it will work this time and yeah it does so now we have works because it's the end of the string but if I write and partners wait I should have put the dollar sign here it doesn't work and when I remove them it works this time okay so that's about the beginning of a string and end of a string anchors you use an upper arrow or at the versa here's an example this is a bit more involved we haven't seen this yet what this syntax mean is previously we saw quantifiers which could say I want to work max hunting it let's say one or more times zero or more times and cetera et cetera but what if you want to match something exactly ten times the way you can do it is by using these brackets let me show it to so I have phone number plus three five nine eight nine four four four twenty twenty so now I write I won't have a plus I escaped it I don't have some digit I won't have it many times so now when I have this it matches this battle but what if I had a bigger number like this it will also match what if I have a shorter number it will also much I want to match a number only if it has ten characters no more and no less in that case what you can do is you can use curly brackets and specified number this number here defines how many times you want this to match in this case it matches one two three four five six seven eight nine ten times now it has to be twerpish in order to be both however notice that although this works it's too much as part of the incorrect button what must you do not match it expect the end of the string here in order for a phone number to be correct it should have twelve digits and it should be the end of the string there shouldn't be anything afterwards okay that's for the curly brackets and one more thing you can specify is a range you can say I want match anything between four characters and well in this case it says I want to match this because it has more than four characters and it has less than well and it matches this because it's exactly 12 characters so that's another way of specifying exactly how many times you want something to occur for example this you can use in order to check how weak a password is if you are making some kind of login screen or whatnot okay cool so that's about anchors now let's give you a challenge for tankers hmm what challenge must give you well anchors are quite simple actually I'll show you all the use cases that I could think of to be honest well let's say you have let's keep it simple have a phone number and I want you to complete this example so that when you have white spaces in the beginning the phone number is not matched it's a very simple so think about it it's just used as an anchor we already covered I'll see you in the next view now it's time to practice a bit more formally we have to match the full name we should already know how to do this you're given a sequence of words you should find those who are full names you can use your ax I okay so I have for next year now this isn't name so fun for iPhone no George Michael Michael Jackson Terry John okay cool so that's the that's the words now we have to match them so what what what is the what is the full name so it consists of a first name and a last name a space a first name and a last name now we already know how to match letters don't we we can use character classes so we can write from it match everything from A to Z plus space anything from A to Z plus however we have an uppercase capped as well that's why before we match this there should be an uppercase character here so we we should start our words here with the possibility for one uppercase character and should be unassisted and now notice what we get we get even even off George Michael Michael Jackson Terry John okay cool and yep that's that's our full name essentially we get an uppercase character and the star start with an uppercase character get one or more lowercase characters an uppercase character and one or more lowercase characters let us see the actual solution here it's very similar as you can see but there's one more addition it's this beating what's that be well it can come in handy sometimes but essentially it matches a non word separator so it says I want this thing here to have before it some separator in this case it's not necessary to be honest I can think of a case in which it will be maybe maybe now I can't think sir but essentially if you want to match the start of a word you can use the B character experiment with it maybe you can find some new skills where it can be useful but if you don't use it as well.you you do well for most tasks so that's our simple problem here we wanted we actually didn't weigh more complex ones before so this is just recap we have character classes we have quantifiers and some special escaping characters now if you haven't already tried yourself this is your challenge and way through in the next video another concept of regex is called groups which can become very handy sometimes so imagine that you want catch the following pattern Peter is Peter and that set and you want to catch a pattern in which this part is constant and this is a word so this is any arbitrary word as long as this part here is the same as this one so you want to match any working one but you want the second part to be the same for example you want to catch Peter is Peter and that's it but you don't want a match Peter is George and that's it okay so how do you do that well first you can write the constant part is a word and that's it and a word so now notice that we captured both patterns because we're saying as long as there's any word in these two slots will match but now how can we say I want to match this only the second word is the same as the first one in problems like this you can utilize groups what the group is is a way actually what the group does is it essentially well doesn't add much to the capturing itself but what you can do is you can have groups and later use them and repeat in and repeat them for example you can say the way you create group is you can write this after you do this this Peter here is now a group you group him so so far nothing happened but since Peter is in the group now I can say I want to match the result of the first group and now it says look the first group consists of a word any word but the second word can only be valid if it's whatever the result of the of that first group is and you can have more than one group in this case this worked well Peter is Peter and that's it and the other buttons didn't work so this has matter match the results of captured group 1 that's what a group does essentially you can use many times you can for example I added here as well and you might expect here to have Peter as well otherwise it won't be valid and as you can see you can have more than one group you can for example add this in a group as well and expect it to come here for example Peter is Peter and that's it and that's it okay so here we have group one group one which matched here and here this is group two and it matched here and that's what we define with this pattern so essentially that's what how group can be helpful it allows you to group on expression and reuse the results of it afterwards in some problems like this so you have some so the expression we put it in normal brackets and then you have a group you have a non capturing group like this which means that you can use it it's defined like this to be honest I've never used this part because I don't see any reason for but essentially when you have it what it means is you cannot use it so this is unknown capturing groups and you can say I want the result of Group one here because you haven't captured group on this is just you're just taking advantage of the brackets without defining a group which to me really is pointless because you can just use the normal groups and the other thing adds just just as clutter okay so that's groups now you can practice with it and the way you can practice it I want you to much five character per engine so baggage room is something which reads the same way from left and right a b c b a is a palindrome p8t a p is a palindrome PA t PA is not because it's not it doesn't read the same from right and left you want to capture a five letter poverty drum and that's your task now do it with records and one here use groups because they will be very health careful so that's what your challenge for learning how to use groups I've seen the next video so back reference cones constructs are something we actually already powered essentially it helps you get the result of a group and we saw how to do it the first type of back reference is slash some number like this and it tells you get would expect here the result of the capturing group he defines with the index you specify for example this is a back reference like that you say here I want the result of this group this group matches Peter I want Peter to be here as well so you already know how to use it the second thing however we haven't seen it's another variant and this way you can still match the result of a group however however you can match a name group that is you can specify a group not by its index but by a name which you'd give for example here I can write question mark name and this means what this group which I just captured is code name so you can refer to this group with the Elias name and now instead of writing one here I can write k of name however one bad part here is that this doesn't work in reg X are calm the side just doesn't work with those kinds of frigates but if you go to reg X 101 this works so this is another theory well it's the same thing as red acrylics are what has different XO interface and it seems that it supports these named groups so let's see how it works notice how this pattern matched which was the same as before this is Group 1 and this is the result of Group 1 however this time we didn't refer to the group by index but we refer to the group by name so you can use this this construct here naming groups explicitly in order to like bring a bit more clarity for example if you may mess with too many groups having groups 1 2 3 4 5 can be hard to trace that's why you can use named groups and I'll give you an example how about asking which you need to do it so let's say you have the following line 2001 2018 today is 2001 2018 you have these two lines and you want to match the first one and not the second one so what you should do is you should check whether this is the same and this or maybe okay to make it a bit more scarred the day is 20 the month is 1 the year is 2018 so you want to match both tweet all the three parameter the day is 21st the month is 0-1 the year is 1995 so we want match line one and not one two so this can be achieved with naming groups a bit more easier first let's match a date we have a digit a - see what this is yeah character digit - digit then you have a colon that day is well this is what will few for now just write this the month is this the year is this so that's all your records now we want here to match the result of the first group that's why we can add groups we can add it like this so now we have three groups and we can refer to them with back references I can write 1 here 2 here and 3 here as you can see the first line matches the second one doesn't in this example it was a bit straightforward to use these groups but imagine it was a bit more involved maybe you don't want to use numbers because you have to make groups want to use names when you want to use names you can add name groups so I write question mark day question mark month question mark here so I want here to have the day this one should be the month okay and this should be the year so now I achieve the same thing but this time I use name name groups this way your wreckage is a bit less cluttered and more easily read maybe maybe not so name groups out use in rare circumstances a bit because you don't want to make too many - you don't want to make your reg access to complicated but when it gets complicated this can help you achieve more clarity so that's about back reference essentially it's something which helps you catch the result of a group a challenge for you is to understand what this reg X and this one match I haven't explained them you actually have the result here what I want you to read them and understand why this matches the Ravens okay and this one else so it's a bit of a decryption task for you okay cool that's about back references I'll see in the next video so now we know how to work with red x's we know what they are and we know how to use them in some sites let's now see how to use them in our computer programs that's why for that purpose we'll explore a regular expressions API part one so first we start with declaring a class which allows us to use red X and C sharp and the class is called reg X it's part of the system dot X dot regular expressions namespace reg X pattern is equal to new records here I specify the pattern so first I have to include this I click out enter and suggests this which is correct except you have the namespace include and now you can start using it so the part term I will use is this one here I specified and let's say I have some text which I want to check if it matches the pattern so here's my text I added here and first of all I have to notice how I have this arrows here the errors occur because see sharp things dis and trying to escape the K here with an escaping character and it says there is no special character K whatsoever so destroying an error for records you would have to double escape well you have to escape backslash in order for it to work so you have to add additional backslash here like this everywhere the backslash occurs and I think that's enough ok now what you can do is you can use the is match method the way you can use it is you just write if if pattern dot is match of some text then say the text matches else say the text well it doesn't match after I run this program I see that the text matches if I change this to George it no longer matches so that's how you can use the knowledge you just learned into your computer programs one suggestion I'll give you is when you're writing radix when you're using reg X in some tasks I would advise you to always start from this side reg X 101.com or reg X are calm the reason is that here you have immediate feedback and you can see if anything is not OK for example if I accidentally put a white space here I would immediately see see that I won't match my text but if you do that things too sharp and immediately put a white space here the text doesn't match although it looks okay so I try I now have to debug which is now okay that's why a suggestion is always start from this test I always start from this side and then go to the source code so that's about the east match method basically checks if a text directly matches your father another thing you can use is the match method the match method basically returns something like a substring which got matched for example if I want to have well I want to extract a full name out of a text so first let me remove the buttons so I have hello my name is person person north and that's it so you want to extract this full name let's first copy our text and put it here so I want a match full name where a full name starts with an uppercase character it has lowercase characters and I use the plus quantifier in order to match them and then this repeats so that's how I extract the full name out of this string now I copy this pattern I put it inside my reddit and I can say match M is equal to button dot match of the text and now I can use this this is actually something which gives me access to the substring which God found for example I can say write M 0 well m dot groups of 0 a bit strange but anyways this will return the substring which got matched person personal so that way if you have a large text you can extract let's say all the full names well for now we just know how to extract a full name but assume you learn how to extract many names or whatever you need and work with them so ray X is very useful for data manipulations like that so that's the match now try these two methods out is match and match and I'll see you in the next videos now let's explore some additional methods of this reg X API the first one is looking for matches that is what if you don't want to extract just a full name out of a string you want to extract all the full names for example here I have hello my name is person personal this is my Billie Jean and my other friend Laura Evans so you want to extract these names now if I just have this text here like this I paste it and I just find a match I'll find the first full name but what if I want to iterate through all the full names and extract them all in this case you can use not match but matches what this returns is a match collection which can be iterated you can say for each match M in matches you should be renamed now I want to write match those groups of 0 so now I can have all the full names by for reaching the matched collection apart from that you have some additional space info like let's say total names found is matches count so you can see how many fundings you found in total and things like that ok one thing here is that when you have groups you can access them by using this index search something I didn't show you let's say you have the this example okay you wanna match this you want to match a digit digit I digit what you wanna add groups so the day is - one the month is / - we already saw this just repeating okay this is my butter it has some groups in it I put money pattern well I have to escape now every bags large there is here okay cool so that's it that's my button now I just need a text this is text and now I want to find a match match M is equal to pattern match of text so now if I write m dot groups of zero I'll get everything I matched the whole string including but if I ask for the specific groups like Group 1 2 & 3 what I'll get is the capture groups for example groups of 1 gives me 20 and this was the first group groups of 2 gives me 1 groups of 3 gives in 2018 etc etc so this way I can catch some I can extract some specific groups out of text for example what if you have a text with a date in it and you want to extract the day from that day well first you got a match a text with a date which you can do with your knowledge so far but if you also add a group to your day let's say you can easily extract it later so that's for the match API and that matches basically it helps you give you a match collection with which you can iterate you can also replace you can say hey have a some text and I want to replace something for example let's say you want to hide all the names in the text so I want to match a full name with my notorious reg X which is plain and simple like this now I want to censor this string I can say look here's my red X nothing needs to be removed here is my text okay and now I can say well pattern well pattern dot replace text wait yeah so I should add the text and the replacement which is five stars or maybe like this since it's a full name so this returns a new string okay there's no strings are immutable you can't modify them in place that's why I need a result like this now I can write the result so notice how I just had a text and I hid I have just hit on all the names in it only easy way to achieve it is using records as you can see and the replace functionality and finally the final thing we'll see of this class is splitting as you know when you have some text normally you can split it by specifying some characters but sometimes it can be way easier to just use a records let's say you have text which has numbers one two three four five you have these numbers and they have unspecified whitespace length of whitespace between them one way you can split it is by using that method we already know another way is to use rails you can say look I want to match all the whitespace characters there I want to match a whitespace once or more so you match this this reg X will match this part this part and all the others and now I can say pattern dot split text this will return a string array okay and now I can iterate through time for each string in tokens right the top and once I do that well first I have to escape once I do that I get all the numbers that way I can get the string and split it no matter how many white spaces it may have this can be it can also be achieved with the normal split utility you already knew from the we already know from the previous lecture but you can also do it with Dragons and there might be some other more convoluted way convoluted example in which you have multiple characters for example let's say you can you can split by whitespace you have some special characters and things like that and you still want to extract the numbers how do you do it well I'll just use that here first you need a character class specify all the special symbols like this yep and match a white space as well then you match any of those multiple times like this and you split by them and once you do that you still get all the numbers so that's an example of an input which is a bit harder to parse using normal utilities but with reg X it's relative ease okay cool so that's about splitting and that's about the regular expressions API we learn is match match matches splits and replace so let's recap is match returns a boolean telling whether the text matches the pattern or not match returns 1 exactly 1 match which is the first case in which your pattern hits something which which matched it matches returns a collection of matches which can be iterated that way you can for example extract all the full names out of strings not only the first one is in the match case then we so replace which allows you to replace a pattern you encounter in a string with whatever text is specified and then you can have split this is the final thing we cover which allows you to split a string using projects not using enumeration of characters so that's about the regular expressions API play a bit with it and you have the opportunity for that in the exercises I'll see you in the next video [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter] [Laughter]
Info
Channel: Svetlin Nakov
Views: 5,744
Rating: undefined out of 5
Keywords:
Id: DS9IO0W7-0Q
Channel Id: undefined
Length: 105min 14sec (6314 seconds)
Published: Thu Mar 19 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.