Python Regular expressions -part #4 - Character Sets - Custom Character sets

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
alright what's going on guys in this video we'll be going over more types of character sets so in the previous video we won over the backslash W and backslash capital W and those were mainly to target alphanumeric characters but in this video we're going to be looking at other types of character sets alright so the first character set I want to look at it is backslash D so this backslash G matches digits so anything from 0 to 9 the backslash T will match and the backslash uppercase T is just the complement of the backslash lowercase D so in this case it's going to match anything that the backslash D does that match all right so let's just look at examples was that ok so we have a string it has two digits and a bunch of characters and then two plus symbols so we're going to run this back side D with of the plots quantifiers we're going to run the character set backslid with the plus quantifier and see what we pull out remember it matches digits so as expected it pulls up the two digits located within the string yeah so that's it for that pretty self-explanatory let me go into some other character sets and our backslash s represents any whitespace character and backslash capital S is the complement of backslash S so it represents any non whitespace character so whitespace character is new lines tabs spaces etc so when you see a backslash N or backslash T those are all whitespace characters non whitespace characters are pretty much none of those things so they're just the compliment of backsplashes all right so let's just take another look okay so here we have no white spaces right so we have two digits and we have a bunch of letters and we have - plus characters so backslash capital s matches anything that's not a whitespace so in this space since there's no white space we could just run this and it should grab the full string let's just run this and it grabs the full string all right okay so now have another example so this is this string so now we're going to actually run so I have a pretty lengthy string and now we're going to use re dot find all so it's going to find all the words essentially it's finding everything that's not a empty space so so excuse me so essentially it's going to pull out all instances of words that are not spaces so in this case it's going to just ignore all the empty spaces and just pull out the word so essentially we're getting all of the text without a clean format so if I run this let's see so it's pretty lengthy so let's just run this again robots are branching out and you let's see University of California then there's a plastic heels let's see if they talk about the meters it seems to be pulling out everything 3d structures 72 meters 10 meters so I feel like it's pulling out everything and it should be based on the regular expression so let's see one thing we can do is you can use the join so we could essentially pull out all of the characters all the words within an article using this backslash X and create a list out of that and one thing we can do is we can actually join it back together using spaces and we'll get the article back so this is the first time a robot takes so this seems to be in your lab so before in the lab we have a camera sensor so let's just make sure camera simply so essentially it's pulling out the whole article so yeah so backslash s um what it does is pull out all the words without the spaces and then we could you join everything together and we get the article back so yeah so that's backslash s okay now the next thing I want to show you is um the dot okay the dot matches any character except the newline there's actually flag you can use to have the dot plot the newline but we'll go over flags in another video so the dot it essentially represents all characters any type of characters except than you want so we have the string and now let's just let's just try out dot and then we're going to use the quantifier plus okay so dot is the character set so doctor character set and the plus is the quantifier so let's just run this and see what output we get okay so we run it and it says plants are branching out a new prototype soft robotics inspiration from plants and it ends to explore its environment then it ends so let's just take a look at our original string so if you look at our original string the string ends at environment and then there's a new line so that starts with the new line vines and fungi extended from there tips so this is the new line so we essentially I'm grab everything we can before we hit the new line actually I'll just give you an example of the flag so already done third we're actually just going to just copy all this just get rid of this okay so the flag is the third parameter flags I think it's Flag sorry flag equals and it's re dot fall so re dot all means include the new line so this flag essentially says include the new line so we should be pulling out the entire article if they run this okay so I pulled up the entire article with the new lines so we have all this new lines so we can actually just get rid of the new lines by using strip on the sitting but if this works I know strip we can use strip because it's included in the middle yeah so there's the new lines are within the middle so we won't be able to get it out with the simple strip but yeah so dot does a character says that posing all characters and if you use the dot except a new line all characters except in your line and if you use the doll it includes the new line all right so those are essentially um some of the most common character sets and now the next thing I want to go into is creating your own character sets so yes we can create our own character sets and that's just dive right into this there we go all right so to create your character set we have to use these square brackets so this these square brackets are also another metacharacter within regular expressions that are used to create your own character sets so if we create these square brackets they have a special meaning and that's some anything included within these square brackets is included in your custom character set that you're creating now this uh this - is also a special metal character and this - is not a character that's included in this a AZ custom set so it's not a - Z that's our custom set it's actually a to Z so anything from A to Z is part of our custom set so this - if you actually want to use this - what you would have to do is you have to probably you would have to backslash T - to use it as a regular character string character as opposed to the meta character that it is within this are regular expressions okay so A to Z represents all uppercase letters so let's just implement this custom character set and see how it works okay so we have a string hello there how are you right hello there how are you as a string it has four capital v capital letters commas and each word is separated by commas so let's just run this we have our string now okay so our character set is going to pull out this character set is going to pull out all the capital letters because remember we're using already defined all so any capital letter it finds it pulls it out so in our case we have a capital letter here here here here and here such you pull out all of these so let's just run it okay so H HT h au a ey h th a one so pulled out all the capital letters now let's look at another custom set so we have everything from A to Z and a comma so this comma you got to be careful it looks like we're separating items in a list but but this is actually not a meta character this is a string character so this comma is actually a comma it's going to search for a comma so our character set is actually anything from capital anything from A to Z capital letters anything from A to Z capital letters and a comma so all these are included in our custom set so we should be pulling out all this and addition to all that we should be pulling out these comments as well so let's just run this so we pull out H the comma T comma H so we're pulling out the commas as well okay so now okay so we have a new string now okay um so this is a dot dot dot so this new string is dot dot dot now within these are meta characters within this custom set this dot doesn't represent the dot that we're so used to the dot and regular expressions is a actually part of a character set but when when we're creating a custom set the dot is actually just a period and not the character set that we explored a little earlier here so it's not the same same data at this table you got to be careful anything going into these two square brackets a lot of these are character sets lose their meaning so yeah so just keep that in mind alright so we have a new string and it ends with dot dot right so dot dot dot so let's just run this and see well it pulls out so it should pull out everything from A to C to pull out all the commas and it should pull out all the dots to run this it pulls on everything we pull up before and now we're just including the three dots all right so now the next string that we're going to look at on I've actually included backslash in s so this backslash in s means empty space I think we're allowed to use this because it's it's a Python metacharacter this dot is actually a regular expression metacharacter while this back backslash s is actually a Python Marik meta character so it works in this case so now let's just take a look at what we're pulling out this is a pretty huge custom set we're pointing out all the capital letters that we're pulling all the lowercase letters we're pulling out any commas we're pulling out any spaces and we're pulling our new periods so essentially pulling out this whole string so if we run this let's see hello there how are you and it pulls out essentially the whole string and yeah so this was just a brief introduction to custom sets so let me just get out of here this video is a little longer than I expected so we will continue the next video with quantifiers we're going to use quantifiers with custom sets so I'll see you guys in the next video
Info
Channel: PyMoondra
Views: 11,607
Rating: 4.9576721 out of 5
Keywords: Python Beautiful Soup, Python Requests, Python Hacking, Python Webscraping 2017, Python webscraping tutorials, Python Webscraping for beginners, Advanced Python Webscraping, Python Webscraping, Scraping gamefaqs, Scraping videogames sites, Intermediate Python, Advanced Python, Python Webscraping 2018, Python regular expressions Intro to Python regular expressions, Gentle Intro Python regular expressions
Id: IqF_XGrFbC8
Channel Id: undefined
Length: 12min 13sec (733 seconds)
Published: Mon Jul 24 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.