WHAT Is "Glob" In Python?! (It's Actually Very Useful!)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this lesson we're going to be talking about the globe module and that's quite a mouthful glob is just such a weird word no matter how many times you say it it doesn't get any better glob globe globe as you can hear it's not getting any better but let's discuss it because it's a very powerful module in Python and the glob module finds all the path names matching a specified pattern according to the rules used by the Unix shell and the results are going to be returned in an arbitrary order so we kind of have this file finding feature and I'm going to be showing you how it works of course rather than just reading the documentation but it's important to note that the pattern rules for Globs are not regular Expressions instead they follow standard Unix path expansion rules and you'll know what this means as soon as we start writing these regex like expressions and finally for the more advanced programmers the shell variable names and tilde are not expanded when you are working with glow job so that was just a bit of dry text to really kick off this tutorial I know how much you guys love dry text so I thought that would be a great start but let's jump into actually using the glob module so to get Globs to actually work you need to have some files somewhere on your computer that you want to Target so in this example I created three JavaScript files one is ananas.js apple.js and banana dot JS but I have a lot of JavaScript files inside this project so we will be finding those as well using the glob module now the first thing we're going to do is try using Globe so you can get a general understanding on how it works so here we have globe.glob and with this function we can specify some sort of pattern for the file that we're trying to find so here we're going to type in for example apple.js and I mean if we print this now this won't be the most useful function in the world because we literally typed in apple.js and it's going to look for a file in our current folder that matches apple.js and return it inside an array if there's no file that matches this pattern it's not going to return anything to to us where this becomes powerful is when we start using these Unix path expansion rules so for example instead of typing Apple we can pretend we don't know what the first two letters are and we can use the question mark kind of like a wild card for a character so if we run that we'll get apple dot JS now we can both agree that banana and ananas have the exact same amount of letters so one cool thing you can do of course is Type in let's say six question marks and with this we will find all files that start with six random characters and end with JavaScript so if we run that we'll get ananas and JavaScript back so the question mark just matches any single character and I'm going to be pasting in these comments so you can remember these as we move on with this program but to keep it simple I'll just type in question mark probable and if we run that of course we get apple dot JS back in the array now something that you might find even more useful is the absolute wildcard so we can just copy and paste that and instead of typing in apple.js we can type in asterisk.js and this will absorb all of the content before dot JS so it will match that and then if it ends with DOT JS it's going to put it inside our array so if we run this we will get apple JavaScript and a Nas JavaScript and banana.javascript all back inside this glob because it matched everything before the dot JS and you can also do it with DOT pi and other extensions whatever you want to match you can use it inside here and if we run that we will get main.pi back because this matched all the characters before dot pi and you can even be crazy and do asterisk dot asterisk so it will match everything in front of the dot and then the file must contain a DOT and it will match everything after the dot so if we run that we will get all of the files back but really we just want to get all of the JavaScript files in this folder so we will type in asterisk.javascript now I will put this down so we have some more space now let's copy and paste this down here because we have something else to cover and that is the square brackets the square brackets allow us to insert some characters that we want to use to match a certain character from a file so instead of choosing to match any character we can use these square brackets to match only specified characters so if we put a and v and D it will only match these characters for the first character that we're searching for so if we put avd the only thing we're going to get back for the first character is ananas and Apple and we still need to Define what the rest of it is going to look like so we can put an asterisk there so the first character must be one of these and then the rest can be whatever it wants followed by dot JS so if we run that we'll get apple and ananas back because a was inside the selection but V and D was not but if we put something such as b and a it's going to allow both b and a to be matched so if we run that we will get everything back because the first character in banana is a b which is inside this square bracket and the first character of Apple is also inside the square bracket so those are the First characters that must match if we put something such as Z and let's say F it's not going to match anything because none of these have Z or F for the first character so simply put this matches any character in the sequence now there's also the exact opposite so let's just take this paste it under and add the negation symbol here so now this checks that these are not in this sequence before matching the file so if Z and F are not located at the first character it's going to match it and we can just add that comment as well so we can add matches any character not in the sequence and to demonstrate this we will add another file called zebra dot JavaScript and we'll go back to main and now if we run it you'll see that zebra is not going to be inside this glob array we will only have apple ananas and banana but in the one where we did do the matching we will only get zebra back because Zed is inside this sequence while the other characters are not so that was just the basic introduction to how we can use globe with its syntax but now let's actually use it in some context that makes more sense such as searching folders and inside folders because if we're only searching in our current folder it maybe doesn't make too much sense if it's this small because we can literally see all the files here but maybe we want to look inside all of these folders to find out which ones are JavaScript files and which ones are python files so we're going to just remove all of this now that we understand the syntax we're going to type in print glob dot Club and we're going to use something a bit more special which is the double asterisk and then backslash dot JavaScript and actually we need to add an asterisk here so what this is saying is that we want to search all of the folders recursively so we're going to check in let's say scheduled for example open it up check in shorts for example open it up and if there's a Javascript file inside here it's going to match it here so this will just open up the folders indefinitely now we also need to specify a root directory so here we can type in root directory and inside the root directory I'm just going to copy the path and reference the absolute path in reference to my root directory for this folder but it can be any root directory from your computer so we will just use this one here then we'll use a comma once again and we need to set recursive to true so it continuously looks inside the folders and in Python 3.11 they introduced the include hidden parameter which allows us to check for hidden files which I'm just going to set to true now if we run this program we're going to get all the JavaScript files back from my python folder so here as you can see it actually recursively checked inside schedule so it opened up scheduled it looked inside all the shorts there was nothing there then it opened up videos it realized that there was another section called Globs and I had some JavaScript files in there so it was able to find them so here we have of course test.javascript hello.javascript and index.javascript and you can still use the rules here so you can also say something such as we only want files that begin with age or with I and if we run that we're going to get less files back such as hello.javascript and index.javascript because those were the rules that we defined but I like it better with that asterisk so I will leave it there but one thing to mention about using globe.club is that this can become really slow especially if you have lots of files because it's going to load them all at once or put them all into an array all at once so you're going to have a lot of data on your hands if you just run this directly and store it directly so a way to avoid loading all of these files into memory immediately is by creating a generator from this and to do that we're going to remove the print statement and here we'll type in Globs equal and at the moment it's not doing anything of course so what we have to do here is say glob dot iglob for iterator glob and that will turn it into a generator so we can access that data in a more memory efficient way but everything else is exactly the same now we can actually access it in a more memory efficient way so we can access this one element at a time for example we can type in print Globs dot next and if we run that we will get the first element back we can say next next two more times and we'll get each one on demand now if you want to ignore that and print all of them anyway you can type in 4i file in enumerate and here I will type in Globs starting at one and we'll print I followed by the file with a separator of a colon and a space so if we run that we can see how many files we have that are JavaScript files in our main folder which is my Pi December folder now again this works for other files as well so you can Define whatever rules you want and if you change this to dot Pi it's going to be much bigger as you can see right here we have a lot of Pi files inside our program and this can easily go up into the tens of thousands depending on what files you're looking for in your computer so it might be smarter to create a generator instead of just loading all of those into memory immediately anyway do let me know what you think about glob or whether you use this in your code or not already but with that being said as always thanks for watching and I'll see you in the next video
Info
Channel: Indently
Views: 22,741
Rating: undefined out of 5
Keywords: pyton, pyhton, pythn
Id: tATFQUx0Zx0
Channel Id: undefined
Length: 11min 14sec (674 seconds)
Published: Fri Dec 09 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.