Python Get Files from SharePoint and Load to Azure Storage

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what up everybody what up what up i am blue today we're going to be talking about how to download files from sharepoint but we're going to upload them to an azure storage blob so i got a request i made a video before related to downloading from sharepoint and uploading files to an s3 bucket so i had a viewer ask me hey can you make a video a similar video but have it get uploaded to an azure storage blob so i told them yeah you know we'll i'll put a video together so ultimately what we're going to end up doing is i'm going to end up cloning that project the sharepoint s3 i'm going to make some small tweaks and ultimately just direct it to a storage blog but before we get started guys you give me a like give me a follow i appreciate it man i really do hey the sponsor of today's video is go share it go check them out at go share it dot io it's a sharing platform that allows you to share large and small files where it makes it very easy to share your recipient all they need is a download link and they're good to go all right guys let's get started so i have a empty project right now um i just created a new folder i just called it python sharepoint to azure blob again don't have no files i did i so went through the process already i created a virtual environment um i installed there are two packages that we will need let me get go ahead and bring them up so you can get those installed as well we're going to need python azure storage blob so make sure to install this package and you're also going to need um the sure pump so make sure to install to install this package as well so again install those two packages because that's what we're going to be using um now because i did mention we're going to be taking um we're going to take source code from the previous project that i made which is the sharepoint to s3 let me go to that repo which is over here so you can find it under my uh github account which is python download sure enough that's not it uh which one is it it's going to be oh there goes sharepoint files to s3 so we're going to take the some of the same files we have here right we're going to take our configuration let me go ahead and copy this and then we create a new file in here we're going to call this config json we're going to make some tweaks here again we're not dealing with aws but nevertheless this is our starting point and the next thing that i'm going to copy from here is going to be um our sharepoint same thing called a sharepoint.py file let's copy that and then the third thing is going to be our project file so here's the thing we're going to be tweaking these files right we're going to be making changes to these files and we'll walk through the process of what those changes consist of and so forth but you want to make sure we we're pretty much starting with that starting from there that's our starting point um [Music] and then we'll make the changes that we need so again make sure to install those two packages that we talked about python um azure storage blob and uh sure pump once you get those two installed let's start off with our configuration file first so that's gonna be our first thing that we uh tweak so let's go to our configuration file so this could stay the same okay it's again this is because we're connected to sharepoint there's nothing different there what we do want to change is going to be to make it bigger i want to make sure that it's there it goes easy for you to see well we are going to change this guy like again the this configuration with more specific for aws well now we're going to change that where it's more specific for azure storage so we're going to change this to azure storage okay next our first one is our first uh key is going um key value is gonna be azure account name second is gonna be azure access key and then the third one is going to be container name right so those are our three they're gonna have the account name access key and container number so now that we kind of have this configuration in place if you're not familiar where to go get this information let's kind of walk through that real quick so let me go to my azure [Music] um portal so when you're when you're logged into azure go to storage accounts and in my case i have three accounts right now um so one of them is this guy is called sharepoint test i am liu so i'm going to click on this storage account okay and if you were to scroll down to where it says security network there's a section for access keys so go ahead and click on that this is where you will get all your information see we have account name here so i'm gonna go ahead and copy this and this would be my account name and then if i'm not going to show it but if you were to click on show key and connection string this key becomes the visible and then there will be a copy button here go ahead and copy that key that key value would go here this is where your key would go um and then of course we would have our container name our container name would be if we were to scroll up see under data storage we have containers in this case i have five different containers so i'm gonna put my files into this staging container so this is going to be the name of my container it's going to be staging now if you happen to have folders inside that container like let's say maybe 2022 then you know january something of that sort then just type it out like the way i did right type in you know forward slash every folder um folder pass into you know until you get to the destination where you want your files to drop into but in my case i'm just dropping into my staging container and then my blobs will get created in there so that's all you got to do again pretty straightforward in my mind to get set up um easier than aws in my opinion so it's pretty straightforward um so once you got all of that set up again up here for sure for your sharepoint um the the setup would be similar like the way we did the other video if you haven't seen the other video ultimately this is only for office 365 so you're utilizing office um outlook sharepoint on-prem this is not the configuration that would work but either way it would be your email account then of course your password that you use to log in to office 365 and then you would have your url on site so i think this is where some people get a little bit confusing knowing exactly so let me kind of show you uh my sharepoint account so you kind of get an idea of what it is so right now if we look at the url see how we have the uh sub domain which is i am liu that is dot sharepoint.com so this section here of the url is the url piece that we're going to use okay now if i go back when it comes to site it's going to be see where right after that you have sites then you have the name then you have a forward slash so it's going to be whatever name is called in my case it's called the development that's you know here's the name here as well too but in your case it will be whatever that name is so again it would be the subdomain the domain which is sharepoint.com sites forward slash then whatever your site name is so we will copy that you will paste it in here and then forward slash as well at the end so that's that setup again i think some people have got confused on how to configure that the other piece i think some people can confuse that this guy here these shared documents so if i go back the way i normally configure my shared documents so so this doc library is pretty much this section here now as you can tell i'm not providing anything inside my doc library because i want that to be dynamic right but if you did not want it to be dynamic at all then you could specify um the past so if again if i go back to sharepoint i could specify data 2022 and then of course i have my files i could do that here as well i could call this data 2022 but i don't like doing it that way the reason why is i like to specify the root the section of my library which is shared documents think of that as the the the main section for all of your your folders um in sharepoint and then from there you could have many folders right like in my case i only have one but you could have ten folders in here that's where i like to have it more dynamic where you will specify what folder you want to access not to the configuration but more so through our python code you would specify i want to access this folder and this you know subfolder pass and so like that you can reuse the code so leave it blank like when i say blank don't leave it blank believe it as is this do not change leave it as this and in our configuration when we run the projects you can specify the starting point of your file and again i'll show you in a minute so we'll walk through that process so hopefully this makes it sense on the configuration uh next we're going to do i'll put in i'll put my key in in a minute but just kind of want to walk you through that process uh let me move this to the side because i am going to need it in a few uh let's see all right cool so let's go to our sharepoint so our sharepoint there's nothing file we're not changing nothing right it's gonna we're gonna leave it as is we're gonna keep it as there are there's one thing wouldn't change but this this is more of my i should i should have done it this way to begin with but i didn't um we're going to bring in uh pass live in point pure pass so what this does is it ultimately is a better way to join directory passing files together the way i did it right here this only works for windows strictly window the windows only but it doesn't work for mac or linux if people run my code on the mac linux in order to make it more um um normalized to work on all operating systems i should have used peer pass to begin with and that's my mistake but that what that does is that it builds your your um your past based on the operating system that you're using so you're using mac or linux it builds it in the mac linux format uh which is forward slash or compared to backslashes and so on where windows the slashes are different so what i'm going to do here is i'm going to let's just kind of you know just kind of compare the two so it'll be config pass pure pass then literally would be root directory um then it would be in this case config json that's it you know that's the replacement here so these two do the same thing the only difference is peer pass does it in the format that works for your operating system whether it be mac linux windows doesn't matter all right so now that i have this i can remove that out of the way but that's the only change it changes uh that we're doing here um okay so now i don't need nothing else from here now we're gonna go to our project file so this file they're gonna be a good amount of changes that we're gonna do like for one we don't need this this was strictly for aws since we're not dealing with aws no more uh we don't need it no more so that's gone um for azure it is called from azure what storage blob import blob client okay again this is all part of your your azure storage blob package that we installed earlier um [Music] okay so now that we have that um there's a few changes that i've noticed as well that we're changing it has nothing to do with aws or azure it has more to do on the way it was working the way it was working before is it downloads from sharepoint but it saves it locally on your hard on your hard drive and then it takes the files reads it again then uploads it to s3 well there's no need to download save and then re-upload what we need to do just as we download and we have that object in memory we're going to take that object and then send it straight to the blob so because of that we don't need this we don't need this local or remote folder destination this is this argument here was where do you want to save the file that you're going to download from sharepoint we don't we don't want to save them because we're going to keep them in memory so there's no need to save them nowhere on your hard drive so i'm going to remove that which means ultimately now we're only going to have three arguments total and i need to change these as well the index from two three okay um same thing here with the configuration we need to bring in um from pass lib import pre-pass and then we're going to replace this pure pass but it's not a list format so i need to take that away so there you go we have our our directory pass in our file boom that's good um also now what we're going to do is see how when we're reading this configuration we're reading the aws bucket well that's that's not what we're doing no more now we're reading the azure storage you know that's the key so i'm going to replace it with azure storage and ultimately all of these will go away we don't need none of these no more i'm going to delete all of those and i'm going to replace it with the azure names the azure keys which would be azure account name azure access key and container name right so in this case it would be config container name config azure access key oops and config azure account name because again these are the keys over here so ultimately the value they're going to get assigned to these variables right here that we're going to use in a process okay so now that we have that in place see how we have this upload to s3 we don't need that because we're not uploading to s3 no more so this could go away i'm gonna delete this we're gonna replace it with function use for azure storage okay so we we're gonna create a another function here but before we do that let me kind of go through the process and clean up the other stuff so this was specific for aws as well bucket subfolder built we're not building the subfolders because we don't we don't have that need here um so this is going to get deleted as well again that was strictly for aws same thing for save file we're not saving the file anymore because remember what was happening before in the other in this the original source code we're downloading from sharepoint saving it on the local machine and then we re read the files and then we upload it to aws s3 well there's no need for that in reality we since we're already downloading the files from sharepoint we have that object in memory since we have it in memory we could just go ahead and take that object and then pass it through to upload directly to s3 so there's no need to even save it's one of those deals where the more i thought about it i looked at the code yeah it doesn't make sense to do that um so this is fine sharepoint download files yep again we're not saving nothing no more so we're going to remove this save action uh we don't need to build a file directory pass no more because that was only more that was more so to be able to read the files locally but we're not reading it anymore it's already in memory and then this guy this file name was more so to generate our file name for we were dealing with subfolders which we're not doing we don't have to do that in um azure blob so this is going to get removed and then on top of that we're you know we're going to have a new function here not this guy so this could actually go away um and we'll update this in a minute we're gonna have to add this new function that we create over here i just kind of go through and clean up all the stuff that we don't need uh this could stay the same that could stay the same so all of this could stay the same so now what we're going to end up doing is we're going to create our our new function so our new function is going to be called what do we call it upload file to blob and it's going to take in two arguments so in this case it's going to be file object because again we're not reading there's no need to read a file or not that we're just taking the the the object of the file then we need to bring in the file name okay um okay so what's in here we're going to create a blob object and this is where we're going to end up calling our blob client okay and then we're going to call the method which is uh what is called from connection string okay so we're gonna have a few oops a few arguments here one of them is going to be called connection string the other one it's going to be called container name and then the third one is going to be called my bad not third one we got four so the third one is going to be the blob name and then the last one is going to be um credentials okay so now let's start populating well what's all of this here right everything we need to populate there is one thing that i forgot that we need to do up here which is see how we have our connection stream we need to build that connection stream um so we're going to create another variable we're going to have another object here which is going to be called azure connect string now this is going to be a combination of um so like if you're familiar with databases you have this you know you specify the database the username password all of that and your connection strings very similar to what that's what this is here so in this case let me go ahead and kind of copy what i do have already uh kind of speed up the process so let's take a look at it real quick let me close this out so we can take a look at it it's a bit long so we have our default endpoint protocol and then in this case https then we have our account name which again this is our account name up here from a configuration file account key same thing coming from our configuration file our endpoint is just core.window.net all right so that's pretty much as building our azure connection stream and that's where down here that's what we're going to end up you know calling that azure connection string and then under container name we're gonna end up calling the um container name object and then under blob name so remember blob name is same as your file name so if you're downloading um let's say five files this has to be a variable has to be unique and this is where this is why we're passing a file name over here and as a argument this is going to be our file name when we save it out to the blob we want it to save under the under the same file name if you don't want it to be under same file name or you want to like modify it then of course you could do stuff like maybe you want to add for example whatever the file name is but then you want to add to it maybe you want to add a um um some sort of date time aspect at the end or whatever you could do that but in this case i'm just saving it with the same file name you know based on what with the sharepoint then for credentials this is going to be azure access key that's our credentials right there so ultimately this generates a connection to our blob which is specifying a file name now what we where we're going to need to do is what we need to provide well what is it that we're going to save right like we created the connection we have a file name established but now we need to pass some kind of data to you know to save to this blob and that's where we're going to [Music] call the upload uh message which will be blob upload um blob and in this case it would be file object again that's where we're passing in over here this file object that's what we're going to end up saving to this blob and that's it that's all we would have to do here nothing else and we're done so it's pretty pretty straightforward so now that we created this new function for for the upload piece to azure but we need to add that in there get files section because what we're doing here is this is where we're getting the files and where we have our file object so now we're gonna have to pass this we're gonna we're gonna end up calling our upload files to blob we're going to pass the file object that we have there and then we're going to end up passing in the file name as well as an argument right file name which again we need because we're trying to save under the same file name so now that we have that established um let's see i think that's it man i think we're good to go so let me kind of go back to the sharepoint just because we could see everything in action so what i'm going to do and i'm going to show you on how to do this as your argument but i'm i'm going to be calling the data 2022 folder and i want to be able to pull all of these files download all of these files that upload it to my blog my azure storage right that's what i want to do that's to go you can tell i'm dealing with two different formats csv and excel okay and my pass here is not documents again whenever documents is kind of think of that as the root pass which is part of the configuration um so as i specify here with shared documents think of this as being the the starting point which is this guy this document section here after you you come to this the root pass for your sharepoint then you have your your your first folder which is data and then of course 2022 and that's what i'm going to be passing in data 2022 is one of my arguments of this is a folder that i want to pull data from to ultimately upload to s3 right so what i'm going to do next now let me put in my credentials here my email password and my azure access key and um yeah then we'll test it out make sure that it works so let me go ahead and do that right quick all right cool let me go back share my screen all right so now that i have my configuration completed with my credentials now i should be able to run this and what should happen again it literally should download file from sharepoint it's not going to save it locally it keeps in the memory and then it uploads it directly to blob and let's go to our storage our azure storage to make sure that it does load properly so if you could tell right now i'm under under storage accounts this is a sharepoint um this is the um my account which is sharepoint tests i am blue then inside here i have a stage because that's what i specified in my configuration i want my container name to be staging so i'm going to click on staging and i have no data here right here refresh no data so now when i run it what should happen is i'm going to run python let me clear this out just to make it bigger all right cool let me make this bigger because it's pretty small so i'm going to end up calling again activate your environment so i have my environment activated which is going to be um python project so remember we have three arguments we got rid of one we don't need no more but we have three object uh arguments here we have folder name this is a sharepoint if we look at the description sharepoint folder name may include subfolders so my case is youtube 2022 or really in my case it's called data 2022 right that's my my starting point so again if i go back as you can tell it's not documents you don't need to specify documents it's going to be the first folder in here which is data 2022. next i'm going to hit space the next section is uh sharepoint file name if i'm trying to download like one specific file well i'm not trying to download one file i want to download many files but assuming that you just want to download one file only then you can just specify like file.csv if that's the case but in my case that's not the case i want to download all files so i'm going to put call this none that's going to be my argument third third argument here is file name patterns so if i go back and here's a good example see i have this pattern of customer february customer january but then this one says sample csv file so let's say i just wanted to download the customer files only then what i could do for argument three i could specify customer and it's gonna look for that pattern and only download files that contain the customer name in the file name so just keep that in mind you could provide a pattern argument if you're trying to download specific files in my case i want to download all files so i'm going to specify none once i do that um i think we're good man everything seems good i think we're good um let's give it a test and see what happens uh no such file blah blah blah what happened here oops i forgot to specify project.py that's my mistake so let's go back to project dot py now let's run it again cross fingers does the work does not work what do we get no errors so far is it going to work first try up no errors that's that's good so now let's go to [Music] um let's go to our our storage so i'm going to hit refresh boom we have our four files in there so again we're able to connect to our sharepoint download these four files and then we end up uploading it to a specific container in a specific storage account and then the file got dropped into that container again guys hopefully this helps out i got a request from somebody and you could tell what we did we just we took code that we had already for another project and we just kind of did some small modifications to it uh one one of the big changes that we did here anyways was the download and then saving it locally then re-read the file then re-upload it there's no need to add that piece there i when i reviewed the code today earlier today i saw that in my mind i was thinking like why the hell did i do it that way it makes no sense so of course got rid of it and um and it works right just keep it in memory and then we send it on over to um to the to the destination so again hopefully this helps um if you have any requests let me know you know you can send me an email at contact i am at imleu.net um and i'll try to make some more videos man based on the requests that i be getting again guys give me a like give me a follow and peace
Info
Channel: I am Lu
Views: 6,268
Rating: undefined out of 5
Keywords:
Id: 5ALAkznC_xM
Channel Id: undefined
Length: 32min 15sec (1935 seconds)
Published: Sun Jun 12 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.