Python SharePoint files to AWS S3

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey what up y'all i am lou and today we're going to be talking about taking sharepoint files downloading sharepoint files it could be csv files whatever whatever type it doesn't matter and we're going to upload these files directly to uh aws s3 bucket got a few requests about this video so decide to go ahead and put something together we're going to be taking code from a previous project that i made which is the downloading file from sharepoint so we actually want to start off with that code base but we're going to add to it right we're going to add the functionality to be able to connect to your s aws environment and then also of course communicate with your s3 bucket to be able to upload files so let's get started alright guys before we get started give me a like give me a follow that's all i ask if you can man i appreciate it let's go so i created a new folder for this project i just called it python sharepoint files to aws3 long name but that's fine i have let me open up uh let's see let me open up vs code i'm already pointing um pointing to that folder so right now it's empty there's nothing there right we don't have anything there so we're gonna um the first thing i need to do is create a virtual environment i'll go through i'm gonna go through that real quick you know that's one of the things that i always you always hear me mentioning over and over again virtual environments so let's get that created real quick um what i call a point file yep so now let's do python in virtual environment i'm going to call it env again this is something i have done many many times um i think i even have a video i have a video related to this that goes in more detail and we test different scenarios as well so take a look at that if you want to know more so next i'm going to go ahead and activate it if you're using a mac or linux it will be in v bin then activate that's the only difference there will be scripts it will be replaced with bin now that i have that activated uh first thing we're going to be using the aws sdk for python which is the bootu boto3 i don't honestly back here's the thing man i heard peop different pronunciations of it and um when i when i read it i call it boo23 i don't think that's how pronounced boto3 that's that's the supposedly the more proper pronunciation i don't know i don't know i would get stuck on pronouncing it but either way this is it's the aws sdk um for python and that's pretty much we're going to be using so let's go ahead and get that installed moto 3 install all right so once this is done installing uh the next thing i'm going to end up installing will be sharp plumb because that's what we're using to connect to sharepoint share plump all right cool so now that we have those two installed that's it i think that that's all we need so how i mentioned before we're going to be using the source code that i had in my previous video the download files from sharepoint video so because we're gonna do that source code let's go ahead and get that imported in right now and then we'll get we'll get started so let me get that imported in right so i'll copy all of the files that i need and as we can see we see the files over here now let me make sure my config doesn't have my credentials or anything nope it doesn't so that's good all right so the first thing again this is our starting point clone that project if you want that'll probably help you out or just copy the code over a code could be found at the github account so i'll provide that the the url and the um the description for this video you could find it there so this is our starting point right we have our username password and this is all straightforward this is what we need to use in order to connect to the sharepoint you know file uh so everything here pretty straightforward username password you know again if you look at this video it will kind of go over on how we're utilizing this it's all relates to connecting to your sharepoint one of the things that we're going to add on here is we're going to add a say another key which is going to be uh let's see i'm gonna call this aws bucket okay so in here under my aws bucket i'm gonna have some additional settings that's specific for aws so for example one of my settings is gonna be aws access key id i'll go through this in a in a minute on what this is uh you do need to configure create a user a um in your aws account and on top of that you're gonna config have it set where it will be access accessible via api um so we'll go through that in a few that's a whole different video that i'm gonna make for setting up that user so hopefully you're aware of that process if not again i'll be making a video soon let's see the other [Music] value is going to be aws secret access key um and then it's going to be bucket name so this is where we're going to put the name of the bucket that we're going to we want to ask access because again in aws you have mini buckets one bucket 10 buckets really doesn't matter uh and then we're going to have another one for bucket sub folder so let's go ahead and get these two things set up right quick right uh let me make my text bigger so small all right there it goes so let's go ahead and populate these two values right so the logic here is going to be is let's see all right so i created an aws account strictly for the purpose of me doing youtube videos i have another aws account that i use for my business i did not want to use that because you know i don't want to be you know there's a concern or maybe i accidentally show stuff that i shouldn't be showing you know more confidential things of that nature well this account is strictly for the purpose of doing these tutorials do making videos things of that nature for some crazy reason this ever gets hacked or somebody takes the account away there's nothing sensitive on here besides my i guess bank account information that's probably it but they cannot access that information anyways right it's it's taking it sorry it's saved in there now they can use services like crazy and run up my bail and that's all different issue but i can easily cancel that and that's not a big deal either way i created this account strictly for testing i created a bucket called youtube storage one zero zero four five again to make this bigger i don't know why it's so damn small there it goes so i created this bucket called youtube storage one zero zero four five inside this bucket i have in a folder called staging now this is where your um subfolders comes comes in if you happen to have subfolders where you want your files to get uploaded to that's where you're going to put in in this case i'm going to put in staging but let's say you don't want you don't have a subfolder folder or let's say you do have subfolders but you don't want them to get uploaded to the subfolders you want them to get loaded directly to the root um bucket pass then this will stay blank right so it all depends on where you want it if you want to go into specific folder we'll specify the the folder let's say you have a subfolder and a subfolder and a subfolder all within each other we'll just specify all the folders right unless you have staging and it's by year 2020 and it's by um january right something like that we'll specify the whole pass that's how you would do it in our case we only have one folder like what i'm calling subfolder uh which is staging all right so now if i go back this is going to be my bucket so let me go ahead and copy this bucket name and then i'm going to get this configured so now we have our bucket name configured and our bucket subfolder configured alright so the next thing that that is needed is going to be to enter in our access key id and our secret key id so this information would come from and i think i could show this your identity and access management in aws so i created this user which i called api user and then this user had been configured for api access and then ultimately you would get an api um id and then of course your secret key i'm not going to show that of course you know but again if you're familiar if you have access to this then you can go ahead and you've got to get that set up first or your admin needs to set it up and they should be able to provide you with that users api keys pretty much you could set this user i created this user to only have access to an s3 bucket only so this user does not have access to all of the other aws resources like glue or ec2 or you know whatever right just doesn't have access to that strictly s3 bucket only um i'll i'll make another video that relates to this so if you're not familiar on how to configure a user for api access i'll make a video showing that so that's kind of on the list to do so what i'm going to do next is let me go ahead and populate my information i get before i put in my credentials let's take a look at this guy you know just to make sure that you know we're putting it in you're familiar with which you're familiar with the information and he said we put in um let's see so again i have an account and i created this specifically for the purpose of making youtube videos which is called i am lu youtube so if i look at my url that's ultimately what my ur my url name is my sub dom sub domain is is i am blue and so if you're not familiar just go to your sharepoint site and you'll be able to figure it you know figure out what what your sub-domain ultimately your sub domain um so in my case it's that so i'm going to replace this guy i have had some people where i think some of the issues they have had is they replace the word domain only but you got to take away the curly brackets just fyi so that's pretty much the name and i'm going to replace this guy and then of course over here with the site i need to specify the site name again if you're not sure what that is go to your sharepoint site and you'll see it right after sites you'll see your site name which in my case is called development so let me go ahead and copy that and there you go now i got all my info populated the only thing i need to put in now will be my username password and of course my access keys so i'm going to not show that information so let me go ahead and get off of that and let me go ahead and enter that in real quick okay so i have i got all my credentials entered in my aws api keys entered into my config file and of course my username and password for um sharepoint connectivity so let's go into the sharepoint file this we copied over again from the previous project we're not changing nothing in here this stays exactly the way it is so there's nothing to change here right so let's go ahead and close out of that let's go to project so the project this there will be some changes here this is a piece where we're going to end up modifying all right so some of the modifications that we're going to do um let's go ahead let me show you some of the changes that we're doing here right off the top two of the things that we're going to be importing in is going to be the os and json and the reason why we're going to import that in is because we're going to be now reading the configuration file right the the config.json file we're going to be reading that file to extract our aws um keys pretty much because that's where we configured it so we need to import that in the next thing we need to import in is going to be our moto 3 because this is well again this is the package that's needed to connect to aws um and then we're going to do from boto core exceptions import client error this is more if we have any errors we'll we'll get that we'll be able to get a response back and a message back to identify these errors okay so now that we have that piece in here what we're going to just all of this you could keep right again look at look at the video the previous video it'll kind of explain what everything is just kind of give you some some insights what we probably should do let me specify some better uh better description just so hopefully it makes better sense so this this folder name is the sharepoint folder name uh may include subfolders right so that means if your main folder happens to be like in my case in sharepoint so if i go to sharepoint it's called youtube 2022. okay so that means this could be you youtube 2022 that's pretty much this will be your argument that you'll pass in in that scenario um so the second argument pretty much is local or remote folder destination ultimately this is where you want your files to get saved so whenever you access a sharepoint file you're going to download the files but now you need to save these files somewhere it could be locally it could be some kind of other um remote server i mean it's really up to you where you want to save it in our case we're going to be saving it locally right so if you happen to have a server that that's running this script for example it could just save on that server temporarily that's pretty much what that means um let's see what else argument three is sharepoint file name um this is used when only one file is being downloaded so pretty much only when you're gonna access one file right you're not trying to download all the files in sharepoint but just um one file then that's what you're gonna put in and then ultimately the last pattern is um sharepoint pattern file name patterns again this is more if you're trying to find any anything with a file name that starts with customer right so let's say you have a folder with a hundred files and maybe you have sales files in there like you know they have other file that relates to calls like crm data like files well maybe you just want only the sales files and and you know when the file name has the word cells in it you could do a pattern match for sales it would only download those files only so that's the main purpose for that um the next thing that we're going to do is we're going to let me add some some blanks here um we're going to read json file right so again same thing the way we did it in our sharepoint file we did the same thing over here where we're we're reading json file same concept that we're going to do i mean technically i guess we're going to copy this as well too so let's copy this because it's the same thing um oh you know what i'm missing my pass so i need to copy that as well which is this guy so let's copy this as well all right again same concept is reading the json file and then we're gonna just pretty much put this in this object this config object now our key is not sharepoint i believe what did we call our we call it aws bucket so it's going to be aws bucket or whatever name you call it right you call it something else it will be that okay now we're gonna end up having um those four key values that that we set in that in this aws bucket you know so the in inside of that object we have a few attributes that we want to pull which would be aws access key secret access key bucket and then of course buckets so subfolders so let's go ahead and put all of that in here so i'm going to make this capital aws access key id uh let me just do everything real quick aws secret access key bucket um bucket oops sub folder yep subfolder all right so now we have everything i'm going to call this config aws access key id aws secret access key oops config bucket name um and then it will oops then it'll be config bucket subfolder all right so now we have all this in place we're we're going to still use a lot of these functions that we have the save file which again saves to your local directory or remote directly or network folder whatever it is you're trying to point it to we're still going to keep this get files get all of this we're still keeping right the only difference is we're going to be making some small modifications um so but before we make those some small modification we're actually going to create one more um we're going to create two new functions right so let me i'm going to call these um function used for aws um for aws just call it that so these are going to be specifically more so used for aws one of them going to be the the upload function for uploading and the other function is going to be for bucket subfolders because it depending if you a per you populate a subfolder or not in your over here in your in your key in this attribute right and for this key value if you provided a subfolder then there has to be logic to say okay we're going to concatenate to know what subfolder to upload to but if you don't provide a subfolder then it needs to be blank so we're going to create a function kind of manage that so let's create that function real quick we're going to call this um bucket sub folder build just call it that uh i'm going our parameter here is gonna be bucket subfolder um and then it's going to be file name right so it's taken in two parameters um two arguments file name bucket sub folder and just making sure yeah all right that should be fine um so we're going to have some logic this if bucket subfolder is not blank again it's not blank right so we're going to say if it happens to be not blank we're going to assign a value to this object called file pass name and we're going to use the join operator here's the join function to join um the bucket subfolder and file name oops file name which is going to give us this file pass name ultimate what we're doing is we have a file name then we're going to up pretty much concatenate a the subfolder because that's how for aws that's how you got to provide it you cannot provide the subfolder on the bucket name bucket name needs to be by itself it needs to be added in front of the file name in order to know what folder to put it in so i think it's kind of backwards you would think that you could just add it to your bucket but either way it is what it is uh then if this is true we're gonna return back uh file pass name all right else it's going to be return file name right so it's going to either return the file name or it's going to return back the file name with the prefix of the bucket the subfolder name right it's going to be one or the other so that's pretty much what this function is for we're going to create another function and we're going to call this upload upload file to s3 and this is going to take in the uh file directory pass so pretty much we need to provide the full not just the file name but we need to provide the full pass of the file like on the the root path right so a window will be like c whatever folder you have it in so you know etc and then of course the file name linux it will be what home then just use your profile you know whatever file file you put it it's just a full pass you got to provide all of that uh then we will be it will be the bucket and then we're going to call this file name right what we're going to do here is we're going to let's create this object for s3 client and this will be the moto 3 client it will be s3 because s3 the resource that we're looking for this is pretty much the resource you want to use it could be s3 it could be glue it could be ec2 i mean whatever it is you're trying to use uh the next thing when it wants is going to be the aws access key id equals aws access key id then aws um secret oops secret access key equal aws secret and that's it so so just kind of side note if you happen to have the aws command line interface installed on your machine um what it should do it should look for your aws config file by default so whatever settings you have already set up in that file it will just bring those in and that's what it will utilize in our case though i don't have aws command line interface installed on this machine so because i don't have it installed i gotta populate my again my my keys and that's kind of this is the way to do it um so the next thing that i'm going to do here is let's go ahead and do a try we're gonna try let's call it um uh response would be s3 client upload file and then again we're going to populate here the file directory pass bucket and then of course file name oops so so this just kind of just kind of show you what this means this the bucket needs to know what bucket to put it into right number one the file name it needs to know what file name to save that file as just because you upload something lts um s3 bucket like for example let's say you upload a file um a csv file that has the name sales totals right but and you could rename that file if you want like so as you upload that file the sales total file when it saved an s3 it would actually save under a different name if you if the file name happens to be something different so if you do want to rename your files or you know or something of that sort then that's where you want to do it you want to change make sure your file name represents that because that's the name that it's going to save it under um except um would be client error as e and then we'll do we could print the error if we want we want to take a look at it and then we'll do a return false then we'll do a return true ultimate we're saying here we want to evaluate if there was an error or not we could tell by i mean we probably look at the response that that is one way but another way would be to just look at the status did we get did we get a false return from this function or did we get a true return from this function if it happened to be true then we know everything was successful everything's fine if we receive false back then we know something's wrong right um so that's all this is trying to do so one now that we created this function these two functions better yet i think we're good to go again now what we're gonna do we're gonna make some small tweaks to what we have uh this could stay the same this could stay the same get files stay the same the only function that we're gonna tweak is gonna be this guy here this get files this is the one that we're we're gonna be tweaking and what we're going to end up doing here is so forget files where even if we run to get files we want mini files as you can tell we're iterating over that list and then we're calling the get files function so regardless they're going to call it one by one as we call this one by one and it saves which means it's going to save either a local directory or some sort of remote directory but it will save it somewhere the next thing that we want to do now is we're going to create an object for file directory pass and this would be our um [Music] oops what am i doing this would be our folder destination which again one of our arguments that we that we are passing in when we run our python script then it will be uh file in for name okay so ultimately this is gonna this is setting the path of where is my file located at pretty much right like this is being saved to a specific folder my save file uh you can tell up here doing the same thing getting the file destination folder and it's saving it to a specific pass here i'm just regenerating that to know what passed that folder that file was in right because i'm going to need it um the next thing is now we need to create our file name so this file name object is going to be ultimately what comes in in here as file name right so if you are doing some modifications to your file name maybe you're like um adding like uh some kind of prefix or suffix at the end of the file name for whatever reason uh maybe because you want to know that this is i don't know but if you want to add something to it this this will be the the time to do that right but in our case we're not really renaming the file we're taking the file name that they are in sharepoint moving them over to uh s3 um so i'm going to end up calling my bucket sub folder build and then we're going to pass in bucket subfolder and then i'm going to pass in um file name and ultimately i'm going to be getting a new name for it and the new name would only occur if there happens to be a subfolder presented the reason why that needs to let me just kind of type it out um if you don't if you do not have a subfolder which means your files are going to get uploaded directly to a bucket your file name will look something like let's say cells data dot csv right but let's say you do provide a subfolder because you want to go into another folder or let's say uh the subfolder is called staging well what happens is now we need to add staging forward slash cells data csv this is going to be the new file name the reason why it needs to be in this format is in order to upload it to the subfolders you have to append you know in front of the file name and then of course put a forward slash um then whatever pass is going to right so if you happen to have multiple folder names where it's staging and maybe it's i don't know year then maybe it's january you got ultimately this is going to be your new file name because that's how it's going to know how to upload it also keep in mind let's say if you don't need let's say these folders don't exist right the 2022 in january if you add it to your file name like the way i just did right now it's going to create those folders for you automatically so just keep that keep that in mind as well it will create those folders for you automatically even though they don't exist in the s3 bucket right so that's ultimately what this is doing here and then ultimately then the final step here would be the upload to s3 bucket and this is where we're gonna provide the file directory pass again this has to be the full path of your file so the the full pass and the file name so it knows where to read the file from to upload to s3 and then we're going to provide the um the bucket this has our bucket name then of course we're going to provide the the file name in this case file name is really a new file name from file n which is file name as well it's it's a new um an updated file name because it's taking the bucket into consideration all right so now that we have that um oh yeah you know what i forgot see how we're we're hold on is that right nope everything's good um or i think that's it man i think we have everything updated here that we needed to update it again all we did was add two functions and then we updated the um the get files function besides that everything's pretty much the same this stays the same down here below so now what i could do is um in my folder here let me create another another folder i'm going to call this um stage right i'm going to call it sage so now let's go ahead and run the script um and let's see if it works hopefully it works with no issues so again i'm using the environment that has all my packages installed man let's make this bigger there goes too small so now let me let's run this as python project let's see my first argument is going to be my folder name in order to know what folder name again if we go back to in my case i'm gonna take these files here these are the file that i want all right before i do anything let me go to my s3 my s3 bucket and i'm going to delete some i uploaded i have uploaded these files already when i was doing my testing so let's go ahead and delete that all right so that's been deleted and now let's delete the ones inside staging as well cool so now that's all emptied out right we don't have no data there what we're going to end up doing here we're going to take these files we're going to download these files and again it's coming from youtube 2022 so if you have many subfolders in sharepoint this is where we're going to have to specify youtube forward slash 2022 and let's say if you got like a january or something you got to specify the uh you know the full pass my case it's only 20 22. so that's argument number one hit space argument number two is going to be where do i want these files to be saved under like where do i want to save these files um my destination folder this is going to be saved in here in the staging folder here let's see if i could copy copy pass yep there it goes so this is gonna my files that are that i that i'm gonna access from sharepoint they're gonna get saved into this folder again it could be a network drive wherever it's up to you um argument three and four where i want other files i don't want just one file i want all of them so what i'm going to end up doing this is going to be none space none right that's all i need to do um now let's test it out let's run it oh no what's wrong here i spelled it wrong yep that's what it was all right that's the easy fix let's run it again all right no error messages so far some oh there's an error message what's happening here no sir oh yes i forgot let me go back so my my my um my second argument which is my folder destination like where i want to save the files to i need to put those in double quotes i forgot okay now let's run it oh what happened here something happened uh fail to upload all right so let's see what what am i missing this see this is right for my get bucket save file that's all oh duh what the hell i got him backwards no wonder okay that makes sense that makes sense all right so now let me save that and now let's run it again let's see what happens dammit what the hell again all right what's happening here fail to upload let's see if i even downloaded these files so there's one file so one file did get downloaded that's good um it's pointing to the right pass there's the file name csv to youtube storage 10045 staging an error occur signature does not match when calling the put operator the request signature we kept it does not match check your keys try to make sure that i i know this was wrong initially i did have a wrong oh damn did i spell this wrong too what the hell man all right let me let me switch screen let me check my config file under under secret key i put in access key again i don't know why so it was again this issue was related to me typing in the wrong secret key that's pretty much what it is all right so now let's go back let's rerun it again everything's fine let's run it hopefully you didn't you know you you put in the right uh secret keys under your your um there it goes now i ran successfully again that was more my error under i just put in the wrong secret key that's all it was so you could tell we downloaded three files it got saved into this folder that's the same three file that we have here now if i go to my bucket if i refresh boom now we see the same three files over here and that's it i mean ultimately we did what we were trying to do we downloaded the files and then we uploaded the files to an s3 bucket so it wasn't too bad pretty straightforward um again we we took our our existing project that we did and then we just made some small modifications so that project worked already for you you got to do a small a few small tweaks install another package and you're good to go man like it's you are good to go so again i got a request for this video hopefully it helps you all out i am going to be making another video related to this and we're going to go in more details on some additional steps so i am going to make another video a part two of what we just did and in this case it's gonna be more so after you move files over from sharepoint to aws we're gonna then delete the sharepoint files right that we downloaded from and you know have those removed the same thing the storage folder that we had locally we're going to make sure that once the files get uploaded to um to aws s3 bucket that we delete the files from that folder as well or your network folder or whatever so that's gonna be part two i'm gonna make a video related to that where we kind of delete those files so if you are trying to migrate over and maybe you're gonna store you know to an s3 bucket because that's to be your new archive place then that will be another step to the part 2 would help you out on how to delete those files from sharepoint and of course your local directory pass hopefully this video helps again give me a like give me a follow if you can every fridays we'll be streaming drinking with loom we'll be testing tequilas bourbons etc so we'll talk then peace
Info
Channel: I am Lu
Views: 5,513
Rating: undefined out of 5
Keywords:
Id: gV4Src14kmY
Channel Id: undefined
Length: 44min 5sec (2645 seconds)
Published: Wed Apr 06 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.