DataFactory: Extract data from SharePoint Online

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys in this video i'm going to show you how to extract files from a sharepoint online with data factory and we will finish loading the data into a data lake the first thing that i would like to do is to explain the whole process so that you can understand all the involved parts in fact most of the configurations that we are going to explain in this video are not data factory related are more related to issue active directory or to sharepoint online so that you can help the ide departments of your company to set them up so let's go to cd process the first thing that we need to do is to register an application in the azure active directory to be used as a service account for our process the second step will be to grant the application id access to the sharepoint online site once that we have that we are ready to move to the step 3 and start creating the data factory pipeline there we will configure our activity to get a token and a second activity to copy the data from the sharepoint to the data link i have pasted in the description of this video so links that help me on this process so that you can also get help from them let's start now with the first step we are going to move to the azure portal we are going to look for the azure active directory and the first thing that we need to copy is the tenant id that is here we are going to copy it because we will need it later here on the left bar in the manage area we are going to get into the app registration we are going to register a new one on the name we are going to call it a sp access app and for the rest of the actions we are going to keep them as they are in our case there is the setpoint line is in the same tenant so with the selected option is ok register and here we have the application id we are also going to copy it because we will need it later and what we have to do now is to establish a cigarette so here we have the certificates and secrets let's create a new one i'll add a description here i'll keep the six months period add and here we have the value of the credentials to be used let's also copy to use it later with this we have finished the configuration in the azure active directory part we are going to start now with the configuration in the sharepoint online site for that we are going to open one of the links that you have in the descriptions because there's something in the instructions that was a bit confusing to me and got me several weeks fighting with my iit department if we scroll down to this part we have here the url to grant access to the sharepoint on it we have to replace the url of the site if we open my sharepoint site that is this one i took the url and replace it on the text with this and in fact this link works and allows us to grant access but these permissions are not inherited to all the sites of sharepoint site let me go back to the sharepoint home and here in my sites i need to look for the one containing the files that we want to export in our case is this one habiside we are going to get in it and this is the url that we need to use to replace so we are going to add here slash sites slash and hobbysite and with this url we are going again to the browser we are going to paste here the app id we previously copied click on lookup and it will automatically find the app name for the next cells we are going to follow the instructions recommended so in the domain we are going to use localhost.com in the redirect the https for localhost and in the permissions the xml script that we are copying that basically grants the application read permissions as we can see here now we click on create trust it and that's it for the separate site we are now ready to go to the step 3 and to start with the data factory process creation in the following link we can get the url to get the token we are going to copy here and we just need to replace on it the tenant id that we previously got we are going to the data factory we are going to create a new pipeline we are going to insert our activity i i'm going to call it get token and in the url part we are going to copy the one we created this method should be a post and if we look in the documentation we should add a header with name content type and value this one for the body we are going to copy this string into a notepad and we are going to replace the client id here the tenant id here and here the client secret needs to be pasted here and we need the tenant name okay now we are going to copy the body into the web activity and if everything is fine this should work let's debug it okay if we check the output we can see that the token has been generated the second step is to build the url to request the file the url must be something like this we need to replace the site url with the sharepoint url plus the site show this one and the relative path to the file should be the site again and the structure of folders until we get to the file name let's just test the link in our browser and as you can see a file called value has been downloaded it has no extension but if we open it we can see that it contains our csv data the browser doesn't ask for credentials because i'm locked with those credentials but in data factory the token is going to do that part of the job we are going back to the data factory we are going to add a copy activity and connect it and we are going to create a new linked service it should be of type http we are going to call it sharepoint online paste the base url disable the certificate and set the authentication type to anonymous it's okay create it now let's go to the copy activity in the source let's create a new data set it should be http type and format binary choose the previously created linker service and in the relative url we are going to paste the rest of the url the only pending part here is to add the header of the source that will be a concatenation of authorization colon bitter space and with the token value that is the output of the previous task if we go to the output of the previous execution we can see that the value is in access token we are going to copy this and paste it here and that should be it the source is configured so let's go to the sync i need to create a data set pointing to the data lake i already have the link selected for it configured and the type should be binary this is going to be the path and the file name and that's all it's time to test it let me show you first the data lake destination so that you can check that there's nothing there i'll run the process okay it's fine let's go now to the data link and here it is the file exported directly from the sharepoint online site this is the end of the video i hope this content is useful for you see you in the next clips
Info
Channel: javiBI
Views: 63,759
Rating: undefined out of 5
Keywords: Data Factory, Synapse, Azure, Cloud, SharePoint, Sharepoint Online, ETL, ELT, Business Intelligence, BI, Data, Data Lake
Id: FFfNu3cI-uw
Channel Id: undefined
Length: 10min 29sec (629 seconds)
Published: Sat Feb 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.