Skip to content

Copy Data from SharePoint to ADLS Using Azure Data Factory: A Step-by-Step Guide

In modern data architectures, integrating data from various sources is a common requirement. One such scenario Copying data from SharePoint to Azure Data Lake Storage (ADLS) is a common need for businesses that want to centralize their data for analytics and reporting. Using Azure Data Factory (ADF), you can automate this process with ease. In this guide, we’ll walk you through the steps to set up a pipeline in ADF to copy data from SharePoint to ADLS, ensuring a smooth data transfer process.

Prerequisites:

The SharePoint List Online connector uses service principal authentication to connect to SharePoint. Follow these steps to set it up:

  • Register an application with the Microsoft identity platform.
  • Grant SharePoint Online site permission to your registered application.
  • Create a Pipeline in Azure Data Factory
    • To Get the Access Token
  • Extract the Data from SharePoint

Step 1: Register an application with the Microsoft identity platform:

Login to Azure Portal and open Azure Active directory.

  • On the left sidebar open App Registrations a New registration Provide a meaningful name and leave the default values as is à Click on Register
  • Copy Application Id, Tenant Id
  • On the left sidebar, open Certificates and Secrets a New client secret a Provide the description and Expiration period (According to your requirement)
  • Copy secret Value
  • On the left sidebar of the manage, tab click on Api Permissions Add a permission a Microsoft Graph Application Permissions a select required permissions
    Note: The above access can be only given by the Admin

Step 2: Grant SharePoint Online site permission to your registered application:

  • Open your SharePoint Online site link. The URL in the format 
  • https:///sites/<>_layouts/15/appinv.aspx
  • where the placeholder <your-site-url> is your site URL and site name(the site name that you want to access from SharePoint).
  • Provide The Application Id you copied earlier as when you provide the id the title will be visible automatically à App Domain paste localhost.com àRedirect URL paste https://www.localhost.com à
  • Permission Required XML provides the Content below:
  • <AppPermissionRequests AllowAppOnlyPolicy=”true”> <AppPermissionRequestScope=”http://sharepoint/content/sitecollection/web” Right=”Read”/></AppPermissionRequests>
  • Click “Trust It” for this app.

Step 3: Create a Pipeline in Azure Data Factory.

Create a linked service to a SharePoint Online List using UI

  • Use the following steps to create a linked service to a SharePoint Online List in the Azure portal UI.
  • Browse to the Manage tab in your Azure Data Factory and select Linked Services, then click New:
  • Search for HTTP and select the HTTP connector.
  • Configure the service details, test the connection, and create the new linked service.
  • Click on Test Connection to check the connection successfully established.

Create a Dataset according to your use case:

  • Example: Creation of Dataset to Copy Excel Files
  • Create required Parameters and provide
  • Base_url: Base URL of SharePoint which will not include the site name
  • Relative URL: GetFileByServerRelativeUrl(‘<<full path where your file is present in sharepoint>>’)/$value
  • Sheetname: Name of the sheet which you can access

Copy file from SharePoint Online:

You can copy files from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source

Create a Web Activity to get the access token from SharePoint Online:

  • URL: https://accounts.accesscontrol.windows.net/[Tenant-ID]/tokens/OAuth/2. Replace the tenant ID.
  • Method: POST
  • Headers:
  • Content-Type: application/x-www-form-URL-encoded
  • Body: grant_type=client_credentials&client_id=[Client-ID]@[Tenant-ID]&client_secret=[Client-Secret]&resource=00000003-0000-0ff1-ce00-000000000000/[Tenant-Name].sharepoint.com@[Tenant-ID].
  • Replace the client ID (application ID), client secret (application key), tenant ID, and tenant name (of the SharePoint tenant) from the above steps.

In Copy activity source:

  • Request method: GET
  • Additional header: use the following expression@{concat(‘Authorization: Bearer ‘, activity(‘<Web-activity-name>’).output.access_token)}, which uses the Bearer token generated by the upstream Web activity as authorization header.
  • Replace the Web activity name.
  • Configure the copy activity sink as usual to write to ADLS.

Note: Apostrophe (‘) and hash (#) is not allowed in file names and in the folder names.

Conclusion:

In this guide, we demonstrated a streamlined process to move data from SharePoint to Azure Data Lake Storage using Azure Data Factory. This approach provides a robust and scalable solution for automating data transfers, making it easier to centralize and manage your data. By utilizing Azure’s capabilities, you can seamlessly integrate SharePoint data into your analytics workflows, allowing you to focus on generating business insights and making informed decisions.

For More Details, Diggibyte Technologies Pvt Ltd has all the experts you need. Contact us Today to embed intelligence into your organization.

Author: Chilukamari Durga Bhavani

Leave a Reply

Your email address will not be published. Required fields are marked *