Setting Up Azure Data Lake as a Target
  • Updated on 19 Mar 2020
  • 8 minutes to read
  • Contributors
  • Print
  • Share
  • Dark
    Light

Setting Up Azure Data Lake as a Target

  • Print
  • Share
  • Dark
    Light

Setting Up Azure Data Lake Analytics As Your Target Data Platform

Overview

Welcome to Getting Start with Rivery and Azure Data Lake Analytics.
This guide will show you how to create all of the requirements and configure your Azure resources in order to load and orchestrate data into your Azure Data Lake Analytics (ADLA) with Rivery. 
Before you use this guide, Please make sure you have a valid Azure account and the compatible permission to update or create resources in your account.

Prepare your Azure environment using Rivery

In order of running on your Azure Data Lake analytics, Rivery require creating or managing some Azure resources. Please make sure you have those requirements. Don't worry , if you do not have one of those, you always can create it with our guide further.
The steps in our guide are:

  1. Get an  Azure account with the compatible admin/management
  2. Create Azure subscription.
  3. Create Azure resource group.
  4. Create Azure Data Lake Analytics account.
  5. Create Azure Data Lake Store account.
  6. Create Azure Blob Storage account.
  7. Deploy Assemblies to your Data Lake Analytics.
  8. Get the Azure tenant id (directory id in Azure Active Directory)
  9. Invite azure@rivery.io user to your Active Directory.
  10. Permit azure@rivery.io in your platform.

1. Creating Subscription:
Azure subscription is the main billing and tiers accounts. You must have one in order of running any resource at Azure portal or environment.

If you don't have any subscription in Azure, you can create it:

  1. In the search box at the right top, search for subscriptions. Click on subscriptions in the result.
  2. At the subscriptions screen, click on + Add button.
  3. Follow the steps of Microsoft subscription wizard. 
    Recommended: Create your subscription name with the name of <your-account>-rivery, so you can manage the billing using Rivery in data lake analytics.
  4. After the subscription was created, Please save the Subscription Id from the subscription panel.

2. Creating Resource Group
Resource Group is a semantical group in Azure, which lets you manage and define your permissions, users and the resources under it. Rivery require to use a Resource Group, so if you don't have one already, please create it as follows.

Important: Create a specific new Resource Group for Rivery. It will let you manage the users, permissions, and resources that Rivery will use, and watch the capacity and usage of the resource. The Resource Group name must be Rivery .

  1. Click on Resource Groups at the main menu in Azure portal:

    setting-up-azure-data-lake-as-a-target_mceclip04.png

  2. Click on  + Add button in the top panel.

  3. In the form, Name your Resource Group to Rivery , Choose your subscription (exist one, or create one using the guide above) and choose your location.

3. Create Azure Data Lake Analytics Account
If you don't have any Azure Data Lake Analytics account in your subscription, and your resource group, please create it as follows:

  1. Click on + Create a Resource in the main menu.

    setting-up-azure-data-lake-as-a-target_mceclip43.png

  2. On the New window, you can search for Data Lake Analytics in the search box, or choose Data + Analytics -> Data Lake Analytics . click on it.

    setting-up-azure-data-lake-as-a-target_mceclip52.png

  3. In the New Data Lake Analytics Account form, name your Data Lake Analytics account. In our example, it named riveryadl .
    Choose your subscription, choose your existing resource group (Rivery)
    and click on Data Lake Store -> Configure required settings .

    setting-up-azure-data-lake-as-a-target_mceclip62.png

  4. On the Select Data Lake Store form, choose to Create New Data Lake Store .

    setting-up-azure-data-lake-as-a-target_mceclip8.png

  5. In New Data Lake Store form, 
    Name your Data Lake Store account as same as your Data Lake Analytics account name. In our case, the name is riveryadl.
    click on OK.
    setting-up-azure-data-lake-as-a-target_mceclip9.png*

  6. Click on Create the New Data Lake Analytics Account form.

4. Deploy JSON and Avro Assemblies in your Data Lake Analytics Account
Rivery uses external, confirmed, assemblies in the Data Lake Analytics in order to load and export JSON and Avro files, and make its operations. Please deploy the assemblies as follows:

  1. Download the Assemblies to your computer . Extract the tar file in your local computer. You should get 3 DLL files:

    Microsoft.Analytics.Samples.Formats.dll
    Microsoft.Hadoop.Avro.dll
    Newtonsoft.Json.dll

  2. Go to your Azure Data Lake Store account by choosing it from All Resources .

  3. Click on Data Explorer , in the main menu.

  4. In the explorer, choose to Create New Folder. Call it Assemblies. setting-up-azure-data-lake-as-a-target_mceclip20.png

  5. Go into the Assemblies folder, and click on Upload .

  6. Select the DLL files, and upload them to your Assemblies folder.

Now, let's create the assemblies in the Azure Data Lake Analytics account:

  1. Go to your Azure Data Lake Anlytics account by choosing it from All Resources

  2. Create a New Job .

  3. Paste the job as it defined here. Be considered to select your correct paths in Azure Data Lake Store. In our example, it defined to folder named Assemblies .

    USE DATABASE master;

    CREATE ASSEMBLY [master].[Microsoft.Analytics.Samples.Formats] FROM @"/ Assemblies /Microsoft.Analytics.Samples.Formats.dll";

    CREATE ASSEMBLY [master].[Microsoft.Hadoop.Avro] FROM @"/ Assemblies /Microsoft.Hadoop.Avro.dll";

    CREATE ASSEMBLY [master].[Newtonsoft.Json] FROM @"/ Assemblies /Newtonsoft.Json.dll";

  4. Wait for the job to complete.

5. Create a Blob Storage Account and Blob Storage Container
Rivery uses Azure Blob Storage container in order to upload your source data into it and then pull the data to your Data Lake Analytics.

Rivery needs an Azure Blob Storage container to be a FileZone before your data is loading up to Data Lake Analytics. You can either use the FileZone bucket or objects as a base to other Hadoop or spark operated by Azure HDInsights, or by your other services.

Note : You can find the up to date documentation of Blob Storage operations and getting started here .

So, let's create a Blob Storage account (if you don't have one already) and a container in it:

  1. Click on + Create a Resource in the main menu.

    setting-up-azure-data-lake-as-a-target_mceclip44.png

  2. In the New window, choose Storage , and Storage account - blob, file, table, queue .

    setting-up-azure-data-lake-as-a-target_mceclip101.png

  3. Define your storage account by the form. Name it, choose Subscription , and your Resource Group as created before. 
    It is important your storage type (Account Kind) will be Blob s torage .

    setting-up-azure-data-lake-as-a-target_mceclip121.png

After creating a storage account, Let's create a container , and get the access keys.

1. Click on All Resources in the main menu.

2. Search for your storage account name and click on it.

3. In the Storage Account panel menu, under Blob Service click on Blobs.

4. Press on +Containers and create a container name as you wish.

5. Go to Access Keys in the storage account menu.

6. Copy one of your keys and save it aside.

We will use it further, in the connection of Azure Data Lake in Rivery.

6. Get your Tenant Id (Directory Id)
Rivery connects to your Azure platform, which defined by tenants (or directories). In order to connect, there is need to find your tenant id.

  1. Go to Azure Active Directory in the main menu.

  2. Choose Properties in the Active Directory Panel.

  3. Your tenant id is the Directory Id in the properties window. Copy it. Save it aside.

    setting-up-azure-data-lake-as-a-target_mceclip15.png

7. Invite Rivery Azure Management User to Your Tenant
In order to connect to your Azure platform, Rivery uses a local Azure user, which may be invited to your tenant (directory). This is a crucial step.
You can define Rivery user permission by your subscription level, or by your Resource Group level, or even define it to every resource you've created separately.
In order to invite Rivery User to your platfom:

  1. Go to Azure Active Directory in the main menu.

  2. In the menu, click on Users .

  3. At the top panel of window opens, click on + New Guest User. setting-up-azure-data-lake-as-a-target_mceclip16.png

  4. Enter azure@rivery.io as the Email of the user. You can add your personal invitation. Click on Invite.

    setting-up-azure-data-lake-as-a-target_mceclip18.png

  5. Now, our azure@rivery.io user should get a mail of invitation. It will be accepted in a short time by our team in Rivery!

If the user invitation did not accept for any reason, you can send a mail to the user, or to helpme@rivery.io . We promise to accept your invitation as soon as possible.

8. Permit azure@rivery.io in your platform
After the invitation of azure@rivery.io will be accepted, you may define its permissions in your platform. You may choose to permit it in your subscription, resource group, or in all of your relevant resources separately (Blob Storage Account, Azure Data Lake Analytics, Azure Data Lake Store, Resource Group, Subscription...)
In our example, we'll permit the azure@rivery.io user in the resource group.

  1. Go to Resource Groups in the main menu.

  2. In the panel menu, click on Access Control (IAM) .

  3. Click on the +Add button at the top.

  4. Now, choose azure@rivery.io from your user's list. Set the role to Contributor. setting-up-azure-data-lake-as-a-target_mceclip19.png

  5. Save it.

Configure your Blob Storage as File Zone in Rivery

Now, for the fun part.
We need to make a quick setup in Rivery for start using it with Azure Data Lake.

  1. Log into Rivery.

  2. Now we’ll set your container as the default container for the file zone in Rivery:

    1. In the main menu, go to Variables

    2. Set your {azure_file_zone} variable value to the blob storage container name that you’ve created. That will be saved automatically, you don’t need to press save anywhere.

    3. If you don’t have the { azure_file_zone } variable, you should add a new variable with that name and the blob storage container name that you’ve created as value. Press on + Add Variable .

      setting-up-azure-data-lake-as-a-target_image31.png

  3. Let’s create a new connection for your Azure Data Lake:

    1. Go to Connections.

    2. Press on New Connection .

    3. From the source list, choose Azure DataLake.

      setting-up-azure-data-lake-as-a-target_mceclip211.png

  4. Now, enter your credentials information that you’ve created with the Azure Details you're saved before-

    • Blob Storage  Account Name.

    • Blob Storage Account Key.

    • Your  Azure DataLake Account Name . Please be noticed to define your DataLake Store name as the same name of your DataLake Analytics account name. 
      Tenant Id from your Active Directory.
      Subscription Id
      Your Resource Group Name . Default here is Rivery.

      setting-up-azure-data-lake-as-a-target_mceclip221.png

  5. You can test your connection by Test Connection.

  6. Give your connection a name, and Save .

4.Now, you can use that connection for any Azure Datalake River that you have.

Conclusion

This guide showed you how to Define your entire Azure platform in order using Rivery, Define your Blob Storage account container as Rivery file zone default, and Create a new connection to your Azure Data Lake.

Was this article helpful?