Setting up Amazon Redshift as a Target
  • Updated on 22 Jun 2020
  • 4 minutes to read
  • Contributors
  • Print
  • Share
  • Dark
    Light

Setting up Amazon Redshift as a Target

  • Print
  • Share
  • Dark
    Light

Getting started with Rivery and Redshift:

This guide will show you how to create your first Amazon Redshift cluster, how to create an S3 bucket, and how to get the right credentials for using Redshift with Rivery. By the end of this guide, you will be able to quickly setup your new Redshift cluster in Rivery.

Prerequisites

Before you use this guide, please make sure you’ve signed up for AWS and that you have a console admin user.

If you don’t have one of these prerequisites, you can start here .

Rivery uses an S3 bucket for loading source data, and from there the data is sent into Redshift. Therefore, you need to create a user in the Amazon Platform Console that will have access to the relevant bucket and Redshift cluster.

Create FileZone S3 Bucket

Rivery needs an S3 bucket to serve as a FileZone before data is loaded to Redshift. You can either use the FileZone bucket or objects as a base to other Hadoop or Spark operations by Amazon EMR, or by your other services.

Amazon's S3 documentation can be found here.

So, let's create an S3 bucket for Rivery:

  1. Go to S3 Management in the AWS Console .

  2. Click on Create Bucket

  3. Give the bucket a name, and choose the same region where your Redshift resides (in most cases, the US Standard region is sufficient). Click on Next.
    image.png

  4. Follow the default steps in the wizard and click on Create .

Create an AWS Policy for the Rivery User

  1. Go to IAM in the AWS Console -> Policies

  2. Select the Create Policy button

  3. In the Policy Creation wizard, go to the JSON tab, and paste the policy below into that tab. Please don’t forget to replace the bolded areas with your current settings as set earlier in this guide.

    Replace the <RiveryFileZoneBucket> with the S3 bucket name you created earlier.

    {
     "Version": "2012-10-17",
     "Statement": [
     {
     "Sid": "ManagingFZPolicy",
     "Effect": "Allow",
     "Action": [
     "s3:ReplicateObject",
     "s3:PutObject",
     "s3:GetObject",
     "s3:ListBucketMultipartUploads",
     "s3:RestoreObject",
     "s3:GetObjectTagging",
     "s3:ListBucket",
     "s3:DeleteObject",
     "s3:GetBucketLocation",
     "s3:ListMultipartUploadParts"
     ],
     "Resource": [
     "arn:aws:s3:::<RiveryFileZoneBucket>",
     "arn:aws:s3:::<RiveryFileZoneBucket>/*"
     ]
     },
     {
     "Sid": "HeadingBuckets",
     "Effect": "Allow",
     "Action": [
     "s3:ListAllMyBuckets",
     "s3:HeadBucket"
     ],
     "Resource": "*"
     }
     ]
    }
    
  4. After pasting the policy, click on Review Policy.

  5. Name the policy. (i.e. RiveryFileZoneBucketPolicy). Click on Create Policy.

Create a User for Rivery

We recommend having a specific Rivery user in AWS that has its own permissions for interfacing with the platform.

How to create a user for Rivery:

  1. Sign into the AWS Console

  2. Go to IAM in the AWS Console -> Users -> Add User

  3. In the wizard, set your username (i.e: RiveryUser), and select the Programmatic Access type. Click Next .

    image4.png

  4. In Permissions, choose the Attach Existing Policy Directly option. From here, search and choose the FileZone policy you've created above.image.png

  5. In the Review Screen - press Next

  6. Once complete, you will get the user’s AWS credentials (aws_access_key_id, aws_secret_access_key) and the user's ARN .

  7. Save these credentials in a safe place! You will need these later on in Rivery.

Create a Redshift Cluster

  1. Now, let’s create a Redshift cluster for Rivery If you don’t have one, you can create one by following these instructions .

  2. Open your cluster and authorize access to the cluster by following these tasks.

Create your Schema and User in the Redshift Database

In the Cluster creation wizard in AWS, you must define the database where Rivery will upload data.

  1. Connect to your Redshift cluster with the credentials set in the cluster creation step with your preferred SQL management tool.

    We use usually, the free tool of Aginity to Redshift, SQL Workbench/J or DBeaver .

    You can find your host address in the Redshift cluster details. Choose the URL from the tool you use.

    image6.png

  2. Create the user and schema - In the SQL Management tool, run the following commands (replace the rivery_user_name and my_schema_name with your relevant names):

    CREATE USER <rivery_user_name> with password '<myPwd>' CREATEUSER;
    CREATE SCHEMA IF NOT EXISTS <my_schema_name> authorization <rivery_user_name>;
    

Security Group Access

Open your Redshift Cluster for the following IP addresses:
52.14.86.20 , 13.58.140.165 and 52.14.192.86

Configure your Redshift in Rivery

How to setup Redshift for use in Rivery:

  1. Log into Rivery.

  2. Next, set your bucket as the default S3 bucket in Rivery:

    • In the main menu, go to Variables

    • Set your {aws_file_zone} variable value to the s3 bucket namethat you’ve created. This will be saved automatically by clicking anywhere on the screen.

    • If you don’t have the { aws_file_zone } variable, you should add a new variable with that name and the S3 bucket name that you’ve created as the value. Press on + Add Variable .

      image33.png

  3. Let’s create a new connection for your Redshift instance:

    • Go to Connections.

    • Select New Connection .

    • From the source list, choose Amazon Redshift.

      image2.png

    • Now, enter your credentials information from the Redshift cluster and database (ie. host address, database, username and password of <rivery_user_name> ) and for the Rivery user that you’ve created.

      redshift_connection.PNG

    • Choose the file zone region from the drop down list (Default: us-east-1 (N.Virginia)).

    • Choose the desired authentication method. There are 3 options to choose from:

    1. AWS Keys :

      aws_keys.PNG

      Enter the credentials information that you created earlier in this guide

    • AWS_access_key_id
    • AWS_secret_access_key
    1. IAM Role - Automatic :
      Follow the detailed instructions in the connection window to create the Role ARN Key

    2. IIAM Role - Manual :
      Follow the detailed instructions in the connection window to create the Role ARN Key

  • You can test your connection by clicking the Test Connection button.
  • Give your connection a name and click Save .
  1. You can now use this connection in any river that has Redshift as the database target.

Further Reading

Check out the Best Practices, Considerations, and Requirements for Using Redshift and Rivery article when using Redshift as a target in Rivery.

Was this article helpful?