Configuring Amazon Redshift as a Target

6 Minutes to read

Print
Share
Dark
Light
PDF

Article Summary

Share feedback

Thanks for sharing your feedback!

Introduction

This guide will walk you through the steps of setting up your first Amazon Redshift cluster and acquiring the essentials needed for integration with Rivery.

Important Note:

This connector is not compatible with Redshift Serverless.

Prerequisites

Before using this guide, ensure the following prerequisites are met:

Select a Source

Setting up a Target requires first selecting a Source and creating a connection to it.

Amazon Redshift Connection

Ensure that you have established a connection with Amazon Redshift as your Target.

Configuration Process

Click the curved arrow next to Schema on the right side of the row. After the refresh is complete, click the row and choose the Schema where the data will be stored.
Enter the Table Name.
Select a Distribution Method (Note that this option is available only when you select Custom Report, Custom Query or Legact River as the River Mode.):
- All - Involves distributing a complete copy of the table to every node in the cluster. Best suited for tables that are rarely updated or are relatively static (slow moving). It's not typically beneficial for small tables, as the cost of redistribution during queries is low.
- Even - Involves the leader node distributing data rows across slices in a round-robin manner, without considering the values in any specific column. This method is ideal for tables that are not involved in joins. It's also a suitable choice when neither KEY nor ALL distribution methods are clearly advantageous.
- Key - Rows are allocated based on the values in a designated column. The leader node places rows with matching values in the same node slice. Particularly effective when distributing a pair of tables on their joining keys. This ensures that rows with matching values from the common columns are stored together physically, facilitating efficient join operations.
  It's important to remember that when using the Key Distribution Method, you must choose a single key column to perform the slicing.
Please Note:
When using Multi-Tables or Predefined Reports as your River Mode, you have the option to choose a Distribution Method. This can be done in the Table Settings, which is accessible by clicking on a particular table in the Schema tab.
Set the Loading Mode.
Please Note:
Amazon Redshift offers the flexibility to choose different Merge methods. For more information about these options, please refer to the Amazon Redshift Upsert-Merge Loading Mode Options documentation.

In the Additional Options menu, the following options are available:

Truncate Columns - is designed to handle instances where the length of an array exceeds the maximum VARCHAR length allowed in Redshift. Since Redshift's array type has limited flexibility and can change between different data loads, Rivery ensures compatibility by converting arrays into VARCHAR(max) type. However, if an array surpasses the maximum VARCHAR length, the Truncate Columns option, found under Advanced Options in the Target tab, can be used to truncate the array data to fit within Redshift's maximum size constraints.
Compression Update - updates the column compression in the target table during data loading. This action occurs only if the target table has not been previously created.
Keep Schema-Binding Views - ensures that all schema-binding views remain intact when employing upsert-merge or overwrite methods. If this option is not selected, any schema-binding views that depend on the target table will be dropped.
Add Rivery Metadata - designed to enhance the Target table by automatically including three columns: Rivery_last_update, Rivery_river_id, and Rivery_run_id. This feature also provides the flexibility to incorporate more metadata fields by utilizing expressions.
Please Note:
When the Source is in Multi Table mode, this option becomes available.

5.If you have configured a Custom File Zone, select a Bucket and specify a path where your data will be stored. Additionally, establish a timeframe for period partitioning within a FileZone folder.
Note: You can instruct Rivery to divide the data according to the data insertion day, the Day/Hour, or the Day/Hour/Minute. This means that Rivery will produce data files from your sources under folders that correspond to the correct partition you've selected.
6. Any Source to Target River can now send data to your Amazon Redshift Bucket.

Type Mapping

We match Redshift data types to Rivery data types as we extract your data.

The following table shows the mapping of Redshift data types to Rivery-compatible types:

Amazon Redshift Type	Rivery Type
INTERVAL	TIMESTAMP
DOUBLE_PRECISION	FLOAT
FLOAT	FLOAT
CHAR	STRING
DATE	DATE
VARCHAR	STRING
TIMESTAMP	TIMESTAMP
ENUM	STRING
BOOLEAN	BOOLEAN
ARRAY	STRING
array	STRING
BYTEA	STRING
BIT	BOOLEAN
REAL	FLOAT
BIGINT	INTEGER
TEXT	STRING
SMALLINT	INTEGER
NUMERIC	FLOAT
INTEGER	INTEGER
CIDR	STRING
DropEnumType	STRING
INET	STRING
MACDDR	STRING
OID	STRING
TSVECTOR	STRING
UUID	STRING
DATERANGE	INTEGER
INT4RANGE	INTEGER
INT8RANGE	INTEGER
JSON	STRING
JSONB	STRING
NUMRANGE	FLOAT
TSRANGE	FLOAT
HSTORE	STRING
SUPER	JSON

Note:
If you want to use SUPER datatypes, simply change the mapping from Object to JSON and the SUPER datatype will be generated.

Scenarios of Data Type Conflicts

In instances where a datatype mismatch occurs, for example, when the datatype of an existing column in the source table is modified, the datatype of the corresponding column in the target table will be determined based on certain defined rules, thereby resolving the conflict. These rules ensure that the most comprehensive data type is chosen to maintain data integrity and avoid loss of information.

Examples of Data Type Conflicts

The table below outlines various scenarios of data type conflicts and the resolution rules. The Inferred Type column represents the chosen data type in the target table when there's a conflict.

Conflicting Types	Inferred Type	Example Scenario
TIMESTAMP vs. DATE	TIMESTAMP	If a column in the source, initially of TIMESTAMP type, is changed to DATE, it will be converted into TIMESTAMP in the target.
SMALLINT vs. BOOLEAN	SMALLINT	A conflict between SMALLINT and BOOLEAN types in the source column will lead to the target column adopting the SMALLINT type.
INTEGER vs. (SMALLINT / BOOLEAN)	INTEGER	If an INTEGER is updated to SMALLINT or BOOLEAN in the source, it will be promoted to INTEGER in the target.
BIGINT vs. (INTEGER / SMALLINT / BOOLEAN)	BIGINT	If a column of BIGINT type is changed to SMALLINT, BOOLEAN, or INTEGER in the source, it will be elevated to BIGINT in the target.
DECIMAL vs. (INTEGER / BIGINT / SMALLINT / BOOLEAN)	DECIMAL	A column that is INTEGER or DECIMAL in the source will be transformed into DECIMAL.
DOUBLE PRECISION vs. (INTEGER / BIGINT / SMALLINT / BOOLEAN)	DOUBLE PRECISION	If there is a conflict between DOUBLE PRECISION and SMALLINT in the source column, the target column will adopt DOUBLE PRECISION.
REAL vs. (DOUBLE PRECISION / DECIMAL / INTEGER / BIGINT / SMALLINT / BOOLEAN)	REAL	If a column of DOUBLE PRECISION type is altered to REAL type in the source, it will remain as REAL in the target.
FLOAT vs. (INTEGER / BIGINT / SMALLINT / DECIMAL / REAL / DOUBLE PRECISION / BOOLEAN)	FLOAT	A source column initially of FLOAT type that is updated to INTEGER will remain as FLOAT in the target.
VARCHAR vs. ALL DATATYPES	VARCHAR	If a column in the source was VARCHAR type and has been updated to INTEGER type, the target column will still retain the VARCHAR type.
OBJECT vs. ALL DATATYPES	OBJECT	If a source column is initially set as OBJECT type and later changed to DATE or INTEGER, it will still be treated as OBJECT type in the target.

Please Note:

VARCHAR and OBJECT have higher priority and will override other data types in conflicts.

Was this article helpful?

What's Next

Amazon Redshift Upsert-Merge Loading Mode Options

Table of contents

Introduction
Prerequisites
Configuration Process
Type Mapping