Skip to main content
Skip to main content
Edit this page

Amazon Redshift SQL translation guide

Data types

Users moving data between ClickHouse and Redshift will immediately notice that ClickHouse offers a more extensive range of types, which are also less restrictive. While Redshift requires users to specify possible string lengths, even if variable, ClickHouse removes this restriction and burden from the user by storing strings without encoding as bytes. The ClickHouse String type thus has no limits or length specification requirements.

Furthermore, users can exploit Arrays, Tuples, and Enums - absent from Redshift as first-class citizens (although Arrays/Structs can be imitated with SUPER) and a common frustration of users. ClickHouse additionally allows the persistence, either at query time or even in a table, of aggregation states. This will enable data to be pre-aggregated, typically using a materialized view, and can dramatically improve query performance for common queries.

Below we map the equivalent ClickHouse type for each Redshift type:

RedshiftClickHouse
SMALLINTInt8 *
INTEGERInt32 *
BIGINTInt64 *
DECIMALUInt128, UInt256, Int128, Int256, Decimal(P, S), Decimal32(S), Decimal64(S), Decimal128(S), Decimal256(S) - (high precision and ranges possible)
REALFloat32
DOUBLE PRECISIONFloat64
BOOLEANBool
CHARString, FixedString
VARCHAR **String
DATEDate32
TIMESTAMPDateTime, DateTime64
TIMESTAMPTZDateTime, DateTime64
GEOMETRYGeo Data Types
GEOGRAPHYGeo Data Types (less developed e.g. no coordinate systems - can be emulated with functions)
HLLSKETCHAggregateFunction(uniqHLL12, X)
SUPERTuple, Nested, Array, JSON, Map
TIMEDateTime, DateTime64
TIMETZDateTime, DateTime64
VARBYTE **String combined with Bit and Encoding functions
* ClickHouse additionally supports unsigned integers with extended ranges i.e. UInt8, UInt32, UInt32 and UInt64.
**ClickHouse’s String type is unlimited by default but can be constrained to specific lengths using Constraints.

DDL syntax

Sorting keys

Both ClickHouse and Redshift have the concept of a “sorting key”, which define how data is sorted when being stored. Redshift defines the sorting key using the SORTKEY clause:

Comparatively, ClickHouse uses an ORDER BY clause to specify the sort order:

In most cases, you can use the same sorting key columns and order in ClickHouse as Redshift, assuming you are using the default COMPOUND type. When data is added to Redshift, you should run the VACUUM and ANALYZE commands to re-sort newly added data and update the statistics for the query planner - otherwise, the unsorted space grows. No such process is required for ClickHouse.

Redshift supports a couple of convenience features for sorting keys. The first is automatic sorting keys (using SORTKEY AUTO). While this may be appropriate for getting started, explicit sorting keys ensure the best performance and storage efficiency when the sorting key is optimal. The second is the INTERLEAVED sort key, which gives equal weight to a subset of columns in the sort key to improve performance when a query uses one or more secondary sort columns. ClickHouse supports explicit projections, which achieve the same end-result with a slightly different setup.

Users should be aware that the “primary key” concept represents different things in ClickHouse and Redshift. In Redshift, the primary key resembles the traditional RDMS concept intended to enforce constraints. However, they are not strictly enforced in Redshift and instead act as hints for the query planner and data distribution among nodes. In ClickHouse, the primary key denotes columns used to construct the sparse primary index, used to ensure the data is ordered on disk, maximizing compression while avoiding pollution of the primary index and wasting memory.