Skip to main content
Skip to main content
Edit this page

Snowflake SQL translation guide

Data types

Numerics

Users moving data between ClickHouse and Snowflake will immediately notice that ClickHouse offers more granular precision concerning declaring numerics. For example, Snowflake offers the type Number for numerics. This requires the user to specify a precision (total number of digits) and scale (digits to the right of the decimal place) up to a total of 38. Integer declarations are synonymous with Number, and simply define a fixed precision and scale where the range is the same. This convenience is possible as modifying the precision (scale is 0 for integers) does not impact the size of data on disk in Snowflake - the minimal required bytes are used for a numeric range at write time at a micro partition level. The scale does, however, impact storage space and is offset with compression. A Float64 type offers a wider range of values with a loss of precision.

Contrast this with ClickHouse, which offers multiple signed and unsigned precision for floats and integers. With these, ClickHouse users can be explicit about the precision required for integers to optimize storage and memory overhead. A Decimal type, equivalent to Snowflake’s Number type, also offers twice the precision and scale at 76 digits. In addition to a similar Float64 value, ClickHouse also provides a Float32 for when precision is less critical and compression paramount.

Strings

ClickHouse and Snowflake take contrasting approaches to the storage of string data. The VARCHAR in Snowflake holds Unicode characters in UTF-8, allowing the user to specify a maximum length. This length has no impact on storage or performance, with the minimum number of bytes always used to store a string, and rather provides only constraints useful for downstream tooling. Other types, such as Text and NChar, are simply aliases for this type. ClickHouse conversely stores all string data as raw bytes with a String type (no length specification required), deferring encoding to the user, with query time functions available for different encodings. We refer the reader to "Opaque data argument" for the motivation as to why. The ClickHouse String is thus more comparable to the Snowflake Binary type in its implementation. Both Snowflake and ClickHouse support “collation”, allowing users to override how strings are sorted and compared.

Semi-structured types

Snowflake supports the VARIANT, OBJECT and ARRAY types for semi-structured data.

ClickHouse offers the equivalent Variant, Object (deprecated) and Array types. Additionally, ClickHouse has the JSON type which replaces the now deprecated Object('json') type and is particularly performant and storage efficient in comparison to other native JSON types.

ClickHouse also supports named Tuples and arrays of Tuples via the Nested type, allowing users to explicitly map nested structures. This allows codecs and type optimizations to be applied throughout the hierarchy, unlike Snowflake, which requires the user to use the OBJECT, VARIANT, and ARRAY types for the outer object and does not allow explicit internal typing. This internal typing also simplifies queries on nested numerics in ClickHouse, which do not need to be cast and can be used in index definitions.

In ClickHouse, codecs and optimized types can also be applied to substructures. This provides an added benefit that compression with nested structures remains excellent, and comparable, to flattened data. In contrast, as a result of the inability to apply specific types to substructures, Snowflake recommends flattening data to achieve optimal compression. Snowflake also imposes size restrictions for these data types.

Type reference

SnowflakeClickHouseNote
NUMBERDecimalClickHouse supports twice the precision and scale than Snowflake - 76 digits vs. 38.
FLOAT, FLOAT4, FLOAT8Float32, Float64All floats in Snowflake are 64 bit.
VARCHARString
BINARYString
BOOLEANBool
DATEDate, Date32DATE in Snowflake offers a wider date range than ClickHouse e.g. min for Date32 is 1900-01-01 and Date 1970-01-01. Date in ClickHouse provides more cost efficient (two byte) storage.
TIME(N)No direct equivalent but can be represented by DateTime and DateTime64(N).DateTime64 uses the same concepts of precision.
TIMESTAMP - TIMESTAMP_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZDateTime and DateTime64DateTime and DateTime64 can optionally have a TZ parameter defined for the column. If not present, the server's timezone is used. Additionally a --use_client_time_zone parameter is available for the client.
VARIANTJSON, Tuple, NestedJSON type is experimental in ClickHouse. This type infers the column types at insert time. Tuple, Nested and Array can also be used to build explicitly type structures as an alternative.
OBJECTTuple, Map, JSONBoth OBJECT and Map are analogous to JSON type in ClickHouse where the keys are a String. ClickHouse requires the value to be consistent and strongly typed whereas Snowflake uses VARIANT. This means the values of different keys can be a different type. If this is required in ClickHouse, explicitly define the hierarchy using Tuple or rely on JSON type.
ARRAYArray, NestedARRAY in Snowflake uses VARIANT for the elements - a super type. Conversely these are strongly typed in ClickHouse.
GEOGRAPHYPoint, Ring, Polygon, MultiPolygonSnowflake imposes a coordinate system (WGS 84) while ClickHouse applies at query time.
GEOMETRYPoint, Ring, Polygon, MultiPolygon
ClickHouse TypeDescription
IPv4 and IPv6IP-specific types, potentially allowing more efficient storage than Snowflake.
FixedStringAllows a fixed length of bytes to be used, which is useful for hashes.
LowCardinalityAllows any type to be dictionary encoded. Useful for when the cardinality is expected to be < 100k.
EnumAllows efficient encoding of named values in either 8 or 16-bit ranges.
UUIDFor efficient storage of UUIDs.
Array(Float32)Vectors can be represented as an Array of Float32 with supported distance functions.

Finally, ClickHouse offers the unique ability to store the intermediate state of aggregate functions. This state is implementation-specific, but allows the result of an aggregation to be stored and later queried (with corresponding merge functions). Typically, this feature is used via a materialized view and, as demonstrated below, offers the ability to improve performance of specific queries with minimal storage cost by storing the incremental result of queries over inserted data (more details here).