Skip to main content

Using native and binary formats in ClickHouse

ClickHouse supports multiple binary formats, which result in better performance and space efficiency. Binary formats are also safe in character encoding since data is saved in a binary form.

We're going to use some_data table and data for demonstration, feel free to reproduce that on your ClickHouse instance.

Exporting in a Native ClickHouse format

The most efficient data format to export and import data between ClickHouse nodes is Native format. Exporting is done using INTO OUTFILE clause:

SELECT * FROM some_data
INTO OUTFILE 'data.clickhouse' FORMAT Native

This will create data.clickhouse file in a native format.

Importing from a Native format

To import data, we can use file() for smaller files or exploration purposes:

DESCRIBE file('data.clickhouse', Native);
┌─name──┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ path │ String │ │ │ │ │ │
│ month │ Date │ │ │ │ │ │
│ hits │ UInt32 │ │ │ │ │ │
└───────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
Tip

When using the file() function, with ClickHouse Cloud you will need to run the commands in clickhouse client on the machine where the file resides. Another option is to use clickhouse-local to explore files locally.

In production, we use FROM INFILE to import data:

INSERT INTO sometable
FROM INFILE 'data.clickhouse'
FORMAT Native

Native format compression

We can also enable compression while exporting data to Native format (as well as most other formats) using a COMPRESSION clause:

SELECT * FROM some_data
INTO OUTFILE 'data.clickhouse'
COMPRESSION 'lz4'
FORMAT Native

We've used LZ4 compression for export. We'll have to specify it while importing data:

INSERT INTO sometable
FROM INFILE 'data.clickhouse'
COMPRESSION 'lz4'
FORMAT Native

Exporting to RowBinary

Another binary format supported is RowBinary, which allows importing and exporting data in binary-represented rows:

SELECT * FROM some_data
INTO OUTFILE 'data.binary' FORMAT RowBinary

This will generate data.binary file in a binary rows format.

Exploring RowBinary files

Automatic schema inference is not supported for this format, so to explore before loading, we have to define schema explicitly:

SELECT *
FROM file('data.binary', RowBinary, 'path String, month Date, hits UInt32')
LIMIT 5
┌─path───────────────────────────┬──────month─┬─hits─┐
│ Bangor_City_Forest │ 2015-07-01 │ 34 │
│ Alireza_Afzal │ 2017-02-01 │ 24 │
│ Akhaura-Laksam-Chittagong_Line │ 2015-09-01 │ 30 │
│ 1973_National_500 │ 2017-10-01 │ 80 │
│ Attachment │ 2017-09-01 │ 1356 │
└────────────────────────────────┴────────────┴──────┘

Consider using RowBinaryWithNames, which also adds a header row with a columns list. RowBinaryWithNamesAndTypes will also add an additional header row with column types.

Importing from RowBinary files

To load data from a RowBinary file, we can use a FROM INFILE clause:

INSERT INTO sometable
FROM INFILE 'data.binary'
FORMAT RowBinary

Importing single binary value using RawBLOB

Suppose we want to read an entire binary file and save it into a field in a table. This is the case when the RawBLOB format can be used. This format can be directly used with a single-column table only:

CREATE TABLE images(data String) Engine = Memory

Let's save an image file to the images table:

cat image.jpg | clickhouse-client -q "INSERT INTO images FORMAT RawBLOB"

We can check the data field length which will be equal to the original file size:

SELECT length(data) FROM images
┌─length(data)─┐
│ 6121 │
└──────────────┘

Exporting RawBLOB data

This format can also be used to export data using an INTO OUTFILE clause:

SELECT * FROM images LIMIT 1
INTO OUTFILE 'out.jpg'
FORMAT RawBLOB

Note that we had to use LIMIT 1 because exporting more than a single value will create a corrupted file.

MessagePack

ClickHouse supports importing and exporting to MessagePack using the MsgPack. To export to MessagePack format:

SELECT *
FROM some_data
INTO OUTFILE 'data.msgpk'
FORMAT MsgPack

To import data from a MessagePack file:

INSERT INTO sometable
FROM INFILE 'data.msgpk'
FORMAT MsgPack

Protocol Buffers

To work with Protocol Buffers we first need to define a schema file:

syntax = "proto3";

message MessageType {
string path = 1;
date month = 2;
uint32 hits = 3;
};

Path to this schema file (schema.proto in our case) is set in a format_schema settings option for the Protobuf format:

SELECT * FROM some_data
INTO OUTFILE 'proto.bin'
FORMAT Protobuf
SETTINGS format_schema = 'schema:MessageType'

This saves data to the proto.bin file. ClickHouse also supports importing Protobuf data as well as nested messages. Consider using ProtobufSingle to work with a single Protocol Buffer message (length delimiters will be omitted in this case).

Cap’n Proto

Another popular binary serialization format supported by ClickHouse is Cap’n Proto. Similarly to Protobuf format, we have to define a schema file (schema.capnp) in our example:

@0xec8ff1a10aa10dbe;

struct PathStats {
path @0 :Text;
month @1 :UInt32;
hits @2 :UInt32;
}

Now we can import and export using CapnProto format and this schema:

SELECT
path,
CAST(month, 'UInt32') AS month,
hits
FROM some_data
INTO OUTFILE 'capnp.bin'
FORMAT CapnProto
SETTINGS format_schema = 'schema:PathStats'

Note that we had to cast the Date column as UInt32 to match corresponding types.

Other formats

ClickHouse introduces support for many formats, both text, and binary, to cover various scenarios and platforms. Explore more formats and ways to work with them in the following articles:

And also check clickhouse-local - a portable full-featured tool to work on local/remote files without starting ClickHouse server.