Software

2 minute read

Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

September 25, 2023

connecting-multiple-kafka-clusters-in-clickhouse-using-named-collections

Introduction:

ClickHouse is a powerful columnar database renowned for its speed and efficiency. A pivotal strength lies in its seamless integration with external data sources like Kafka. With the rising need for multi-cluster setups in modern data architectures, ClickHouse’s Named Collections offers an invaluable asset. In this guide, we’ll delve into how you can leverage this feature to seamlessly set up connections to two distinct Kafka clusters.

Why Use Named Collections?

Understanding the true value of Named Collections is crucial before we dive deep into the configurations. They allow us to:

Reduce Repetition: Eliminate the need to redundantly specify configurations.
Centralized Management: Maintain all configurations in a single, easily manageable location.
Improved Security: Safeguard sensitive credentials, keeping them out of the reach of non-administrative users.

Configuring Named Collections for Kafka:

With the prominence of Named Collections established, let’s gear up to connect to two distinct Kafka clusters – primary and secondary.

XML Configuration:


    
        
        
            primary-kafka-cluster:9094
            
                primary_kafka_client
                SASL_PLAINTEXT
                SCRAM-SHA-512
                clickhouse_primary
                primary_secret_password
            
        
        
        
            backup-kafka-cluster:9095
            
                secondary_kafka_client
                SASL_PLAINTEXT
                SCRAM-SHA-512
                clickhouse_secondary
                secondary_secret_password

For a more detailed configuration setup, refer to Pull Request #31691 starting from ClickHouse v21.12, which provides a more streamlined approach to using named_collections.

Setting Up Permanent Storage: MergeTree Table

After configuring our Kafka connections, the focus shifts to the ClickHouse realm. We’ll architect tables that act as our permanent data reservoirs.

1. Kafka Engine Table:

To tap directly into our Kafka topics, we’ll shape tables in ClickHouse using the Kafka engine. Here’s how you can define these tables:

For the primary Kafka cluster:

CREATE TABLE kafka_cluster_a
(
    `id` UInt32,
    `first_name` String,
    `last_name` String
)
ENGINE = Kafka(primary_kafka_cluster)
SETTINGS kafka_topic_list = 'your_topic_name_for_primary',
         kafka_group_name = 'your_consumer_group_for_primary',
         kafka_format = 'JSONEachRow',
         kafka_named_collection = 'primary_kafka_cluster';

For the secondary Kafka cluster:

CREATE TABLE kafka_cluster_b
(
    `id` UInt32,
    `first_name` String,
    `last_name` String
)
ENGINE = Kafka(secondary_kafka_cluster)
SETTINGS kafka_topic_list = 'your_topic_name_for_secondary',
         kafka_group_name = 'your_consumer_group_for_secondary',
         kafka_format = 'JSONEachRow',
         kafka_named_collection = 'secondary_kafka_cluster';

2. MergeTree Table:

We’ll use the MergeTree table to persistently store the data streamed from Kafka:

For kafka.cluster_a:

CREATE TABLE cluster_a_storage
(
    `id` UInt32,
    `first_name` String,
    `last_name` String
) ENGINE = MergeTree()
ORDER BY id;

For kafka.cluster_b:

CREATE TABLE cluster_b_storage
(
    `id` UInt32,
    `first_name` String,
    `last_name` String
) ENGINE = MergeTree()
ORDER BY id;

3. Materialized View:

The Materialized View serves as the Kafka table’s consumer, directing data flow:

For kafka.cluster_a:

CREATE MATERIALIZED VIEW cluster_a_mv TO cluster_a_storage AS
SELECT 
    id,
    first_name,
    last_name
FROM kafka.cluster_a;

For kafka.cluster_b:

CREATE MATERIALIZED VIEW cluster_b_mv TO cluster_b_storage AS
SELECT 
    id,
    first_name,
    last_name
FROM kafka.cluster_b;

Practical Applications:

With the above groundwork, ClickHouse is primed to consistently ingest and archive data from both Kafka clusters. This means any data dispatched to the delineated Kafka topics will be assimilated in real time. This is particularly advantageous for businesses seeking to conduct instantaneous analytics or data-driven decision-making.

Conclusion:

Harnessing ClickHouse’s Named Collections, establishing connections to multiple Kafka clusters transitions from being merely possible to efficient and organized. This structure guarantees instant data availability for querying, simplifying real-time analytics.

Is There a Developer Shortage?

September 25, 2023

Software

Top 5 CSS Frameworks

September 25, 2023

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Building a Splunk Investigator Agent with Strands Agents and Amazon Bedrock AgentCore

AI search strategy: A guide for modern marketing teams

Flutter App Development: Why You Should Choose Flutter for Your Project

Trending Tags

Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

Introduction:

Why Use Named Collections?

Configuring Named Collections for Kafka:

Setting Up Permanent Storage: MergeTree Table

1. Kafka Engine Table:

2. MergeTree Table:

3. Materialized View:

Practical Applications:

Conclusion:

Further Reading:

Leave a Reply Cancel reply

Previous Post

Is There a Developer Shortage?

Next Post

Top 5 CSS Frameworks

Connecting Multiple Kafka Clusters in ClickHouse Using Named Collections

Introduction:

Why Use Named Collections?

Configuring Named Collections for Kafka:

Setting Up Permanent Storage: MergeTree Table

1. Kafka Engine Table:

2. MergeTree Table:

3. Materialized View:

Practical Applications:

Conclusion:

Further Reading:

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts