ClickHouse 是由 Yandex 于 2016 年开源的列式数据库管理系统(DBMS),专为在线分析处理(OLAP)场景设计。ClickHouse 以其 极高的查询性能、高效的列式存储、强大的数据压缩 著称。
ClickHouse 的核心定位是 实时 OLAP 分析数据库。它提供了:
ClickHouse 于 2016 年由 Yandex 开源,最初用于 Yandex.Metrica(类似 Google Analytics)的实时数据存储和分析。ClickHouse 需要处理每秒数千亿行数据,并支持复杂的实时分析查询。
-- 创建表(MergeTree 引擎)
CREATE TABLE events (
event_time DateTime,
user_id UInt64,
event_type String,
session_id String,
page_url String,
duration UInt32
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_time, user_id)
SETTINGS index_granularity = 8192;
-- 插入数据
INSERT INTO events (event_time, user_id, event_type, session_id, page_url, duration)
VALUES
('2024-01-15 10:00:00', 1001, 'view', 'sess_001', '/home', 120),
('2024-01-15 10:05:00', 1001, 'click', 'sess_001', '/product/123', 0),
('2024-01-15 10:10:00', 1002, 'view', 'sess_002', '/about', 80);
-- 批量插入(推荐)
INSERT INTO events
SELECT
toDateTime('2024-01-15 10:00:00') + toIntervalSecond(number),
rand() % 10000,
['view', 'click', 'purchase', 'share'][rand() % 4 + 1],
concat('sess_', toString(rand() % 1000)),
concat('/page_', toString(rand() % 100)),
rand() % 300
FROM numbers(1000000);
-- 基础查询
SELECT COUNT(*) FROM events;
-- 分组统计
SELECT
event_type,
COUNT(*) as count,
AVG(duration) as avg_duration
FROM events
WHERE event_time >= '2024-01-15'
GROUP BY event_type
ORDER BY count DESC;
-- 时间聚合(按小时)
SELECT
toStartOfHour(event_time) as hour,
COUNT(*) as events,
COUNT(DISTINCT user_id) as users
FROM events
GROUP BY hour
ORDER BY hour;
-- 漏斗分析
SELECT
event_type,
COUNT(DISTINCT user_id) as users
FROM events
WHERE event_time BETWEEN '2024-01-15' AND '2024-01-16'
GROUP BY event_type
ORDER BY event_type;
-- 窗口函数
SELECT
user_id,
event_time,
event_type,
row_number() OVER (PARTITION BY user_id ORDER BY event_time) as event_seq
FROM events
WHERE event_time >= '2024-01-15'
LIMIT 100;
-- 物化视图(预聚合)
CREATE MATERIALIZED VIEW events_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, event_type)
AS SELECT
toStartOfHour(event_time) as hour,
event_type,
COUNT(*) as count,
SUM(duration) as total_duration
FROM events
GROUP BY hour, event_type;
-- 查询物化视图
SELECT * FROM events_hourly
WHERE hour >= '2024-01-15'
ORDER BY hour;
-- 分布式表(需要集群)
CREATE TABLE events_distributed
ENGINE = Distributed('cluster_name', 'default', 'events', rand());
-- 查询分布式表
SELECT COUNT(*) FROM events_distributed;
-- 窗口函数(排名)
SELECT
user_id,
SUM(duration) as total_duration,
rank() OVER (ORDER BY total_duration DESC) as rank
FROM events
GROUP BY user_id
ORDER BY rank
LIMIT 10;
-- JSON 支持
CREATE TABLE logs (
ts DateTime,
data JSON
) ENGINE = MergeTree()
ORDER BY ts;
INSERT INTO logs (ts, data)
VALUES
(now(), '{"level": "info", "message": "User logged in", "user_id": 1001}'),
(now(), '{"level": "error", "message": "Database connection failed", "code": 500}');
-- 查询 JSON 字段
SELECT
ts,
data.level,
data.message
FROM logs
WHERE data.level = 'error';
-- 创建 Kafka 表引擎(消费 Kafka 数据)
CREATE TABLE events_kafka (
event_time DateTime,
user_id UInt64,
event_type String,
session_id String,
page_url String,
duration UInt32
) ENGINE = Kafka()
SETTINGS
kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'events',
kafka_group_name = 'clickhouse_consumer',
kafka_format = 'JSONEachRow';
-- 创建物化视图,自动写入 MergeTree
CREATE MATERIALIZED VIEW events_mv TO events
AS SELECT * FROM events_kafka;
| 对比项 | ClickHouse | MySQL | Elasticsearch | Snowflake |
|---|---|---|---|---|
| 类型 | 列式 OLAP | 行式 OLTP | 搜索引擎 | 云数仓 |
| 查询性能 | 极快 | 快 | 快 | 快 |
| 压缩比 | 5-10x | 1x | 1-2x | 5-10x |
| 写入速度 | 极高 | 高 | 高 | 高 |
| SQL 支持 | ✅ | ✅ | ❌ | ✅ |
| 事务支持 | ❌ | ✅ | ❌ | ❌ |
| 适用场景 | 实时分析 | OLTP | 全文搜索 | 企业数仓 |
ClickHouse 安装、SQL 基础、MergeTree 表引擎
分区、排序键、索引、物化视图、窗口函数
分布式表、集群配置、Kafka 集成、JSON 支持
日志分析平台、实时 BI 报表、性能调优、生产部署
ClickHouse 是实时分析数据库的性能之王。
它用 列式存储、向量化执行、高压缩比 实现了万亿行数据的毫秒级查询。ClickHouse 是日志分析、实时 BI 的最佳选择。
"ClickHouse 让实时分析变得简单而高效。" 📊