Migrating Analytics from PostgreSQL to ClickHouse

Disclaimer: This blog post is automatically generated from project documentation and technical proposals using AI assistance. The content represents our development journey and architectural decisions. Code examples are simplified illustrations and may not reflect the exact production implementation.

The Materialized View Problem

I was reviewing PostgreSQL slow query logs when Caroline pointed out something concerning: “Our materialized view refreshes are taking longer every week.”

She pulled up the stats:

SELECT schemaname, matviewname, last_refresh, query_start, current_timestamp - query_start as refresh_duration
FROM pg_stat_progress_refresh_mat_view
JOIN pg_stat_activity USING (pid);

The match_statistics view was taking 18 seconds to refresh. The player_momentum view was at 12 seconds.

“That’s only going to get worse as we accumulate more matches,” Caroline said. “And these views lock tables during refresh.”

Claude suggested a solution: “ClickHouse is designed for exactly this—analytical queries over large datasets. Keep PostgreSQL for transactions, use ClickHouse for analytics.”

We decided to migrate.

Architecture: OLTP vs OLAP

Caroline drew the architecture on the whiteboard:

PostgreSQL (OLTP)          ClickHouse (OLAP)
- Match state              - Shot history
- Event sourcing           - Score changes
- Transactions             - Analytics queries
- Current scores           - Time-series analysis

The key insight: separate transactional and analytical workloads.

“PostgreSQL is perfect for event sourcing,” Claude explained. “It has strong consistency, transactions, and foreign keys. But for analytical queries over millions of shots, ClickHouse is 10-100x faster.”

I asked: “How do we keep them in sync?”

“NATS,” Caroline answered. “Events flow through NATS to both databases. PostgreSQL stores the current state, ClickHouse stores the analytical history.”

Data Flow

Match Events → NATS Stream → Projections Service
                                      ↓
                              ┌───────┴────────┐
                              ↓                ↓
                        PostgreSQL      ClickHouse
                    (Current state)   (Analytics)

Phase 1: Infrastructure Setup (1 Day)

Caroline started with the Docker Compose configuration:

clickhouse:
  image: clickhouse/clickhouse-server:24.3
  container_name: clickhouse
  ports:
    - '8123:8123' # HTTP interface
    - '9000:9000' # Native protocol
  volumes:
    - clickhouse-data:/var/lib/clickhouse
    - ./db/clickhouse/migrations:/docker-entrypoint-initdb.d
  environment:
    CLICKHOUSE_DB: shock
    CLICKHOUSE_USER: shock
    CLICKHOUSE_PASSWORD: shock

“ClickHouse automatically runs .sql files from /docker-entrypoint-initdb.d on first startup,” she explained. “No manual migration runner needed.”

We created the initial schema:

-- db/clickhouse/migrations/001_initial_schema.sql

CREATE DATABASE IF NOT EXISTS shock;

CREATE TABLE IF NOT EXISTS shock.events (
  event_id String,
  match_id String,
  event_type String,
  event_timestamp DateTime64(3),
  event_data String,  -- Full JSON for flexibility
  created_at DateTime64(3) DEFAULT now64(3)

) ENGINE = ReplacingMergeTree(created_at)
PARTITION BY toYYYYMM(event_timestamp)
ORDER BY (match_id, event_timestamp, event_id);

Caroline noted: “We use ReplacingMergeTree with created_at as the version column. This handles duplicate inserts—NATS delivers at-least-once, so we need deduplication.”

Phase 2: Event Projection (1 Day)

Next, we needed to stream events from NATS to ClickHouse. Caroline extended our existing projections service:

ClickHouse Client Setup

// app/packages/matches-app/index.mts

import { createClient } from '@clickhouse/client';

const clickhouse = createClient({
  url: config.clickhouse.url,
  username: config.clickhouse.username,
  password: config.clickhouse.password,
  database: config.clickhouse.database,
});

const app = new Application(database, nats, clickhouse, tracer, logger);

“Now ClickHouse is a first-class dependency,” Caroline explained. “Just like PostgreSQL.”

Batch Processing

We created a ClickHouseWriter class with batch support:

class ClickHouseWriter {
  async write(events: DomainEvent[]): Promise<void> {
    const rows = events.map((event) => ({
      event_id: event.id,
      match_id: event.detail.matchId,
      event_type: event.type,
      event_timestamp: new Date(event.detail.eventTime),
      event_data: JSON.stringify(event.detail),
    }));

    await this.clickhouse.insert({
      table: 'shock.events',
      values: rows,
      format: 'JSONEachRow',
    });
  }
}

Caroline integrated it with the projections service:

// app/packages/matches-projections/index.mts

const batchDuplex = new BatchDuplex({
  maxBatchSize: 10,
  maxWaitTime: 5000, // 5 seconds
});

consumer
  .messages()
  .pipe(batchDuplex)
  .on('data', async (batch) => {
    try {
      await clickhouseWriter.write(batch);
    } catch (error) {
      logger.error(error, 'Failed to write batch to ClickHouse');
    }
  });

“We batch 10 events or wait 5 seconds, whichever comes first,” she explained. “This balances latency and throughput.”

I asked: “What if a batch fails?”

“Retry with exponential backoff,” Claude answered. “3 attempts with 1s, 2s, 4s delays. If all fail, we log the error and continue processing the next batch.”

Phase 3: Denormalized Schema (1 Day)

The initial schema stored everything as JSON. Caroline proposed denormalization:

“We should extract hot columns—the fields we query most often—into native ClickHouse types for better performance.”

Shots Table

CREATE TABLE IF NOT EXISTS shock.shots (
  event_id String,
  match_id String,
  event_timestamp DateTime64(3),

  -- Denormalized hot columns
  player UInt8,              -- Changed from String to save space
  shot_type String,          -- forehand, backhand, serve, volley
  shot_outcome String,       -- WINNER, ERROR, FORCED_ERROR, NEUTRAL
  winner_team UInt8,         -- 0 or 1 (team that won the point)
  text String,               -- Original description

  -- Full event for flexibility
  event_data String,
  created_at DateTime64(3)

) ENGINE = ReplacingMergeTree(created_at)
PARTITION BY (match_id, toYYYYMMDD(event_timestamp))
ORDER BY (match_id, player, shot_type, event_timestamp);

Caroline explained the benefits:

Storage: player as UInt8 (1 byte) vs String (~10 bytes) = 90% savings
Query speed: Native types avoid JSON parsing overhead
Indexes: Sparse indexes on player and shot_type for filtering
Compression: ClickHouse compresses native types better than JSON

Score Changes Table

CREATE TABLE IF NOT EXISTS shock.score_changes (
  event_id String,
  match_id String,
  event_timestamp DateTime64(3),
  sequence UInt32,

  -- Current game score
  current_game_point_0 String,  -- e.g., '0', '15', '30', '40', 'AD'
  current_game_point_1 String,

  -- Set and match status
  current_set UInt8,
  match_finished Boolean,
  match_winner Int8,            -- -1 if not finished, 0 or 1 for winner

  -- Sets data (JSON due to variable length)
  sets String,

  event_data String,
  created_at DateTime64(3)

) ENGINE = ReplacingMergeTree(created_at)
PARTITION BY (match_id, toYYYYMMDD(event_timestamp))
ORDER BY (match_id, sequence, event_timestamp);

Phase 4: Query Migration (2 Days)

Now the hard part: migrating 11 analytical queries from PostgreSQL to ClickHouse.

Migration Strategy

Caroline created a systematic approach:

Feature flags - Run both PostgreSQL and ClickHouse versions in parallel
Identical interfaces - Same input parameters, same output structure
Query optimization - Leverage ClickHouse-specific features
Testing - Compare results between implementations

Example: Player Momentum

The PostgreSQL version used materialized views:

-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW match_momentum AS
SELECT
  match_id,
  time_window,
  player,
  SUM(CASE WHEN winner = player THEN 1 ELSE 0 END) as points_won,
  SUM(CASE WHEN winner != player THEN 1 ELSE 0 END) as points_lost
FROM shots
GROUP BY match_id, time_window, player;

Caroline migrated it to ClickHouse:

// get.player_momentum.clickhouse.query.mts

const query = `
WITH time_windows AS (
  SELECT
    match_id,
    player,
    floor(date_diff('second', match_start, event_timestamp) / 60) as time_window,
    countIf(winner_team = team) as points_won,
    countIf(winner_team != team) as points_lost
  FROM shock.shots
  WHERE match_id = {matchId:String}
  GROUP BY match_id, player, time_window
),
momentum_calc AS (
  SELECT
    time_window,
    player,
    points_won,
    points_lost,
    -- Exponential decay with lookback of 4 windows
    points_won + 0.7 * lag(points_won, 1, 0) OVER w +
    0.49 * lag(points_won, 2, 0) OVER w +
    0.343 * lag(points_won, 3, 0) OVER w as weighted_points_won
  FROM time_windows
  WINDOW w AS (PARTITION BY player ORDER BY time_window)
)
SELECT
  time_window,
  player,
  round(weighted_points_won / (weighted_points_won + weighted_points_lost) * 100, 2) as momentum_pct
FROM momentum_calc;
`;

const result = await clickhouse.query({
  query,
  query_params: { matchId },
});

“The logic is identical to PostgreSQL,” Caroline said. “But ClickHouse processes it 10-20x faster because of columnar storage and vectorized execution.”

Complex Query: Pressure Analysis

The most challenging migration was GetPressureAnalysis. It combines three scoring components:

Base Weight (0.3-1.0): Shot outcome impact
Situation Weight (1.0-5.0): Game score context (deuce, game point, advantage)
Momentum Factor (0.5-1.5): Recent point trend with exponential decay

Caroline showed me the ClickHouse query:

const query = `
WITH scored_shots AS (
  SELECT
    s.event_timestamp,
    s.player,
    s.shot_outcome,
    
    -- Base weight from outcome
    multiIf(
      s.shot_outcome = 'WINNER', 1.0,
      s.shot_outcome = 'FORCED_ERROR', 0.8,
      s.shot_outcome = 'UNFORCED_ERROR', 0.6,
      s.shot_outcome = 'LET', 0.3,
      0.5
    ) as base_weight,
    
    -- Situation weight from game score
    multiIf(
      sc.current_game_point_0 = 'AD' OR sc.current_game_point_1 = 'AD', 5.0,
      sc.current_game_point_0 = '40' AND sc.current_game_point_1 = '30', 4.0,
      sc.current_game_point_0 = '40' AND sc.current_game_point_1 = '40', 3.5,
      2.5
    ) as situation_weight,
    
    -- Momentum factor (last 5 winners with 0.2 decay)
    1.0 + (
      arrayReduce('sum',
        arrayMap((w, i) -> if(w = team, exp(-0.2 * i), -exp(-0.2 * i)),
          groupArray(5)(s.winner_team) OVER (
            PARTITION BY s.match_id
            ORDER BY s.event_timestamp
            ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
          ),
          range(5)
        )
      ) / 5.0
    ) * 0.5 as momentum_factor

  FROM shock.shots s
  ASOF LEFT JOIN shock.score_changes sc
    ON s.match_id = sc.match_id
    AND sc.event_timestamp <= s.event_timestamp
  WHERE s.match_id = {matchId:String}
)
SELECT
  player,
  round(avg(base_weight * situation_weight * momentum_factor), 3) as pressure_score
FROM scored_shots
GROUP BY player;
`;

I was amazed: “You’re using ASOF LEFT JOIN to correlate shots with the closest previous score change?”

Caroline nodded: “ClickHouse’s time-series joins are perfect for this. And arrayMap + arrayReduce let us calculate momentum directly in SQL—no intermediate tables.”

Performance Results

After migrating all 11 queries, we benchmarked:

Query	PostgreSQL (P95)	ClickHouse (P95)	Speedup
Player Momentum	450ms	42ms	10.7x
Team Momentum	380ms	38ms	10.0x
Pressure Analysis	820ms	68ms	12.1x
Player Profiles	620ms	54ms	11.5x
Match Statistics	580ms	48ms	12.1x
Head-to-Head	740ms	71ms	10.4x
Shot Type Stats	410ms	36ms	11.4x
Set Stats	390ms	34ms	11.5x
Situation Performance	680ms	58ms	11.7x
Outcomes Over Time	520ms	46ms	11.3x
Match Streaks	710ms	62ms	11.5x
Average	573ms	51ms	11.2x

Caroline was thrilled: “We’re seeing 11x faster queries on average. And this gap will grow as we accumulate more data.”

Phase 5: PostgreSQL Cleanup (1 Day)

With all queries migrated, we cleaned up PostgreSQL:

Materialized Views Dropped

DROP MATERIALIZED VIEW IF EXISTS match_statistics;
DROP MATERIALIZED VIEW IF EXISTS match_momentum;
DROP MATERIALIZED VIEW IF EXISTS match_streaks;
DROP MATERIALIZED VIEW IF EXISTS match_pressure_analysis;
DROP MATERIALIZED VIEW IF EXISTS player_shot_type_statistics;
DROP MATERIALIZED VIEW IF EXISTS player_set_statistics;
DROP MATERIALIZED VIEW IF EXISTS player_situation_performance;
DROP MATERIALIZED VIEW IF EXISTS player_head_to_head_statistics;

“We dropped 8 materialized views,” Caroline said. “That’s 8 fewer things to maintain, 8 fewer background refresh jobs, and a lot less disk space.”

Disk Space Savings

Before cleanup:

PostgreSQL database size: 2.4 GB
- Transactional data: 680 MB
- Materialized views: 1.72 GB

After cleanup:

PostgreSQL database size: 680 MB
ClickHouse database size: 890 MB (compressed)

Total: 1.57 GB (34% reduction)

“We saved 850 MB,” I noted. “And queries are 11x faster.”

What Stayed in PostgreSQL

We kept transactional data in PostgreSQL:

Event sourcing tables: matches, match_events, match_projections
Current match state: Live scores, active matches
Reference tables: shot_types, shot_outcomes
Transactional queries: GetMatch, GetMatchScore, GetLatestMatches

Claude explained: “PostgreSQL excels at transactional workloads—strong consistency, ACID guarantees, foreign keys. ClickHouse excels at analytical workloads—columnar storage, vectorized execution, distributed queries.”

Challenges and Lessons Learned

Challenge 1: Column Name Mismatches

Caroline hit an error early on:

DB::Exception: Unknown identifier: outcome

“The PostgreSQL table had outcome, but ClickHouse table had shot_outcome,” she explained. “I had to update 6 queries.”

Lesson: Consistent naming across databases is critical during migration.

Challenge 2: Subquery Cloning Errors

ClickHouse doesn’t support correlated subqueries in some contexts:

-- This fails in ClickHouse
SELECT player, (SELECT AVG(points) WHERE team = player.team) FROM shots;

Caroline’s fix: Use window functions instead:

-- This works
SELECT player, AVG(points) OVER (PARTITION BY team) FROM shots;

Lesson: ClickHouse window functions are more powerful than subqueries for analytical queries.

Challenge 3: Timestamp Casting

Date functions in ClickHouse require explicit casting:

-- This fails
SELECT toStartOfMinute(event_timestamp) FROM shots;

-- This works
SELECT toStartOfMinute(CAST(event_timestamp AS DateTime64)) FROM shots;

Lesson: ClickHouse type system is stricter than PostgreSQL—explicit casts prevent runtime errors.

Challenge 4: Type Consistency

We changed player from String to UInt8 across the entire stack:

// Before (domain/events.mts)
interface ShotPlayedEvent {
  player: string; // '1', '2', '3', '4'
}

// After
interface ShotPlayedEvent {
  player: 1 | 2 | 3 | 4; // Numeric union type
}

Caroline explained: “This cascaded through commands, events, projections, and queries. But it caught bugs—we found places where player IDs were strings when they should be numbers.”

Lesson: Strict typing prevents errors, but requires discipline during migration.

Monitoring and Observability

Caroline added OpenTelemetry spans for all ClickHouse operations:

async write(events: DomainEvent[]): Promise<void> {
  return this.tracer.startActiveSpan('clickhouse.write', async (span) => {
    span.setAttribute('event.count', events.length);

    try {
      await this.clickhouse.insert({ table: 'shock.events', values: rows });
      span.setStatus({ code: SpanStatusCode.OK });
    } catch (error) {
      span.recordException(error);
      span.setStatus({ code: SpanStatusCode.ERROR });
      throw error;
    } finally {
      span.end();
    }
  });
}

“Now we can see ClickHouse latency in Grafana,” she said. “Inserts, queries, errors—all traced.”

Mermaid Diagram: Migration Architecture

graph TB
    subgraph EventSource["📝 Event Source"]
        MATCHES[Match Events<br/>ShotPlayed, ScoreChanged]
    end

    subgraph MessageBus["📨 NATS JetStream"]
        STREAM[MATCHES Stream<br/>At-least-once delivery]
    end

    subgraph Projections["🔄 Projections Service"]
        CONSUMER[Event Consumer<br/>BatchDuplex 10/5s]
        PG_WRITER[PostgreSQL Writer<br/>Current state]
        CH_WRITER[ClickHouse Writer<br/>Analytics history]
    end

    subgraph PostgreSQL["💾 PostgreSQL (OLTP)"]
        PG_TABLES[(matches<br/>match_events<br/>match_projections)]
        PG_QUERIES[Transactional Queries<br/>GetMatch, GetMatchScore]
    end

    subgraph ClickHouse["📊 ClickHouse (OLAP)"]
        CH_TABLES[(shots<br/>score_changes<br/>events)]
        CH_QUERIES[Analytical Queries<br/>11 migrated queries]
    end

    subgraph API["🌐 API Layer"]
        API_ROUTES[API Routes<br/>Split by workload]
    end

    MATCHES --> STREAM
    STREAM --> CONSUMER

    CONSUMER --> PG_WRITER
    CONSUMER --> CH_WRITER

    PG_WRITER --> PG_TABLES
    CH_WRITER --> CH_TABLES

    PG_TABLES --> PG_QUERIES
    CH_TABLES --> CH_QUERIES

    PG_QUERIES --> API_ROUTES
    CH_QUERIES --> API_ROUTES

    classDef source fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef message fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    classDef projection fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef postgres fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px
    classDef clickhouse fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    classDef api fill:#ffebee,stroke:#c62828,stroke-width:2px

    class MATCHES source
    class STREAM message
    class CONSUMER,PG_WRITER,CH_WRITER projection
    class PG_TABLES,PG_QUERIES postgres
    class CH_TABLES,CH_QUERIES clickhouse
    class API_ROUTES api

Takeaways

Separate OLTP from OLAP - PostgreSQL for transactions, ClickHouse for analytics. Each database optimized for its workload.
Denormalize for performance - Extract hot columns to native types. Avoid JSON parsing overhead.
Batch for efficiency - 10 events or 5 seconds. Balance latency and throughput.
Feature flags for safe migration - Run both implementations in parallel. Compare results before switching.
ClickHouse time-series features are powerful - ASOF JOIN, window functions, array operations.

Caroline summarized: “We went from 8 materialized views refreshing every 5 minutes to real-time ClickHouse queries that are 11x faster.”

Claude agreed: “This is how you scale analytics. PostgreSQL handles state, ClickHouse handles history, NATS keeps them in sync.”

I was just happy our dashboard loaded instantly now.

The Materialized View Problem#

Architecture: OLTP vs OLAP#

Data Flow#

Phase 1: Infrastructure Setup (1 Day)#

Phase 2: Event Projection (1 Day)#

ClickHouse Client Setup#

Batch Processing#

Phase 3: Denormalized Schema (1 Day)#

Shots Table#

Score Changes Table#

Phase 4: Query Migration (2 Days)#

Migration Strategy#

Example: Player Momentum#

Complex Query: Pressure Analysis#

Performance Results#

Phase 5: PostgreSQL Cleanup (1 Day)#

Materialized Views Dropped#

Disk Space Savings#

What Stayed in PostgreSQL#

Challenges and Lessons Learned#

Challenge 1: Column Name Mismatches#

Challenge 2: Subquery Cloning Errors#

Challenge 3: Timestamp Casting#

Challenge 4: Type Consistency#

Monitoring and Observability#

Mermaid Diagram: Migration Architecture#

Takeaways#