Jun 27, 2026

OLAP – Phase 2 Columnar Storage

Phase 1 gave us in-memory vectors and data chunks. But a database needs to persist data to disk and read it back. The challenge is designing a disk format that preserves the benefits of columnar layout: read only the columns you need, skip entire sections when the data can’t match your query.

This phase builds two structures: RowGroups (horizontal partitions of a table, each holding up to 122,880 rows) and ColumnSegments (one column’s serialized data within a row group, annotated with min/max zone maps).

In DuckDB, these are RowGroup (src/storage/table/row_group.cpp) and ColumnSegment (src/storage/table/column_segment.cpp). The default row group size of 122,880 (60 x 2048 vectors) is defined in row_group.hpp.

Here is the roadmap for the phases to come:

Phase 2: Columnar storage (row groups and column segments)
Phase 3: Compression
Phase 4: Table, catalog, and bulk loading
Phase 5: Vectorized expressions and scan/filter/project
Phase 6: Hash aggregation
Phase 7: Hash join
Phase 8: SQL parser
Phase 9: Query planner and optimizer
Phase 10: Sorting, parallel execution, REPL, and server

Full Source Code

The code referenced in this post can be found in https://gitlab.com/kimserey.lam/olap-learn.

Why 122,880 Rows?

A row group batches 60 vector-sized chunks:

const RowGroupSize = 60 * vector.VectorSize // 122,880

Why 60? It’s a trade-off:

Too small (e.g., 2048 rows): too many segments, zone maps are useless because the value ranges overlap everywhere
Too large (e.g., 10 million rows): reading one row group loads too much data, nullifying the benefit of skipping
122,880: large enough that zone maps are meaningful (a sorted column will have tight min/max ranges per group), small enough that skipping a group saves real I/O

Column Segments and Zone Maps

A ColumnSegment holds all values for one column within one row group:

type ColumnSegment struct {
    Type        vector.LogicalType
    Count       uint32
    NullCount   uint32
    Compression CompressionType
    ZoneMap     ZoneMap
    Data        []byte
    Nulls       []byte
}

The ZoneMap stores min/max statistics for the segment’s values:

type ZoneMap struct {
    HasStats bool
    Min      any
    Max      any
    HasNulls bool
}

Zone maps are the OLAP equivalent of a B-tree index, but free — they’re computed once during writes and checked during reads. For WHERE year = 2024, the scan operator checks each row group’s zone map before decompressing any data:

min=2020, max=2022 → skip (2024 can’t be in this range)
min=2023, max=2025 → maybe (decompress and check row by row)
min=2024, max=2024 → all match (every row satisfies the predicate)

The zone map supports five comparison checks:

func (z *ZoneMap) CheckEquals(val any) ZoneMapResult {
    if !z.HasStats {
        return ZoneMapMaybe
    }
    cmpMin := compareValues(val, z.Min)
    cmpMax := compareValues(val, z.Max)
    if cmpMin < 0 || cmpMax > 0 {
        return ZoneMapSkip
    }
    if cmpMin == 0 && cmpMax == 0 {
        return ZoneMapAllMatch
    }
    return ZoneMapMaybe
}

And similarly for CheckGreaterThan, CheckLessThan, CheckGreaterThanOrEqual, CheckLessThanOrEqual.

Computing Zone Maps

During serialization, we scan all valid values to find min and max:

func computeMinMax(v *vector.Vector) (hasStats bool, min, max any) {
    count := v.Count()
    if count == 0 {
        return false, nil, nil
    }
    firstValid := -1
    for i := 0; i < count; i++ {
        if v.IsValid(i) {
            firstValid = i
            break
        }
    }
    if firstValid < 0 {
        return false, nil, nil
    }
    // type-switch to find min/max across valid values
    // ...
}

Building a Segment from a Vector

SegmentFromVector serializes an in-memory Vector into a ColumnSegment:

func SegmentFromVector(v *vector.Vector) *ColumnSegment {
    seg := &ColumnSegment{
        Type:        v.Type(),
        Count:       uint32(v.Count()),
        Compression: CompressionNone,
    }
    seg.NullCount = uint32(v.NullCount())
    seg.ZoneMap.HasNulls = seg.NullCount > 0
    seg.Nulls = serializeNullBitmask(v)
    rawData := serializeData(v)
    seg.ZoneMap.HasStats, seg.ZoneMap.Min, seg.ZoneMap.Max = computeMinMax(v)

    compression, compressedData := AutoSelectCompression(rawData, int(seg.Count), seg.Type)
    seg.Compression = compression
    seg.Data = compressedData
    return seg
}

The null bitmask is serialized as one bit per value (1=valid, 0=null):

func serializeNullBitmask(v *vector.Vector) []byte {
    count := v.Count()
    nbytes := (count + 7) / 8
    buf := make([]byte, nbytes)
    for i := 0; i < count; i++ {
        if v.IsValid(i) {
            buf[i/8] |= 1 << uint(i%8)
        }
    }
    return buf
}

Row Groups

A RowGroup groups column segments together:

type RowGroup struct {
    RowCount uint32
    Columns  []*ColumnSegment
}

Building a row group from DataChunks merges all chunks’ columns into single vectors, then converts each to a segment:

func NewRowGroupFromChunks(chunks []*vector.DataChunk) *RowGroup {
    colCount := chunks[0].ColumnCount()
    totalRows := 0
    for _, c := range chunks {
        totalRows += c.Count()
    }

    merged := make([]*vector.Vector, colCount)
    for col := 0; col < colCount; col++ {
        typ := chunks[0].Column(col).Type()
        v := vector.NewFlatVector(typ)
        for _, c := range chunks {
            src := c.Column(col)
            for row := 0; row < c.Count(); row++ {
                // append each value from src to v
            }
        }
        merged[col] = v
    }

    segments := make([]*ColumnSegment, colCount)
    for i, v := range merged {
        segments[i] = SegmentFromVector(v)
    }
    return &RowGroup{RowCount: uint32(totalRows), Columns: segments}
}

Reading Back with Column Projection

When deserializing, we only decompress the columns the query needs:

func (rg *RowGroup) ToDataChunk(columnIndices []int) *vector.DataChunk {
    types := make([]vector.LogicalType, len(columnIndices))
    for i, colIdx := range columnIndices {
        types[i] = rg.Columns[colIdx].Type
    }
    chunk := vector.NewDataChunk(types)
    for i, colIdx := range columnIndices {
        v := rg.Columns[colIdx].ToVector()
        chunk.SetColumn(i, v)
    }
    chunk.SetCount(int(rg.RowCount))
    return chunk
}

If a query only needs 2 of 10 columns, only those 2 segments are decompressed.

File Format

The on-disk format is a single binary file per table:

File Header (16 bytes):
  magic: "QUAK" (4 bytes)
  version: 1 (4 bytes)
  column count (4 bytes)
  row group count (4 bytes)

RowGroup 0:
  row count (4 bytes)
  ColumnSegment 0: header(18 bytes) + zone map + null bitmask + data
  ColumnSegment 1: header + zone map + null bitmask + data
  ...

RowGroup 1:
  ...

The segment header is 18 bytes:

func (seg *ColumnSegment) WriteTo(w io.Writer) error {
    // type(1) + count(4) + nullcount(4) + compression(1) + datalen(4) + nulllen(4) = 18
    header := make([]byte, 18)
    header[0] = byte(seg.Type)
    binary.LittleEndian.PutUint32(header[1:5], seg.Count)
    binary.LittleEndian.PutUint32(header[5:9], seg.NullCount)
    header[9] = byte(seg.Compression)
    binary.LittleEndian.PutUint32(header[10:14], uint32(len(seg.Data)))
    binary.LittleEndian.PutUint32(header[14:18], uint32(len(seg.Nulls)))
    // write header, zone map, nulls, data
}

Selective Reading

For queries that don’t need all columns, ReadRowGroupsSelective reads segment headers to determine byte offsets, then seeks past unneeded columns:

func ReadRowGroupsSelective(r io.ReadSeeker, columnIndices []int) ([]*RowGroup, error) {
    // read file header
    needed := make(map[int]bool, len(columnIndices))
    for _, idx := range columnIndices {
        needed[idx] = true
    }

    for i := 0; i < rgCount; i++ {
        for col := 0; col < colCount; col++ {
            if needed[col] {
                seg, _ := ReadSegmentFrom(r)
                segments[segIdx] = seg
            } else {
                SkipSegmentFrom(r)  // seek past without decompressing
            }
        }
    }
}

SkipSegmentFrom reads just the 18-byte header to learn the data length, then seeks past the data bytes. This means reading 2 of 10 columns only touches ~20% of the file.

Summary

The row group + column segment model gives us three properties that define OLAP storage:

Column projection: only read the columns the query needs
Zone map pruning: skip entire row groups when min/max statistics prove no match is possible
Compression boundary: each segment can be compressed independently (Phase 3)

The file format is simple but effective — no index structures, no page management, no buffer pool. Just sequential column segments with headers that enable skipping.