OLTP – Phase 2 Tuple/Record Format

In Phase 1, pages store raw bytes — InsertTuple([]byte("alice")). The page has no idea what those bytes mean. Phase 2 adds the layer that gives bytes meaning: a schema that defines columns and types, and a serializer that converts typed values to/from bytes.

After this phase, we can say “this row has an INT called id, a VARCHAR called name, and a BOOL called active” and reliably pack/unpack those values.

In PostgreSQL, this is implemented in src/backend/access/common/heaptuple.c with the tuple header defined in src/include/access/htup_details.h.

Here is the roadmap for the phases to come:

  • Phase 2: Tuple and record format
  • Phase 3: Table and catalog
  • Phase 4: Sequential scan and filtering
  • Phase 5: Buffer pool manager
  • Phase 6: SQL parser
  • Phase 7: Query executor
  • Phase 8: B-tree index
  • Phase 9: Write-ahead log
  • Phase 10: Transactions, concurrency, REPL, and server

Full Source Code

The code referenced in this post can be found in https://gitlab.com/kimserey.lam/db-learn.

The Schema (tuple/types.go)

What is it?

The schema is an ordered list of column definitions. Each column has a name, a type, and whether it can be NULL.

1
2
3
4
5
6
7
schema := &Schema{
    Columns: []ColumnDef{
        {Name: "id",     Type: TypeInt32,   Nullable: false},
        {Name: "name",   Type: TypeVarchar, Nullable: true, MaxLen: 50},
        {Name: "active", Type: TypeBool,    Nullable: false},
    },
}

Why order matters

The schema is how the database knows what bytes mean. Column 0 is always serialized first, column 1 second, etc. When you INSERT INTO users VALUES (1, 'alice', true), the values are matched to columns by position:

1
2
3
Value 1       → Column 0 (id, INT)      → write 4 bytes
Value 'alice' → Column 1 (name, VARCHAR) → write 2-byte length + string bytes
Value true    → Column 2 (active, BOOL)  → write 1 byte

The column names (“id”, “name”, “active”) are NEVER stored in the tuple bytes. They only exist in the schema. When you write SELECT name FROM users, the parser looks up “name” in the schema, finds it’s column 1, and the executor reads column 1’s bytes from the tuple.

Column types and their sizes

Every type has a defined size — this is a design decision, not something derived from the data:

Type Size Format
TypeInt32 always 4 bytes little-endian uint32, cast to int32
TypeInt64 always 8 bytes little-endian uint64, cast to int64
TypeBool always 1 byte 0x00 = false, 0x01 = true
TypeVarchar variable [2-byte length] [string bytes]

Fixed-size types (INT, BIGINT, BOOL) need no length prefix — the schema says exactly how many bytes to read. Variable-size types (VARCHAR) MUST store their length in the data because it changes per row (“alice” is 5 bytes, “bob” is 3 bytes).

The FixedSize() method returns the byte count for fixed types and -1 for variable types:

1
2
3
TypeInt32.FixedSize()   // → 4
TypeBool.FixedSize()    // → 1
TypeVarchar.FixedSize() // → -1 (variable)

The Value (tuple/tuple.go)

What is it?

A Value represents one column’s data in a row. It’s either NULL or holds a Go value:

1
2
3
4
type Value struct {
    IsNull bool
    Data   interface{}  // int32, int64, bool, or string
}

NULL is NOT the same as empty. StringValue("") is an empty string (has value, takes 2 bytes for the length prefix). NullValue() is no value at all (takes zero data bytes, just a bit in the bitmap).

Helper constructors avoid verbose struct literals:

1
2
3
4
IntValue(42)          // Value{Data: int32(42)}
StringValue("alice")  // Value{Data: "alice"}
BoolValue(true)       // Value{Data: true}
NullValue()           // Value{IsNull: true}

Serialization: Values → Bytes

The byte layout

A serialized tuple looks like this:

1
[Null Bitmap] [Column 0 data] [Column 1 data] [Column 2 data] ...

Step by step: Serialize(schema, [1, “alice”, true])

Step 1: Write the null bitmap.

The bitmap uses 1 bit per column. 3 columns = 1 byte (rounded up from 3/8).

1
2
3
4
5
All values present → bitmap = 0x00 = 00000000
                                          |||
                                          ||└─ bit 0: id     → 0 (not null)
                                          |└── bit 1: name   → 0 (not null)
                                          └─── bit 2: active → 0 (not null)

Step 2: Write column 0 (id = 1, TypeInt32).

Schema says TypeInt32 → write 4 bytes, little-endian.

1
1 in little-endian = [01 00 00 00]

Step 3: Write column 1 (name = “alice”, TypeVarchar).

Schema says TypeVarchar → write 2-byte length, then string bytes.

1
2
Length 5 = [05 00]
"alice"  = [61 6C 69 63 65]    (ASCII codes for a, l, i, c, e)

Step 4: Write column 2 (active = true, TypeBool).

Schema says TypeBool → write 1 byte.

1
true = [01]

Final result:

1
2
3
4
5
6
7
Byte:  00  01 00 00 00  05 00 61 6C 69 63 65  01
       ──  ───────────  ─────────────────────  ──
       null  id=1        name="alice"           active=true
       bitmap
       1B    4B          2B + 5B = 7B           1B

Total: 13 bytes

This 13-byte blob is what gets passed to page.InsertTuple() from Phase 1.

What happens with NULLs

When name is NULL: Serialize(schema, [42, NULL, false])

1
2
3
4
5
6
7
8
9
10
11
Bitmap: 0x02 = 00000010
                    |||
                    ||└─ bit 0: id     → 0 (has value)
                    |└── bit 1: name   → 1 (NULL!)
                    └─── bit 2: active → 0 (has value)

Data:   [02] [2A 00 00 00] [00]
         ──   ───────────   ──
         bitmap  id=42      active=false

Total: 6 bytes (name takes ZERO bytes)

NULL columns save space — no length prefix, no data bytes, just a single bit in the bitmap. This is why the bitmap exists: without it, you’d need a per-column “is this null?” marker.


Deserialization: Bytes → Values

The reverse process. Deserialize reads the schema and walks through the bytes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Input bytes: [00 01 00 00 00 05 00 61 6C 69 63 65 01]
Position:     ^

1. Read bitmap (1 byte): 0x00 → no nulls
   Position: 1

2. Column 0 is INT → read 4 bytes: [01 00 00 00] → 1
   Position: 5

3. Column 1 is VARCHAR → read 2 bytes: [05 00] → length=5
                          read 5 bytes: [61 6C 69 63 65] → "alice"
   Position: 12

4. Column 2 is BOOL → read 1 byte: [01] → true
   Position: 13 (done)

The pos variable in the code tracks the current read position as it advances through each column.

If a column’s null bit is set, the deserializer produces a NullValue() and reads ZERO bytes for that column — it just skips to the next one.

Bounds checking

Before reading each column, the code checks if enough bytes remain:

1
2
3
if pos+4 > len(data) {
    return nil, fmt.Errorf("%w: not enough bytes for int32", ErrCorruptData)
}

This catches corrupted or truncated data instead of panicking on an out-of-bounds slice access.


The Null Bitmap

How bits are packed

The bitmap uses 1 bit per column, packed left-to-right into bytes:

1
2
3
4
5
Byte 0:  [bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0]
          col7 col6 col5 col4 col3 col2 col1 col0

Byte 1:  [bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0]
          col15 col14 ...                    col8

A 1-bit means NULL. A 0-bit means the column has a value.

Side note: endianness does NOT apply to the bitmap

Endianness only matters when a single number is too big for one byte and gets split across multiple bytes — “which byte do I put first?” For example, int32(4091) becomes [FB 0F 00 00] in little-endian or [00 00 0F FB] in big-endian.

The bitmap never splits a number across bytes. Each bit is set independently, and each byte in the bitmap is self-contained:

1
2
3
4
5
6
7
8
Endianness matters — one number split across bytes:
  int32(4091) → [FB 0F 00 00] or [00 00 0F FB]? Must pick an order.
  uint16(256) → [00 01] or [01 00]? Must pick an order.

Endianness does NOT matter — each byte is independent:
  Bitmap: byte 0 has columns 0-7, byte 1 has columns 8-15. No splitting.
  String: "alice" → [61 6C 69 63 65]. Each byte is one character.
  Bool: true → [01]. Just one byte.

If you control each byte individually — setting bit 3 in byte 0, writing one ASCII character at a time — there’s no ordering question. Endianness is only the question “I have a number too big for one byte, which end do I start from?” If that question doesn’t arise, endianness is irrelevant.

Bitmap size calculation

1
2
3
func nullBitmapSize(numColumns int) int {
    return (numColumns + 7) / 8
}

This is integer division that rounds up:

  • 1-8 columns → 1 byte
  • 9-16 columns → 2 bytes
  • 17-24 columns → 3 bytes

The +7 before dividing by 8 is the standard trick for rounding up integer division: ceil(n/8) = (n+7)/8 using integer math.

Setting a bit

1
2
3
4
5
func setNullBit(bitmap []byte, i int) {
    byteIndex := i / 8       // which byte
    bitIndex := uint(i % 8)  // which bit within that byte
    bitmap[byteIndex] |= 1 << bitIndex
}

For column 5: byteIndex = 0, bitIndex = 5, so 1 << 5 = 00100000. The |= sets that one bit without touching the others.

Checking a bit

1
2
3
4
5
func isNullBit(bitmap []byte, i int) bool {
    byteIndex := i / 8
    bitIndex := uint(i % 8)
    return bitmap[byteIndex]&(1<<bitIndex) != 0
}

Uses & (AND) to mask out all bits except the one we care about. If the result is non-zero, the bit is set → the column is NULL.


How Phase 1 and Phase 2 connect

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Phase 2: Serialize(schema, values)
    │
    └──→ produces []byte{00 01 00 00 00 05 00 61 6C 69 63 65 01}
            │
            ▼
Phase 1: page.InsertTuple(bytes)
    │
    ├── Creates ItemPointer: offset=4083, length=13
    └── Copies 13 bytes to page position 4083

Later:

Phase 1: page.GetTuple(0)
    │
    └──→ returns []byte{00 01 00 00 00 05 00 61 6C 69 63 65 01}
            │
            ▼
Phase 2: Deserialize(schema, bytes)
    │
    └──→ returns [IntValue(1), StringValue("alice"), BoolValue(true)]

The page stores raw bytes. The serializer gives those bytes meaning. Neither knows about the other — they communicate through []byte.


Key Go patterns used

interface{} for Value.Data

Data holds different Go types (int32, int64, bool, string). Go’s interface{} (or any) can hold any type. When reading, you use a type assertion:

1
2
v := value.Data.(int32)  // panics if Data is not int32
v, ok := value.Data.(int32)  // returns ok=false instead of panicking

The serializer uses the ok form to produce a useful error message instead of a panic.

append for building byte slices

Serialize starts with a fixed-size bitmap slice and grows it with append:

1
2
buf := make([]byte, bitmapSize)  // start with bitmap
buf = append(buf, b...)          // grow for each column

This is simpler than pre-calculating the total size. The Go runtime handles reallocation. For a database that serializes millions of rows per second, you’d pre-calculate — but for learning, clarity wins.

Bit manipulation

Setting, checking, and clearing individual bits is fundamental to database internals. The three operations:

  • Set a bit: byte |= 1 << n (OR with a mask)
  • Check a bit: byte & (1 << n) != 0 (AND with a mask)
  • Clear a bit: byte &^= 1 << n (AND-NOT with a mask, not used here)

PostgreSQL comparison

PostgreSQL’s tuple format (HeapTupleHeaderData) is more complex:

  • t_xmin / t_xmax — transaction IDs that created/deleted this tuple (for MVCC, Phase 10)
  • t_cid — command ID within the transaction
  • t_ctid — physical location (page, offset) of this tuple or its newer version
  • t_infomask — flags for null bitmap presence, variable-length attributes, etc.
  • t_hoff — offset to the actual data (header is variable-size)
  • t_bits — our null bitmap

Our format strips this down to just the null bitmap + column data. We’ll add transaction IDs in Phase 10 when we build MVCC.


Files created in Phase 2

File Purpose Lines
tuple/types.go ColumnType, ColumnDef, Schema — defines what columns a table has ~85
tuple/tuple.go Serialize/Deserialize, Value type, null bitmap helpers ~170
tuple/tuple_test.go 13 tests: round-trip, nulls, all types, edge cases, byte dumps ~300

What’s next

Phase 3 (Table and Catalog) ties these two layers together: a Table = Schema + HeapFile. You’ll be able to do table.InsertRow(values) and it will serialize the values (Phase 2) and store them in pages (Phase 1) automatically.

Observations

  • Phase 1 explains how bytes are stored. Phase 2 explains what those bytes mean.
  • The key abstraction here is that a schema is an ordered list of columns, and a row is an ordered list of values that must line up with that schema.
  • The null bitmap is a compact way to track which columns are absent in a given row without changing the schema itself.
  • The link with Phase 1 is direct: schema plus values go through serialization, then the page stores the raw bytes. On the way back, the page returns tuple bytes and deserialization reconstructs typed values from the same schema.