OLTP – Phase 4 Sequential Scan and Filtering

Phase 3 gave us table.ScanAll() — iterate through every row. But there’s no way to say “only rows where age > 25” or “only the name column.” Phase 4 adds that.

This is the first time the database actually looks at what’s inside the rows. Before this, data was just stored and retrieved. Now it’s evaluated.

In SQL terms, this phase implements:

  • WHERE age > 25 → filtering (skip rows that don’t match)
  • SELECT name, age → projection (return only certain columns)
  • WHERE age > 25 AND active = true → compound expressions

In PostgreSQL, this maps to nodeSeqscan.c (sequential scan node) and execQual.c (expression evaluation).

Here is the roadmap for the phases to come:

  • Phase 4: Sequential scan and filtering
  • Phase 5: Buffer pool manager
  • Phase 6: SQL parser
  • Phase 7: Query executor
  • Phase 8: B-tree index
  • Phase 9: Write-ahead log
  • Phase 10: Transactions, concurrency, REPL, and server

Full Source Code

The code referenced in this post can be found in https://gitlab.com/kimserey.lam/db-learn.

Expressions (executor/expression.go)

What is an expression?

An expression is anything that can be evaluated against a row to produce a value. It’s an interface:

1
2
3
type Expression interface {
    Eval(row []tuple.Value, schema *tuple.Schema) tuple.Value
}

You give it a row and a schema, it gives you back a value. That’s the contract.

The expression types

There are five types, and they compose together like building blocks:

ColumnRef — “give me this column’s value”

1
ColumnRef{Name: "age"}

Looks up “age” in the schema, finds it’s column 2, returns row[2]. This is how SELECT age becomes code — the parser turns the column name into a ColumnRef, and the executor evaluates it.

Literal — “this constant value”

1
Literal{Value: tuple.IntValue(25)}

Always returns 25, regardless of the row. This is how WHERE age > 25 represents the “25” — it’s a Literal.

CompareExpr — “compare two things”

1
2
3
4
5
CompareExpr{
    Left:  ColumnRef{Name: "age"},       // get the value of age
    Right: Literal{Value: IntValue(25)}, // the constant 25
    Op:    OpGt,                          // greater than
}

Evaluates both sides, compares them, returns true or false. This is age > 25. The six operators: =, !=, <, >, <=, >=.

LogicalExpr — “combine two booleans”

1
2
3
4
5
LogicalExpr{
    Left:  CompareExpr{...age > 25...},
    Right: CompareExpr{...active = true...},
    Op:    OpAnd,
}

Evaluates both sides (which must be booleans), combines with AND or OR. This is age > 25 AND active = true.

NotExpr — “negate a boolean”

1
NotExpr{Expr: ColumnRef{Name: "active"}}

Evaluates the inner expression, flips true↔false. This is NOT active.

How expressions compose into a tree

The SQL WHERE age > 25 AND active = true becomes a tree of expressions:

1
2
3
4
5
6
        LogicalExpr (AND)
        /              \
CompareExpr (>)     CompareExpr (=)
   /       \           /         \
ColumnRef  Literal   ColumnRef   Literal
 "age"      25       "active"    true

Evaluation walks the tree bottom-up:

  1. ColumnRef("age") → looks up age in the row → 30
  2. Literal(25) → 25
  3. CompareExpr(30 > 25) → true
  4. ColumnRef("active") → looks up active → true
  5. Literal(true) → true
  6. CompareExpr(true = true) → true
  7. LogicalExpr(true AND true) → true

The row passes the filter.

NULL handling (three-valued logic)

In SQL, NULL is not true and not false — it’s unknown. Comparing anything to NULL produces NULL, not false:

1
2
age > 25   where age is NULL  →  NULL (not false)
NULL = NULL                    →  NULL (not true!)

Our code handles this:

1
2
3
if left.IsNull || right.IsNull {
    return tuple.NullValue()
}

When filtering, only rows where the filter evaluates to true are included. Both NULL and false are excluded:

1
2
3
4
5
6
7
if result.IsNull {
    continue  // skip
}
b, ok := result.Data.(bool)
if !ok || !b {
    continue  // skip
}

This matches SQL behavior — WHERE age > 25 excludes rows where age is NULL.

The compareValues function

All six comparison operators use the same comparison function:

1
func compareValues(a, b tuple.Value) int

It returns -1, 0, or 1 (less than, equal, greater than), like C’s strcmp. Then the operator just checks the result:

1
2
case OpGt:  result = cmp > 0    // "greater than" means cmp returned 1
case OpLtEq: result = cmp <= 0  // "less or equal" means cmp returned -1 or 0

One comparison function, six operators. Each type has its own comparison logic:

  • int32/int64: numeric < > ==
  • string: lexicographic (alphabetical) order
  • bool: false < true

Sequential Scan (executor/seqscan.go)

The RowIterator interface

1
2
3
type RowIterator interface {
    Next() ([]tuple.Value, bool)
}

Same pattern as TableIterator from Phase 3, but now it’s an interface. Any operation that produces rows implements this. The sequential scan is the first, but later phases will add index scans, joins, etc. — all returning RowIterator.

This is the Volcano model (also called the “iterator model”). PostgreSQL uses the same approach — every executor node (SeqScan, IndexScan, HashJoin, Sort) implements the same Next() interface.

SeqScanNode

1
2
3
4
5
type SeqScanNode struct {
    Table   *catalog.Table
    Filter  Expression   // WHERE clause (nil = no filter)
    Columns []int        // which columns to return (nil = all)
}

Three fields, mapping directly to SQL:

1
2
3
4
SELECT name, age FROM users WHERE active = true
       ──────────            ─────────────────
       Columns: [1, 2]       Filter: CompareExpr{...}
                              Table: "users"

How the scan works

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SeqScanNode.Execute()
    │
    └── creates seqScanIterator
            │
            └── Next() is called repeatedly:

                1. Get next row from TableIterator (Phase 3)
                   └── which loads pages from HeapFile (Phase 1)
                       └── which deserializes tuples (Phase 2)

                2. If filter exists, evaluate it against the row
                   └── if false or NULL → skip, loop back to step 1
                   └── if true → continue

                3. If projection exists, select only those columns
                   └── [all 4 columns] → [name, age] (2 columns)

                4. Return the row

Filtering: the inner loop

The for loop in seqScanIterator.Next() is important:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for {
    row, ok := it.tableIter.Next()
    if !ok {
        return nil, false  // no more rows
    }

    if it.filter != nil {
        result := it.filter.Eval(row, it.schema)
        if result.IsNull || result.Data.(bool) != true {
            continue  // skip this row, try next
        }
    }

    return row, true  // row passes filter
}

Each call to Next() might skip many rows internally before finding one that passes the filter. The caller just sees one row per call — the skipping is hidden inside.

For WHERE age > 100 on a table with no ages over 100, Next() reads every single row, evaluates the filter on each, skips all of them, and returns false. This is why sequential scans are slow on large tables — they always read everything.

Projection: selecting columns

Projection happens after filtering:

1
2
3
4
5
6
7
if it.columns != nil {
    projected := make([]tuple.Value, len(it.columns))
    for i, colIdx := range it.columns {
        projected[i] = row[colIdx]
    }
    return projected, true
}

Columns: []int{1, 2} means “give me column 1 (name) and column 2 (age).” The code creates a new smaller slice with just those values.

Note: the filter runs on the full row (all columns), then projection strips it down. The filter needs access to all columns because WHERE active = true references a column that might not be in the SELECT list.


How Phase 4 connects to previous phases

1
2
3
4
5
6
7
8
9
10
11
Phase 4: SeqScanNode{Table, Filter, Columns}
    │
    ├── Filter.Eval(row, schema)           ← expression evaluation (new)
    │     ├── ColumnRef → schema.FindColumn → row[idx]    (Phase 2: schema)
    │     ├── Literal → constant value                     (Phase 2: types)
    │     └── CompareExpr → compareValues                  (new)
    │
    └── TableIterator.Next()               ← from Phase 3
          └── HeapFile.ReadPage()          ← from Phase 1
                └── Page.GetTuple()        ← from Phase 1
                      └── Deserialize()    ← from Phase 2

The expression system is the new layer. Everything underneath (table scan, pages, serialization) is reused unchanged.


What this looks like in action

From the test dump:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
=== All Rows (no filter) ===
  id=1, name=alice, age=30, active=true
  id=2, name=bob, age=25, active=false
  id=3, name=carol, age=35, active=true
  id=4, name=dave, age=20, active=false
  id=5, name=eve, age=28, active=true

=== WHERE age > 25 AND active = true ===
  id=1, name=alice, age=30, active=true
  id=3, name=carol, age=35, active=true
  id=5, name=eve, age=28, active=true

=== SELECT name FROM users WHERE active = true ===
  name=alice
  name=carol
  name=eve

The first scan reads everything. The second filters (bob and dave excluded — bob isn’t active, dave’s age is too low). The third filters AND projects (only name column returned).


Why this is slow (and what fixes it later)

Sequential scan reads every row in the table. For WHERE id = 42 on a million-row table, it reads all million rows to find one.

Phase 8 (B-Tree Index) fixes this. An index on id lets the executor jump directly to the row with id = 42 in O(log n) time — reading maybe 3-4 pages instead of thousands.

But sequential scan is always the fallback. If there’s no matching index, or if the query touches a large percentage of the table, a sequential scan is the only option (and sometimes the best one).


Key Go patterns used

Interface for polymorphism

1
2
3
type Expression interface {
    Eval(row []tuple.Value, schema *tuple.Schema) tuple.Value
}

ColumnRef, Literal, CompareExpr, LogicalExpr, and NotExpr all implement this interface. The filter doesn’t care which kind of expression it is — it just calls Eval(). This is how Go does polymorphism without inheritance.

Type switch

1
2
3
4
5
6
7
8
switch av := a.Data.(type) {
case int32:
    bv := b.Data.(int32)
    // compare
case string:
    bv := b.Data.(string)
    // compare
}

.(type) is a special Go syntax that checks the type inside an interface{} and branches on it. It combines the type check and the extraction into one step. Used in compareValues to handle different column types.

Composition over nesting

The expression tree is built by composing simple structs:

1
2
3
4
LogicalExpr{
    Left: CompareExpr{Left: ColumnRef{...}, Right: Literal{...}},
    Right: CompareExpr{Left: ColumnRef{...}, Right: Literal{...}},
}

No special “tree” data structure. Each expression holds other expressions as fields. The Eval call naturally recurses through the tree.


PostgreSQL comparison

PostgreSQL’s expression evaluation (execQual.c) is much more complex:

  • Supports hundreds of operators and functions
  • Has a “JIT” compiler that generates machine code for hot expressions
  • Handles type coercion (comparing int to float, etc.)
  • Supports subqueries in expressions (WHERE id IN (SELECT ...))
  • Has specialized fast paths for common patterns

Our system supports 6 comparison operators, 2 logical operators, and 4 data types. But the architecture is the same: tree of expression nodes, evaluated bottom-up against each row.


Files created in Phase 4

File Purpose Lines
executor/expression.go Expression interface, ColumnRef, Literal, CompareExpr, LogicalExpr, NotExpr, compareValues ~175
executor/seqscan.go SeqScanNode, RowIterator interface, filtering and projection ~75
executor/executor_test.go 19 tests: expressions, comparisons, scans with filter/projection ~370

What’s next

Phase 5 (Buffer Pool Manager) adds a page cache in memory so we don’t read from disk on every page access. Right now every ReadPage hits the disk — even if we just read that same page one call ago. The buffer pool fixes this by keeping hot pages in RAM and only evicting cold ones when the cache is full.

Observations

  • This is the phase where stored values start becoming queryable values.
  • The expression tree is the important abstraction: the WHERE clause is not executed as text, it is evaluated as a structured tree against each row.
  • SeqScan stays simple on purpose. It reuses the table iterator, evaluates the filter row by row, and projects only the requested columns.
  • What I like about this phase is that it makes query execution feel incremental: first scan rows, then filter them, then shape the output.