Exploring Sql Joins In Postgres PostgreSQL

Apr 9th, 2021 - written by Kimserey with .

Even after using SQL for years, I still get caught from time to time with join cardinality where the result of simple queries surprise me as I omit to consider the relationships between tables assuming one-to-one relationships. In this post, we will look at the different type of joins that Postgres supports and see how the cardinality affects the result.

Create Test Data

We start by creating a demo database:

1
2
createdb demo
pgcli demo

And create a simple schema:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
create table person (
  id serial primary key, 
  name varchar 
);

create table address (
  id serial primary key,
  name varchar,
  owner integer
);

create table animal (
  id serial primary key,
  name varchar,
  owner integer
);

In this simple model, each person may have zero or more animals where each animal have zero or one owner. And each person may have zero or one addresses where each address is associated with zero or one owner.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    +------------+             +----------+
    |            | 0..1   0..* |          |
    |   person   +-------------+  animal  |
    |            |             |          |
    |            |             |          |
    +------------+             +----------+
          | 0..1 
          |
          |
          |
          | 0..*
    +-----------+
    |           |
    |  address  |
    |           |
    |           |
    +-----------+

We can then insert test data:

1
2
3
4
5
6
7
8
9
10
11
12
13
insert into person (name) values ('kim');
insert into person (name) values ('tom');
insert into person (name) values ('bob');

insert into address (name , owner) values ('Paris 123', 1);
insert into address (name , owner) values ('Bordeaux 123', 1);
insert into address (name , owner) values ('Marseille', 1);
insert into address (name , owner) values ('London', 2);
insert into address (name , owner) values ('Oxford', 2);
insert into address (name , owner) values ('Oxford', 7);

insert into animal (name, owner) values ('cookie', 1);
insert into animal (name, owner) values ('brownie', 1);

With those data, we end up with the following tables:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
select * from person;
+------+--------+
| id   | name   |
|------+--------|
| 1    | kim    |
| 2    | tom    |
| 3    | bob    |
+------+--------+

select * from address;
+------+--------------+---------+
| id   | name         | owner   |
|------+--------------+---------|
| 1    | Paris 123    | 1       |
| 2    | Bordeaux 123 | 1       |
| 3    | Marseille    | 1       |
| 4    | London       | 2       |
| 5    | Oxford       | 2       |
| 6    | Oxford       | 7       |
+------+--------------+---------+

select * from animal;
+------+---------+---------+
| id   | name    | owner   |
|------+---------+---------|
| 1    | cookie  | 1       |
| 2    | brownie | 1       |
+------+---------+---------+

Exploring Joins

The first way of joining tables is by selecting from two tables and filtering those where the person.id equals the address.owner.

1
2
3
4
5
6
7
8
9
10
select * from person, address WHERE person.id = address.owner;
+------+--------+------+--------------+---------+
| id   | name   | id   | name         | owner   |
|------+--------+------+--------------+---------|
| 1    | kim    | 1    | Paris 123    | 1       |
| 1    | kim    | 2    | Bordeaux 123 | 1       |
| 1    | kim    | 3    | Marseille    | 1       |
| 2    | tom    | 4    | London       | 2       |
| 2    | tom    | 5    | Oxford       | 2       |
+------+--------+------+--------------+---------+

What we see from here is that the query has returned five rows. This is because a selection on multiple tables will create a row for each combinations which would be 16 rows (3 persons * 6 addresses) and filtering by person.id = address.owner will then reduce it to only the rows where there is a matching person for the address owner. This is the same as the following query:

1
2
3
4
5
6
7
8
9
10
11
select * from person
join address on address.owner = person.id;
+------+--------+------+--------------+---------+
| id   | name   | id   | name         | owner   |
|------+--------+------+--------------+---------|
| 1    | kim    | 1    | Paris 123    | 1       |
| 1    | kim    | 2    | Bordeaux 123 | 1       |
| 1    | kim    | 3    | Marseille    | 1       |
| 2    | tom    | 4    | London       | 2       |
| 2    | tom    | 5    | Oxford       | 2       |
+------+--------+------+--------------+---------+

Now if we join the animal with person after having join the address, we will have the following result:

1
2
3
4
5
6
7
8
9
10
11
12
13
select * from person
join address on address.owner = person.id
join animal on animal.owner = person.id;
+------+--------+------+--------------+---------+------+---------+---------+
| id   | name   | id   | name         | owner   | id   | name    | owner   |
|------+--------+------+--------------+---------+------+---------+---------|
| 1    | kim    | 1    | Paris 123    | 1       | 2    | brownie | 1       |
| 1    | kim    | 1    | Paris 123    | 1       | 1    | cookie  | 1       |
| 1    | kim    | 2    | Bordeaux 123 | 1       | 2    | brownie | 1       |
| 1    | kim    | 2    | Bordeaux 123 | 1       | 1    | cookie  | 1       |
| 1    | kim    | 3    | Marseille    | 1       | 2    | brownie | 1       |
| 1    | kim    | 3    | Marseille    | 1       | 1    | cookie  | 1       |
+------+--------+------+--------------+---------+------+---------+---------+

From the first join, we end up with rows with only id 1 and 2 and when we join the animals which have 2 rows with owner 1, we end up with 6 rows. We get all possible combination between person, address and animal. Any row that doesn’t have any matching with the conditions will be excluded from the result.

If we want to keep the results that don’t match, we can use an outer join:

1
2
3
4
5
6
7
8
9
10
11
12
select * from person
left outer join address on address.owner = person.id;
+------+--------+--------+--------------+---------+
| id   | name   | id     | name         | owner   |
|------+--------+--------+--------------+---------|
| 1    | kim    | 1      | Paris 123    | 1       |
| 1    | kim    | 2      | Bordeaux 123 | 1       |
| 1    | kim    | 3      | Marseille    | 1       |
| 2    | tom    | 4      | London       | 2       |
| 2    | tom    | 5      | Oxford       | 2       |
| 3    | bob    | <null> | <null>       | <null>  |
+------+--------+--------+--------------+---------+

The left outer join will leave rows from the left that don’t have any matching. Here bob doesn’t have any matching addresses hence the address side is returned as id: <null>, name: <null>, owner: <null>. We can also do the opposite and use right outer join:

1
2
3
4
5
6
7
8
9
10
11
12
select * from person
right outer join address on address.owner = person.id;
+--------+--------+------+--------------+---------+
| id     | name   | id   | name         | owner   |
|--------+--------+------+--------------+---------|
| 1      | kim    | 1    | Paris 123    | 1       |
| 1      | kim    | 2    | Bordeaux 123 | 1       |
| 1      | kim    | 3    | Marseille    | 1       |
| 2      | tom    | 4    | London       | 2       |
| 2      | tom    | 5    | Oxford       | 2       |
| <null> | <null> | 6    | Oxford       | 7       |
+--------+--------+------+--------------+---------+

The right outer join will leave rows from teh right hence here Oxford 7 is the row that doesn’t have any matching person.

Lastly we can use full outer join to keep all sides:

1
2
3
4
5
6
7
8
9
10
11
12
13
select * from person
full outer join address on address.owner = person.id;
+--------+--------+--------+--------------+---------+
| id     | name   | id     | name         | owner   |
|--------+--------+--------+--------------+---------|
| 1      | kim    | 1      | Paris 123    | 1       |
| 1      | kim    | 2      | Bordeaux 123 | 1       |
| 1      | kim    | 3      | Marseille    | 1       |
| 2      | tom    | 4      | London       | 2       |
| 2      | tom    | 5      | Oxford       | 2       |
| <null> | <null> | 6      | Oxford       | 7       |
| 3      | bob    | <null> | <null>       | <null>  |
+--------+--------+--------+--------------+---------+

Now that we’ve seen the different type of joins, we can see how the amount of rows returned is based of the cardinality of the relationships and the type of join being used. For example if we join person and address, and subsequently left outer join animal, we can predict that we will have 8 rows:

  • first join will result in 3 rows for kim and 2 rows for tom
  • second left join will then result in 8 rows, since the animal only match with kim, we end up with 2 * 3 kim + 2 unmatched tom.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
select * from person
join address on address.owner = person.id
left outer join animal on animal.owner = person.id;
+------+--------+------+--------------+---------+--------+---------+---------+
| id   | name   | id   | name         | owner   | id     | name    | owner   |
|------+--------+------+--------------+---------+--------+---------+---------|
| 1    | kim    | 3    | Marseille    | 1       | 1      | cookie  | 1       |
| 1    | kim    | 2    | Bordeaux 123 | 1       | 1      | cookie  | 1       |
| 1    | kim    | 1    | Paris 123    | 1       | 1      | cookie  | 1       |
| 1    | kim    | 3    | Marseille    | 1       | 2      | brownie | 1       |
| 1    | kim    | 2    | Bordeaux 123 | 1       | 2      | brownie | 1       |
| 1    | kim    | 1    | Paris 123    | 1       | 2      | brownie | 1       |
| 2    | tom    | 5    | Oxford       | 2       | <null> | <null>  | <null>  |
| 2    | tom    | 4    | London       | 2       | <null> | <null>  | <null>  |
+------+--------+------+--------------+---------+--------+---------+---------+

And similarly if we left outer join both address and animal, we’ll end up with 9 rows:

  • first join will result in 3 rows for kim, 2 rows for tom and 1 unmatched row for bob,
  • second join will then result in 6 rows for kim and the 3 extra unmatched rows for tom and bob.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
select * from person
left outer join address on address.owner = person.id
left outer join animal on animal.owner = person.id;
+------+--------+--------+--------------+---------+--------+---------+---------+
| id   | name   | id     | name         | owner   | id     | name    | owner   |
|------+--------+--------+--------------+---------+--------+---------+---------|
| 1    | kim    | 3      | Marseille    | 1       | 1      | cookie  | 1       |
| 1    | kim    | 2      | Bordeaux 123 | 1       | 1      | cookie  | 1       |
| 1    | kim    | 1      | Paris 123    | 1       | 1      | cookie  | 1       |
| 1    | kim    | 3      | Marseille    | 1       | 2      | brownie | 1       |
| 1    | kim    | 2      | Bordeaux 123 | 1       | 2      | brownie | 1       |
| 1    | kim    | 1      | Paris 123    | 1       | 2      | brownie | 1       |
| 2    | tom    | 5      | Oxford       | 2       | <null> | <null>  | <null>  |
| 2    | tom    | 4      | London       | 2       | <null> | <null>  | <null>  |
| 3    | bob    | <null> | <null>       | <null>  | <null> | <null>  | <null>  |
+------+--------+--------+--------------+---------+--------+---------+---------+

And that concludes today’s post!

Conclusion

In today’s post we looked at the different SQL joins that can be used in Postgres. We went through regular join, left join, right join and full join. While looking at the queries, we’ve seen how the relationships affect the amount of rows being returned. I hope you liked this post and I see you on the next one!

External Sources

Designed, built and maintained by Kimserey Lam.