Apr 9th, 2021 - written by Kimserey with .
Even after using SQL for years, I still get caught from time to time with join cardinality where the result of simple queries surprise me as I omit to consider the relationships between tables assuming one-to-one relationships. In this post, we will look at the different type of joins that Postgres supports and see how the cardinality affects the result.
We start by creating a demo
database:
1
2
createdb demo
pgcli demo
And create a simple schema:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
create table person (
id serial primary key,
name varchar
);
create table address (
id serial primary key,
name varchar,
owner integer
);
create table animal (
id serial primary key,
name varchar,
owner integer
);
In this simple model, each person may have zero or more animals where each animal have zero or one owner. And each person may have zero or one addresses where each address is associated with zero or one owner.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
+------------+ +----------+
| | 0..1 0..* | |
| person +-------------+ animal |
| | | |
| | | |
+------------+ +----------+
| 0..1
|
|
|
| 0..*
+-----------+
| |
| address |
| |
| |
+-----------+
We can then insert test data:
1
2
3
4
5
6
7
8
9
10
11
12
13
insert into person (name) values ('kim');
insert into person (name) values ('tom');
insert into person (name) values ('bob');
insert into address (name , owner) values ('Paris 123', 1);
insert into address (name , owner) values ('Bordeaux 123', 1);
insert into address (name , owner) values ('Marseille', 1);
insert into address (name , owner) values ('London', 2);
insert into address (name , owner) values ('Oxford', 2);
insert into address (name , owner) values ('Oxford', 7);
insert into animal (name, owner) values ('cookie', 1);
insert into animal (name, owner) values ('brownie', 1);
With those data, we end up with the following tables:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
select * from person;
+------+--------+
| id | name |
|------+--------|
| 1 | kim |
| 2 | tom |
| 3 | bob |
+------+--------+
select * from address;
+------+--------------+---------+
| id | name | owner |
|------+--------------+---------|
| 1 | Paris 123 | 1 |
| 2 | Bordeaux 123 | 1 |
| 3 | Marseille | 1 |
| 4 | London | 2 |
| 5 | Oxford | 2 |
| 6 | Oxford | 7 |
+------+--------------+---------+
select * from animal;
+------+---------+---------+
| id | name | owner |
|------+---------+---------|
| 1 | cookie | 1 |
| 2 | brownie | 1 |
+------+---------+---------+
The first way of joining tables is by selecting from two tables and filtering those where the person.id
equals the address.owner
.
1
2
3
4
5
6
7
8
9
10
select * from person, address WHERE person.id = address.owner;
+------+--------+------+--------------+---------+
| id | name | id | name | owner |
|------+--------+------+--------------+---------|
| 1 | kim | 1 | Paris 123 | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 |
| 1 | kim | 3 | Marseille | 1 |
| 2 | tom | 4 | London | 2 |
| 2 | tom | 5 | Oxford | 2 |
+------+--------+------+--------------+---------+
What we see from here is that the query has returned five rows. This is because a selection on multiple tables will create a row for each combinations which would be 16 rows (3 persons * 6 addresses) and filtering by person.id = address.owner
will then reduce it to only the rows where there is a matching person for the address owner. This is the same as the following query:
1
2
3
4
5
6
7
8
9
10
11
select * from person
join address on address.owner = person.id;
+------+--------+------+--------------+---------+
| id | name | id | name | owner |
|------+--------+------+--------------+---------|
| 1 | kim | 1 | Paris 123 | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 |
| 1 | kim | 3 | Marseille | 1 |
| 2 | tom | 4 | London | 2 |
| 2 | tom | 5 | Oxford | 2 |
+------+--------+------+--------------+---------+
Now if we join the animal
with person
after having join the address
, we will have the following result:
1
2
3
4
5
6
7
8
9
10
11
12
13
select * from person
join address on address.owner = person.id
join animal on animal.owner = person.id;
+------+--------+------+--------------+---------+------+---------+---------+
| id | name | id | name | owner | id | name | owner |
|------+--------+------+--------------+---------+------+---------+---------|
| 1 | kim | 1 | Paris 123 | 1 | 2 | brownie | 1 |
| 1 | kim | 1 | Paris 123 | 1 | 1 | cookie | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 | 2 | brownie | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 | 1 | cookie | 1 |
| 1 | kim | 3 | Marseille | 1 | 2 | brownie | 1 |
| 1 | kim | 3 | Marseille | 1 | 1 | cookie | 1 |
+------+--------+------+--------------+---------+------+---------+---------+
From the first join, we end up with rows with only id
1
and 2
and when we join the animals
which have 2 rows with owner 1
, we end up with 6 rows. We get all possible combination between person
, address
and animal
. Any row that doesn’t have any matching with the conditions will be excluded from the result.
If we want to keep the results that don’t match, we can use an outer join
:
1
2
3
4
5
6
7
8
9
10
11
12
select * from person
left outer join address on address.owner = person.id;
+------+--------+--------+--------------+---------+
| id | name | id | name | owner |
|------+--------+--------+--------------+---------|
| 1 | kim | 1 | Paris 123 | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 |
| 1 | kim | 3 | Marseille | 1 |
| 2 | tom | 4 | London | 2 |
| 2 | tom | 5 | Oxford | 2 |
| 3 | bob | <null> | <null> | <null> |
+------+--------+--------+--------------+---------+
The left outer join
will leave rows from the left that don’t have any matching. Here bob
doesn’t have any matching addresses hence the address
side is returned as id: <null>, name: <null>, owner: <null>
. We can also do the opposite and use right outer join
:
1
2
3
4
5
6
7
8
9
10
11
12
select * from person
right outer join address on address.owner = person.id;
+--------+--------+------+--------------+---------+
| id | name | id | name | owner |
|--------+--------+------+--------------+---------|
| 1 | kim | 1 | Paris 123 | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 |
| 1 | kim | 3 | Marseille | 1 |
| 2 | tom | 4 | London | 2 |
| 2 | tom | 5 | Oxford | 2 |
| <null> | <null> | 6 | Oxford | 7 |
+--------+--------+------+--------------+---------+
The right outer join
will leave rows from teh right hence here Oxford 7
is the row that doesn’t have any matching person
.
Lastly we can use full outer join
to keep all sides:
1
2
3
4
5
6
7
8
9
10
11
12
13
select * from person
full outer join address on address.owner = person.id;
+--------+--------+--------+--------------+---------+
| id | name | id | name | owner |
|--------+--------+--------+--------------+---------|
| 1 | kim | 1 | Paris 123 | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 |
| 1 | kim | 3 | Marseille | 1 |
| 2 | tom | 4 | London | 2 |
| 2 | tom | 5 | Oxford | 2 |
| <null> | <null> | 6 | Oxford | 7 |
| 3 | bob | <null> | <null> | <null> |
+--------+--------+--------+--------------+---------+
Now that we’ve seen the different type of joins, we can see how the amount of rows returned is based of the cardinality of the relationships and the type of join being used. For example if we join person
and address
, and subsequently left outer join
animal
, we can predict that we will have 8 rows:
kim
and 2 rows for tom
animal
only match with kim
, we end up with 2 * 3 kim
+ 2 unmatched tom
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
select * from person
join address on address.owner = person.id
left outer join animal on animal.owner = person.id;
+------+--------+------+--------------+---------+--------+---------+---------+
| id | name | id | name | owner | id | name | owner |
|------+--------+------+--------------+---------+--------+---------+---------|
| 1 | kim | 3 | Marseille | 1 | 1 | cookie | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 | 1 | cookie | 1 |
| 1 | kim | 1 | Paris 123 | 1 | 1 | cookie | 1 |
| 1 | kim | 3 | Marseille | 1 | 2 | brownie | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 | 2 | brownie | 1 |
| 1 | kim | 1 | Paris 123 | 1 | 2 | brownie | 1 |
| 2 | tom | 5 | Oxford | 2 | <null> | <null> | <null> |
| 2 | tom | 4 | London | 2 | <null> | <null> | <null> |
+------+--------+------+--------------+---------+--------+---------+---------+
And similarly if we left outer join
both address
and animal
, we’ll end up with 9 rows:
kim
, 2 rows for tom
and 1 unmatched row for bob
,kim
and the 3 extra unmatched rows for tom
and bob
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
select * from person
left outer join address on address.owner = person.id
left outer join animal on animal.owner = person.id;
+------+--------+--------+--------------+---------+--------+---------+---------+
| id | name | id | name | owner | id | name | owner |
|------+--------+--------+--------------+---------+--------+---------+---------|
| 1 | kim | 3 | Marseille | 1 | 1 | cookie | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 | 1 | cookie | 1 |
| 1 | kim | 1 | Paris 123 | 1 | 1 | cookie | 1 |
| 1 | kim | 3 | Marseille | 1 | 2 | brownie | 1 |
| 1 | kim | 2 | Bordeaux 123 | 1 | 2 | brownie | 1 |
| 1 | kim | 1 | Paris 123 | 1 | 2 | brownie | 1 |
| 2 | tom | 5 | Oxford | 2 | <null> | <null> | <null> |
| 2 | tom | 4 | London | 2 | <null> | <null> | <null> |
| 3 | bob | <null> | <null> | <null> | <null> | <null> | <null> |
+------+--------+--------+--------------+---------+--------+---------+---------+
And that concludes today’s post!
In today’s post we looked at the different SQL joins that can be used in Postgres. We went through regular join, left join, right join and full join. While looking at the queries, we’ve seen how the relationships affect the amount of rows being returned. I hope you liked this post and I see you on the next one!