Generate Fake Data In Python With Faker

Oct 9th, 2020 - written by Kimserey with .

Reducing repetition in codebase is a well understood concept in Software development. When writing features, we try to use existing functionalities so that we don’t duplicate similar logic. Surprisingly, this concept is often skipped when writing tests where we end up with a hundred over test cases with repeated construction of input objects to fit all possible scenarios being tested. In most languages (I haven’t checked all of them), developers have addressed such problem by providing ways to fake inputs. In today’s post, we will look into Python Faker package and how to use it to improve code reuse.

Faker

In tests, we have to compose input objects which are passed to our system under test (or SUT). At the beginning, it’s easy to write fake input like firstname = abc or age=1. But that quickly grows out of hand as we add new test cases, new developers coding on the same codebase, more properties in the input (and even worse if the properties are object themselves). This approach leads to unmaintainable test suites where the comprehensiblitity level grows in the opposite direction to the complexity of the input.

So in order to cut the complixity of composing input data, we want to use a tool to generate them. In Python, Faker package fulfill that role.

1
pip install faker

Then we instantiate a fake instance:

1
2
3
from faker import Faker

fake = new Faker()

From there we then have access to the default providers methods to generate fake data. For example fake.name:

1
2
3
4
5
6
7
8
9
for _ in range(5):
    print(fake.name())


Yvonne Rhodes
Wesley Mcconnell
Amy Webster
Stephen Vasquez
Sabrina Hammond DDS

which comes from the faker.providers.person provider. We can see that a lot of pre-existing methods are available by listing the Provider methods:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from faker.providers.person import Provider

help(Provider)

Help on class Provider in module faker.providers.person:

class Provider(faker.providers.BaseProvider)
 |  Provider(generator)
 |
 |  Method resolution order:
 |      Provider
 |      faker.providers.BaseProvider
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  first_name(self)
 |
 |  first_name_female(self)
 |
 |  first_name_male(self)
 |
 |  first_name_nonbinary(self)
 |
 |  language_name(self)
 |      Generate a random i18n language name (e.g. English).
 |
 |  last_name(self)
 |
 |  last_name_female(self)
 |
 |  last_name_male(self)
 |
 |  last_name_nonbinary(self)
 |
 |  name(self)
 |      :example 'John Doe'
 |
 |  name_female(self)
 |
 |  name_male(self)
 |
 |  name_nonbinary(self)
 |
 |  prefix(self)
 |
 |  prefix_female(self)
 |
 |  prefix_male(self)
 |
 |  prefix_nonbinary(self)
 |
 |  suffix(self)
 |
 |  suffix_female(self)
 |
 |  suffix_male(self)
 |
 |  suffix_nonbinary(self)

or we can look at faker.providers.profile which provides fake.profile():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In [84]: fake.profile()
Out[84]:
{'job': 'Minerals surveyor',
 'company': 'York, Roth and Hughes',
 'ssn': '410-41-0507',
 'residence': '1008 Hernandez Rue Apt. 777\nWest John, GA 97755',
 'current_location': (Decimal('-87.6969825'), Decimal('-96.917776')),
 'blood_group': 'A-',
 'website': ['http://www.barker.org/',
  'https://www.roach.biz/',
  'https://wallace-ray.net/',
  'http://www.greene.com/'],
 'username': 'jennifergarcia',
 'name': 'Joseph Wilson',
 'sex': 'M',
 'address': '395 Danielle Stravenue Suite 795\nEast Brandon, AZ 09248',
 'mail': '[email protected]',
 'birthdate': datetime.date(1930, 8, 14)}

We can see that fake.profile() generated a whole dictionary of fake data which makes it easy for us to use in tests.

This is essentially how Faker is used, we import it, and use the methods to generate random data.

Faker Providers

As we saw, person and profile providers give us the methods related to creating a person with first_name, last_name, prefix, suffix and to create a profile with profile or simple_profile. There are all sorts of providers and the list is can be seen under faker.providers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import faker

help(faker.providers)

Help on package faker.providers in faker:

NAME
    faker.providers

PACKAGE CONTENTS
    address (package)
    automotive (package)
    bank (package)
    barcode (package)
    color (package)
    company (package)
    credit_card (package)
    currency (package)
    date_time (package)
    file (package)
    geo (package)
    internet (package)
    isbn (package)
    job (package)
    lorem (package)
    misc (package)
    person (package)
    phone_number (package)
    profile (package)
    python (package)
    ssn (package)
    user_agent (package)

Those providers provide ways to generate fake data for values which would be troublesome to write like bank related values or phone_number:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
In [100]: fake.phone_number()
Out[100]: '270.130.6750x46207'

In [101]: fake.phone_number()
Out[101]: '(513)291-1492'

In [102]: fake.phone_number()
Out[102]: '001-118-246-1055'

In [103]: fake.iban()
Out[103]: 'GB33PFFS96993009295239'

In [104]: fake.iban()
Out[104]: 'GB37KNCU53825255040525'

For regular Python types, the python provider allows us to generate fake data too:

1
2
3
4
5
6
7
8
9
10
11
In [107]: fake.pybool()
Out[107]: True

In [108]: fake.pydecimal()
Out[108]: Decimal('-32.2712863')

In [109]: fake.pydecimal(3, 2, False)
Out[109]: Decimal('-264.29')

In [110]: fake.pydecimal(3, 2, False)
Out[110]: Decimal('919.83')

or make random pick:

1
2
3
4
5
6
7
8
In [117]: fake.random_element(elements=("Active", "Inactive", "Suspended"))
Out[117]: 'Suspended'

In [118]: fake.random_element(elements=("Active", "Inactive", "Suspended"))
Out[118]: 'Inactive'

In [119]: fake.random_element(elements=("Active", "Inactive", "Suspended"))
Out[119]: 'Active'

As we saw here some of the methods accept keyworded arguments to specify the boundary of the fake data.

Usage

With our knowledge of Faker, we can now create functions in our test cases which will generate the input necesasry for our SUT:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def generate_participants():
    return { "id": fake.pyint(0, 100), "name": fake.name(), "email": fake.email() }

def generate_event(**kwargs):
    res = {
        "event_name": fake.bs(),
        "start_date": fake.date_between('now', '+1y'),
        "participants": [generate_participants() for _ in range(fake.pyint(1, 5))],
        "status": fake.random_element(elements=("Active", "Cancelled")),
        "website": fake.uri(),
        "address": fake.address()
    }
    res.update(kwargs)
    return res

And we can then create events quickly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
In [184]: generate_event()
Out[184]:
{'event_name': 'maximize end-to-end metrics',
 'start_date': datetime.date(2020, 11, 14),
 'participants': [{'id': 24,
   'name': 'Harold Luna',
   'email': '[email protected]'},
  {'id': 95, 'name': 'Kristen Lopez', 'email': '[email protected]'},
  {'id': 34, 'name': 'Julie Mahoney', 'email': 'mi[email protected]'}],
 'status': 'Active',
 'website': 'https://woodward-duffy.org/post.html',
 'address': '645 Valerie Lane Suite 781\nMckinneyside, NH 14414'}

In [185]: generate_event()
Out[185]:
{'event_name': 'repurpose seamless e-services',
 'start_date': datetime.date(2020, 12, 13),
 'participants': [{'id': 84,
   'name': 'Kimberly Peterson',
   'email': '[email protected]'}],
 'status': 'Cancelled',
 'website': 'http://www.clark.com/explore/index/',
 'address': '02596 Nathan Dale Suite 759\nJamesland, CO 24083'}

In [186]: generate_event()
Out[186]:
{'event_name': 'extend cross-media relationships',
 'start_date': datetime.date(2020, 12, 31),
 'participants': [{'id': 41,
   'name': 'Amanda Watkins',
   'email': '[email protected]'},
  {'id': 85, 'name': 'Eric James', 'email': '[email protected]'}],
 'status': 'Cancelled',
 'website': 'https://www.haney.biz/categories/search/list/post/',
 'address': '40210 Miranda Centers\nBrianside, AL 29138'}

We also left the possibility to override attributes of the event with hardcoded data which we can use to pass in data specific for our tests. For example if we wanted to make sure that the event name is not allowed to contain special character like # or @, we could override the event name:

1
2
3
4
5
6
7
8
9
10
11
In [192]: generate_event(event_name="[email protected]")
Out[192]:
{'event_name': '[email protected]',
 'start_date': datetime.date(2021, 9, 22),
 'participants': [{'id': 89,
   'name': 'John Thomas',
   'email': '[email protected]'},
  {'id': 35, 'name': 'Traci Wells', 'email': '[email protected]'}],
 'status': 'Active',
 'website': 'http://www.wise-hanna.com/',
 'address': '202 Mark Courts Suite 208\nLake Bill, MD 41906'}

which would give us an event ready for our test.

And that concludes today’s post!

Conclusion

In today’s post we looked into Faker which we used to generate fake data for our tests. We started by looking at how we can import faker and used it quickly. We then looked into more details in how to look for fake data in providers and lastly we completed the post by showcasing how we could use faker to build functions which generate completely fake events and participants and how we could override some parts of it to make writing tests easier. I hope you liked this post and I see you on the next one!

Designed, built and maintained by Kimserey Lam.