Serialization With Marshmallow

Oct 25th, 2019 - written by Kimserey with .

Marshmallow is a library converting different datatypes to Python objects. The most common usage of Marshmallow is to deserialize JSON object to Python object or serialize Python object to JSON object to be used in web API. Marshmallow does this through the definition of a schema which can be used to apply rules to validate the data being deserialized or change the way data are being serialized. Today we will look into more details on how to use Marshmallow, how to apply validation on fields and how to configure

Schema
Fields Validation
Meta

Schema

We start first by installing marshmallow.

pip install marshmallow

The main component of Marshmallow is a Schema. A schema defines the rules that guides deserialization, called load, and serialization, called dump. It allows us to define the fields that will be loaded or dumped, add requirements on the fields, like validation or required. And it allows us to inject computation to perform transformation between load and dump.

from marshmallow import Schema, fields

class UserSchema(Schema):
    firstname = fields.Str()
    lastname = fields.Str(required=True)

schema = UserSchema()
data = { "firstname": "Kim" }
user = schema.load(data)

For example here we define a simple schema with two string fields, where lastname is required. We then try to load the object, we get the following validation error:

ValidationError: {'lastname': ['Missing data for required field.']}

By adding the lastname, the object will be correctly deserialized. There are times where the property name differs from what we receive to what we wish to deserialize to. We can use data_key to specify the field name in the raw object.

class UserSchema(Schema):
    firstname = fields.Str(data_key="name")
    lastname = fields.Str(required=True)

schema = UserSchema()
data = { "name": "Kim", "lastname": "lam"}
user = schema.load(data)

And that will deserialize to {'lastname': 'lam', 'firstname': 'Kim'}.

When creating the schema, we can pass arguments:

only: a list of fields to only consider from dump and load,
exclude: a list of fields to exclude from dump and load,
many: whether the resulting schema is an array of the instantiated schema,
context: a context object to provide contextual dump and load,
load_only: a list of fields to be considered only during load,
dump_only: a list of fields to be considered only during dump,
partial: a list of fields that can be omitted,
unknown: the behavior to take on unknown fields (EXCLUDE, INCLUDE, RAISE).

For example,

class UserSchema(Schema):
    firstname = fields.Str(data_key="name")
    lastname = fields.Str(required=True)
    password = fields.Str()
    age = fields.Integer(required=True)

schema = UserSchema(load_only=['password'], unknown='EXCLUDE', partial=['age'])   

With this schema, we added a password field and an age field where on dump, password will be excluded, any unknown field will be excluded and age can be partially provided, in other word, it can be omitted but if provided cannot be None.

data = schema.load(dict(name='kim', lastname='lam', password='123', something='123'))
# data = {'lastname': 'lam', 'password': '123', 'firstname': 'kim'} <= something is excluded
res = schema.dump(data)
# res = {'lastname': 'lam', 'name': 'kim'} <= password is not dump

For collection of objects, we can use many:

schema = UserSchema(load_only=['password'], unknown='EXCLUDE', partial=['age'], many=True)
u = dict(name='kim', lastname='lam', password='123', something='123')
data = schema.load([u, u, u])
# data = [{'password': '123', 'lastname': 'lam', 'firstname': 'kim'},
#         {'password': '123', 'lastname': 'lam', 'firstname': 'kim'},
#         {'password': '123', 'lastname': 'lam', 'firstname': 'kim'}]
schema.dump(data)                                                       
# [{'lastname': 'lam', 'name': 'kim'},
#  {'lastname': 'lam', 'name': 'kim'},
#  {'lastname': 'lam', 'name': 'kim'}]

Marshmallow also provides hooks to perform transformation before and after dump or load. This can be used to handle deserialization into a custom type. This can be achieved with the @post_load hook:

class User():
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __repr__(self):
        return '<User {}/{}>'.format(self.name, self.age)

class UserSchema(Schema):
    name = fields.Str()
    age = fields.Int()

    @post_load
    def make_user(self, data, **kwargs):
        return User(**data)

When we load data, we will directly get back a User:

schema = UserSchema()
user = schema.load({'name': 'Tom', 'age': 60}) # <User Tom/60>

Now that we know the basic of schema creation, we can start to look into how fields can be validated.

Fields Validation

Each schema contains fields which define the fields to load or dump. Fields are defined using the fields module.

The types available to define fields are:

Mapping
Dict
List
Tuple,
String,
UUID,
Number,
Integer,
Decimal,
Boolean,
Float,
DateTime,
NaiveDateTime,
AwareDateTime,
Time,
Date,
TimeDelta,
Url,
URL,
Email,
Method,
Function,
Str,
Bool,
Int,
Constant,
Pluck

Where Str, Bool, Int and URL are aliases for String, Boolean, Integer and Url.

We define fields by specifying their types:

name = fields.Str()

Each field derives from Field and providing common configurations:

default: value used in serialization (dump) when the value is missing.
missing: value used in deserialization (load) when value is missing.
data_key: used when the field name differs
validate: used for validator (we will see more about validation later)
required: specify whether the field is required in the deserialization
allow_none: specify whether None is a valid value during deserialization. It defaults to True when missing is set to None else False.
load_only: if True it will skip the field during serialization (since it is only for loading)
dump_only: if True it will skil the field at deserialization (since it is only for dump)
error_messages: a dictionary allowing to override the error message on error.

For example previously we defined

class UserSchema(Schema):
    firstname = fields.Str(data_key="name")
    lastname = fields.Str(required=True)
    password = fields.Str()
    age = fields.Integer(required=True)

a schema containing four fields, two required and one having a different key on the raw data.

Another important part of the fields is the validation. It can be specified through the validate argument, taking a single value or an array of validations. Validations are added from the validate module.

from marshmallow import Schema, fields, validate

class UserSchema(Schema):
    firstname = fields.Str(validate=validate.Length(min=1))

Validation only occurs at deserialization and not at serialization. When the validation fails, a ValidationError exception is thrown.

The default validator supported are:

ContainsOnly: validates that the value is a subset of the values from the validation,
Email: validates that the value is an email,
Equal: validates by comparing value with the validation value,
Length: validates the length of the value using len(),
NoneOf: validates that the value is a sequence and that it is mutually exclusive from the validation value,
OneOf: validates that the value is one of the values from the validation,
Predicate: validates by calling the method specified in the value of the validation,
Range: validates against a range,
Regexp: validates with a regular expression,
URL: validates that the value is a URL.

It’s also possible to pass our own validation function directly in validate:

from marshmallow import Schema, fields, ValidationError

def validate_name(name):
    if name != 'kim':
        raise ValidationError('Name must be kim')

class UserSchema(Schema):
    name = fields.Str(validate=validate_name)

schema = UserSchema()
schema.validate({'name': 'tom'}) # => ValidationError: {'name': ['Name must be kim']}

It’s also possible to write the validator as a method from the schema itself using the @validates decorator:

class UserSchema(Schema):
    name = fields.Str()

    @validates('name')
    def validate_name(self, name):
        if name != 'kim':
            raise ValidationError('Name must be kim')

Apart from the type of the field, fields can also be used to define nested fields with fields.Nested().

class UserProfile(Schema):
    address = fields.Str()

class UserSchema(Schema):
    name = fields.Str()
    profile = fields.Nested(UserProfile)

Nested accepts arguments like schemas with exclude, only, unknown and many which are applied to the underlying schem a where exclude defines the fields to exclude in the nested schema, only defines the fields to include in the nexted schema, unkown defines the behavior on unkown fields and many whether the field should be a collection of the underlying schema.

The nested schema can also be specified by providing a string in order to specify two-way nesting where you’d have a circular reference.

class UserProfile(Schema):
    address = fields.Str()
    user = fields.Nested('UserSchema')

class UserSchema(Schema):
    name = fields.Str()
    profile = fields.Nested(UserProfile)

Conclusion

Today we looked at Marshmallow, a library used for converting different datatypes to Python objects with a JSON friendly serialization/deserialization. We started by looking into how Marshmallow uses its Schema type to create schemas used to convert to and from objects, how they could be used to add validation and how we could specify fields to exclude, include, partially include. We then moved on to look at configuration of fields specifically, looking into how we could apply validation, what types were supported by default, what sort of validations were supported by default and how we could extend the validation. Lastly we completed the post by talking about the Meta class which allows us to share configuration of the schema for different instances of the schema. I hope you liked this post and I see you on the next one!

Serialization With Marshmallow

Schema

Fields Validation

Meta

Conclusion

External Sources