Python 101: An Intro to Object Serialization with Pickle

Python's "batteries included" philosophy even includes a module for object serialization. They call it the pickle module. Some people call serialization by other names, such as marshalling or flattening. In Python, it's known as "pickling". The pickle module also has an optimized C-based version known as cPickle that can run up to 1000 times faster than the ordinary pickle. The documentation does come with a warning though and it's important so it is reprinted below:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

Now that we have that out of the way, we can start learning how to use pickle! By the end of this post, you may be hungry!

Writing a Simple Pickle Script

We will start out by writing a simple script that demonstrates how to pickle a Python list. Here's the code:

import pickle

#----------------------------------------------------------------------
def serialize(obj, path):
    """
    Pickle a Python object
    """
    with open(path, "wb") as pfile:
        pickle.dump(obj, pfile)
        
#----------------------------------------------------------------------
def deserialize(path):
    """
    Extracts a pickled Python object and returns it
    """
    with open(path, "rb") as pfile:
        data = pickle.load(pfile)
    return data

#----------------------------------------------------------------------
if __name__ == "__main__":
    my_list = [i for i in range(10)]
    pkl_path = "data.pkl"
    serialize(my_list, pkl_path)
    saved_list = deserialize(pkl_path)
    print saved_list

Let's take a few minutes to study this code. We have two functions, the first of which is for saving (or pickling) a Python object. The second is for deserializing (or unpickling) the object. To do the serialization, you just need to call pickle's dump method and pass it the object to be pickled and an open file handle. To deserialize the object, you just call pickle's load method. You can pickle multiple objects into one file, but the pickling works like a FIFO (first in, first out) stack. So you'll get the items out in the order that you put them in. Let's change the code above to demonstrate this concept!

import pickle

#----------------------------------------------------------------------
def serialize(objects, path):
    """
    Pickle a Python object
    """
    with open(path, "wb") as pfile:
        for obj in objects:
            pickle.dump(obj, pfile)
        
#----------------------------------------------------------------------
def deserialize(path):
    """
    Extracts a pickled Python object and returns it
    """
    with open(path, "rb") as pfile:
        lst = pickle.load(pfile)
        dic = pickle.load(pfile)
        string = pickle.load(pfile)
    return lst, dic, string

#----------------------------------------------------------------------
if __name__ == "__main__":
    my_list = [i for i in range(10)]
    my_dict = {"a":1, "b":2}
    my_string = "I'm a string!"
    
    pkl_path = "data.pkl"
    serialize([my_list, my_dict, my_string], pkl_path)
    
    data = deserialize(pkl_path)
    print data

In this code, we pass in a list of 3 Python objects: a list, a dictionary and a string. Note that we have to call pickle's dump method to store each of these objects. When you deserialize, you'll need to call pickle's load method the same number of times.

Other Notes about Pickling

You can't pickle everything. For example, you cannot pickle Python objects that have ties to C/C++ underneath, such as wxPython. If you try to, you'll receive a PicklingError. According to the documentation, the following types can be pickled:

  • None, True, and False
  • integers, long integers, floating point numbers, complex numbers
  • normal and Unicode strings
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose __dict__ or the result of calling __getstate__() is picklable (see section The pickle protocol for details).

Also note that if you happen to use the cPickle module for the speedup in processing time, you cannot subclass it. The cPickle module does not support subclassing of the Pickler() and Unpickler() because they're actually functions in cPickle. That's a rather sneaky got'cha that you need to be aware of.

Finally, pickle's output data format uses a printable ASCII representation. Let's take a look at the second script's output just for fun:

(lp0
I0
aI1
aI2
aI3
aI4
aI5
aI6
aI7
aI8
aI9
a.(dp0
S'a'
p1
I1
sS'b'
p2
I2
s.S"I'm a string!"
p0
.

Now, I'm not an expert on this format, but you can kind of see what's going on. However, I'm not sure how to tell what's the end of a section. Also note that the pickle module uses protocol version 0 by default. There are protocols 2 and 3. You can specify which protocol you want by passing it in as the 3rd argument to pickle's dump method.

Finally, there's a really cool video on the pickle module from PyCon 2011 by Richard Saunders.

Wrapping Up

At this point, you should be able to use pickle for your own data serialization needs. Have fun!

Copyright © 2024 Mouse Vs Python | Powered by Pythonlibrary