Memoizing expensive functions in python and saving results

| categories: programming | tags:

Sometimes a function is expensive (time-consuming) to run, and you would like to save all the results of the function having been run to avoid having to rerun them. This is called memoization. A wrinkle on this problem is to save the results in a file so that later you can come back to a function and not have to run simulations over again.

In python, a good way to do this is to "decorate" your function. This way, you write the function to do what you want, and then "decorate" it. The decoration wraps your function and in this case checks if the arguments you passed to the function are already stored in the cache. If so, it returns the result, if not it runs the function. The memoize decorator below was adapted from here.

from functools import wraps
def memoize(func):
    cache = {}
    @wraps(func)
    def wrap(*args):
        if args not in cache:
            print 'Running func'
            cache[args] = func(*args)
        else:
            print 'result in cache'
        return cache[args]
    return wrap

@memoize
def myfunc(a):
    return a**2

print myfunc(2)
print myfunc(2)

print myfunc(3)
print myfunc(2)
Running func
4
result in cache
4
Running func
9
result in cache
4

The example above shows the principle, but each time you run that script you start from scratch. If those were expensive calculations that would not be desirable. Let us now write out the cache to a file. We use a simple pickle file to store the results.

import os, pickle
from functools import wraps
def memoize(func):
    if os.path.exists('memoize.pkl'):
        print 'reading cache file'
        with open('memoize.pkl') as f:
            cache = pickle.load(f)
    else:
        cache = {}
    @wraps(func)
    def wrap(*args):
        if args not in cache:
            print 'Running func'
            cache[args] = func(*args)
            # update the cache file
            with open('memoize.pkl', 'wb') as f:
                pickle.dump(cache, f)
        else:
            print 'result in cache'
        return cache[args]
    return wrap

@memoize
def myfunc(a):
    return a**2


print myfunc(2)
print myfunc(2)

print myfunc(3)
print myfunc(2)
reading cache file
result in cache
4
result in cache
4
result in cache
9
result in cache
4

Now you can see if we run this script a few times, the results are read from the cache file.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter