Uellue's Blog

Python attribute caching with low overhead

For my work I need to do various lengthy numerical calculations on large datasets with results that do not change. The program is interactive, which means I do not know in advance which calculations on which data I need to perform. The results of the calculations depend completely on the dataset, there are no additional parameters. One can implement such a thing with a cache and corresponding code that checks if the result is in the cache before doing the calculation. I developed a generic technique to cache the results in such a way that no function call is necessary at all if the result is cached. It works by dynamically generating and deleting object attributes. If the attribute is set, it is used. If it is not set, a corresponding function to calculate the attribute is called and the attribute is set. If the data changes, all designated attributes are unset and will be recalculated when they are accessed next time.

A class for lists demonstrates how it works:


#!/usr/bin/env python3
# works similarly with python 2.x, except different syntax in some points

# -*- coding: utf-8 -*-

class CachedList(object):
    # List of attributes that can be cached.
    # Use keys of a hash to quickly look up if an attribute is managed.
    # Leading underscores do not work due to the way attributes 
    # with two underscores are handled.
    # A method that calculates the result has to be defined for each of these attributes. 
    # It has the same name as the attribute plus a leading underscore: average -> _average(), ...
    _managed_attributes = {'average': 1, 'sum': 1, 'len': 1, 'exc': 1}
    def __init__(self, l):
        # copy the list
        self._list = list(l)
        self._exception_cache = {}
        
    # This method is called automatically if an undefined attribute is accessed.
    # If the attribute already exists, this method will not be called.
    def __getattr__(self, attr):
        # Avoid recalculating attributes that raised an exception.
        # Instead raise the cached exception.
        # Some of my calculations fail with an exception after a very long calculation,
        # so it makes sense to cache the exception.
        exception = self._exception_cache.get(attr)
        if exception is not None:
            raise exception
        if attr in self._managed_attributes:
            # Try to get and call a method that "makes" the entry
            f = getattr(self, '_'+attr)
            try:
                # Just some tracing code for the example
                print("calculate {0}".format(attr))
                result = f()
                setattr(self, attr, result)
            except Exception as e:
                # "remember" the exception
                self._exception_cache[attr] = e
                raise
        else:
            raise AttributeError('{0} object has no attribute {1}'.format(type(self), attr))
        # result has been set if this point is reached
        return result
    
    def append(self, x):
        self._list.append(x)
        self._invalidate_cache()
        
    def _invalidate_cache(self):
        # just some tracing
        print("empty cache")
        self._exception_cache = {}
        for a in self._managed_attributes.keys():
            try:
                delattr(self, a)
            except AttributeError:
                pass
            
            
    def _average(self):
        # Cached attributes can be used just like normal attributes
        return self.sum/self.len
        
    def _sum(self):
        return sum(self._list)
        
    def _len(self):
        return len(self._list)
        
    def _exc(self):
        raise Exception("Exception!")
    
# now test the code!    

l = range(10000000)
cl = CachedList(l)
# attribute is calculated and set
print(cl.average)
# attribute is used
print(cl.average)
# cache is cleared
cl.append(100000000)
# attribute is recalculated
print(cl.average)
# exceptions are also cached
try:
    print(cl.exc)
except Exception as e:
    print(e)
    pass

try:
    print(cl.exc)
except Exception as e:
    print(e)
    pass

user@host:~$ ./propertycache.py

calculate average
calculate sum
calculate len
4999999.5
4999999.5
empty cache
calculate average
calculate sum
calculate len
5000008.9999991
calculate exc
Exception!
Exception!