Python attribute caching with low overhead
For my work I need to do various lengthy numerical calculations on large datasets with results that do not change. The program is interactive, which means I do not know in advance which calculations on which data I need to perform. The results of the calculations depend completely on the dataset, there are no additional parameters. One can implement such a thing with a cache and corresponding code that checks if the result is in the cache before doing the calculation. I developed a generic technique to cache the results in such a way that no function call is necessary at all if the result is cached. It works by dynamically generating and deleting object attributes. If the attribute is set, it is used. If it is not set, a corresponding function to calculate the attribute is called and the attribute is set. If the data changes, all designated attributes are unset and will be recalculated when they are accessed next time.
A class for lists demonstrates how it works:
#!/usr/bin/env python3
# works similarly with python 2.x, except different syntax in some points
# -*- coding: utf-8 -*-
class CachedList(object):
# List of attributes that can be cached.
# Use keys of a hash to quickly look up if an attribute is managed.
# Leading underscores do not work due to the way attributes
# with two underscores are handled.
# A method that calculates the result has to be defined for each of these attributes.
# It has the same name as the attribute plus a leading underscore: average -> _average(), ...
_managed_attributes = {'average': 1, 'sum': 1, 'len': 1, 'exc': 1}
def __init__(self, l):
# copy the list
self._list = list(l)
self._exception_cache = {}
# This method is called automatically if an undefined attribute is accessed.
# If the attribute already exists, this method will not be called.
def __getattr__(self, attr):
# Avoid recalculating attributes that raised an exception.
# Instead raise the cached exception.
# Some of my calculations fail with an exception after a very long calculation,
# so it makes sense to cache the exception.
exception = self._exception_cache.get(attr)
if exception is not None:
raise exception
if attr in self._managed_attributes:
# Try to get and call a method that "makes" the entry
f = getattr(self, '_'+attr)
try:
# Just some tracing code for the example
print("calculate {0}".format(attr))
result = f()
setattr(self, attr, result)
except Exception as e:
# "remember" the exception
self._exception_cache[attr] = e
raise
else:
raise AttributeError('{0} object has no attribute {1}'.format(type(self), attr))
# result has been set if this point is reached
return result
def append(self, x):
self._list.append(x)
self._invalidate_cache()
def _invalidate_cache(self):
# just some tracing
print("empty cache")
self._exception_cache = {}
for a in self._managed_attributes.keys():
try:
delattr(self, a)
except AttributeError:
pass
def _average(self):
# Cached attributes can be used just like normal attributes
return self.sum/self.len
def _sum(self):
return sum(self._list)
def _len(self):
return len(self._list)
def _exc(self):
raise Exception("Exception!")
# now test the code!
l = range(10000000)
cl = CachedList(l)
# attribute is calculated and set
print(cl.average)
# attribute is used
print(cl.average)
# cache is cleared
cl.append(100000000)
# attribute is recalculated
print(cl.average)
# exceptions are also cached
try:
print(cl.exc)
except Exception as e:
print(e)
pass
try:
print(cl.exc)
except Exception as e:
print(e)
pass
user@host:~$ ./propertycache.py calculate average calculate sum calculate len 4999999.5 4999999.5 empty cache calculate average calculate sum calculate len 5000008.9999991 calculate exc Exception! Exception!