XD blog

blog page

lambda function, memoization, python


2015-02-16 Delay evaluation

The following class is meant to be a kind of repository of many tables. Its main issue it is loads everything first. It takes time and might not be necessary if not all the tables are required.

import pandas

class DataContainer:
    def __init__( self, big_tables ):
        self.big_tables = big_tables
        
    def __getitem__(self, i):
        return self.big_tables[i]
        
filenames = [ "file1.txt", "files2.txt" ]
          
def load(filename):
    return pandas.read_csv(filename, sep="\t")
    
container = DataContainer ( [ load(f) for f in filenames ] )

So the goal is to load the data only when it is required. But I would like to avoid tweaking the interface of class. And the logic loading the data is held outside the container. However I would an access to the container to activate the loading of the data. Si instead of giving the class DataContainer the data itself, I give it a function able to load the data.

def memoize(f):
    memo = {}
    def helper(self, x):
        if x not in memo:            
            memo[x] = f(self, x)
        return memo[x]
    return helper        
        
class DataContainerDelayed:
    def __init__( self, big_tables ):
        self.big_tables = big_tables
        
    @memoize
    def __getitem__(self, i):
        return self.big_tables[i]()
        
container = DataContainerDelayed ( [ lambda t=f : load(t) for f in filenames ] )        
for i in range(0,2): print(container[i])

But I would like to avoid loading the data only one time. So I used a memoize mechanism.


<-- -->

Xavier Dupré