The following class is meant to be a kind of repository of many tables. Its main issue it is loads everything first. It takes time and might not be necessary if not all the tables are required.
import pandas class DataContainer: def __init__( self, big_tables ): self.big_tables = big_tables def __getitem__(self, i): return self.big_tables[i] filenames = [ "file1.txt", "files2.txt" ] def load(filename): return pandas.read_csv(filename, sep="\t") container = DataContainer ( [ load(f) for f in filenames ] )
So the goal is to load the data only when it is required. But I would like to avoid tweaking the interface of class. And the logic loading the data is held outside the container. However I would an access to the container to activate the loading of the data. Si instead of giving the class DataContainer the data itself, I give it a function able to load the data.
def memoize(f): memo = {} def helper(self, x): if x not in memo: memo[x] = f(self, x) return memo[x] return helper class DataContainerDelayed: def __init__( self, big_tables ): self.big_tables = big_tables @memoize def __getitem__(self, i): return self.big_tables[i]() container = DataContainerDelayed ( [ lambda t=f : load(t) for f in filenames ] ) for i in range(0,2): print(container[i])
But I would like to avoid loading the data only one time. So I used a memoize mechanism.
<-- --> |