4

My python script does some heavy computation. To boost performance, it caches the computed data on the disk so that next time I'll run it, it doesn't waste time in computing the same thing. However, before extracting data from the cache, it needs to do some checking to make sure that the cache is not stale. This is the part where I am stuck.

My first idea was to compare the creation time of cache and modification time of python script and if the later is larger (ie more recent) than the former, I would consider the cache as stale, else not. However, since linux kernel does not store creation times of files, I am stuck at this point.

Similar situation:
When python interpreter creates .pyc files from .py files, it does something similar --> creates a new .pyc file if I'll modify my .py file after the .pyc file was created, else it does not. How does it do that? I wish to know the algorithm. Thank you.

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
Pushpak Dagade
  • 5,510
  • 6
  • 25
  • 38
  • 5
    Why not use the last-modified timestamp then? – Martijn Pieters Sep 10 '12 at 10:42
  • You might want to have a look at http://stackoverflow.com/questions/50499/in-python-how-do-i-get-the-path-and-name-of-the-file-that-is-currently-executin - then you can compare timestamps. This might only work though if your script is simple, for example would you want the results to be reprocessed if a library used by the script was upgraded? – George Sep 10 '12 at 10:58
  • I remember hearing a lecture about how yahoo deals with this issue. I'll have a look later and try to find the slides, hopefully it'll be helpful. – amit Sep 10 '12 at 10:58
  • Here it is: Blanco et al. article: [Caching Search Engine Results over Incremental Indices](http://www.google.co.il/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCAQFjAA&url=http://work-tmp.googlecode.com/svn/trunk/SE/Papers/Cache/p82.pdf&ei=18pNUK_qIsfKtAbS94CoCA&usg=AFQjCNEr4Cm03KfGAbznxJszW5ax76w5RQ&sig2=KdiZ22ErkvOfLceJSB2Bsw&cad=rja). Published in SIG-IR2010. Tell me if you find it helpful and want me to post it as an answer. – amit Sep 10 '12 at 11:16
  • @MartijnPieters Your idea works. So silly of me! I should have thought on this just a little bit deeper :P. Thank you :) Please post it as an answer and I'll be happy to accept it. – Pushpak Dagade Sep 10 '12 at 16:52
  • @amit Thanks for your efforts but my task isn't that complicated,so I'll stick with Martijn's answer – Pushpak Dagade Sep 10 '12 at 16:53

2 Answers2

2

Just check the last-modified time of your cache file instead.

Even better, that's what you really want to check in any case, because when you update your cache to store the new computed value, you want to know when that was done last, not when that was done the first time. :-)

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
0

You can have a metadata file that will hold a list of all cached entities together with their creation times

MichaelT
  • 6,613
  • 6
  • 31
  • 46