0

Let's say I have a page I'd like to render which will present some (expensive to compute) data in a few ways. For example, I want to hit my database and get some large-size pile of data. Then I want to group that data and otherwise manipulate it in Python (for example, using Pandas). Say the result of this manipulation is some Pandas DataFrame that I'll call prepped_data. And say everything up to this point takes 3 seconds. (I.e. it takes a while...)

Then I want to summarize that data at a single URL (/summary): I'd like to show a bar graph, a pie chart and also an HTML table. Each of these elements depends on a subset of prepped_data.

One way I could handle this is to make 3 separate views hooked up to 3 separate URL's. I could make pie_chart_view which would make a dynamically generated pie chart available at /piechart.svg. I could make bar_graph_view which would make a dynamically generated bar graph available at /bargraph.svg. And I could make summary_view which would finish by rendering a template. That template would make use of context variables generated by summary_view itself to make my HTML table. And it would also include the graphs by linking to their URL's from within the template. In this structure, all 3 view functions would need to independently calculate prepped_data. That seems less-than-ideal.

As an alternative. I could turn on some kind of caching. Maybe I could make a view called raw_data_view which would make the data itself available at /raw_data.json. I could set this to cache itself (using whatever Django caching backend) for a short amount of time (30 seconds?). Then each of the other views could hit this URL to get their data and that way I could avoid doing the expensive calculations 3 times. This seems a bit dicey as well, though, because there's some real judgement involved in setting the cache time.

One other route could involve creating both graphs within summary_view and embedding the graphics directly within the rendered HTML (which is possible with .svg). But I'm not a huge fan of that since you wind up with bulky HTML files and graphics that are hard for users to take with them. More generally, I don't want to commit to doing all my graphics in that format.

Is there a generally accepted architecture for handling this sort of thing?

Edit: How I'm making the graphs:

One comment asked how I'm making the graphs. Broadly speaking, I'm doing it in matplotlib. So once I have a Figure I like generated by the code, I can save it to an svg easily.

8one6
  • 10,670
  • 11
  • 50
  • 78
  • Take a look at [memoization](http://stackoverflow.com/questions/1988804/what-is-memoization-and-how-can-i-use-it-in-python). It's a pretty generic technique that you can use to cache computation results for pretty much anything. – Lukas Graf Mar 19 '14 at 16:10
  • It would help if you explained how you're making your graphs. – Joel Burton Mar 19 '14 at 16:30

1 Answers1

0

I think your idea to store the prepped data in a file is a good one. I might name the file something like this:

/tmp/prepped-data-{{session_id}}.json

You could then just have a function in each view called get_prepped_data(session_id) that either computes it or reads it from the file. You could also delete old files when that function is called.

Another option would be to store the data directly in the user's session so it is cleaned up when their session goes away. The feasibility of this approach depends a bit on how much data needs to be stored.

DavidM
  • 1,347
  • 1
  • 12
  • 15