5

I would like to know what the best practice for efficiently storing (and subsequently accessing) sets of multi-dimensional data arrays with variable length. The focus is on performance, but I also need to be able to handle changing the length of an individual data set during runtime without too much overhead.

Note: I know this is a somewhat lengthy question, but I have looked around quite a lot and could not find a solution or example which describes the problem at hand with sufficient accuracy.

Background

The context is a computational fluid dynamics (CFD) code that is based on the discontinuous Galerkin spectral element method (DGSEM) (cf. Kopriva (2009), Implementing Spectral Methods for Partial Differential Equations). For the sake of simplicity, let us assume a 2D data layout (it is in fact in three dimensions, but the extension from 2D to 3D should be straightforward).

I have a grid that consists of K square elements k (k = 0,...,K-1) that can be of different (physical) sizes. Within each grid element (or "cell") k, I have N_k^2 data points. N_k is the number of data points in each dimension, and can vary between different grid cells.

At each data point n_k,i (where i = 0,...,N_k^2-1) I have to store an array of solution values, which has the same length nVars in the whole domain (i.e. everywhere), and which does not change during runtime.

Dimensions and changes

The number of grid cells K is of O(10^5) to O(10^6) and can change during runtime.
The number of data points N_k in each grid cell is between 2 and 8 and can change during runtime (and may be different for different cells).
The number of variables nVars stored at each grid point is around 5 to 10 and cannot change during runtime (it is also the same for every grid cell).

Requirements

Performance is the key issue here. I need to be able to regularly iterate in an ordered fashion over all grid points of all cells in an efficient manner (i.e. without too many cache misses). Generally, K and N_k do not change very often during the simulation, so for example a large contiguous block of memory for all cells and data points could be an option.

However, I do need to be able to refine or coarsen the grid (i.e. delete cells and create new ones, the new ones may be appended to the end) during runtime. I also need to be able to change the approximation order N_k, so the number of data points I store for each cell can change during runtime as well.

Conclusion

Any input is appreciated. If you have experience yourself, or just know a few good resources that I could look at, please let me know. However, while the solution will be crucial to the performance of the final program, it is just one of many problems, so the solution needs to be of an applied nature and not purely academic.

Should this be the wrong venue to ask this question, please let me know what a more suitable place would be.

  • Does that mean that you can guarantee a maximum number of required elements? – Nathan White Jul 10 '12 at 12:13
  • Only with a very large bound. The whole idea is that the grid may be refined dynamically, i.e. depending on the development of the flow solution at certain points in (physical) space. So for all intents and purposes... probably no. – Michael Schlottke-Lakemper Jul 10 '12 at 12:20
  • How are you iterating over the grid? Does it matter? Do you access neiboring grid points during an update? Sorry I don't have the chance to look up primary source now. – simon Jul 10 '12 at 13:30
  • In general, yes. Data points of neighboring cells in all major directions (-x, +x, -y, +y) will be accessed (though not all of them, but just the data points that lie closest to the current cell). However, there are also sweeps across all cells where the order does not really matter since only internal data points in each cell are accessed. – Michael Schlottke-Lakemper Jul 10 '12 at 14:00

2 Answers2

3

Often, these sorts of dynamic mesh structures can be very tricky to deal with efficiently, but in block-structured adaptive mesh refinement codes (common in astrophysics, where complex geometries aren't important) or your spectral element code where you have large block sizes, it is often much less of an issue. You have so much work to do per block/element (with at least 10^5 cells x 2 points/cell in your case) that the cost of switching between blocks is comparitively minor.

Keep in mind, too, that you can't generally do too much of the hard work on each element or block until a substantial amount of that block's data is already in cache. You're already going to have to had flushed most of block N's data out of cache before getting much work done on block N+1's anyway. (There might be some operations in your code which are exceptions to this, but those are probably not the ones where you're spending much time anyway, cache or no cache, because there's not a lot of data reuse - eg, elementwise operations on cell values). So keeping each the blocks/elements beside each other isn't necessarily a huge deal; on the other hand, you definitely want the blocks/elements to be themselves contiguous.

Also notice that you can move blocks around to keep them contiguous as things get resized, but not only are all those memory copies also going to wipe your cache, but the memory copies themselves get very expensive. If your problem is filling a significant fraction of memory (and aren't we always?), say 1GB, and you have to move 20% of that around after a refinement to make things contiguous again, that's .2 GB (read + write) / ~20 GB/s ~ 20 ms compared to reloading (say) 16k cache lines at ~100ns each ~ 1.5 ms. And your cache is trashed after the shuffle anyway. This might still be worth doing if you knew that you were going to do the refinement/derefinement very seldom.

But as a practical matter, most adaptive mesh codes in astrophysical fluid dynamics (where I know the codes well enough to say) simply maintain a list of blocks and their metadata and don't worry about their contiguity. YMMV of course. My suggestion would be - before spending too much time crafting the perfect data structure - to first just test the operation on two elements, twice; the first, with the elements in order and computing on them 1-2, and the second, doing the operation in the "wrong" order, 2-1, and timing the two computations several times.

Jonathan Dursi
  • 47,319
  • 8
  • 115
  • 149
  • Hm. What do you think about maintaining a separate memory location for each value of `N_k`, and to store the number of cells having that number of data points? Usually `N_k` should not vary by more than 3-4 over the whole domain, so having 3-4 separate lists to go through should remain relatively easy to handle. Of course that would complicate things like unique cell ids, neighbor information etc. Alas, maybe you're right and I should just use a simple array of object pointers... – Michael Schlottke-Lakemper Jul 10 '12 at 15:55
1

For each cell, store the offset in which to find the cell data in a contiguous array. This offset mapping is very efficient and widely used. You can reorder the cells for cache reuse in traversals. When the order or number of cells changes, create a new array and interpolate, then throw away the old arrays. This storage is much better for external analysis because operations like inner products in Krylov methods and stages in Runge-Kutta methods can be managed without reference to the mesh. It also requires minimal memory per vector (e.g. in Krylov bases and with time integration).

Jed
  • 1,591
  • 16
  • 23
  • What do you mean by "can be managed without reference to the mesh"? – Michael Schlottke-Lakemper Aug 26 '12 at 07:51
  • Linear algebraic operations know nothing (mathematically) about the physical meaning of the vectors. This is a good property to maintain, otherwise you end up with tighter coupling between software components and have to do much more work to use libraries (e.g. writing all your own primitive operations) and often sacrifice some performance due to more complicated traversal. – Jed Aug 26 '12 at 15:28