optimizing a grid-based particle system

Question

I've implemented a game somewhat similar to this one in Java and currently find that I'm hitting a ceiling number of particles of ~80k. My game board is a 2D array of references to 'Particle' objects, each of which must be updated every frame. Different kinds of 'Particle' have different behaviors and may move or change their state in response to environmental conditions such as wind or adjacent particles.

Some possible 'rules' that might be in effect:

If a Particle of type lava is adjacent to a Particle of type water, both disappear, and the lava is replaced by obsidian
If a gas Particle is adjacent to a Lava, Fire, Ember, etc. Particle, it will ignite, and produce fire and smoke
If a sufficient number of dust particles are stacked on top of one another, those at lower levels, as if under pressure, can become sedimentary rock

I've searched around and haven't been able to find any algorithms or data structures that seem particularly well-suited to speeding up the task. It seems that some kind of memoization might be useful? Would a quad tree be of any use here? I've seen them used in the somewhat similar Conway's Game of Life with the Hashlife algorithm. Or, is it the case that I'm not going to be able to do too much to increase the speed?

This sounds like the kind of problem that a GPU is a great fit for. I know little about GPU programming, but http://mikeinnes.github.io/2017/08/24/cudanative.html suggests that it might be easier to get into than you think. — btilly, Aug 30 '17 at 03:24
Hashlife relies on the locality of computation and you've told us little about your rules. — maaartinus, Aug 30 '17 at 03:49
@paleto-fuera-de-madrid I guess, hashlife is compatible with the first two rules (local interactions only), but not with the last. I'm also skeptical about using memoization because of the much bigger number of possibilities. If you could post the whole code on [CR](https://codereview.stackexchange.com), you could get quite some help there (drop me a note if you do). Even minor improvement can give you a nice speed factor. — maaartinus, Aug 31 '17 at 04:29
@maaartinus https://codereview.stackexchange.com/questions/174508/optimized-updates-of-a-grid-based-particle-system — paleto-fuera-de-madrid, Aug 31 '17 at 23:00

score 1 · Answer 1 · answered Sep 04 '17 at 19:15

Hashlife will work in principle but there are two reasons why you might not get as much out of it as Conway Life.

Firstly it relies on recurring patterns. The more cell states you have and the less structured the plane the fewer cache hits you'll encounter and the more you'll be working with brute force.

Secondly as another poster noted rules that involve non-local effects will either mean your primitives (in Conway Life 4x4) will need to be bigger so you will have abandon divide and conquer at say 8x8 or 16x16 or whatever size guarantees you can correctly calculate the middle portion in n/2 time. That's made the worse by the diversity of states. In Conway Life it's common to pre-calculate all 4x4 gridsor at least have nearly all relevant ones in cache. With 2 states there are only 65536 4x4 grids (peanuts on modern platforms) but with only 3 there are 43046721. If you have to have 8x8 primitives it gets very big very quickly and beyond any realistic storage.

So the larger the primitive and the more states you have that becomes quickly unrealistic.

One way to address that primitive size is to have the rock rule propagate pressure. So a Rock+n (n representing pressure) becomes Rock+(n+1) in the next generation if it has Rock+m where m>=n above it. Up to some threshold k where it turns to sedimentary Rock.

That means cells are still only dependent on their immediate neighbours but again multiplies up the number of states.

If you have cell types like the 'Bird' in the example given and you have velocities that you don't keep to a minimum (say -1,0,1 in either direction) you'll totally collapse memoization. Even then the chaotic nature of such rules may make cache hits on those areas vanishingly small.

If your rules don't lead to steady states (or repeating cycles) like Conway Life often does the return on memoization will be limited unless your plane is mostly empty.

Since Hashlife doesn't seem of much use in a case like this due to the great variance of behaviors, can you think of any other relevant algorithms that I might research? Or do you think that my best bet would be to come up with clever ways to reduce the per-particle work? — paleto-fuera-de-madrid, Sep 04 '17 at 20:40
@paleto-fuera-de-madrid You may still gain something with some kind of spacial-tree. The obvious are to optimize is ignoring dead space and Hashlife is pretty good at that - ignore the memoizing. You need a data structure where it's easy to iterate through non-dead cells and also obtain their near neighbours for interaction. An alternative would be a couple of hashmaps (x,y)->state. Iterate the 'old' map and fill the 'new' map recognising you can access neighbours easily. Or some kind of sparse matrix with links that jump dead space. — Persixty, Sep 04 '17 at 21:00
I tried hashmaps before and found them to be too slow for the numbers of particles that I was dealing with. I also tried using some ints to keep track of the max and min occupied coordinates to minimize iteration over the array, but that seemed slower too, when I tested it. Is it feasible that a plain old array might be the fastest data structure for this? — paleto-fuera-de-madrid, Sep 05 '17 at 01:44

score 0 · Answer 2 · answered Aug 30 '17 at 04:02

0

i don't understand your problem clearly but I think cuda or OpenGL (GPU programming in general) can easily handle your ref link: https://dan-ball.jp/en/javagame/dust/

answered Aug 30 '17 at 04:02

dk1111

160
5

2

what additional info would make my problem clearer to you? also, could you provide a bit more of an explanation? – paleto-fuera-de-madrid Aug 30 '17 at 18:32

score 0 · Answer 3 · 2018-01-17T11:45:25.610

I'd use a fixed NxN grid for this mainly because there are too many points moving around each frame to benefit from the recursive subdividing nature of the quad-tree. This is a case where a straightforward data structure with the data representations and memory layouts tuned appropriately can make all the difference in the world.

The main thing I'd do for Java here is actually avoid modeling each particle as an object. It should be raw data like just some plain old data like floats or ints. You want to be able to work with contiguity guarantees for spatial locality with sequential processing and not pay for the cost of padding and the 8-byte overhead per class instance. Split cold fields away from hot fields.

For example, you don't necessarily need to know a particle's color to move it around and apply physics. As a result, you don't want an AoS representation here which has to load in a particle's color into cache lines during the sequential physics pass only to evict it and not use it. Cram as much relevant memory used together into a cache line as you can by separating it away from the irrelevant memory for a particular pass.

Each cell in the grid should just store an index into a particle, with each particle storing an index to the next particle in the cell (a singly-linked list, but an intrusive one which requires allocating no nodes and just uses indices into arrays). A -1 can be used to indicate the end of the list as well as empty cells.

To find collisions between particles of interest, look in the same cell as the particle you're testing, and you can do this in parallel where each thread handles one or more cells worth of particles.

The NxN grid should be very fine given the boatload of moving particles you can have per frame. Play with how many cells you create to find something optimal for your input sizes. You might even have multiple grids. If certain particles don't interact with each other, don't put them in the same grid. Don't worry about the memory usage of the grid here. If each grid cell just stores a 32-bit index to the first particle in the cell, then a 200x200 grid only takes 160 kilobytes with a 32-bit next index overhead per particle.

I made something similar to this some years back in C using the technique above (but not with as many interesting particle interactions as the demo game) which could handle about 10 mil particles before it started to go below 30 FPS and on older hardware with only 2 cores. It did use C as well as SIMD and multithreading, but I think you can get a very speedy solution in Java handling a boatload of particles at once if you do the above.

Data structure:

As particles move from one cell to the next, all you do is manipulate a couple of integers to move them from one cell to the other. Cells don't "own memory" or allocate any. They're just 32-bit indices.

To figure out which cell a particle occupies, just do:

cell_x = (int)(particle_x[particle_index] / cell_size)
cell_y = (int)(particle_y[particle_index] / cell_size)
cell_index = cell_y * num_cols + cell_x

... much cheaper constant-time operation than traversing a tree structure and having to rebalance it as particles move around.

could you elaborate on the collision-checking mechanism? I don't get how looking at a cell tells you about all the neighbors of a particle. it seems that some of the particle's neighbors could be in a neighboring cell. — paleto-fuera-de-madrid, Mar 20 '18 at 12:54
If all particles have the same size, then you can determine what particles may collide with a given particle by checking the cell(s) that the particle overlaps (all cells that intersect the particle's circle or AABB, e.g.). If the particles have different sizes then it's a little bit trickier. You can do either of two things: 1) insert particles that overlap multiple cells into each cell. 2) Make the cells loose and expand/shrink their AABBs to fit the particles inside. I have a rather lengthy write-up on the second method here: https://stackoverflow.com/a/48384354/4842163 — , Mar 21 '18 at 12:05
Basically if all particles have uniform sizes, then you can treat them as just points for insertion and insert each particle to just one cell. However, for collision detection, you query an *area* (a circle or AABB, e.g.) to determine what particles might collide with a given particle. If the particles don't have uniform sizes, then you either insert a particle to all the cells it overlaps or insert to a single cell but one whose AABB can grow/shrink. Then you do the same thing when querying an area and check all cells that the particle overlaps. — , Mar 21 '18 at 12:07
Does the size of the particle objects stored in the list have a significant impact on the performance in this approach? Also, do you have a reference implementation that we could look at? — paleto-fuera-de-madrid, Mar 22 '18 at 02:48
Not in an absolute sense but in a relative sense to the cell size of the data structure. If you use teeny cell sizes that are a fraction of the size of a particle, for example, and the particles vary wildly in size, then big particles end up getting inserted to many cells, area searches end up requiring checking a boatload of cells, and performance suffers that way. Yet that applies to an extent to quadtrees as well (and spatial hashes) with the exception of loose variants. In general spatial indexes tend to require some level of tuning with respect to the content being stored. — , Mar 22 '18 at 21:58
Loose variants tend to work very well for content which varies wildly in size, since the size of the cells themselves adjust based on what's inserted to them (which means you only have to insert an element to one cell/node regardless of size). However, they have a drawback in that your searches now require checking the AABBs of the cells whereas with non-loose ("tight") variants, they only require looking at one point to determine which cell to traverse or which quadrant of a tree (quadtree, kd-tree, etc) to traverse. — , Mar 22 '18 at 21:59
Still the loose variants (loose quadtrees, loose grids) are probably the closest to a well-balanced data structure for collision detection where you can just throw whatever you want at it and have it do a good job. As for source code, that link provides a full implementation for a quadtree which tends to be a decent start. The loose quadtree and loose grid tend to be quite easy to implement afterwards. — , Mar 22 '18 at 22:01
For collision detection in particular with very dynamic content, loose variants get my personal vote and I've had the most success with them for content which varies wildly in size. They'd be horrible for contexts like raytracing where your bottlenecks are dominated by search queries. However, for collision detection, you tend to have hotspots distributed between updating the data structure (with things moving every single frame) and searching it. In those cases, the loose variant makes updating (insertion/removal) really, really cheap even though it makes searches a bit more expensive. — , Mar 22 '18 at 22:05

optimizing a grid-based particle system

3 Answers3