43

I have a large number of rectangles, and some overlap others; each rectangle has an absolute z-order and a colour. (Each 'rectangle' is actually the axis-aligned bounding box of a particle effect, mesh or texture and may be semi-transparent. But its easier to think abstractly about coloured rectangles as long as you don't try to cull rectangles behind others, so I will use that in the problem description:)

The cost of changing the 'colour' is quite high; its much faster to draw two blue rectangles in succession than it is to draw two different-coloured rectangles.

The cost of drawing rectangles that are not even on the screen is quite high too and should be avoided.

If two rectangles do not overlap, the order they are drawn relative to one-another is not important. Its only if they overlap that the z-order is important.

For example:

Overlapping Rectangles

1 (red) and 4 (red) can be drawn together. 2 (blue) and 5 (blue) can also be drawn together, as can 3 (green) and 7 (green). But 8 (red) must be drawn after 6 (blue). so its either we draw all three red together and draw the blue in two sets, or we draw all the blue together and draw the red in two sets.

And some of the rectangles may move occasionally. (Not all of them; some rectangles are known to be static; others are known to move.)

I will be drawing this scene in JavaScript/webGL.

How can I draw the rectangles in a reasonable order to minimize colour changes, with a good trade-off of JavaScript culling code vs letting the GPU cull?

(Just working out which rectangles overlap and which are visible is expensive. I have a basic quadtree and this sped my scene drawing up immensely (compared to just emitting the draw-ops for the whole scene); now the question is how to minimize OpenGL state changes and concatenate attribute arrays as much as possible)

UPDATE I have created a very simple test app to illustrate the problem and serve as a basis for demonstration of solutions: http://williame.github.com/opt_rects/

The source-code is on github and can easily be forked: https://github.com/williame/opt_rects

It turns out its hard to make a little test app with sufficient state change to actually recreate the problem I see in my full game. At some point you'll have to take it as a given that state changes can be sufficiently expensive. What is also important is how to speed up the spatial index (quadtree in demo) and the overall approach.

Community
  • 1
  • 1
Will
  • 68,898
  • 35
  • 156
  • 231
  • 1
    How many rectangles you have? Max possible value. – Толя Jan 14 '13 at 11:39
  • @Tom a few hundred. https://github.com/williame/ludum_dare_25_you_are_the_villain/blob/gh-pages/data/level1.json#L3852 kind of data (each artwork has a size too, so its a rectangle.) – Will Jan 14 '13 at 11:50
  • Do you have access to the rectangles coordinates? – mitchus Jan 15 '13 at 12:20
  • @mitchus yes the code will know the rectangles. For testing purposes, imagine random values between 0 and 1000. – Will Jan 15 '13 at 14:09
  • 2
    off my head: start with a topological sort of ascending z-order considering only pairs of rectangles that overlap. take the corner of an overlapping rectangle as the coordinate at which to split the overlapped rectangle into 2 or 3 smaller rectangles of the same color, one of which will be occluded completely (drop it at once) or partially (parts can be dropped later). update the toposort relation. repeat until the toposort relation is empty. at that time, there will be no overlaps, so draw by sets of same color. the final number of rects should be linear in the original number. – collapsar Jan 15 '13 at 15:20
  • @collapsar cunning and cool and was actually a google interview question I got once! Sadly my rectangles can be semi-transparent and even move; I've tried to clarify in the question – Will Jan 15 '13 at 15:53
  • Can you explain what do you want achieve? You already have a quadtree and you already told us it's a quite speed up and now you ask us for further improvement? – Gigamegs Jan 15 '13 at 19:16
  • @Phpdevpad I want it to draw at a higher framerate. It'll never be too fast. The faster I can redraw the screen, the richer a scene I can draw at an acceptable framerate, or the more complex a scene I can simulate in the physics. One day soon Apple is going to flip the switch that allows webGL in the iPad's Safari, and then my stuff will run on an iPad, and I want it to run at a playable, acceptable frame-rate. – Will Jan 16 '13 at 06:26
  • My idea is that's already micro-optimization. Maybe you can invest you brain cells in other things. I would also wonder if you would share your solution in an understandable answer? But too me it looks very like a webgl specific problem not really a problem generally spoken an employer should solve for whoever. – Gigamegs Jan 16 '13 at 06:51
  • 1
    @Will Can you provide us with a self-contained example of your current method, so we can benchmark and compare improvements? – Alex L Jan 16 '13 at 08:34
  • How many colors and how many z-orders are there? – mitchus Jan 16 '13 at 08:37
  • Also, is there a typical proportion of rectangles having an overlap? – mitchus Jan 16 '13 at 09:29
  • @AlexL updated the question with a quick example app – Will Jan 16 '13 at 09:57
  • 1
    So would the appropriate metric be simply the number of color changes required to paint all the rectangles, or is there something more complicated to optimize? – Scott Sauyet Jan 16 '13 at 13:45
  • @ScottSauyet colour changes, and culling those not visible – Will Jan 16 '13 at 13:52
  • 1
    The "minimise colour changes" objective is clear, but it's not clear what "culling invisible rectangles" should mean. The latter seems to be a step that would precede the former. Do you have an axis-aligned bounding box representing the view, and want to find a nice data structure to quickly cull the boxes that are entirely outside this AABB? A quadtree seems ideal for this step. – j_random_hacker Jan 16 '13 at 14:01
  • @Will: How would those two factors be combined into a reasonable measure of the success of an algorithm? How would the culling be measured? Or could this be considered a separate optimization problem, as j_random_hacker suggests, after the culling has been performed? – Scott Sauyet Jan 16 '13 at 14:06
  • there is a quadtree in the demo; whether its a particularly efficient one is questionable. Overall I want to draw very large, complex 2D scenes as quickly as possible. Basic goals like trying to avoid unnecessary state changes and recomputing things and so on are general problems and I think algorithmic solutions - perhaps someone can offer a better r-tree for the static content, and/or a simple way to sort layers that increases the chances of two objects with the same state being drawn in succession therefore avoiding the state changes and so on would be cool. – Will Jan 16 '13 at 14:11
  • Are there just 3 "colours"? Or could there be many more? – j_random_hacker Jan 16 '13 at 14:17
  • 1
    @j_random_hacker the example code given has 10, but in real life the number could be much higher; possibly every object in the scene has unique state. But quite likely many share each state. In a 2D platform game, its quite likely that the scene is made from 'tiles', even in 3D-drawn 2D platform games – Will Jan 16 '13 at 14:20
  • How about: 1. order all by Z. 2. Start from back and put in a set each rectangle until one overlaps another from the set. 3. Color sort this set and draw it. 4. Goto 2 – qwertzguy Jan 16 '13 at 15:58
  • Would be fair to know **how** expensive a colour change is. Is the penalty 5 times the execution time of `aabb_intersects`? 100 times? 10000 times? – ilmiacs Jan 16 '13 at 16:17
  • @ilmiacs well 'colour change' is really a GPU state change, and http://stackoverflow.com/questions/6769581/texture-change-and-other-state-change-costs-on-modern-gpus has some good links; in the example test app, I couldn't get it to be significant. In my real games, swapping out so all meshes use one texture and so on really really has an order of magnitude difference, which is why I've asked this question. – Will Jan 16 '13 at 16:38
  • I would be very surprised if there is any modern hardware where there is a CPU solution that is cheaper than the cost of overdraw when using a naive z-buffer solution. Plus with 2D you can use a texture atlas to reduce state changes. So while I appreciate there's an fascinating abstract question here, I'm not convinced there's a solution that applies to the real world. – Kylotan Jan 17 '13 at 12:18
  • @Kylotan yes I first tried to phrase my question very abstractly because I wanted to a general algorithmic (but specific) answer. I've been slowly putting more and more context into it just to answer people's questions. As there is semi-transparency involved, draw order is important and the z-buffer doesn't save you. I do use texture atlases extensively too. I'm hoping for a good interesting algorithmic answer with example code, and perhaps a faster spatial index, even if my framerate in real life hardly moves. – Will Jan 17 '13 at 12:34
  • You're doomed to consider pretty much every single pixel on every semi-transparent surface anyway, and the ways in which you can choose to draw them are effectively limited. That's why there is pretty much one canonical solution here - render opaque objects sorted by texture, then render translucent objects sorted by distance. – Kylotan Jan 17 '13 at 12:51
  • @Kylotan yes I made the obvious, functional and correct code and when I was writing it I thought 'wait a sec, if I just examine the non-moving layers and see for each if I can move it upwards to be adjacent to one the same colour, and with having a quadtree so I just have to walk to parents to see overlaps and early out when it fails or I pass a moving object rather than do an exhaustive search, then I can perhaps reduce the state changes in the common case for drawing any subrect?'. And so I've asked here and I'd be happy with anyone that can show an algorithm with prototype. – Will Jan 17 '13 at 18:09

4 Answers4

16

You are making the very wrong assumption that the performance you will be getting on the desktop browser will somehow determine the performance on your iPhone. You need to understand that the iPhone hardware implements tile-based deferred rendering which means that the fragment shader is used very late in the pipeline anyway. As Apple themselves say (“Do not waste CPU time sorting objects front to back”), Z-sorting your primitives will get you little performance gain.

But here’s my suggestion: if changing the colour is expensive, just don’t change the colour: pass it as a vertex attribute, and merge the behaviours into one super shader so you can draw everything in one or a few batches without even sorting. Then benchmark and determine the optimal batch size for your platform.

sam hocevar
  • 11,037
  • 5
  • 42
  • 59
  • 1
    I think, because he uses quotes with 'colour', he's not referring to simple color, but possibly other state like a different shader.. – Jari Komppa Jan 17 '13 at 09:08
  • @JariKomppa Sure, I understood it like that, too; which doesn't prevent merging those different shaders. I'll clarify my answer. – sam hocevar Jan 17 '13 at 09:37
  • @SamHocevar yes the question is *how* to merge those things that can be drawn with the same state and so to minimize batches... I cannot trivially draw different types of particle effect with the same shader or draw meshes with that same shader either, but I can try and draw as many of the same type of particle effects with the same shader at the same time rather than flipping between them... – Will Jan 17 '13 at 10:27
  • 2
    @Will You will need to provide more information about what you wish to do, then, because using a large `if()` to merge several shaders is certainly trivial. If the idea of "colour" extends to render states then it's not the same question at all, especially if alpha blending is involved. – sam hocevar Jan 17 '13 at 10:55
  • yes I could have a mega shader with switch statement. It wasn't what I was imagining, but it would work. – Will Jan 17 '13 at 11:03
  • @SamHocevar I've been googling and there are people complaining that different cards support different maximum shader lengths, and that there is no way to know what the limit is at runtime nor even to know the size of your program :( – Will Jan 17 '13 at 11:44
  • 2
    @Will Yes, that's a whole deal of new problems to solve and compromises to make :-) But with the information you are providing, you'll only get general strategies for improvements, not full-featured solutions. – sam hocevar Jan 17 '13 at 15:56
12

Choose colours, not boxes!

At any point in time, one or more boxes will be paintable, i.e. they are able to be painted next without introducing problems (though possibly introducing a cost due to having a different colour from the most recently painted box).

The question at every point is: What colour should we pick to draw next? It's not necessary to think about picking individual paintable boxes to draw, because as soon as you pick a particular box to draw next, you might as well draw all available boxes of the same colour that can be drawn at that time. That's because painting a box never adds constraints to the problem, it only removes them; and choosing not to paint a paintable box when you could do so without changing the current colour cannot make the solution less expensive than it would otherwise be, since you will later have to paint this box and that may require a colour change. This also means it doesn't matter in which order we paint paintable boxes of the same colour, since we will paint all of them at once in a single "block" of box painting operations.

The dependency graph

Start by building a "lies underneath" dependency graph, where each coloured rectangle is represented by a vertex and there is an arc (arrow) from v to u if rectangle v overlaps rectangle u and lies underneath it. My first thought was to use this to build a "must be drawn before" dependency graph by finding the transitive closure, but actually we don't need to do this, since all the algorithms below care about is whether a vertex is paintable or not. Paintable vertices are the vertices that have no predecessors (in-arcs), and taking the transitive closure does not alter whether a vertex has 0 in-arcs or not.

In addition, whenever a box of a given colour has only boxes of the same colour as its ancestors, it will be painted in the same "block" -- since all those ancestors can be painted before it without changing colours.

A speedup

To cut down on computation, notice that whenever all paintable boxes of some particular colour have no different-coloured descendants, painting this colour won't open up any new opportunities for other boxes to become paintable, so we don't need to consider this colour when considering which colour to paint next -- we can always leave it till later with no risk of increasing the cost. In fact it's better to leave painting this colour till later, since by that time other boxes of this colour may have become paintable. Call a colour helpful if there is at least one paintable box of that colour that has a different-coloured descendant. When we get to the point when there are no helpful colours remaining (i.e. when all remaining boxes overlap only boxes of the same colour, or no boxes at all) then we are done: just paint the boxes of each remaining colour, picking colours in any order.

Algorithms

These observations suggest two possible algorithms:

  1. A fast but possibly suboptimal greedy algorithm: Choose to paint next the colour that produces the most new paintable vertices. (This will automatically consider only helpful colours.)
  2. A slower, exact DP or recursive algorithm: For each possible helpful colour c, consider the dependency graph produced by painting all paintable c-coloured boxes next:

    Let f(g) be the minimum number of colour-changes required to paint all boxes in the dependency graph g. Then

    f(g) = 1 + min(f(p(c, g)))

    for all helpful colours c, where p(c, g) is the dependency graph produced by painting all paintable boxes of colour c. If G is the dependency graph for the original problem, then f(G) will be the minimum number of changes. The colour choices themselves can be reconstructed by tracing backwards through the DP cost matrix.

    f(g) can be memoised to create a dynamic programming algorithm that saves time whenever 2 different permutations of colour choices produce the same graph, which will happen often. But it might be that even after DP, this algorithm could take an amount of time (and therefore space) that is exponential in the number of boxes... I will have a think about whether a nicer bound can be found.

j_random_hacker
  • 47,823
  • 9
  • 95
  • 154
  • 1
    Good idea with the paintable boxes and the dependency graph. However, my gut tells me that building the graph will be expensive. The greedy algorithm can simply be proven to be suboptimal. The recursive algorithm could be expensive as well. Altogether this solution is possibly not a good tradeoff between js and gpu work. Code shall prove. – ilmiacs Jan 16 '13 at 17:14
  • 2
    @ilmiacs: It seems to me that building that graph will be necessary in almost any approach that offers some algorithmic optimizations. But you need to do that work in any case, regardless of whether you store it in a particular data structure, so creating the graph should not add much overhead to any approach that has to understands which boxes overlap each other. But I agree that the recursive algorithm here is almost certain too expensive for reasonable sized data sets. – Scott Sauyet Jan 16 '13 at 17:49
  • My gut goes with @ilmiacs but its possible perhaps that an optimal ordering computed once when you make a tree can cause the rectangles that are computed to be visible by the tree each time the screen rect moves to be in *near*-optimal general order without any further work, or with a cheapish sort? ***Code shall prove.*** – Will Jan 16 '13 at 17:53
  • 1
    @Will: I haven't looked at your sample yet (maybe from home this evening, maybe not until the weekend.) But if you have a tree structure, doesn't it already encapsulate the overlap? If so, you're most of the way to the graph. I actually wouldn't be surprised if j_random's greedy algorithm would be all you need to achieve the performance you want. Nor would I be surprised to find that the actual optimization problem is NP-complete. It has the feeling of that sort of problem. – Scott Sauyet Jan 16 '13 at 18:01
  • Wherher the algorithm will be efficient depends on whether the complexity of building that tree is low enough, i.e. O(N log N). The quadtree is O(N log N), but yours is different. Giving solutions that are O(N^2) is easy, some won't require trees at all, but are inefficient. ;-) – ilmiacs Jan 16 '13 at 19:36
  • @Will: The dependency graph I describe is in general not a tree, since one box may lie on top of more than one other box. And generating this graph (whether implicitly or explicitly) is **absolutely necessary** for any correct algorithm! (Try to imagine a solution that doesn't involve computing it :-P Ultimately you still need to do overlap tests between pairs of boxes, which is exactly the same thing as testing whether an arc exists between two vertices in the graph.) – j_random_hacker Jan 17 '13 at 00:36
  • @ilmiacs: In general, building the dependency graph is O(n^2), because there could be O(n^2) intersections: imagine a row of tall, skinny boxes overlaid by a column of short, wide boxes to form a "grid". But as I said, all correct algorithms need to contend with this. – j_random_hacker Jan 17 '13 at 00:40
  • @j_random_hacker I think Scott and ilmiacs have some insight here; if you see the equivalence of the tree and graph, and the rectangles in the nodes are sorted by z, then the all overlapping case becomes O(n) because you can early-out on overlap? Hmm, there seems to be a big gap between your waffle and an actual implementation of anything concrete; any chance of putting demo code for the 500 pts I've put up for grabs? – Will Jan 17 '13 at 07:15
  • 1
    @Will Is 'your waffle' really necessary? – MrKWatkins Jan 17 '13 at 17:43
  • 4
    Seems my comment was removed for being offensive. Let me try again: "@Will: My waffle? To answer your question, no, I won't be providing demo code for you. See if you can figure out why." – j_random_hacker Jan 17 '13 at 21:47
2

Here's a possibility. You'll have to benchmark it to see if it's actually an improvement.

For all rectangles, back to front:
  If this rectangle has been marked as drawn, skip to the next one
  Set a screen-sized unseen surface to all black
  Call this rectangle's color "the color"
  For rectangles starting with this one and proceeding toward the front
    If (this rectangle's color is the color and
        all the pixels of this rectangle on the unseen are black) then
      Add this rectangle to the to-draw list
    Draw a white rectangle with this rectangle's shape on the unseen surface
    If the unseen surface is more than half white, break
  For all rectangles on the to-draw list:
    Draw the rectangle
    Mark it as drawn

It's not guaranteed to be the most optimal in terms of ordering, but I think it will come pretty close, and it's worst-case quadratic in the pre-drawing step. It does depend on readbacks from the graphics buffer being fast. One trick that might help there is to create a new one pixel surface that is a shrunken version of the area of interest. Its color will be the fraction of the original that was white.

dspeyer
  • 2,456
  • 1
  • 15
  • 23
2

Start by drawing in a random (but correct) order, for example in strict z order. When drawing each frame, either count the number of color changes, or possibly the actual time a complete frame takes. Each frame, try swapping the order of two rectangles. The rectangles to be swapped must not overlap, therefore they can be drawn in any order without violating correctness; aside from that they can be chosen at random, or do a linear pass through the list, or... If doing the swap reduces the number of color changes, keep the new order, if not revert it and try a different swap in the next frame. If doing the swap neither reduces nor increases the number of color changes, keep it with 50% odds. For any rectangles which did not overlap in a previous frame but which start overlapping due to a move, simply exchange them so they are in z order.

This has some relationship to sorting algorithms which swap pairs of items, except that we cannot compare items, we need to go through the whole list and count color changes. This will perform very badly at first but converge to a good order relatively quick, and will adapt to scene changes. I think it is probably not worth it to go through and calculate an optimum order every frame; this will get to, and maintain, a near-optimum order with very little extra work.

Referring to the drawing you have: Initial draw order picked at random: 1,6,2,4,5,8,3,7 (5 color changes). Swap 5,8. New order: 1,6,2,4,8,5,3,7 (4 color changes) => Keep new order.

Alex I
  • 18,105
  • 7
  • 72
  • 135