# Making a Game in Go, part 5

Last post about draw ordering, I promise.

In the first four posts of this series, I iterated on an implementation for a 2D game engine, starting from a minimal implementation of the ebiten.Game interface through to a system for organising components both in a logical hierarchy as well as a data structure for drawing the right component at the right time (assuming the components can be totally ordered). The previous post ended on some open pondering on whether switching from sort.Sort to a topological ordering made sense, from both an ease-of-use perspective (the user can define fewer ordering constraints) and an efficiency perspective (big-O amount of work is potentially quadratic).

Friends, it does make sense to implement topological ordering. And all the other kinds, too.

Making things easier for the users of a library is often a good call. Accidentally sorting things incorrectly, because they happen to not be totally ordered, could be really confusing.

And it’s not always the library writer’s job to shield the users from inefficiencies. While a library should ideally aim to avoid needless inefficiency, there are other concerns to balance, like maintainability and ease-of-use, and some problems are by their very nature intractable, or at least difficult in the sense of requiring a lot of processing to solve.

A library or framework can embody the opinions of its author. A responsible library author would clearly document the chosen tradeoffs - at least a big fat “this implementation is hella slow” in the godoc and example code would go a long way. Designing the library interface to avoid or discourage inefficient usage could be even better.

If a game maker wants to overlap thousands of components in the same place, knowing that it would probably be inefficient given the engine being used, then that’s a little bit on them.

And good library might offer a few different approaches, and let the user pick among them.

But all discussion of theoretical efficiency is kind of moot. What matters is how the program behaves when it is actually run. As long as the game’s performance isn’t annoying players, it almost doesn’t matter what kind of algorithm is being used.

By performance I don’t just mean framerate. Other concerns players might have include memory usage, remaining battery, GPU fan noise, wear on the hard drive, secretly mining cryptocurrencies (which, to be clear, are all evil - this isn’t hyperbole - if you support cryptocurrencies or NFTs then you are morally bankrupt and any rights you might have had to read my blog are revoked), the ongoing climate catastrophe…

Benchmarking and profiling tools exist to measure how fast an implementation is. A heuristic I’ve been using is to take a CPU profile, and compare how long is spent preparing to draw versus actually drawing. Actually drawing should take a bit longer.

For my own goals, I’m aiming to make my game run at 60fps under WebAssembly on an iPad. In a rough early cut of implementing topological ordering, the framerate on my iPad began to dip a bit below 60. But, with a little work, it runs about as quickly (often faster) than the previous sort method, even though it would be even slower in the worst case (and, bonus: everything is correctly ordered now).

### The gory details

More data structures? Why not keep everything in a flat slice? Because incremental changes to the right data structure might be more efficient overall compared with doing processing in-place on a flat slice.

Firstly, here is a type that is a set (of things to draw):

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  type drawerSet map[Drawer]struct{} // Creating a drawerSet: s := make(drawerSet) // Inserting v into s: s[v] = struct{}{} // Removing v from s: delete(s, v) // Iterating over s: for v := range s { ... } 

struct{} has no fields and there’s only one value of the type, which is struct{}{}, but you can still use it as a value type in a map.

Representing a set as map[...]struct{} versus map[...]bool is a stylistic choice.

On the one hand, iterating over map[...]struct{} does not require checking if the corresponding value is false, if that is a possibility, and there is (theoretically) no memory required to store each struct{}{}.

On the other, testing to see if something is an element is easier in map[...]bool, because failed lookups result in the zero value for the value type (false). if m[key] { instead of the longer if _, found := m[key]; found {. Also, true is slightly easier to type on many keyboards, compared with struct{}{}, when adding new values.

Secondly, a direct acyclic graph (DAG) data structure, implemented in a two-sided edge-set fashion:

 1 2 3 4 5  type edges struct { in, out drawerSet } type dag map[Drawer]edges 

This dag maps things to draw (type Drawer) to two sets: a set of things to draw before it (in) and a set of things to draw after it (out). I will now switch terminology: the things to draw are the vertices, and the ordering constraints are the edges linking the vertices together.

Some edge-list or edge-set implementations of graphs focus only on the outgoing edges from each vertex. I’m also recording ingoing edges, for reasons that will be explained soon. What is important is that if v is an element of d[u].out, then u should be an element of d[v].in.

Here’s a few methods for managing edges and vertices in dag:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37  // addVertex ensures the vertex is present, even if there are // no edges. func (d dag) addVertex(v Drawer) { if _, found := d[v]; found { return } d[v] = edges{ in: make(drawerSet), out: make(drawerSet), } } // removeVertex removes v, and all edges associated with v, // in O(degree(v)). func (d dag) removeVertex(v Drawer) { for u := range d[v].in { delete(d[u].out, v) } for w := range d[v].out { delete(d[w].in, v) } delete(d, v) } // addEdge adds the edge (u -> v) in O(1). func (d dag) addEdge(u, v Drawer) { d.addVertex(u) d.addVertex(v) d[v].in[u] = struct{}{} d[u].out[v] = struct{}{} } // removeEdge removes the edge (u -> v) in O(1). func (d dag) removeEdge(u, v Drawer) { delete(d[v].in, u) delete(d[u].out, v) } 

A quick note about addEdge and removeEdge. If u isn’t already in d, then d[u] has the value edges{}, or more verbosely, edges{in: nil, out: nil}. Using .in and .out on this value is safe (unlike, say, if the map stored pointers to edges, because then d[u] would be nil if u is not in the map) and deleting from .in and .out is also safe (because delete is documented to do nothing for nil maps). Setting values in .in and .out (where nil) would panic, however. This is why addEdge needs to call addVertex to ensure the maps exist, but removeEdge does not.

The degree of a vertex is how many edges connect to it. In a directed graph, the outdegree of a vertex is how many edges start with that vertex, and the indegree is how many end with that vertex. In- and out-degrees can be counted easily with dag: indegree of v is len(d[v].in) and outdegree is len(d[v].out)

Ease of counting indegrees and iterating out-edges suggests using a topological sorting algorithm based on… counting indegrees and iterating out-edges (Kahn’s Algorithm):

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  // topWalk visits each vertex in topological order, in time // O(|V| + |E|) and temporary memory O(|V|). func (d dag) topWalk(visit func(Drawer)) { // We know the queue will contain each vertex at some point, // so preallocate to the full size. queue := make([]Drawer, 0, len(d)) indegree := make(map[Drawer]int, len(d)) for v, e := range d { if len(e.in) == 0 { queue = append(queue, v) } else { indegree[v] = len(e.in) } } // Visit every vertex (O(|V|)) and decrement indegrees for // every out-edge of each vertex visited (O(|E|)). // Total: O(|V|+|E|). for len(queue) > 0 { u := queue[0] visit(u) queue = queue[1:] // Decrement indegree of each vertex v on an out-edge of u, // and enqueue v if its indegree is now 0. for v := range d[u].out { indegree[v]-- if indegree[v] == 0 { queue = append(queue, v) } } } } 

This can be extended to tolerate (though not truly “support”) cyclic graphs; if there is a cycle, the queue will become empty but some vertices will remain unvisited. But, if there is a cycle, then there is no draw ordering that respects all the constraints. At this level of the library, it may be appropriate to enqueue an arbitrary unvisited vertex, perhaps one with minimal indegree, let it draw, and hope for the best.

Hope? What happened to “hope is not a strategy”?

Actually, hope is a strategy, just often a bad one.

### Using the dag

Now I just have to get a dag loaded up with stuff to draw and enough ordering constraints, and we’re good to go. In the previous post this is where I threw my hands up, exlaiming “comparing everything against everything else is quadratic! This is going to suck!”. And that’s still true, in the worst case.

In practice, a game consisting of lots of things overlapping a small area is likely to be inscrutable to the player, and there might be easier ways to achieve the intended effect (e.g. a single flat image or animation instead).

The number of comparisons and ordering constraints can be reduced in the following two ways, among others:

1. If something hasn’t moved (e.g. components that are “fixed terrain”), then we don’t have to re-evaluate its ordering against other fixed things.
2. If, after projecting into screen space, two items don’t overlap, we don’t have to compare them. We can speed up finding potentially overlapping things in a variety of ways.

To simplify the rest of the discussion, I will assume every component that needs drawing also has a 3D bounding box, i.e. the Drawer interface now looks like:

 1 2 3 4  type Drawer interface { Draw(*ebiten.Image, *ebiten.DrawImageOpts) BoundingBox() geom.Box } 

where geom.Box is a 3D analogue of image.Rectangle. This definition makes it harder on components that “just want to provide a Z value”, since they now have to invent a bounding box. Though, that bounding box could be:

 1 2 3 4  geom.Box{ Min: geom.Point{math.MinInt, math.MinInt, z}, Max: geom.Point{math.MaxInt, math.MaxInt, z}, } 

This is an edge case worth keeping in mind later - what happens with incredibly huge components?

For the sake of letting the game maker choose different draw-ordering approaches, let’s expose the DAG-based drawing code via a new type:

 1 2 3 4 5 6 7 8 9  // DrawDAG manages drawing all subcomponents via a directed // acyclic graph (DAG) to provide a topological ordering that // satisfies draw ordering constraints. type DrawDAG struct { Components []interface{} dag // TODO: any other helper data structures } 

Other approaches to drawing can have their own type. For instance, DrawDFS (short for “draw depth-first search”) could draw subcomponents in a depth-first traversal of the tree, without regard to whether the components have other ideas about their draw ordering. The game maker can then pick and choose which “draw manager” to use, or even write their own. With a bit of care, multiple draw managers could be used for different parts of the game tree!

Anyway.

The API for DrawDog DrawDAG will involve being notified of subcomponents being added and removed, as well as needing to take stock of all subcomponents initially. Since each component is actually a tree of components, registering and unregistering will have to happen recursively (recall Scan for finding immediate subcomponents of a component).

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78  // Prepare prepares the DrawDAG by registering subcomponents. func (d *DrawDAG) Prepare() { d.dag = make(dag) for _, c := range d.Components { d.Register(c) } } // Register registers the component and all subcomponents, // recursively. func (d *DrawDAG) Register(component interface{}) { if dr, ok := component.(Drawer); ok { d.registerOne(dr) } if sc, ok := component.(Scanner); ok { for _, x := range sc.Scan() { d.Register(x) } } } // Unregister unregisters the component and all subcomponents, // recursively. func (d *DrawDAG) Unregister(component interface{}) { if sc, ok := component.(Scanner); ok { for _, x := range sc.Scan() { d.Unregister(x) } } if dr, ok := component.(Drawer); ok { d.unregisterOne(dr) } } func (d *DrawDAG) registerOne(x Drawer) { d.dag.addVertex(x) // TODO: compare x against other vertices to find // draw ordering constraints to add to d.dag as edges. // For a slow quadratic-time implementation, imagine a // loop that compares x against every v in d.dag. } func (d *DrawDAG) unregisterOne(x Drawer) { d.dag.removeVertex(x) } // Draw draws everything in the DAG in topological order. func (d *DrawDAG) Draw(screen *ebiten.Image) { // This is similar to the previous implementation (Game.Draw), // except g is replaced with d as the root component, and the // loop over g.drawList is replaced with: d.dag.topWalk(func(x Drawer) { // ... loop body that determines if x is hidden and what // draw options to apply ... x.Draw(screen, &state.opts) }) } // Update updates the DAG with changes to draw ordering // constraints. func (d *DrawDAG) Update() error { // Assume each component calls Update on any subcomponents, // similar to this loop. for _, x := range d.Components { if u, ok := x.(Updater); ok { if err := u.Update(); err != nil { return err } } } // TODO: update d.dag return nil } 

After Update-ing subcomponents, DrawDAG is ready to update d.dag. A simplistic approach would be to delete d.dag and build it from scratch. But some work can be saved if we only update for components whose bounding boxes have changed. If registerOne and unregisterOne are relatively quick, we can remove all the components that changed, and re-add them. These changes to DrawDAG might look like:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40  type DrawDAG struct { Components []interface{} dag // Cache of last-known bounding box for each component: boxCache map[Drawer]geom.Box // TODO: any other helper data structures } func (d *DrawDAG) Update() error { // update subcomponents loop here // Remove and re-add components that have moved. var readd []Drawer for dr, bb := range d.boxCache { nbb := dr.BoundingBox() if bb != nbb { d.unregisterOne(dr) readd = append(readd, dr) } } for _, db := range readd { d.registerOne(dr) } return nil } func (d *DrawDAG) registerOne(x Drawer) { d.dag.addVertex(x) d.boxCache[x] = x.BoundingBox() // TODO: find draw ordering constraints involving x } func (d *DrawDAG) unregisterOne(x Drawer) { d.dag.removeVertex(x) delete(d.boxCache, x) } 

Note that this does not deal with the possibility that draw ordering might need to change in cases where bounding boxes remain unchanged. This could be added using an interface for components to report whether they’ve changed shape in the current update.

Still, this is an improvement on rebuilding the DAG on every update - components that have changed have their constraints checked, and components that are fixed only change if they were or are connected to something that changed. In the worst case this could still be every component, but for a scene consisting mostly of fixed terrain, this is a big improvement.

Next, let’s deal with avoiding comparisons in registerOne by checking if there are overlaps in screen space. A blunt-force approach, that makes the theoretical worst-case even more worse, is to divide screen space into equally-sized square chunks, and for each chunk, track the set of components that overlap that chunk.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84  type DrawDAG struct { Components []interface{} ChunkSize int // tunable! dag boxCache map[Drawer]geom.Box chunks map[image.Point]drawerSet projection geom.Projection } func (d *DrawDAG) registerOne(x Drawer) { d.dag.addVertex(x) xbox := x.BoundingBox() d.boxCache[x] = xbox // Project xbox into screen space and find chunk coordinates xrect := xbox.BoundingRect(d.projection) min := xrect.Min.Div(d.ChunkSize) max := xrect.Max.Sub(image.Pt(1, 1)).Div(d.ChunkSize) // Look in each overlapping chunk, and build a set of // candidates to check (union of all overlapping chunks). candidates := make(drawerSet) var p image.Point for p.Y = min.Y; p.Y <= max.Y; p.Y++ { for p.X = min.X; p.X <= max.X; p.X++ { chunk := d.chunks[p] if chunk == nil { d.chunks[p] = drawerSet{x: {}} continue } // Merge chunk into candidates for c := range chunk { candidates[c] = struct{}{} } // Add x to the chunk chunk[x] = struct{}{} } } // Compare x against each candidate y for y := range candidates { // Don't create self-cycles if x == y { continue } // Do the bounding rectangles actually overlap? yrect := y.BoundingBox().BoundingRect(d.projection) if !xrect.Overlaps(yrect) { // x and y both overlap some chunk, but not each other continue } switch { case drawOrderConstraint(x, y): d.dag.addEdge(x, y) case drawOrderConstraint(y, x): d.dag.addEdge(y, x) } } } func (d *DrawDAG) unregisterOne(x Drawer) { d.dag.removeVertex(x) // Remove x from d.chunks // Since x.BoundingBox() might have changed, use the value // cached in d.boxCache instead (saved in d.registerOne) to // find the chunks containing x. xrect := d.boxCache[x].BoundingRect(d.projection) min := xrect.Min.Div(d.ChunkSize) max := xrect.Max.Sub(image.Pt(1, 1)).Div(d.ChunkSize) var p image.Point for p.Y = min.Y; p.Y <= max.Y; p.Y++ { for p.X = min.X; p.X <= max.X; p.X++ { delete(d.chunks[p], x) } } delete(d.boxCache, x) } // drawOrderConstraint details omitted; compares boxes to see // which is in front/behind/on top/below/etc, calls DrawBefore/DrawAfter func drawOrderConstraint(x, y Drawer) bool {...} 

If the scene consists of small components that are spread out, this approach is pretty good. Each chunk will only have a few components, and each component is small, so there are few chunks to gather and few components to compare.

If the scene consists of lots of large, perpetually moving components, all overlapping the same chunks, then all this is bad! - at least $$O(A|V| + |V|^2)$$, where A is the average screen area of the components (as measured in chunks, but that’s a constant factor).

It’s a start!