Josh's blog

Advent of Crimes (in Go)

In which I confess my Go crime-ing.

Advent of Code is a yearly bit of fun where, every day from 1 Dec through to 25 Dec, a small problem is posted on adventofcode.com, and participants try to solve each problem. A typical problem lends itself to a computer-programming-based solution. It’s a little bit of programming fun at the end of the year.

It would be healthy to treat it purely as a little bit of fun, but there is also a leaderboard and point-scoring. The first 100 participants to correctly solve each part of each day’s problem score some points based on how quick they were. Some real sickos manage to score lots of points! (I do not!)

Despite being in a convenient timezone, until this year I have not placed in the top 100 of any problem. Now that I have scored some points (159, to be exact), I can start giving advice.

Some general tips for trying to get on the leaderboard:

Finally, something that has saved me a bit of time was to write a library of generic algorithms (at least, they save time when they actually work…).

Misdemeanours

The following save time when speed-coding, but have fairly well-known downsides:

Must

It’s important to not ignore errors, even when speed-coding. You don’t want to fail to parse the input, and then waste time looking for bugs in the processing step.

The Go way for handling errors is to take advantage of multiple return args: return an error, then check it in an if block. In the overwhelming majority of cases this is more readable and maintainable than “try/catch” because you are dealing with the error immediately (i.e. in the context of the code with the error, rather than many lines away), and the strategy can be tailored to each error: self-recover, log useful messages, terminate the program, or make clear (to future maintainers) the deliberate choice to ignore an error.

But it is an inconvenient amount of typing when speed-coding. The input file should be there - if not, the program should panic. If the input contains lines of integers, they should all parse correctly with strconv.Atoi - if not, the program should panic. And so on.

The standard library packages text/template and html/template both have a convenient function called Must. It is implemented like so:

func Must(t *Template, err error) *Template {
    if err != nil {
        panic(err)
    }
    return t
}

The idea is that, sometimes, you want to make sure your template can parse, but the Go compiler doesn’t parse them - the template library does. It has to run at runtime. So the next best thing is to terminate the program as soon as it starts. You wrap the call to template.Parse in the call to template.Must, at init time, and if the template doesn’t parse, you find out before main:

var myCoolTemplate = template.Must(
    template.New("cool").Parse("{{.}} is cool!~"),
)

Go doesn’t have generics, so we would have to re-implement this for every single type we want to use it on, The End Oh wait Go does have generics now:

func Must[T any](t T, err error) T {
    if err != nil {
        panic(err)
    }
    return t
}

Why Must[T] is a crime

If template.Must is a part of the standard library, and is recommended for parsing templates, why would Must[T] be any different? Truthfully, they’re not. However, outside of init-time usage for exposing programmer-caused problems as soon as possible, Must encourages coding in a way that, if any errors happen at all, dump said error in a panic. That’s fine if your user is also the developer of the particular program, but is an unacceptable user experience for anyone else.

Func and games

Sometimes a more functional approach is desired. Consider some operation that is easy to write as a loop:

sum := 0
for _, value := range numbers {
    sum += value
}
fmt.Println(sum)

How primitive and basic! I should be coding more sophisticatedly with some functional programming. Clearly what is needed is some Sum:

import "golang.org/x/exp/constraints"

type Addable interface {
    constraints.Integer | constraints.Float | 
        constraints.Complex | ~string
}

func Sum[S ~[]E, E Addable](in S) E {
    var accum E
    for _, x := range in {
        accum += x
    }
    return accum
}

Or what about the act of converting decimal strings into ints? What is this rudimentary loop nonsense? Don’t you have a PhD?

nums := make([]int, 0, len(input))
for _, line := range input {
    nums = append(nums, Must(strconv.Atoi(line)))
}

That is just Map! You want to call my buddy Map here!

func Map[S ~[]X, X, Y any](in S, f func(X) Y) []Y {
    out := make([]Y, len(in))
    for i, x := range in {
        out[i] = f(x)
    }
    return out
}

//...

nums := Map(input, func(line string) int {
    return Must(strconv.Atoi(line))
})

Heck - Must inside Map must be pretty common, so let’s have a MustMap that does both:

func MustMap[S ~[]X, X, Y any](in S, f func(X) (Y, error)) []Y {
    out := make([]Y, len(in))
    for i, x := range in {
        y, err := f(x)
        if err != nil {
            panic(fmt.Sprintf("at index %d: %v", i, err))
        }
        out[i] = y
    }
    return out
}

// ...

nums := MustMap(input, strconv.Atoi)

Why this is a crime

Functional style is nice in straightforward cases, and saves a lot of typing. But loops are great! With a loop you can see that the code is about to embark on a linear pass through something. You can set up state before the loop exactly as you need, skip iterations as needed, and break out of it as needed. So… I think that loops are both more readable (you can see what it is doing) and maintainable (making it do something different later on is a smaller delta).

But also…how about those generic type parameters? Sum and Map take slices of some element type E, but also types where the underlying type is []E, hence the additional S ~[]E. And then the E itself. Map is even worse! It’s a jungle of capital letters!

Sets

Some people come to Go, see there is a map, and expect there to be a set, and then get upset when they see there is no set, and the standard advice is to just use map.

Which is fine advice, when you consider maintainability. What might begin as a set often later needs to have extra values associated with each element - so you may as well use a map to begin with, and require less code change later on.

Speed coding is, again, different. There are approaches where what is needed is definitely a set. Sometimes the most natural way to think about the solution algorithm is in terms of set operations like intersection, such as in the rucksack reorganisation problem.

Sadly there is no way to make a generic set type Shut up yes there is:

// Set is a generic set type based on map.
type Set[T comparable] map[T]struct{}

// Insert inserts x into the set. If s == nil, Insert
// returns a new set containing x.
func (s Set[T]) Insert(x ...T) Set[T] {
    if s == nil {
        s = make(Set[T])
    }
    for _, x := range x {
        s[x] = struct{}{}
    }
    return s
}

// make(Set[T]), delete(s, x), and len(s) are so
// simple that I'm not making methods.

// Contains reports whether s contains x.
func (s Set[T]) Contains(x T) bool {
    _, c := s[x]
    return c
}

// Intersection returns a new set with elements
// common to both sets s and t.
func (s Set[T]) Intersection(t Set[T]) Set[T] {
    // (Fewer lookups in large map) is faster
    // than (more lookups in small map)?
    // I wrote a benchmark - it seems to be true.
    if len(s) > len(t) {
        s, t = t, s
    }
    u := make(Set[T])
    if len(s) == 0 || len(t) == 0 {
        return u
    }
    for x := range s {
        if t.Contains(x) {
            u.Insert(x)
        }
    }
    return u
}

and so on.

Why Set[T] is a crime

The main pieces of Set are just very thin wrappers around existing syntax for working with maps. Insert is really just s[x] = struct{}{}. Contains is really just _, c := s[x]; return c. Do these save any typing? Barely - and the map syntax would be shorter if, instead of map[T]struct{} being the underlying type, I switched to map[T]bool (and only ever store true). So all Set is really doing is obscuring how it works in the name of abstraction. Wrapping make(Set[T]), delete(s, x), and len(s) in methods would have been too much even for me.

Methods like Union, Intersection, Keep, Disjoint, SubsetOf, and SymmetricDifference save time when the set abstraction is the most useful way to think about the problem. But together with quick conversion functions ToSlice, SetFromSlice, RuneSet, and ByteSet, they may encourage writing code in a way that hides quadratic time or memory complexity.

Tricks with image

The image package exists mainly for operating on images. But it contains two convenient types: image.Point and image.Rectangle. A Point is a struct{ X, Y int }, with some methods. A Rectangle is a struct{ Min, Max Point }.

Point has some basic vector arithmetic methods. Points can also be asked if they are inside a Rectangle. This is very convenient for avoiding writing manual bounds checks: I often create a Rectangle value called bounds, and do the bounds check for a Point p as p.In(bounds). This beats hand-writing four different inequalities that tend to be prone to off-by-one errors.

Rectangles can be unioned, intersected, provide an inset subrectangle, and so on. Very convenient.

Because Point and Rectangle are comparable structs, they work well as keys for map and Set. This trick is useful for problems involving sparse arrangements of points (e.g. unbounded Conway’s Game of Life).

I also save some time by having pre-written slices of neighboring points: for example, Neigh4 = []image.Point{{0, -1}, {1, 0}, {0, 1}, {-1, 0}}. Considering the four neighbors of a point p can then be done as a loop:

for _, d := range Neigh4 {
    q := p.Add(d)
    if !q.In(bounds) {
        continue
    }
    // do something with q
}

Why this is a crime

Your Honour, I argue that it is not. image.Point and image.Rectangle are being used exactly as designed… just not on “images” of the kind the library was intended for. For maintainability it might be better to introduce named types applicable to the problem at hand.

Some things I might add to the image library to make it more convenient would be methods on image.Point that compute common norms (vector length in e.g. the Manhattan metric) and the ability to grow an image.Rectangle to cover a given Point.

Also, I found myself reaching for things during Advent of Code problems that are moral equivalents of image.Point and image.Rectangle in dimensions other than 2. An integer range type would be useful for the camp cleanup problem, and a three-dimensional axis-aligned box type would be useful for doing bounds checks, e.g. for the lava droplet problem.

It would be nice if one could genericise types by an integer parameter…

Grids

So often in Advent of Code, the input consists of some grid of digits or characters. The hill-climbing problem is a nice example.

Go has arrays (fixed size, size is part of the type) and slices (based on arrays but there is support for dynamically growing them). Both of them are one-dimensional. Sure, you can make a “two-dimensional slice” which is really a slice of slices:

// h and w contain the grid height and width, respectively
grid := make([][]byte, h)
for i := range grid {
    grid[i] = make([]byte, w)    
}

But rotations, transpositions, and resizing starts to get fiddly - it’s easy to put the wrong loop variable in the wrong index, or get a subtraction the wrong way around, and once again you are debugging part of the program that could have been a well-tested library implementation.

So I wrote a somewhat tested library. There are two kinds of grid: Dense[T] uses the slice-of-slices approach, and Sparse[T] is a map[image.Point]T under the hood.

Why grid is a crime

The main problem with grid is it’s probably inefficient, compared with highly-optimised number-crunching libraries. But it’s mostly OK? It has tests? The API isn’t horrible?

Pub-sub

If only we were working in a language that made concurrent programs easy to write… hang on, I am!

What about a bit of concurrency maximalism? Wouldn’t it be nice to make all those CPU cores work for once?

Consider the monkeys-shouting-numbers problem, or little Bobby Table’s electronics kit. We could solve these by evaluating each monkey / logic gate serially, until no values change. Or be slightly more clever and maintain a queue of monkeys / gates that have updated, and then re-evaluate only those that depend on them.

Or we could… make each monkey operation / logic gate its own goroutine, and use a pub-sub topic for each monkey / wire. As each operation is evaluated, it announces its new value to any interested parties via its topic.

// Topic is a pub-sub topic.
type Topic[T any] struct {
    mu     sync.Mutex
    subs   []chan T
    closed bool
}

// Pub publishes the value v onto the topic.
func (q *Topic[T]) Pub(v T) {
    q.mu.Lock()
    defer q.mu.Unlock()
    if q.closed {
        panic("topic is closed")
    }
    for _, c := range q.subs {
        c <- v
    }
}

// Sub subscribes to messages on the topic.
func (q *Topic[T]) Sub() <-chan T {
    q.mu.Lock()
    defer q.mu.Unlock()
    if q.closed {
        panic("topic is closed")
    }
    ch := make(chan T, 1)
    q.subs = append(q.subs, ch)
    return ch
}

// Close closes the topic.
func (q *Topic[T]) Close() {
    q.mu.Lock()
    defer q.mu.Unlock()
    for _, c := range q.subs {
        close(c)
    }
    q.subs = nil
    q.closed = true
}

// ...

func monkey(xch, ych <-chan int, out *Topic[int], op func(int, int) int) {
    // Wait until each side has a value
    lx, ly := <-xch, <-ych
    lo := op(lx, ly)
    out.Pub(lo)
    for {
        // Each time either side recieves a
        // value, re-evaluate, and publish
        // any changes.
        // Replace each input channel with
        // nil once it is closed, to avoid
        // reading zeroes over and over.
        select {
        case x, open := <-xch:
            if open {
                lx = x
            } else {
                xch = nil
            }
        case y, open := <-ych:
            if open {
                ly = y
            } else {
                ych = nil
            }
        }
        if xch == nil && ych == nil {
            break
        }
        if o := op(lx, ly); o != lo {
            out.Pub(o)
            lo = o
        }
    }
    out.Close()
}

// ...

// Track each monkey's output topic.
topics := make(map[string]*Topic[int])
topic := func(label string) *Topic[int] {
    t, ok := topics[label]
    if !ok {
        t = new(Topic[int])
        topics[label] = t
    }
    return t
}

for _, line := range input {
    // ... handle monkeys with a single number ...
    // ... parse line into "label: left op right" ...
    
    out, l, r := topic(label), topic(left), topic(right)
    
    go monkey(l.Sub(), r.Sub(), out, op)
}

fmt.Println(<-topic("root").Sub())

Why Topic is a crime

Topic itself is not a crime. It’s pretty much exactly how you have to implement a concurrent single-publisher multiple-subscriber topic.

The real crime is to make the Go runtime do all the work. Using goroutines for literally everything started to fall out of fashion around Go 1.3, and for good reason: goroutines are actually not as lightweight as you want them to be. They’re still good enough for one-goroutine-per-incoming-HTTP-request, but definitely not the right fit for one-goroutine-per-item-in-a-massively-parallel-operation.

Emulation

This is one hack I’m kind of proud of. Kind of.

A theme in Advent of Code is the problem of the form: “Here is a description of a simple computer, and your input is a program for that computer. What is the output of your program?”

It’s enough to emulate the simple computer as described. But I thought it would be cool if I could use the big complex computer I’m writing this blog post on to optimise the input program in some way. It turns out we already have programs that do this: compilers. All I have to do is translate the input into some language a compiler understands (like Go), run the compiler, then run the compiled code, like a JIT compilation process.

Thus, I wrote a library that manages the translation, compilation, and running. You provide the input program and a set of translators. Each translator has to know how to turn an input instruction into a Go equivalent. Once translated, the equivalent Go program is fed through the Go compiler with the flag -buildmode=plugin. Once the plugin is built, it can be loaded back into the program with the plugin standard library package, and run directly.

Why emu is a crime

It’s absolutely not hardened against any security issues. It would be a mistake to expose this library to the internet in such a way an attacker could supply input - that’s literally remote code execution. Fortunately, the toy computers described in AoC problems aren’t very capable, so may only end in consuming resources or denial of service.

On host, an attacker-controlled process could probably race the loader, swapping out the freshly-compiled plugin with a plugin of its own. This would be useful for elevating from one user to another.

It’s also not very efficient. As fast as the Go compiler is, it is not designed for quickly compiling very small snippets of code in a JIT fashion, and there’s multiple round-trips to disk (writing and reading the temporary Go program, writing and reading the compiled plugin).

Finally, Go plugins are only supported on Linux, FreeBSD, and macOS. (It’s dlopen under the hood.)