In which I confess my Go crime-ing.
Advent of Code is a yearly bit of fun where, every day from 1 Dec through to 25 Dec, a small problem is posted on adventofcode.com, and participants try to solve each problem. A typical problem lends itself to a computer-programming-based solution. It’s a little bit of programming fun at the end of the year.
It would be healthy to treat it purely as a little bit of fun, but there is also a leaderboard and point-scoring. The first 100 participants to correctly solve each part of each day’s problem score some points based on how quick they were. Some real sickos manage to score lots of points! (I do not!)
Despite being in a convenient timezone, until this year I have not placed in the top 100 of any problem. Now that I have scored some points (159, to be exact), I can start giving advice.
Some general tips for trying to get on the leaderboard:
- Do all the previous problems, because there are definitely themes. For example, the first problem is usually a variation on summing a list of numbers. There’s usually a problem involving finding the best path through a maze / elevation grid. There’s usually a problem involving emulating a computer. And so on. Having encountered similar problems before is useful.
- Write a small script that downloads the input file for you. This saves about 3 seconds of time.
- Pre-generate source files, so that you aren’t wasting time writing the
boilerplate (e.g. in Go, things like
package main
,func main() {
). This saves another 4 seconds. - Once the problems start getting harder, always test with the example input until that works.
Finally, something that has saved me a bit of time was to write a library of generic algorithms (at least, they save time when they actually work…).
Misdemeanours
The following save time when speed-coding, but have fairly well-known downsides:
- No comments on anything.
- One-letter variable names.
- No unit tests of any kind.
- Inline types together with inline funcs that would be better organised as
top-level types with methods.
- Bonus points for implicitly capturing all state in the inline closures.
- Implementing each solution in its own file, but all in the one package, so they have to be built and run individually.
- More frequent use of
goto
. - Not caring about data races, or when data races are even considered, handling
it by spinning
atomic
functions.
Must
It’s important to not ignore errors, even when speed-coding. You don’t want to fail to parse the input, and then waste time looking for bugs in the processing step.
The Go way for handling errors is to take advantage of multiple return args:
return an error, then check it in an if
block. In the overwhelming majority
of cases this is more readable and maintainable than “try/catch” because you are
dealing with the error immediately (i.e. in the context of the code with the
error, rather than many lines away), and the strategy can be tailored to each
error: self-recover, log useful messages, terminate the program, or make clear
(to future maintainers) the deliberate choice to ignore an error.
But it is an inconvenient amount of typing when speed-coding. The input file
should be there - if not, the program should panic. If the input contains lines
of integers, they should all parse correctly with strconv.Atoi
- if not, the
program should panic. And so on.
The standard library packages text/template
and html/template
both have a
convenient function called Must
. It
is implemented like so:
func Must(t *Template, err error) *Template {
if err != nil {
panic(err)
}
return t
}
The idea is that, sometimes, you want to make sure your template can parse, but
the Go compiler doesn’t parse them - the template library does. It has to run
at runtime. So the next best thing is to terminate the program as soon as it
starts. You wrap the call to template.Parse
in the call to template.Must
,
at init time, and if the template doesn’t parse, you find out before main
:
var myCoolTemplate = template.Must(
template.New("cool").Parse("{{.}} is cool!~"),
)
Go doesn’t have
generics, so we would have to re-implement this for every single type we want to
use it on, The End Oh wait Go does have generics now:
func Must[T any](t T, err error) T {
if err != nil {
panic(err)
}
return t
}
Why Must[T]
is a crime
If template.Must
is a part of the standard library, and is recommended for
parsing templates, why would Must[T]
be any different? Truthfully, they’re
not. However, outside of init-time usage for exposing programmer-caused problems
as soon as possible, Must
encourages coding in a way that, if any errors
happen at all, dump said error in a panic. That’s fine if your user is also the
developer of the particular program, but is an unacceptable user experience for
anyone else.
Func and games
Sometimes a more functional approach is desired. Consider some operation that is easy to write as a loop:
sum := 0
for _, value := range numbers {
sum += value
}
fmt.Println(sum)
How primitive and basic! I should be coding more sophisticatedly with some
functional programming. Clearly what is needed is some
Sum
:
import "golang.org/x/exp/constraints"
type Addable interface {
constraints.Integer | constraints.Float |
constraints.Complex | ~string
}
func Sum[S ~[]E, E Addable](in S) E {
var accum E
for _, x := range in {
accum += x
}
return accum
}
Or what about the act of converting decimal strings into ints? What is this rudimentary loop nonsense? Don’t you have a PhD?
nums := make([]int, 0, len(input))
for _, line := range input {
nums = append(nums, Must(strconv.Atoi(line)))
}
That is just Map
! You
want to call my buddy Map
here!
func Map[S ~[]X, X, Y any](in S, f func(X) Y) []Y {
out := make([]Y, len(in))
for i, x := range in {
out[i] = f(x)
}
return out
}
//...
nums := Map(input, func(line string) int {
return Must(strconv.Atoi(line))
})
Heck - Must
inside Map
must be pretty common, so let’s have a MustMap
that
does both:
func MustMap[S ~[]X, X, Y any](in S, f func(X) (Y, error)) []Y {
out := make([]Y, len(in))
for i, x := range in {
y, err := f(x)
if err != nil {
panic(fmt.Sprintf("at index %d: %v", i, err))
}
out[i] = y
}
return out
}
// ...
nums := MustMap(input, strconv.Atoi)
Why this is a crime
Functional style is nice in straightforward cases, and saves a lot of typing. But loops are great! With a loop you can see that the code is about to embark on a linear pass through something. You can set up state before the loop exactly as you need, skip iterations as needed, and break out of it as needed. So… I think that loops are both more readable (you can see what it is doing) and maintainable (making it do something different later on is a smaller delta).
But also…how about those generic type parameters? Sum
and Map
take slices
of some element type E
, but also types where the underlying type is []E
,
hence the additional S ~[]E
. And then the E
itself. Map
is even worse!
It’s a jungle of capital letters!
Sets
Some people come to Go, see there is a map
, and expect there to be a set
,
and then get upset when they see there is no set
, and the standard
advice is to just use map
.
Which is fine advice, when you consider maintainability.
What might begin as a set often later needs to have extra values associated with
each element - so you may as well use a map
to begin with, and require less
code change later on.
Speed coding is, again, different. There are approaches where what is needed is definitely a set. Sometimes the most natural way to think about the solution algorithm is in terms of set operations like intersection, such as in the rucksack reorganisation problem.
Sadly there is no way to make a generic set type Shut up yes there is:
// Set is a generic set type based on map.
type Set[T comparable] map[T]struct{}
// Insert inserts x into the set. If s == nil, Insert
// returns a new set containing x.
func (s Set[T]) Insert(x ...T) Set[T] {
if s == nil {
s = make(Set[T])
}
for _, x := range x {
s[x] = struct{}{}
}
return s
}
// make(Set[T]), delete(s, x), and len(s) are so
// simple that I'm not making methods.
// Contains reports whether s contains x.
func (s Set[T]) Contains(x T) bool {
_, c := s[x]
return c
}
// Intersection returns a new set with elements
// common to both sets s and t.
func (s Set[T]) Intersection(t Set[T]) Set[T] {
// (Fewer lookups in large map) is faster
// than (more lookups in small map)?
// I wrote a benchmark - it seems to be true.
if len(s) > len(t) {
s, t = t, s
}
u := make(Set[T])
if len(s) == 0 || len(t) == 0 {
return u
}
for x := range s {
if t.Contains(x) {
u.Insert(x)
}
}
return u
}
Why Set[T]
is a crime
The main pieces of Set
are just very thin wrappers around existing syntax for
working with maps. Insert
is really just s[x] = struct{}{}
. Contains
is
really just _, c := s[x]; return c
. Do these save any typing? Barely - and
the map syntax would be shorter if, instead of map[T]struct{}
being the
underlying type, I switched to map[T]bool
(and only ever store true
). So all
Set
is really doing is obscuring how it works in the name of abstraction.
Wrapping make(Set[T])
, delete(s, x)
, and len(s)
in methods would have been
too much even for me.
Methods like Union
, Intersection
, Keep
, Disjoint
,
SubsetOf
, and SymmetricDifference
save time when the set abstraction is
the most useful way to think about the problem. But together with quick
conversion functions ToSlice
, SetFromSlice
, RuneSet
, and ByteSet
, they
may encourage writing code in a way that hides quadratic time or memory
complexity.
Tricks with image
The image
package exists mainly for operating on
images. But it contains two convenient types:
image.Point
and
image.Rectangle
.
A Point
is a struct{ X, Y int }
, with some methods. A Rectangle
is a
struct{ Min, Max Point }
.
Point
has some basic vector arithmetic methods. Point
s can also be asked if
they are inside a Rectangle
. This is very convenient for avoiding writing
manual bounds checks: I often create a Rectangle
value called bounds
, and
do the bounds check for a Point p
as p.In(bounds)
. This beats hand-writing
four different inequalities that tend to be prone to off-by-one errors.
Rectangle
s can be unioned, intersected, provide an inset subrectangle, and so
on. Very convenient.
Because Point
and Rectangle
are comparable structs, they work well as keys
for map
and Set
. This trick is useful for problems involving sparse
arrangements of points (e.g. unbounded Conway’s Game of Life).
I also save some time by having pre-written slices of neighboring points:
for example, Neigh4 = []image.Point{{0, -1}, {1, 0}, {0, 1}, {-1, 0}}
.
Considering the four neighbors of a point p
can then be done as a loop:
for _, d := range Neigh4 {
q := p.Add(d)
if !q.In(bounds) {
continue
}
// do something with q
}
Why this is a crime
Your Honour, I argue that it is not. image.Point
and image.Rectangle
are
being used exactly as designed… just not on “images” of the kind the library
was intended for. For maintainability it might be better to introduce named
types applicable to the problem at hand.
Some things I might add to the image
library to make it more convenient would
be methods on image.Point
that compute common norms (vector length in e.g.
the Manhattan metric) and the ability to grow an image.Rectangle
to cover a
given Point
.
Also, I found myself reaching for things during Advent of Code
problems that are moral equivalents of image.Point
and image.Rectangle
in
dimensions other than 2.
An integer range type
would be useful for
the camp cleanup problem, and a
three-dimensional axis-aligned box type
would be useful for doing bounds checks, e.g. for
the lava droplet problem.
It would be nice if one could genericise types by an integer parameter…
Grids
So often in Advent of Code, the input consists of some grid of digits or characters. The hill-climbing problem is a nice example.
Go has arrays (fixed size, size is part of the type) and slices (based on arrays but there is support for dynamically growing them). Both of them are one-dimensional. Sure, you can make a “two-dimensional slice” which is really a slice of slices:
// h and w contain the grid height and width, respectively
grid := make([][]byte, h)
for i := range grid {
grid[i] = make([]byte, w)
}
But rotations, transpositions, and resizing starts to get fiddly - it’s easy to put the wrong loop variable in the wrong index, or get a subtraction the wrong way around, and once again you are debugging part of the program that could have been a well-tested library implementation.
So I wrote
a somewhat tested library.
There are two kinds of
grid: Dense[T]
uses the slice-of-slices approach, and Sparse[T]
is a
map[image.Point]T
under the hood.
Why grid
is a crime
The main problem with grid
is it’s probably inefficient, compared with
highly-optimised number-crunching libraries. But it’s mostly OK? It has tests?
The API isn’t horrible?
Pub-sub
If only we were working in a language that made concurrent programs easy to
write… hang on, I am!
What about a bit of concurrency maximalism? Wouldn’t it be nice to make all those CPU cores work for once?
Consider the monkeys-shouting-numbers problem, or little Bobby Table’s electronics kit. We could solve these by evaluating each monkey / logic gate serially, until no values change. Or be slightly more clever and maintain a queue of monkeys / gates that have updated, and then re-evaluate only those that depend on them.
Or we could… make each monkey operation / logic gate its own goroutine, and use a pub-sub topic for each monkey / wire. As each operation is evaluated, it announces its new value to any interested parties via its topic.
// Topic is a pub-sub topic.
type Topic[T any] struct {
mu sync.Mutex
subs []chan T
closed bool
}
// Pub publishes the value v onto the topic.
func (q *Topic[T]) Pub(v T) {
q.mu.Lock()
defer q.mu.Unlock()
if q.closed {
panic("topic is closed")
}
for _, c := range q.subs {
c <- v
}
}
// Sub subscribes to messages on the topic.
func (q *Topic[T]) Sub() <-chan T {
q.mu.Lock()
defer q.mu.Unlock()
if q.closed {
panic("topic is closed")
}
ch := make(chan T, 1)
q.subs = append(q.subs, ch)
return ch
}
// Close closes the topic.
func (q *Topic[T]) Close() {
q.mu.Lock()
defer q.mu.Unlock()
for _, c := range q.subs {
close(c)
}
q.subs = nil
q.closed = true
}
// ...
func monkey(xch, ych <-chan int, out *Topic[int], op func(int, int) int) {
// Wait until each side has a value
lx, ly := <-xch, <-ych
lo := op(lx, ly)
out.Pub(lo)
for {
// Each time either side recieves a
// value, re-evaluate, and publish
// any changes.
// Replace each input channel with
// nil once it is closed, to avoid
// reading zeroes over and over.
select {
case x, open := <-xch:
if open {
lx = x
} else {
xch = nil
}
case y, open := <-ych:
if open {
ly = y
} else {
ych = nil
}
}
if xch == nil && ych == nil {
break
}
if o := op(lx, ly); o != lo {
out.Pub(o)
lo = o
}
}
out.Close()
}
// ...
// Track each monkey's output topic.
topics := make(map[string]*Topic[int])
topic := func(label string) *Topic[int] {
t, ok := topics[label]
if !ok {
t = new(Topic[int])
topics[label] = t
}
return t
}
for _, line := range input {
// ... handle monkeys with a single number ...
// ... parse line into "label: left op right" ...
out, l, r := topic(label), topic(left), topic(right)
go monkey(l.Sub(), r.Sub(), out, op)
}
fmt.Println(<-topic("root").Sub())
Why Topic
is a crime
Topic
itself is not a crime. It’s pretty much exactly how you
have to implement a concurrent single-publisher multiple-subscriber topic.
The real crime is to make the Go runtime do all the work. Using goroutines for literally everything started to fall out of fashion around Go 1.3, and for good reason: goroutines are actually not as lightweight as you want them to be. They’re still good enough for one-goroutine-per-incoming-HTTP-request, but definitely not the right fit for one-goroutine-per-item-in-a-massively-parallel-operation.
Emulation
This is one hack I’m kind of proud of. Kind of.
A theme in Advent of Code is the problem of the form: “Here is a description of a simple computer, and your input is a program for that computer. What is the output of your program?”
It’s enough to emulate the simple computer as described. But I thought it would be cool if I could use the big complex computer I’m writing this blog post on to optimise the input program in some way. It turns out we already have programs that do this: compilers. All I have to do is translate the input into some language a compiler understands (like Go), run the compiler, then run the compiled code, like a JIT compilation process.
Thus, I wrote
a library that manages the translation, compilation, and running. You provide the
input program and a set of translators. Each translator has to know how to turn
an input instruction into a Go equivalent. Once translated, the equivalent Go
program is fed through the Go compiler with the flag -buildmode=plugin
.
Once the plugin is built, it can be loaded back into the program with the
plugin
standard library package, and run
directly.
Why emu
is a crime
It’s absolutely not hardened against any security issues. It would be a mistake to expose this library to the internet in such a way an attacker could supply input - that’s literally remote code execution. Fortunately, the toy computers described in AoC problems aren’t very capable, so may only end in consuming resources or denial of service.
On host, an attacker-controlled process could probably race the loader, swapping out the freshly-compiled plugin with a plugin of its own. This would be useful for elevating from one user to another.
It’s also not very efficient. As fast as the Go compiler is, it is not designed for quickly compiling very small snippets of code in a JIT fashion, and there’s multiple round-trips to disk (writing and reading the temporary Go program, writing and reading the compiled plugin).
Finally, Go plugins are only supported on Linux, FreeBSD, and macOS. (It’s
dlopen
under the hood.)