The Millhouse Group Blog: performance

Showing posts with label performance. Show all posts

Saturday, 30 September 2023

Frenzy.js - the saga of flood-fill

It may surprise you to know that yes, I am back working on Frenzy.js after a multi-year hiatus. It's a bit surreal working on an "old-skool" (pre-hooks) React app but the end is actually in sight. As of September 2023 I have (by my reckoning) about 80% of the game done;

Basic geometry (it's all scaled 2x)
Levels with increasing numbers of Leptons with increasing speed
Reliable collision detection
High-score table that persists to a cookie
Mostly-reliable calculation of the area to be filled
Accurate emulation of the game state, particularly when the game "pauses"

The big-ticket items I still need to complete are:

Implement "chasers" on higher levels
Fine-tune the filled-area calculation (it still gets it wrong sometimes)
Animated flood-fill
Player-start, player-death and Lepton-death animations
(unsure) Sound.

It all comes flooding back

To remind you (if you were an 80s Acorn kid) or (far more likely) educate you on what I mean by "flood-fill", here's Frenzy doing its thing in an emulator; I've just completed drawing the long vertical line, and now the game is flood-filling the smaller area:

I wanted to replicate this distinctive style of flood-fill exactly in my browser-based version, and it's been quite the labour of love. My first attempt (that actually worked there were many iterations that did not) was so comically slow that I almost gave up on the whole idea. Then I took a concrete pill and decided that if I couldn't get a multiple-GHz multi-cored MONSTER of a machine to replicate a single-cored 2MHz (optimistically) 8-bit grot-box from the early 1980s, I may as well just give up...

The basic concept for this is:

Given a polygonal area A that needs to be flood-filled;

Determine bottom-rightmost inner point P within A.
The "frontier pixels" is now the array [P]
On each game update "tick":
  Expand each of the "frontier pixels" to the N,S,E and W; but
    Discard an expansion if it hits a boundary of A
    Also discard it if the pixel has already been filled
  The new "frontier pixels" is all the undiscarded pixels 
Stop when the "frontier pixels" array is empty

I got this to be pretty efficient using a bit-field "sparse array" to quickly check for already-filled pixels. In the browser, I could perform the per-tick operations in less than 0.1 milliseconds for any size of A. Not too surprising given the entire game area is only 240x180 pixels, and the maximum possible polygonal area could only ever be half that big: 21,600 pixels.

The problem now became efficiently shifting the big pile'o'filled-pixels from the algorithm onto the HTML5 canvas that is the main gameplay area. I'm using the excellent React Konva library as a nice abstraction over the canvas, but the principal problem is that a canvas doesn't expose per-pixel operations in its API, and nor does Konva. The Konva team has done an admirable job making their code as performant as possible, but my first cut (instantiating a pile of tiny 1x1 Rects on each tick) simply couldn't cope once the number of pixels got significant:

This has led me down a quite-interesting rabbit-hole at the intersection of HTML5 Canvas, React, React-Konva, and general "performance" stuff which is familiar-yet-different. There's an interesting benchmark set up for this, and the results are all over the shop depending on browser and platform. Mobile results are predictably terrible but I'm deliberately not targeting them. This was a game for a "desktop" (before we called them that) and it needs keyboard input. I contemplated some kind of gestural control but it's just not good enough I think, so I'd rather omit it.

What I need to do, is find a way to automagically go from a big dumb pile of individual filled pixels into a suitable collection of optimally-shaped polygons, implemented as Konva Lines.

The baseline

In code, what I first naïvely had was:

type Point = [number, number]

// get the latest flood fill result as array of points
const filledPixels:Array<Point> = toPointArray(sparseMap);

// Simplified a little - we use some extra options
// on Rect for performance...
return filledPixels.map((fp) => 
  <Rect x={fp[0]} 
        y={fp[1]}
        width={1}
        height={1}
        lineCap="square"
        fillColor="red"
  />
);

With the above code, the worst-case render time while filling the worst-case shape (a box 120x180px) was 123ms. Unacceptable. What I want is:


// Konva just wants a flat list of x1,y1,x2,y2,x3,y3
type Poly = Array<number>;

// get the latest flood fill result as array of polys
const polys:Array<number> = toOptimalPolyArray(sparseMap);

// far-fewer, much-larger polygons
return polys.map((poly) => 
  <Line points={poly} 
        lineCap="square"
        fillColor="red"
        closed
  />
);

So how the hell do I write toOptimalPolyArray()?

Optimisation step 1: RLE FTW

My Googling for "pixel-to-polygon" and "pixel vectorisation" failed me, so I just went from first principles and tried a Run-Length-Encoding on each line of the area to be filled. As a first cut, this should dramatically reduce the number of Konva objects required. Here's the worst-case render time while filling the worst-case shape (a box 120x180px): 4.4ms

Optimisation step 2: Boxy, but good

I'd consider this to be a kind of half-vectorisation. Each row of the area is optimally vectorised into a line with a start and end point. The next step would be to iterate over the lines, and simply merge lines that are "stacked" directly on top of each other. Given the nature of the shapes being filled is typically highly rectilinear, this felt like it would "win" quite often. Worst-case render time now became: 1.9ms

Optimisation step 3: Know your enemy

I felt there was still one more optimisation possible, and that is to exploit the fact that the game always picks the bottom-right-hand corner in which to start filling. Thus there is a very heavy bias towards the fill at any instant looking something like this:

----------------
|              |
|              |
|             P|
|            LL|
|           LLL|
|          LLLL|
|         LLLLL|
|        LLLLLL|
|       LLLLLLL|
|      LLLLLLLL|
|     LLLLLLLLL|
|    LLLLLLLLLL|
|   LLLLLLLLLLL|
|  LLLLLLLLLLLL|
| LLLLLLLLLLLLL|
|SSSSSSSSSSSSSS|
|SSSSSSSSSSSSSS|
|SSSSSSSSSSSSSS|
----------------

where

P is an unoptimised pixel
L is a part line, that can be fairly efficiently represented by my "half-vectorisation", and
S is an optimal block from the "stacked vectorisation" approach

You can see there are still a large number of lines (the Ls and the P) bogging down the canvas. They all share a common right-hand edge, and then form a perfect right-triangle. I started implementing this change but ended up aborting that code. Worst-case render time is already significantly below the "tick" rate, and the code was getting pretty complex. Okay, it's not optimal optimal, but it's Good Enough. Whew.

Sunday, 8 September 2019

I like to `watch`

Remember how computers were supposed to do the work, so people had time to think?

Do you ever catch yourself operating in a tight loop, something like:

  10 Edit file
  20 Invoke [some process] on file
  30 Read output of [some process]
  40 GOTO 10

Feels a bit mechanical, no? Annoying having to switch from the text editor to your terminal window, yes?

Let's break down what's actually happening in your meatspace loop*:

  10 Edit file in text editor
  15 Save file (e.g. Ctrl-S)
  17 Change active window to Terminal (e.g. Cmd/Alt+Tab)
  20 Press [up-arrow] once to recall the last command
  22 Visually confirm it's the expected command
  25 Hit [Enter] to execute the command
  30 Wait for the command to complete
  35 Interpret the output of the command
  40 Cmd/Alt+Tab back to the editor
  45 GOTO 10

That's even more blatant! Most of the work is telling the computer what to do next, even though it's exactly the same as the last iteration.
(*) Props to anyone recognising the BASIC line-numbering reference 😂

Wouldn't something like this be better?

  10 Edit file in text editor
  15 Save file (e.g. Ctrl-S)
  20 Swivel eyeballs to terminal window
  30 Wait for the command to complete
  35 Interpret the output of the command
  40 Swivel eyeballs back to editor
  45 GOTO 10

It can be better

If you've got npm on your system somewhere it's as simple as:

  $ npm install -g watch

and arranging your UI windows suitably. Having multiple monitors is awesome for this. Now by invoking:

  $ watch '[processing command with arguments]' dir1 dir2 ... dirN

you have the machine on your side. As soon as you save any file in dir1, dir2 etc, the command will be run for you. Here are some examples:

Validate a CircleCI build configuration

You're editing the circleci/config.yml of a CircleCI continuously-integrated project. These YAML files are notoriously tricky to get right (whitespace matters...🙄) - so you can get the circleci command-line tool to check your work each time you save the file:

  $ brew install circleci
  $ watch 'circleci config validate' .circleci

Validate a Terraform configuration

You're working on a Terraform infrastructure-as-code configuration. These .TF files can have complex interrelationships - so you can get the terraform command-line tool to check your work each time you save the file:

  $ brew install terraform
  $ watch 'terraform validate' .

Auto-word-count an entire directory of files

You're working on a collection of files that will eventually be collated together into a single document. There's a word-limit applicable to this end result. How about running wc to give you a word-count whenever you save any file in the working directory?:

  $ watch 'wc -w *.txt' .

Power tip

Sometimes, the command in your watch expression is so quick (and/or its output so terse), you can't tell whether you're seeing the most-recent output. One way of solving this is to prefix the time-of-day to the output - a quick swivel of the eyeballs to the system clock will confirm which execution you're looking at:

  $ watch 'echo `date '+%X'` `terraform validate`' .
  > Watching .
  13:31:59 Success! The configuration is valid. 
  13:32:23 Success! The configuration is valid.
  13:34:41 Success! The configuration is valid.

Tuesday, 29 August 2017

Don't Fetch What You Don't Need

I've been using GraphQL a bit at work recently - it's an interesting approach that seems in many ways to be the next evolution of RESTful APIs, where the client gets to choose exactly what they'd like the server to return to them.

GraphQL is a work-in-progress. The data type primitives are very limiting (how can I represent a UNIX/JavaScript timestamp with just an Int?), and always POSTing to the server seems like a backward step to the bad-old-days of SOAP. But as with all things in the JavaScript world, it's improving at a truly breakneck pace.

Something that I immediately saw as valuable was being able to save bandwidth by not including fields in the desired response - it also felt familiar, and yesterday I realised why - in MongoDB it's trivially easy to do this whenever you write a query. This excellent feature was sadly not exposed in my Mondrian library for Scala; something I've now rectified in the 0.6.x release.

Some quick tests involving documents that had large arrays of heavyweight fields showed that dropping them using a projection typically saved 50ms of latency, even on very small collections of documents. This has worked out very well for the use case of my current top-secret side project, where upon arrival at the front page, we need to quickly fetch a "summary" version of the most-recent 10 documents. The page visitor can then browse these and we can paginate for more summaries, or, if they are interested in a particular summary, we perform a findById and get the full "heavy" object.