theor

solutions looking for problems

Demo effects using Rust and WebAssembly


I needed a reason to play with rust+wasm, and I have always loved the demoscene and its cleverness. I also wanted the result to be embeddable here on my blog made with Astro. Guess who’s about to fill buffers with fire, plasma, stars and roads.

Rust, Wasm bootstrap

The rust community has spent much time making WebAssembly a functional and efficient target, and it shows. First, the rustc compiler has multiple targets for it: we’ll use the seemingly undecided wasm32-unknown-unknown target. The workflow that worked for me uses wasm-pack, which, in one command, handles the compilation using cargo, the wasm optimization/stripping using wasm-opt and generates the .js/.ts glue code.

I also discovered the lovely cargo watch to monitor my rust folder and compile in on change, using this command:

cargo watch -s "wasm-pack build --profiling --target web -d blog-astro/src/content/blog/lorem-ipsum/_pkg --no-typescript --no-pack"

About arguments:

The resulting .js files export both an init promise loading the actual wasm and every symbol exported by the rust code.

Rust exports are made using wasm-bindgen :

#[wasm_bindgen]
pub struct Plasma { /* ... */ }

#[wasm_bindgen]
impl Plasma {
    #[wasm_bindgen(constructor)]
    pub fn new(w: usize, h: usize) -> Self { /* ... */ }

    pub fn update(&mut self,t: f32) { /* ... */ }
}
import init, { Plasma, StatefulFire, Stars, Roads2,  } from "./_pkg/sample_rust";

await init();
const p = new Plasma(WIDTH, HEIGHT);
p.update(time);

I added controls using TweakPane, a great js lib for controls à la datgui/imgui. Then it’s just a matter of filling an HTML Canvas using rust code - more on that in the Wasm interop optimization section

Plasma effect

The plasma effect is typical of demo effects: trivial to do today with a GPU, impressive when it was written on a CPU, often without a floating point unit. Look at this Amstrad demo:

The gist of it is to compute a color for each pixel using some combination of trigonometric functions; usually, the function returns a value used as an index in a pre-computed palette. An indexed palette used to be a requirement - the above Amstrad CPC had at best a 16-color palette out of 27, in Mode 0 (160x200 pixels) before using any of the typical palette-expanding tricks of the time. However, palette indexing is still more performant today and gives other opportunities like palette shifting (more details in my esp32/demoscene article).

Let’s start with simple concentric circles and a greyscale palette. The palette is a simple lerp outputting ABGR colors as u32, as the HTML canvas expects that format:


fn col32((r,g,b):(u8,u8,u8)) -> u32 {
    255 << 24 | (b as u32) << 16 | (g as u32) << 8 | (r as u32)
}

const PALETTE_SIZE: usize = 256;
let mut palette: Vec<u32> = vec![0; PALETTE_SIZE];

for i in 0..PALETTE_SIZE {
    // color_sys crate
    let c = col32((i as u8, i as u8, i as u8));
    palette[i] = c;
}

The concentric circles are made by piping the distance to the center in a sine :

for y in 0..self.h {
    for x in 0..self.w {
        // position as ratios, in the [0,1] range
        let fx = x as f32 / (self.w - 1) as f32;
        let fy = y as f32 / (self.h - 1) as f32;

        // palette index
        let v = (
            // distance from the current point to the center
            vec2(fx - 0.5, fy - 0.5)
            // scaled to get multiple bands
            * 16.0
        )
        .length()
        // make it oscillate
        .sin();

        // buffer index
        let i = y * self.w + x;
        b[i] = self.palette[
            remap(
                v,
                -1.0..=1.0,
                0.0..=255.0)
            as usize % 256];
    }
}

And that’s all. The result, with the palette drawn below :

It’s now time to animate the effect, first by shifting the palette, which is just a matter of offsetting the palette index with the time in seconds. I’ll do it before remapping the [-1,1] value to [0,255] :

remap(
    (vec2(fx - 0.5, fy - 0.5) * 16.0).length().sin()
     + time , // shift
    -1.0..=1.0,
    0.0..=255.0,
) as usize
% 256

Doing it this way means we’ll cycle around the palette, which shows that the palette is not continuous. Let’s loop the palette:

for i in 0..PALETTE_SIZE {
    // go from 0 to 1 to 0 over the palette range
    let c = if i < PALETTE_SIZE / 2 {
        i as i32 * 2
    } else {
        PALETTE_SIZE as i32 - ((i - PALETTE_SIZE / 2) as i32 * 2)
    };
    col32(Rgb::from((c, c, c)).into())
}

That’s better. Time for a better palette:

for i in 0..PALETTE_SIZE {
    let f = i as f32 / PALETTE_SIZE as f32;
    let ff = if i < PALETTE_SIZE / 2 {
        f * 2.0
    } else {
        1.0 - ((f - 0.5) * 2.0)
    };

    col32(Rgb::from(Hsl::from((
        // lerp the hue from blue to green-yellow
        lerp(203.0..=31.0, ff / 2.0),
        82.0,
        // lerp the lightness
        ((ff / 2.0 + 0.5).powi(2) * 100.0).clamp(10.0, 100.0),
    ))).into())
}

Now let’s animate the circle center too :

remap(
    (vec2(
        fx - 0.25 - time.sin() * 0.2,
        fy - 0.3 - time.cos() * 0.3
    ) * 16.0).length().sin()
    + time,
    -1.0..=1.0,
    0.0..=255.0,
) as usize
    % 256

We now have everything we need: a rendering function akin to a SDF, animated, and palette shifting. It just needs an interesting rendering function. I’ll use the bracket-noise crate :

let w = noise.get_noise(
    // warp the noise domain
    fx * 3.0 + (time / 3.0).sin(),
    fy + (time / 7.0).cos().powi(2));
remap(
    // feed the noise to itself
    noise.get_noise(
        fx + w + time.cos(),
        fy + w.powi(3) + (time/1.7).sin())
    // palette shifting
    + time * (fx * fy + 0.1) / 6.0,
    0.0..=0.7,
    0.0..=255.0,
) as usize
    % 256

Optionally, the palette could be stepped into non-continuous colors:

for i in 0..PALETTE_SIZE {
    let f = i as f32 / PALETTE_SIZE as f32;
    let ff = if i < PALETTE_SIZE / 2 {
        f * 2.0
    } else {
        1.0 - ((f - 0.5) * 2.0)
    };
    // stepping : 
    let stepped = (ff * 4.0).round() / (4.0);
    col32(Rgb::from(Hsl::from((
        lerp(203.0..=31.0, stepped / 2.0),
        82.0,
        ((stepped / 2.0 + 0.5).powi(2) * 100.0).clamp(10.0, 100.0),
    ))).into())
}

The real bottleneck is the massive amount of sin/cos calls. I tried to implement a precomputed look-up table for the trigonometry as we used to but Rust’s own trig is still twice as fast. Maybe I’ll investigate, there’s a potential article here…

Fire effect

Another classic effect. Fabien Sanglard talks in length about its principle, as done in the PSX port of Doom.

At its core, it’s a cellular automaton : a grid where each cell evolves according to its neighbors. The difference with Conway’s is that cell values aren’t just binary, but values over a range used to index a palette. Values are propagated upward, with a bit of horizontal randomization, and attenuated.

Stars

The earlier instance I found is in Spacewar! on PDP-1. Draw dots on the screen, and move them away from the center of the screen to simulate perspective.

Instead of simple points, I draw lines between the previous and the current position to give a feeling of speed and make that speed change over time. The point of perspective is changed over time and when moving the cursor. I like the lo-fi Star Wars hyperdrive vibe.

Each star is stored in polar coordinates, an angle and a radius. This is a good opportunity for an interactive SVG visualization test :

Then it’s a matter of keeping an array of stars, incrementing the distance to the center each frame and resetting it when that distance is out of bounds :

By using the previous position as the start of a line to the current position, we get a feeling of speed. Line drawing is an arcane art, and the reference is the Bresenham algorithm. Here I used Xiaolin Wu’s line algorithm, which also does anti-aliasing, by returning an alpha value for each pixel along the line.

The last only things missing are the planets (created using deepfold’s planet generator), which take a primitive sprite drawing algorithm. I implemented scaled drawing but not rotation, which would have made it a proper rotozoom.

The bitmap data itself is loaded asynchronously, by abusing the HTML image element, then extracting the image data and caching it:

interface Bitmap {
    w: number,
    h: number,
    data: Uint32Array,
}

const bitmapCache = new Map<string, Promise<Bitmap>>();

async function loadBitmap(name: string): Promise<Bitmap> {
    if (bitmapCache.has(name)) {
        return bitmapCache.get(name)!;
    }
    const p = new Promise<Bitmap>((resolve, reject) => {
        // <img> html element
        const image = new Image();

        // we'll need a canvas to extract the raw pixels from the image
        const canvas = document.createElement("canvas");

        image.onload = () => {
            canvas.width = image.width;
            canvas.height = image.height;

            const ctx = canvas.getContext('2d')!;
            ctx.drawImage(image, 0, 0);

            // raw u32 pixels
            const data = new Uint32Array(ctx.getImageData(0, 0, ctx.canvas.width, ctx.canvas.height,).data.buffer);
            resolve({
                w: image.width,
                h: image.height,
                data: data,
            });
        };
        image.onerror = e => reject(e);
        // triggers the actual loading
        image.src = name;

    });
    bitmapCache.set(name, p);
    return p;
}

Outrun 3d roads

Credit: background from Panda, tree from the original Outrun game. Use the keyboard arrows!

OutRun made a big splash when it was released by Sega in 1986, and part of that was the 3D rendering. It’s a form of raycasting: starting from the bottom of the screen, find the road segment by projecting the screen position and draw it with the right height and horizontal offset according to the player’s position.

Lou’s pseudo 3d page is a great resource to dive into that, so I won’t repeat it here.

Today’s rabbit hole : Wasm interop optimization

If the performance of Rust code compiled to WASM was a good surprise, the first issue I encountered was the cost of passing arrays from JS. My initial approach to canvas drawing was a function taking the canvas’ ImageData buffer as a &mut [u8] slice. My understanding is that wasm-bindgen copies the data in and out of WASM, which could take several milliseconds for a 640x480 framebuffer.

I considered creating the canvas from rust code, but adding a dependency on web-sys came with tens of kilobytes in the resulting wasm. The current .wasm file, containing all the demos, is a 61kb file.

The best solution I found is to allocate the framebuffer from rust as a Vec, create a wasm-bindgen exposed method returning the offset of the vec in the wasm memory and construct the canvas ImageData from that offset and the wasm memory pointer:

struct Demo {
    w: usize,
    h: usize,
    buffer: Vec<u32>,
}

impl Demo {
    #[wasm_bindgen(constructor)]
    pub fn new(w: usize, h: usize) -> Self {
        Self {
            w,
            h,
            buffer: vec![0; w*h],
        }
    }
    // even if ImageData expects a u8 buffer, we can just reinterpret-cast it from the js side
    // ugly, but it works
    #[wasm_bindgen]
    pub fn get_ptr(&self) -> *const u32 { self.buffer.as_ptr() }

    #[wasm_bindgen]
    pub fn update(&mut self) {
        self.buffer.fill(0);
    }
}
// startup
const demo = new Demo(WIDTH, HEIGHT);
const data = {
    buffer: <ImageData | undefined>undefined
};
// each frame
demo.update();

// the wasm memory is reallocated/invalidated after an allocation?
// see https://github.com/rustwasm/wasm-bindgen/issues/2439
if (!data.buffer || data.buffer.data.byteLength === 0) {
    // no allocation as we provide the pre-allocated buffer
    let raw_buffer = new Uint8ClampedArray(
        // the entire sandboxed memory of the wasm app
        memory.buffer,
        // our method to get the framebuffer offset in that memory
        p.get_ptr(),
        // 4 bytes per pixel
        WIDTH * HEIGHT * 4
    );
    data.buffer = new ImageData(raw_buffer, WIDTH, HEIGHT);
}
canvasContext.putImageData(data.buffer, 0, 0);

Conclusion

There’s much more to do (benchmark wasm-compiled trigonometry functions, compile rust in no-std mode to reduce even more the binary size, …) but that’s all for now. Maybe I’ll write on my experiment plugging the GNU-Rocket sync tracker in there, and this article is a good prelude to my fascination with fantasy consoles, but I’ll stop here for now! The code is available here: https://github.com/theor/rust-demoeffects/tree/main/sample/sample-rust/src