Demo effects using Rust and WebAssembly
I needed a reason to play with rust+wasm, and I have always loved the demoscene and its cleverness. I also wanted the result to be embeddable here on my blog made with Astro. Guess who’s about to fill buffers with fire, plasma, stars and roads.
Rust, Wasm bootstrap
The rust community has spent much time making WebAssembly a functional and efficient target, and it shows. First, the rustc
compiler has multiple targets for it: we’ll use the seemingly undecided wasm32-unknown-unknown
target. The workflow that worked for me uses wasm-pack, which, in one command, handles the compilation using cargo
, the wasm optimization/stripping using wasm-opt
and generates the .js/.ts glue code.
I also discovered the lovely cargo watch to monitor my rust folder and compile in on change, using this command:
cargo watch -s "wasm-pack build --profiling --target web -d blog-astro/src/content/blog/lorem-ipsum/_pkg --no-typescript --no-pack"
About arguments:
- I use
--profiling
to get function names when profiling in Chrome. I never used--debug
; I don’t know if I got too much in the oldskool mindset, but I didn’t even searched how to attach a debugger with WASM, or if it is even possible --target web
outputs wasm ready for the browser. Didn’t get--target bundler
to work with Astro- I did write my glue code in Typescript, but the ts typings that wasm-build outputs made Astro choke on a const with no value
--no-pack
prevents the generation of apackage.json
file, which IIUC is used when publishing npm packages. It also made Astro choke
The resulting .js files export both an init
promise loading the actual wasm and every symbol exported by the rust code.
Rust exports are made using wasm-bindgen
:
#[wasm_bindgen]
pub struct Plasma { /* ... */ }
#[wasm_bindgen]
impl Plasma {
#[wasm_bindgen(constructor)]
pub fn new(w: usize, h: usize) -> Self { /* ... */ }
pub fn update(&mut self,t: f32) { /* ... */ }
}
import init, { Plasma, StatefulFire, Stars, Roads2, } from "./_pkg/sample_rust";
await init();
const p = new Plasma(WIDTH, HEIGHT);
p.update(time);
I added controls using TweakPane, a great js lib for controls à la datgui/imgui. Then it’s just a matter of filling an HTML Canvas using rust code - more on that in the Wasm interop optimization section
Plasma effect
The plasma effect is typical of demo effects: trivial to do today with a GPU, impressive when it was written on a CPU, often without a floating point unit. Look at this Amstrad demo:
The gist of it is to compute a color for each pixel using some combination of trigonometric functions; usually, the function returns a value used as an index in a pre-computed palette. An indexed palette used to be a requirement - the above Amstrad CPC had at best a 16-color palette out of 27, in Mode 0 (160x200 pixels) before using any of the typical palette-expanding tricks of the time. However, palette indexing is still more performant today and gives other opportunities like palette shifting (more details in my esp32/demoscene article).
Let’s start with simple concentric circles and a greyscale palette. The palette is a simple lerp outputting ABGR colors as u32
, as the HTML canvas expects that format:
fn col32((r,g,b):(u8,u8,u8)) -> u32 {
255 << 24 | (b as u32) << 16 | (g as u32) << 8 | (r as u32)
}
const PALETTE_SIZE: usize = 256;
let mut palette: Vec<u32> = vec![0; PALETTE_SIZE];
for i in 0..PALETTE_SIZE {
// color_sys crate
let c = col32((i as u8, i as u8, i as u8));
palette[i] = c;
}
The concentric circles are made by piping the distance to the center in a sine :
for y in 0..self.h {
for x in 0..self.w {
// position as ratios, in the [0,1] range
let fx = x as f32 / (self.w - 1) as f32;
let fy = y as f32 / (self.h - 1) as f32;
// palette index
let v = (
// distance from the current point to the center
vec2(fx - 0.5, fy - 0.5)
// scaled to get multiple bands
* 16.0
)
.length()
// make it oscillate
.sin();
// buffer index
let i = y * self.w + x;
b[i] = self.palette[
remap(
v,
-1.0..=1.0,
0.0..=255.0)
as usize % 256];
}
}
And that’s all. The result, with the palette drawn below :
It’s now time to animate the effect, first by shifting the palette, which is just a matter of offsetting the palette index with the time in seconds. I’ll do it before remapping the [-1,1] value to [0,255] :
remap(
(vec2(fx - 0.5, fy - 0.5) * 16.0).length().sin()
+ time , // shift
-1.0..=1.0,
0.0..=255.0,
) as usize
% 256
Doing it this way means we’ll cycle around the palette, which shows that the palette is not continuous. Let’s loop the palette:
for i in 0..PALETTE_SIZE {
// go from 0 to 1 to 0 over the palette range
let c = if i < PALETTE_SIZE / 2 {
i as i32 * 2
} else {
PALETTE_SIZE as i32 - ((i - PALETTE_SIZE / 2) as i32 * 2)
};
col32(Rgb::from((c, c, c)).into())
}
That’s better. Time for a better palette:
for i in 0..PALETTE_SIZE {
let f = i as f32 / PALETTE_SIZE as f32;
let ff = if i < PALETTE_SIZE / 2 {
f * 2.0
} else {
1.0 - ((f - 0.5) * 2.0)
};
col32(Rgb::from(Hsl::from((
// lerp the hue from blue to green-yellow
lerp(203.0..=31.0, ff / 2.0),
82.0,
// lerp the lightness
((ff / 2.0 + 0.5).powi(2) * 100.0).clamp(10.0, 100.0),
))).into())
}
Now let’s animate the circle center too :
remap(
(vec2(
fx - 0.25 - time.sin() * 0.2,
fy - 0.3 - time.cos() * 0.3
) * 16.0).length().sin()
+ time,
-1.0..=1.0,
0.0..=255.0,
) as usize
% 256
We now have everything we need: a rendering function akin to a SDF, animated, and palette shifting. It just needs an interesting rendering function. I’ll use the bracket-noise crate :
let w = noise.get_noise(
// warp the noise domain
fx * 3.0 + (time / 3.0).sin(),
fy + (time / 7.0).cos().powi(2));
remap(
// feed the noise to itself
noise.get_noise(
fx + w + time.cos(),
fy + w.powi(3) + (time/1.7).sin())
// palette shifting
+ time * (fx * fy + 0.1) / 6.0,
0.0..=0.7,
0.0..=255.0,
) as usize
% 256
Optionally, the palette could be stepped into non-continuous colors:
for i in 0..PALETTE_SIZE {
let f = i as f32 / PALETTE_SIZE as f32;
let ff = if i < PALETTE_SIZE / 2 {
f * 2.0
} else {
1.0 - ((f - 0.5) * 2.0)
};
// stepping :
let stepped = (ff * 4.0).round() / (4.0);
col32(Rgb::from(Hsl::from((
lerp(203.0..=31.0, stepped / 2.0),
82.0,
((stepped / 2.0 + 0.5).powi(2) * 100.0).clamp(10.0, 100.0),
))).into())
}
The real bottleneck is the massive amount of sin/cos
calls. I tried to implement a precomputed look-up table for the trigonometry as we used to but Rust’s own trig is still twice as fast. Maybe I’ll investigate, there’s a potential article here…
Fire effect
Another classic effect. Fabien Sanglard talks in length about its principle, as done in the PSX port of Doom.
At its core, it’s a cellular automaton : a grid where each cell evolves according to its neighbors. The difference with Conway’s is that cell values aren’t just binary, but values over a range used to index a palette. Values are propagated upward, with a bit of horizontal randomization, and attenuated.
Stars
The earlier instance I found is in Spacewar! on PDP-1. Draw dots on the screen, and move them away from the center of the screen to simulate perspective.
Instead of simple points, I draw lines between the previous and the current position to give a feeling of speed and make that speed change over time. The point of perspective is changed over time and when moving the cursor. I like the lo-fi Star Wars hyperdrive vibe.
Each star is stored in polar coordinates, an angle and a radius. This is a good opportunity for an interactive SVG visualization test :
Then it’s a matter of keeping an array of stars, incrementing the distance to the center each frame and resetting it when that distance is out of bounds :
By using the previous position as the start of a line to the current position, we get a feeling of speed. Line drawing is an arcane art, and the reference is the Bresenham algorithm. Here I used Xiaolin Wu’s line algorithm, which also does anti-aliasing, by returning an alpha value for each pixel along the line.
The last only things missing are the planets (created using deepfold’s planet generator), which take a primitive sprite drawing algorithm. I implemented scaled drawing but not rotation, which would have made it a proper rotozoom.
The bitmap data itself is loaded asynchronously, by abusing the HTML image element, then extracting the image data and caching it:
interface Bitmap {
w: number,
h: number,
data: Uint32Array,
}
const bitmapCache = new Map<string, Promise<Bitmap>>();
async function loadBitmap(name: string): Promise<Bitmap> {
if (bitmapCache.has(name)) {
return bitmapCache.get(name)!;
}
const p = new Promise<Bitmap>((resolve, reject) => {
// <img> html element
const image = new Image();
// we'll need a canvas to extract the raw pixels from the image
const canvas = document.createElement("canvas");
image.onload = () => {
canvas.width = image.width;
canvas.height = image.height;
const ctx = canvas.getContext('2d')!;
ctx.drawImage(image, 0, 0);
// raw u32 pixels
const data = new Uint32Array(ctx.getImageData(0, 0, ctx.canvas.width, ctx.canvas.height,).data.buffer);
resolve({
w: image.width,
h: image.height,
data: data,
});
};
image.onerror = e => reject(e);
// triggers the actual loading
image.src = name;
});
bitmapCache.set(name, p);
return p;
}
Outrun 3d roads
Credit: background from Panda, tree from the original Outrun game. Use the keyboard arrows!
OutRun made a big splash when it was released by Sega in 1986, and part of that was the 3D rendering. It’s a form of raycasting: starting from the bottom of the screen, find the road segment by projecting the screen position and draw it with the right height and horizontal offset according to the player’s position.
Lou’s pseudo 3d page is a great resource to dive into that, so I won’t repeat it here.
Today’s rabbit hole : Wasm interop optimization
If the performance of Rust code compiled to WASM was a good surprise, the first issue I encountered was the cost of passing arrays from JS. My initial approach to canvas drawing was a function taking the canvas’ ImageData buffer as a &mut [u8]
slice. My understanding is that wasm-bindgen
copies the data in and out of WASM, which could take several milliseconds for a 640x480 framebuffer.
I considered creating the canvas from rust code, but adding a dependency on web-sys came with tens of kilobytes in the resulting wasm. The current .wasm file, containing all the demos, is a 61kb file.
The best solution I found is to allocate the framebuffer from rust as a Vec
, create a wasm-bindgen
exposed method returning the offset of the vec in the wasm memory and construct the canvas ImageData from that offset and the wasm memory pointer:
struct Demo {
w: usize,
h: usize,
buffer: Vec<u32>,
}
impl Demo {
#[wasm_bindgen(constructor)]
pub fn new(w: usize, h: usize) -> Self {
Self {
w,
h,
buffer: vec![0; w*h],
}
}
// even if ImageData expects a u8 buffer, we can just reinterpret-cast it from the js side
// ugly, but it works
#[wasm_bindgen]
pub fn get_ptr(&self) -> *const u32 { self.buffer.as_ptr() }
#[wasm_bindgen]
pub fn update(&mut self) {
self.buffer.fill(0);
}
}
// startup
const demo = new Demo(WIDTH, HEIGHT);
const data = {
buffer: <ImageData | undefined>undefined
};
// each frame
demo.update();
// the wasm memory is reallocated/invalidated after an allocation?
// see https://github.com/rustwasm/wasm-bindgen/issues/2439
if (!data.buffer || data.buffer.data.byteLength === 0) {
// no allocation as we provide the pre-allocated buffer
let raw_buffer = new Uint8ClampedArray(
// the entire sandboxed memory of the wasm app
memory.buffer,
// our method to get the framebuffer offset in that memory
p.get_ptr(),
// 4 bytes per pixel
WIDTH * HEIGHT * 4
);
data.buffer = new ImageData(raw_buffer, WIDTH, HEIGHT);
}
canvasContext.putImageData(data.buffer, 0, 0);
Conclusion
There’s much more to do (benchmark wasm-compiled trigonometry functions, compile rust in no-std mode to reduce even more the binary size, …) but that’s all for now. Maybe I’ll write on my experiment plugging the GNU-Rocket sync tracker in there, and this article is a good prelude to my fascination with fantasy consoles, but I’ll stop here for now! The code is available here: https://github.com/theor/rust-demoeffects/tree/main/sample/sample-rust/src