A minimalist, zero-boilerplate WebGPU framework designed for rapid prototyping of compute-driven visuals, simulations, and multi-pass post-process effects.
TinyShade simplifies the complex WebGPU binding model into a chainable API. It handles Ping-Ponging (feedback textures), Dynamic Compute Dispatching, and Intelligent Dependency Management automatically.
πThink of TinyShade as the WebGPU evolution of Shadertoyβbuilt for those who need more than just a fragment shader, but less than a 600KB engine."
TinyShade is engineered for the 64k and 4k intro philosophy. It provides a full-scale WebGPU orchestration layer with a footprint smaller than a typical favicon.
| Component | Size (Min/Gzip) | Strategy |
|---|---|---|
| TinyShade Core | ~9 KB | Tree-shaken dev-time graph manager |
| TinyShade Runner | ~4.5 KB | Ultra-lean binary replay engine |
| Bake Output | Compressed | Frozen, minified, and packed final artifact |
# Clone the repository
git clone https://github.com/MagnusThor/TinyShade.git
# Install dependencies
npm install
# Start the development server
npm run start
TinyShade uses an intelligent chaining API. This example demonstrates a branched DAG: running two independent simulations and combining them in a final fragment pass.
import { TinyShade } from "./TinyShade";
const start = async () => {
// 1. Initialize the WebGPU context
const app = await TinyShade.create("canvas");
(await app
.setUniforms()
// 2. Implicit Dependency: Since 'deps' is omitted,
// this pass is added to the linear stack by default.
.addCompute("physics", physicsWGSL)
// 3. Explicit Empty Dependency: By passing [], we tell TinyShade
// this pass depends on NOTHING. This allows the GPU to
// potentially run "fluid" and "physics" in parallel.
.addCompute("fluid", fluidWGSL, 0, [])
// 4. Multi-Dependency Main: The final fragment pass
// explicitly requests the output textures from both prior passes.
.main(mainWGSL, ["physics", "fluid"])
).run(); // Start the render loop
};
start();
In TinyShade, you only write deps when you want to be specific.
Omit the deps argument: The pass is "greedy"βit automatically binds every prior pass in the chain. This is the fastest way to build a post-processing stack.
Pass an empty array []: The pass is "isolated"βit binds nothing but the global uniforms. This is the best way to optimize performance for independent compute tasks.
Pass a specific array ["a", "b"]: The pass is "surgical"βit binds only the named resources you requested.
TinyShade treats your pipeline as a Directed Acyclic Graph (DAG) using a Convention over Configuration API.
If no dependency array is provided, a pass automatically sees all prior passes.
deps)TinyShade simplifies the complex WebGPU binding model through a "Convention over Configuration" approach. You only define dependencies when you need to optimize.
deps argument (Linear Default)The pass is "greedy." It automatically binds every prior pass in the chain. This is the fastest way to build a classic post-processing stack where each layer builds on the last.
[] (Isolated)The pass is "isolated." It binds nothing but the global uniforms. Use this for independent tasks (like a noise generator or an independent physics sim) to maximize GPU occupancy.
["a", "b"] (Surgical)TinyShade generates a custom BindGroup containing only the requested resources. This keeps your shader code clean and reduces the performance overhead of binding unused textures.
| Type | Name | Source | Description |
|---|---|---|---|
vec3f |
u.resolution |
Internal | Width, height, DPR |
f32 |
u.time |
Internal | Global clock (seconds) |
f32 |
u.sceneId |
Sequencer | Active timeline segment |
f32 |
u.progress |
Sequencer | 0β1 progress in current scene |
f32 |
u.flags |
Sequencer | Bitmask for event triggers |
texture_2d |
<name> |
addCompute | Output of named compute pass |
texture_2d |
prev_<name> |
addPass | Previous-frame feedback |
sampler |
samp |
Internal | Global linear sampler |
TinyShade is built around a named execution model that balances simplicity with surgical control:
Data flows through a Directed Acyclic Graph (DAG) β where every pass can access the present and the past of its ancestors.
Every frame, TinyShade executes your pipeline in the order it was defined. By naming your passes, your shaders reference data via semantic names rather than opaque binding indices:
Greedy Access (Default): If you omit deps, a pass can see the output of every stage before it automatically.
Surgical Access (deps): If you specify deps, the pass only binds the specific resources requested, reducing GPU overhead and enabling internal parallel optimizations.
Uniforms
β
"sim_data" (Compute)
β
"post_fx" (Fragment) β sees ["sim_data"]
β
main() β sees ["post_fx", "sim_data"] β Canvas
Temporal logic is a first-class citizen in TinyShade. Every fragment pass is automatically double-buffered. If you name a pass "fx", TinyShade manages two textures behind the scenes:
fx: The current texture being written (available to subsequent passes).
prev_fx: The texture as it looked in the previous frame (available to the current pass).
This makes complex effects like motion blur, temporal accumulation, and cellular automata effortless to implement.
When using addAtomicCompute, remember that the storage buffer persists. A common pattern is to have one pass "clear" the buffer (or use the u.frame index to offset) and a second pass "splat" data into it.
TypeScript
// Clear/Reset pass (optional depending on shader logic)
app.addCompute("clear", clearWGSL, 1, []);
// Splatting pass
app.addAtomicCompute("particles", splatWGSL, COUNT, ["clear"]);
TinyShade's API is designed to be declarative yet chainable. It eliminates the "Boilerplate Tax" of WebGPU by inferring BindGroups and Pipeline layouts from your graph structure.
Initialize the WebGPU context and attach it to a canvas. This setup is asynchronous and handles adapter/device discovery internally.
const app = await TinyShade.create("canvas-id");
Registers a GPU compute task. TinyShade automatically handles the creation of the output storage texture and calculates the dispatch workgroups based on your canvas size.
app.addCompute("particles", particlesWGSL, 1024, ["background_sim"]);
);
A specialized compute pass that pre-configures a storage buffer with atomic<u32> support. Essential for "Many-to-One" operations like splatting particles onto a grid or building heatmaps.
app.addAtomicCompute("heatmap", heatmapWGSL, PIXEL_COUNT, ["physics"]);
Registers a full-screen render pass. If a pass depends on itself (e.g., ["blur"]), TinyShade automatically creates a Ping-Pong feedback loop, allowing you to sample prev_blur in your WGSL.
app.addPass("blur", blurWGSL, ["simulation"]);
TinyShade uses a Plug-and-Tick architecture. Animation logic, timelines, and audio sync live in optional plugins.
const seq = new TSSequencer(timeline, TOTAL_LENGTH_MS, 120, 4);
await app.addSequencer(seq).run();
export interface IBakePlugin {
code: string;
activator: any[];
data: any;
}
TSSequencer is built for synchronization. By integrating the sequencer, the framework drives your entire GPU pipeline through a unified uniform contract. Transition between scenes, modulate effects via u.progress, and trigger visual events using bitwise u.flagsβall perfectly locked to the timeline.
const app = await TinyShade.create("canvas");
const L = 170000;
const seq = new TSSequencer([], L, 120, 4);
const SS = [
[seq.getUnitsFromMs(1800, L), 0x4000, 1],
[seq.getUnitsFromBars(4, L), 0x4001, 2],
[seq.getUnitsFromMs(5000, L), 0x0000, 3],
[255, 0x0000, 0]
];
seq.timeline = SS;
(await app
.addSequencer(seq)
.setUniforms().
//....
and within the main pass as an example you do
@fragment fn main(in: VSOut) -> @location(0) vec4f {
let noise = textureSample(noise_source, samp, in.uv).r;
let color = textureSample(color_mask, samp, in.uv).rgb;
var finalColor = vec3f(0.0);
var sId:f32 = u.sceneId;
if (sId == 1.0) {
finalColor = vec3f(noise);
} else if (sId == 2.0) {
finalColor = color * noise;
} else if (sId == 3.0) {
finalColor = mix(color * noise, 1.0 - (color * noise), u.progress);
} else {
finalColor = vec3f(1.); // Idle state
}
return vec4f(finalColor, 1.0);
}
TinyShade supports sample-accurate timing. By implementing IAudioPlugin, an engine (like GPUSynth) can drive the u.time uniform.
app.addAudio(mySynth) // u.time is now driven by the audio clock
.run();
My dear friend PCrush (Peter C) and co-developer of GPUSynth created several example tracks for GPUSynth. These can be found in the following folder:
src/example/music/PCrushSongs/
Note: PCrush forked my old repository https://github.com/MagnusThor/demolishedAudio and significantly improved it, modernizing the codebase and adapting it toward a more WebGPU / WGSLβstyle architecture.
You can try the live examples here:
TinyShade Examples
β οΈ Note: TinyShade is under active development. APIs, visuals, and performance characteristics may change.
The source code for each example can be found in:
Each example is self-contained and intended to demonstrate a specific feature or rendering technique.
Bake freezes your live dependency graph into a portable, replay-only artifact.
"While standard WebGPU wrappers start at 100 KB+, TinyShade delivers a complete Compute/Fragment pipeline at ~4% of the size."
.html output| Stage | Raw Dev State | Baked & Packed |
|---|---|---|
| Framework/Runner | ~15 KB | 4.5 KB |
| Shader Library | 128 KB | ~15 KB |
| Total | 143 KB | < 20 KB |
Once your TinyShade scene/app is complete, baking it into a distributable artifact is a single call.
import { TinyShadeBake } from "./TinyShadeBake";
import RunnerSource from "../TinyShaderRunner.ts?raw";
const minifiedRunnerCode = await minifyJS(RunnerSource);
await TinyShadeBake.downloadSelfContained(app, "demo.html", minifiedRunnerCode.code!);
This produces:
.html fileThe runner source is embedded directly into the baked payload and instantiated at runtime.
This allows full control over execution while keeping the original scene untouched.
| Command | Action | Description |
|---|---|---|
npm run build |
tsc |
Compile TypeScript sources using tsconfig.json. |
npm run prepublishOnly |
build hook | Ensures compiled output before publishing. |
npm run start |
dev server | Launches webpack dev server with live reload. |
npm run start-prod |
prod server | Dev server simulating production build. |
npm run build-examples |
bundle | Builds example projects to static output. |
npm run wgsl:minify |
custom script | Minifies WGSL shaders for size and packing. |
The wgsl:minify script is a specialized build-step utility designed for WebGPU workflows.
Recursive Processing: It scans the src/ directory for any .wgsl files, including those nested deep in subfolders.
Clean & Compress: It strips out single-line (//) and multi-line (/* */) comments and collapses redundant whitespace into a single space.
Artifact Creation: For every source.wgsl, it generates a source.min.wgsl.
Purpose: This reduces the final "Bake" payload size (essential for the PNG-encoded self-contained demos) and provides a basic layer of code obfuscation for shared shaders.
Atomic Pass Orchestration: TinyShade treats the GPU as a sequential state machine. It handles the heavy lifting of CommandEncoder management and Compute-to-Render synchronization, ensuring zero-latency data handover between simulation and visualization stages.
Recursive Temporal Buffer Management: Implements a sophisticated "Ping-Pong" texture strategy. By maintaining dual-buffer states for every fragment pass, the engine enables $O(1)$ access to historical frame data (prev_name), turning linear shaders into recursive feedback systems.
Adaptive Dispatch Heuristics: Rather than using naive thread counts, the engine queries hardware limits to calculate the Optimal Workgroup Topology. It aligns dispatch grids with the GPU's internal SIMD width, maximizing occupancy and throughput across varying architectures.
Sample-Locked Synchronization: By hijacking the uniform update loop with IAudioPlugin, the engine achieves sample-accurate phase alignment between visuals and audio. This eliminates the "clock drift" common in requestAnimationFrame and ensures every pixel update is chronologically locked to the audio sample clock.
Procedural Geometry Injection: Utilizes a "Vertex-less" rendering technique. By generating Clip Space coordinates directly from @builtin(vertex_index), it bypasses the entire Input Assembler stage, reducing memory bandwidth overhead and eliminating the need for CPU-side vertex buffers.
Magnus Thor