Deferred vs forward rendering

# ==Deferred vs forward== <p class="doc-sub">// status: seedling</p> "Where does the shading happen?" sounds like a boring question. It isn't. It decides your lighting cost model, your MSAA story, your material flexibility, your transparency strategy, and a pile of other downstream choices. # Forward rendering Classic: each object is rasterized, and for every pixel of every object you iterate all the lights that affect it and shade. Pros: - Simple. - MSAA works naturally. - Transparency works naturally. - Arbitrary per-material shading. Cons: - Cost is `O(pixels × lights)` — naive many-light scenes collapse. - Overdraw penalises you _shaded_: you pay the full BRDF per occluded pixel. - Depth pre-pass helps but doesn't scale to hundreds of lights. # Deferred rendering Split the pipeline: 1. ==G-buffer pass== — rasterize all opaque geometry, write material attributes (albedo, normal, roughness, metallic, depth…) to multiple render targets. 2. ==Lighting pass== — for each light, draw its volume; for each covered pixel, read the G-buffer, compute shading. Pros: - Cost decouples from overdraw. You shade each pixel once. - Many lights are cheap. - Post-processing passes (SSAO, SSR, fog) have everything they need. Cons: - Fat G-buffer = memory bandwidth. The single biggest cost on modern hardware. - MSAA is miserable — you'd have to store and shade per-sample, which is expensive enough that almost everyone uses TAA instead. - Transparency doesn't fit; you need a separate forward pass on top. - Material variety is constrained — everything must fit in the G-buffer channels. # Tiled and clustered forward (Forward+) The modern "best of both" approach: 1. Depth pre-pass. 2. Bin the view into screen-space tiles (e.g. 16×16 pixels) — or 3D clusters that also slice depth. 3. For each tile/cluster, determine which lights affect it via a compute shader. 4. Forward-shade normally, but each pixel only iterates the lights listed for its tile/cluster. Gets deferred's many-light scaling without paying the G-buffer bandwidth. Handles MSAA and transparency naturally. Has become the default in most modern engines (Doom 2016 popularised clustered; Unity HDRP and Unreal's forward renderer use it). # Visibility buffer / deferred texturing The newest family. Instead of shading into a G-buffer, store _what_ to shade: 1. Rasterize to a tiny buffer containing `(triangleID, barycentrics)` per pixel. 2. In a compute pass, fetch the vertex data for that triangle, interpolate, shade. Tradeoffs: - G-buffer bandwidth vanishes — the visibility buffer is ~8 bytes per pixel. - You re-fetch and re-interpolate vertex data per pixel. - Material branching is painful; materials are usually sorted/binned. - Pairs beautifully with mesh shaders and huge numbers of small triangles (cf. Nanite). Likely where the industry is going for opaque geometry. # Typical G-buffer layouts A common AAA-ish G-buffer (fits in 12–16 bytes/pixel with packing): | Target | Contents | Format | |--------|----------|--------| | RT0 | Albedo.rgb + AO | `RGBA8` | | RT1 | Normal.xyz (oct-encoded to rg) + roughness + metallic | `RGBA8` / `RG16F + RG8` | | RT2 | Motion vectors + material ID | `RGBA16F` or packed | | Depth | Hardware depth | `D32F` or `D24_S8` | Tricks used everywhere: - **Octahedron-encoded normals** — pack a unit vector into 2×16 or 2×8 bits. - **Pack roughness²** — gets perceptually linear quantisation. - **Material ID** — 8 bits that switches between BRDFs in the lighting pass. - **Stencil** for material type — free, avoids branching in the lighting pass. Smaller is better. Every additional byte is multiplied by `1920 × 1080 × MSAA samples × draws-per-frame` of bandwidth. # Transparency No deferred technique handles transparency gracefully. The standard approach: - Opaque pass → G-buffer → lighting → HDR colour buffer. - Transparent objects → forward shaded on top, sorted back-to-front. - For many transparent lights: order-independent transparency (OIT) via weighted-blended OIT, moment-based OIT, or per-pixel linked lists. # What I'd pick, given the shape of the project - **Indie / small-scene / stylised** — forward with a reasonable light cap. Simplicity wins. - **Realistic, many lights, opaque-heavy** — clustered forward. Modern default. - **Heavy dynamic lighting, post-processing pipeline** — deferred, accept the TAA tradeoff. - **Film-quality asset density, mesh shader pipeline** — visibility buffer. # Things that tripped me up - **G-buffer clears** — you _must_ clear every component every frame, otherwise stale normals/roughness from last frame leak into skybox pixels etc. - **Oct encoding breaks at -Z** — handle the edge case or use a mapping that doesn't. - **MSAA in deferred** — just don't. Use TAA + (optional) SMAA. - **Light volume geometry** — point-light spheres clipped against the near plane need a special render path (front faces vs back faces, depth test flip). Or use a full-screen quad and tile-cull. - **Tile/cluster indirection** — the light list per tile needs a max cap, and you _will_ hit it in worst cases. Profile with pathological scenes early. --- Back to [[Index|Notes]] · see also [[Physically Based Rendering]] · [[Shadow mapping]] · [[Compute shaders]]