# ==Deferred vs forward==
<p class="doc-sub">// status: seedling</p>
"Where does the shading happen?" sounds like a boring question. It isn't. It decides your lighting cost model, your MSAA story, your material flexibility, your transparency strategy, and a pile of other downstream choices.
# Forward rendering
Classic: each object is rasterized, and for every pixel of every object you iterate all the lights that affect it and shade.
Pros:
- Simple.
- MSAA works naturally.
- Transparency works naturally.
- Arbitrary per-material shading.
Cons:
- Cost is `O(pixels × lights)` — naive many-light scenes collapse.
- Overdraw penalises you _shaded_: you pay the full BRDF per occluded pixel.
- Depth pre-pass helps but doesn't scale to hundreds of lights.
# Deferred rendering
Split the pipeline:
1. ==G-buffer pass== — rasterize all opaque geometry, write material attributes (albedo, normal, roughness, metallic, depth…) to multiple render targets.
2. ==Lighting pass== — for each light, draw its volume; for each covered pixel, read the G-buffer, compute shading.
Pros:
- Cost decouples from overdraw. You shade each pixel once.
- Many lights are cheap.
- Post-processing passes (SSAO, SSR, fog) have everything they need.
Cons:
- Fat G-buffer = memory bandwidth. The single biggest cost on modern hardware.
- MSAA is miserable — you'd have to store and shade per-sample, which is expensive enough that almost everyone uses TAA instead.
- Transparency doesn't fit; you need a separate forward pass on top.
- Material variety is constrained — everything must fit in the G-buffer channels.
# Tiled and clustered forward (Forward+)
The modern "best of both" approach:
1. Depth pre-pass.
2. Bin the view into screen-space tiles (e.g. 16×16 pixels) — or 3D clusters that also slice depth.
3. For each tile/cluster, determine which lights affect it via a compute shader.
4. Forward-shade normally, but each pixel only iterates the lights listed for its tile/cluster.
Gets deferred's many-light scaling without paying the G-buffer bandwidth. Handles MSAA and transparency naturally. Has become the default in most modern engines (Doom 2016 popularised clustered; Unity HDRP and Unreal's forward renderer use it).
# Visibility buffer / deferred texturing
The newest family. Instead of shading into a G-buffer, store _what_ to shade:
1. Rasterize to a tiny buffer containing `(triangleID, barycentrics)` per pixel.
2. In a compute pass, fetch the vertex data for that triangle, interpolate, shade.
Tradeoffs:
- G-buffer bandwidth vanishes — the visibility buffer is ~8 bytes per pixel.
- You re-fetch and re-interpolate vertex data per pixel.
- Material branching is painful; materials are usually sorted/binned.
- Pairs beautifully with mesh shaders and huge numbers of small triangles (cf. Nanite).
Likely where the industry is going for opaque geometry.
# Typical G-buffer layouts
A common AAA-ish G-buffer (fits in 12–16 bytes/pixel with packing):
| Target | Contents | Format |
|--------|----------|--------|
| RT0 | Albedo.rgb + AO | `RGBA8` |
| RT1 | Normal.xyz (oct-encoded to rg) + roughness + metallic | `RGBA8` / `RG16F + RG8` |
| RT2 | Motion vectors + material ID | `RGBA16F` or packed |
| Depth | Hardware depth | `D32F` or `D24_S8` |
Tricks used everywhere:
- **Octahedron-encoded normals** — pack a unit vector into 2×16 or 2×8 bits.
- **Pack roughness²** — gets perceptually linear quantisation.
- **Material ID** — 8 bits that switches between BRDFs in the lighting pass.
- **Stencil** for material type — free, avoids branching in the lighting pass.
Smaller is better. Every additional byte is multiplied by `1920 × 1080 × MSAA samples × draws-per-frame` of bandwidth.
# Transparency
No deferred technique handles transparency gracefully. The standard approach:
- Opaque pass → G-buffer → lighting → HDR colour buffer.
- Transparent objects → forward shaded on top, sorted back-to-front.
- For many transparent lights: order-independent transparency (OIT) via weighted-blended OIT, moment-based OIT, or per-pixel linked lists.
# What I'd pick, given the shape of the project
- **Indie / small-scene / stylised** — forward with a reasonable light cap. Simplicity wins.
- **Realistic, many lights, opaque-heavy** — clustered forward. Modern default.
- **Heavy dynamic lighting, post-processing pipeline** — deferred, accept the TAA tradeoff.
- **Film-quality asset density, mesh shader pipeline** — visibility buffer.
# Things that tripped me up
- **G-buffer clears** — you _must_ clear every component every frame, otherwise stale normals/roughness from last frame leak into skybox pixels etc.
- **Oct encoding breaks at -Z** — handle the edge case or use a mapping that doesn't.
- **MSAA in deferred** — just don't. Use TAA + (optional) SMAA.
- **Light volume geometry** — point-light spheres clipped against the near plane need a special render path (front faces vs back faces, depth test flip). Or use a full-screen quad and tile-cull.
- **Tile/cluster indirection** — the light list per tile needs a max cap, and you _will_ hit it in worst cases. Profile with pathological scenes early.
---
Back to [[Index|Notes]] · see also [[Physically Based Rendering]] · [[Shadow mapping]] · [[Compute shaders]]