# ==Vulkan==, learning log
<p class="doc-sub">// status: seedling</p>
Vulkan is OpenGL's antithesis: no global state, no implicit synchronisation, no driver-side memory management, no shader-string compilation. You write a thousand lines of setup before the first triangle. In return you get predictability, multi-threading, and performance that doesn't depend on knowing which driver bugs to work around. Compare with [[OpenGL - learning log]].
# The big mental shift
OpenGL: "please draw something, driver, figure it out." Vulkan: "I'm telling you what hardware to use, which memory to allocate, when to flush caches, and when to wait for the previous frame to finish." The driver becomes a thin translator.
Concretely, you become responsible for:
- Picking a physical device and queue families.
- Allocating GPU memory and sub-allocating from it.
- Layout transitions of images between uses.
- Synchronisation between queue submissions and within them.
- Tracking per-frame-in-flight resources.
- Compiling shaders to SPIR-V ahead of time.
This sounds terrible. It's actually great once the setup is done, because nothing is hidden.
# The object zoo
In rough order of creation for a "hello triangle":
- **Instance** — the Vulkan loader handle; holds layers and extensions.
- **Physical device** — a specific GPU visible to the system.
- **Logical device** — your opened session with it; creates everything else.
- **Queue** — where commands are submitted. Graphics, compute, transfer, present — not all on the same queue family on all hardware.
- **Surface** — platform-specific window integration (via `VK_KHR_surface` + the OS-specific extension).
- **Swapchain** — the set of images you render into and present.
- **Render pass / dynamic rendering** — declares the attachments and subpasses used by a pipeline. Dynamic rendering (core in 1.3) removes most of this ceremony.
- **Pipeline** — a baked, immutable state object: shaders, vertex input, blending, depth, the lot. Creating one is expensive; cache and reuse them.
- **Descriptor sets** — how you bind resources (buffers, images) to shaders. Allocated from a descriptor pool, organised by set layouts.
- **Command buffer** — recorded once, submitted N times. The thing that actually carries work to the GPU.
- **Semaphores / fences / barriers** — synchronisation primitives (more below).
# Memory
No more `glBufferData(... GL_STATIC_DRAW)`. You:
1. `vkCreateBuffer` / `vkCreateImage` — get a handle with no backing storage.
2. `vkGetBufferMemoryRequirements` — ask what size/alignment it needs.
3. `vkAllocateMemory` from a heap with the right properties (`DEVICE_LOCAL`, `HOST_VISIBLE`, `HOST_COHERENT`…).
4. `vkBindBufferMemory` — attach.
In practice, nobody does this per-resource. You use the [Vulkan Memory Allocator (VMA)](https://gpuopen.com/vulkan-memory-allocator/) — AMD's open-source sub-allocator — or write your own. VMA is the default choice and more or less mandatory for any real project.
Memory properties you'll see:
- `DEVICE_LOCAL` — on the GPU, fast for GPU access, invisible to CPU. Use for most resources.
- `HOST_VISIBLE | HOST_COHERENT` — mappable, coherent without manual flushes. Staging buffers.
- `HOST_VISIBLE | HOST_CACHED` — mappable + CPU-cache-friendly readback. Needs `vkInvalidate...` / `vkFlush...`.
- Unified-memory GPUs (integrated, Apple) expose `DEVICE_LOCAL | HOST_VISIBLE` — fast path on those.
# Synchronisation
The single hardest thing about Vulkan.
- **Semaphores** — GPU↔GPU synchronisation across queue submits. Present signals one, the next submit waits on it.
- **Fences** — GPU→CPU; the CPU waits for a fence to know a submission finished. Used to reuse per-frame resources.
- **Pipeline barriers** — in-queue ordering and cache flushes. Express "after these stages, before those stages, also transition this image's layout".
- **Events / timeline semaphores** — finer-grained, newer (`VK_KHR_timeline_semaphore`, core in 1.2). Timeline semaphores mostly replace both binary semaphores and fences for new code.
Barriers are the knife's edge. Over-synchronise and performance dies. Under-synchronise and you get undefined behaviour that looks right on your machine and flickers on every other GPU.
`VK_LAYER_KHRONOS_validation` catches most sync mistakes. Run it always in dev.
# Shaders and pipelines
Shaders are ingested as **SPIR-V** — a stable bytecode. The common flow is GLSL → `glslang` → SPIR-V, or HLSL → DXC → SPIR-V. Both are fine. HLSL has nicer ergonomics for modern features; GLSL is the Vulkan-native dialect.
A graphics pipeline bakes:
- All shader stages (SPIR-V modules + entry points + specialisation constants).
- Vertex input layout (or none, if you pull from buffers manually).
- Rasterizer state, depth/stencil, blend state.
- Viewport/scissor (or mark as dynamic).
- The render pass / rendering info the pipeline is compatible with.
This is a _lot_ of state baked into an object. To not pay the cost, use `VkPipelineCache` (save to disk between runs) and `VK_EXT_graphics_pipeline_library` / `VK_EXT_shader_object` (newer, lets you compile pieces independently and link at draw time — reduces stutter massively).
# Descriptor sets and bindless
Descriptor sets bind resources to shaders. Three main approaches today:
- **Traditional descriptor sets** — allocate per frame, update explicitly. Verbose but explicit.
- **Push descriptors** (`VK_KHR_push_descriptor`) — small, immediate, no pool churn. Great for per-draw bindings.
- **Descriptor indexing / bindless** (`VK_EXT_descriptor_indexing`, core in 1.2) — one giant descriptor set with thousands of textures, shaders index into it. How modern renderers avoid descriptor-update overhead entirely.
The latter is how AAA engines ship: fill a big array once, index with a `uint` per draw, make draws cheap.
# Frames in flight
The canonical double/triple buffering pattern:
```
frameIndex = (frameIndex + 1) % FRAMES_IN_FLIGHT;
vkWaitForFences(device, 1, &inFlight[frameIndex], VK_TRUE, UINT64_MAX);
vkAcquireNextImageKHR(...); // signals imageAvailable[frameIndex]
// record command buffer for frameIndex
vkQueueSubmit(... waitSemaphores: imageAvailable[frameIndex],
signalSemaphores: renderFinished[frameIndex],
fence: inFlight[frameIndex]);
vkQueuePresentKHR(... waitSemaphores: renderFinished[frameIndex]);
```
Each in-flight frame owns its own command buffer, its own descriptor resources, its own uniform buffer suballocation. The fence prevents the CPU from getting too far ahead of the GPU.
# Things that tripped me up
- **Layouts** — images are in some layout (`UNDEFINED`, `COLOR_ATTACHMENT_OPTIMAL`, `SHADER_READ_ONLY_OPTIMAL`, `PRESENT_SRC_KHR`…). You transition them with barriers. Forget one and you get garbled output or a validation error novel.
- **Y axis and clip space** — Vulkan's clip-space Y points down vs OpenGL's up. Either flip the viewport with a negative height (`VK_KHR_maintenance1`, core since 1.1) or flip in the shader.
- **Negative viewport height is a feature, not a bug** — it's how you get OpenGL-style Y-up without touching the shader.
- **Device loss happens** — on mobile and under TDR on Windows. You need a recovery path or at least a clean crash.
- **`vkCmdPipelineBarrier2`** — the new barrier API (`VK_KHR_synchronization2`, core in 1.3) is clearer and more expressive. New code should use it. Old tutorials use the legacy one.
- **Don't allocate inside the frame** — command pools, descriptor sets, memory. Allocate up front or in background threads; reuse ring-buffer style.
- **Tooling is worth the setup** — validation layers, RenderDoc, Nsight, AMD Radeon GPU Profiler. The explicitness that makes Vulkan painful also makes the tooling unusually informative.
# A sensible starter stack
- **GLFW** or **SDL2** — window + surface creation.
- **VMA** — memory.
- **volk** — function loader that beats the Vulkan SDK loader for startup time.
- **glslang** / **DXC** — shader compilation.
- **Dear ImGui** + `imgui_impl_vulkan` — debug UI.
- **Vulkan SDK** — includes validation layers, `vkconfig`, debug utils.
- **RenderDoc** / **Nsight Graphics** — frame debuggers.
# References
- [Khronos Vulkan](https://www.khronos.org/vulkan/)
- [vulkan-tutorial.com](https://vulkan-tutorial.com/) — best on-ramp, slightly out of date on sync
- [Vulkan Guide](https://vkguide.dev/) — modern, uses dynamic rendering and sync2
- _Writing an efficient Vulkan renderer_ — Zeux's blog post, the one to read after the hello triangle
- [Sascha Willems' samples](https://github.com/SaschaWillems/Vulkan)
---
Back to [[Index|Notes]] · see also [[OpenGL - learning log]] · [[Compute shaders]] · [[Spatial acceleration structures]]