Vulkan - learning log

# ==Vulkan==, learning log <p class="doc-sub">// status: seedling</p> Vulkan is OpenGL's antithesis: no global state, no implicit synchronisation, no driver-side memory management, no shader-string compilation. You write a thousand lines of setup before the first triangle. In return you get predictability, multi-threading, and performance that doesn't depend on knowing which driver bugs to work around. Compare with [[OpenGL - learning log]]. # The big mental shift OpenGL: "please draw something, driver, figure it out." Vulkan: "I'm telling you what hardware to use, which memory to allocate, when to flush caches, and when to wait for the previous frame to finish." The driver becomes a thin translator. Concretely, you become responsible for: - Picking a physical device and queue families. - Allocating GPU memory and sub-allocating from it. - Layout transitions of images between uses. - Synchronisation between queue submissions and within them. - Tracking per-frame-in-flight resources. - Compiling shaders to SPIR-V ahead of time. This sounds terrible. It's actually great once the setup is done, because nothing is hidden. # The object zoo In rough order of creation for a "hello triangle": - **Instance** — the Vulkan loader handle; holds layers and extensions. - **Physical device** — a specific GPU visible to the system. - **Logical device** — your opened session with it; creates everything else. - **Queue** — where commands are submitted. Graphics, compute, transfer, present — not all on the same queue family on all hardware. - **Surface** — platform-specific window integration (via `VK_KHR_surface` + the OS-specific extension). - **Swapchain** — the set of images you render into and present. - **Render pass / dynamic rendering** — declares the attachments and subpasses used by a pipeline. Dynamic rendering (core in 1.3) removes most of this ceremony. - **Pipeline** — a baked, immutable state object: shaders, vertex input, blending, depth, the lot. Creating one is expensive; cache and reuse them. - **Descriptor sets** — how you bind resources (buffers, images) to shaders. Allocated from a descriptor pool, organised by set layouts. - **Command buffer** — recorded once, submitted N times. The thing that actually carries work to the GPU. - **Semaphores / fences / barriers** — synchronisation primitives (more below). # Memory No more `glBufferData(... GL_STATIC_DRAW)`. You: 1. `vkCreateBuffer` / `vkCreateImage` — get a handle with no backing storage. 2. `vkGetBufferMemoryRequirements` — ask what size/alignment it needs. 3. `vkAllocateMemory` from a heap with the right properties (`DEVICE_LOCAL`, `HOST_VISIBLE`, `HOST_COHERENT`…). 4. `vkBindBufferMemory` — attach. In practice, nobody does this per-resource. You use the [Vulkan Memory Allocator (VMA)](https://gpuopen.com/vulkan-memory-allocator/) — AMD's open-source sub-allocator — or write your own. VMA is the default choice and more or less mandatory for any real project. Memory properties you'll see: - `DEVICE_LOCAL` — on the GPU, fast for GPU access, invisible to CPU. Use for most resources. - `HOST_VISIBLE | HOST_COHERENT` — mappable, coherent without manual flushes. Staging buffers. - `HOST_VISIBLE | HOST_CACHED` — mappable + CPU-cache-friendly readback. Needs `vkInvalidate...` / `vkFlush...`. - Unified-memory GPUs (integrated, Apple) expose `DEVICE_LOCAL | HOST_VISIBLE` — fast path on those. # Synchronisation The single hardest thing about Vulkan. - **Semaphores** — GPU↔GPU synchronisation across queue submits. Present signals one, the next submit waits on it. - **Fences** — GPU→CPU; the CPU waits for a fence to know a submission finished. Used to reuse per-frame resources. - **Pipeline barriers** — in-queue ordering and cache flushes. Express "after these stages, before those stages, also transition this image's layout". - **Events / timeline semaphores** — finer-grained, newer (`VK_KHR_timeline_semaphore`, core in 1.2). Timeline semaphores mostly replace both binary semaphores and fences for new code. Barriers are the knife's edge. Over-synchronise and performance dies. Under-synchronise and you get undefined behaviour that looks right on your machine and flickers on every other GPU. `VK_LAYER_KHRONOS_validation` catches most sync mistakes. Run it always in dev. # Shaders and pipelines Shaders are ingested as **SPIR-V** — a stable bytecode. The common flow is GLSL → `glslang` → SPIR-V, or HLSL → DXC → SPIR-V. Both are fine. HLSL has nicer ergonomics for modern features; GLSL is the Vulkan-native dialect. A graphics pipeline bakes: - All shader stages (SPIR-V modules + entry points + specialisation constants). - Vertex input layout (or none, if you pull from buffers manually). - Rasterizer state, depth/stencil, blend state. - Viewport/scissor (or mark as dynamic). - The render pass / rendering info the pipeline is compatible with. This is a _lot_ of state baked into an object. To not pay the cost, use `VkPipelineCache` (save to disk between runs) and `VK_EXT_graphics_pipeline_library` / `VK_EXT_shader_object` (newer, lets you compile pieces independently and link at draw time — reduces stutter massively). # Descriptor sets and bindless Descriptor sets bind resources to shaders. Three main approaches today: - **Traditional descriptor sets** — allocate per frame, update explicitly. Verbose but explicit. - **Push descriptors** (`VK_KHR_push_descriptor`) — small, immediate, no pool churn. Great for per-draw bindings. - **Descriptor indexing / bindless** (`VK_EXT_descriptor_indexing`, core in 1.2) — one giant descriptor set with thousands of textures, shaders index into it. How modern renderers avoid descriptor-update overhead entirely. The latter is how AAA engines ship: fill a big array once, index with a `uint` per draw, make draws cheap. # Frames in flight The canonical double/triple buffering pattern: ``` frameIndex = (frameIndex + 1) % FRAMES_IN_FLIGHT; vkWaitForFences(device, 1, &inFlight[frameIndex], VK_TRUE, UINT64_MAX); vkAcquireNextImageKHR(...); // signals imageAvailable[frameIndex] // record command buffer for frameIndex vkQueueSubmit(... waitSemaphores: imageAvailable[frameIndex], signalSemaphores: renderFinished[frameIndex], fence: inFlight[frameIndex]); vkQueuePresentKHR(... waitSemaphores: renderFinished[frameIndex]); ``` Each in-flight frame owns its own command buffer, its own descriptor resources, its own uniform buffer suballocation. The fence prevents the CPU from getting too far ahead of the GPU. # Things that tripped me up - **Layouts** — images are in some layout (`UNDEFINED`, `COLOR_ATTACHMENT_OPTIMAL`, `SHADER_READ_ONLY_OPTIMAL`, `PRESENT_SRC_KHR`…). You transition them with barriers. Forget one and you get garbled output or a validation error novel. - **Y axis and clip space** — Vulkan's clip-space Y points down vs OpenGL's up. Either flip the viewport with a negative height (`VK_KHR_maintenance1`, core since 1.1) or flip in the shader. - **Negative viewport height is a feature, not a bug** — it's how you get OpenGL-style Y-up without touching the shader. - **Device loss happens** — on mobile and under TDR on Windows. You need a recovery path or at least a clean crash. - **`vkCmdPipelineBarrier2`** — the new barrier API (`VK_KHR_synchronization2`, core in 1.3) is clearer and more expressive. New code should use it. Old tutorials use the legacy one. - **Don't allocate inside the frame** — command pools, descriptor sets, memory. Allocate up front or in background threads; reuse ring-buffer style. - **Tooling is worth the setup** — validation layers, RenderDoc, Nsight, AMD Radeon GPU Profiler. The explicitness that makes Vulkan painful also makes the tooling unusually informative. # A sensible starter stack - **GLFW** or **SDL2** — window + surface creation. - **VMA** — memory. - **volk** — function loader that beats the Vulkan SDK loader for startup time. - **glslang** / **DXC** — shader compilation. - **Dear ImGui** + `imgui_impl_vulkan` — debug UI. - **Vulkan SDK** — includes validation layers, `vkconfig`, debug utils. - **RenderDoc** / **Nsight Graphics** — frame debuggers. # References - [Khronos Vulkan](https://www.khronos.org/vulkan/) - [vulkan-tutorial.com](https://vulkan-tutorial.com/) — best on-ramp, slightly out of date on sync - [Vulkan Guide](https://vkguide.dev/) — modern, uses dynamic rendering and sync2 - _Writing an efficient Vulkan renderer_ — Zeux's blog post, the one to read after the hello triangle - [Sascha Willems' samples](https://github.com/SaschaWillems/Vulkan) --- Back to [[Index|Notes]] · see also [[OpenGL - learning log]] · [[Compute shaders]] · [[Spatial acceleration structures]]