Anti-Lag+ is AMDs answer to NVIDIA Reflex. But how does it compare to Reflex and why did AMD remove Anti-Lag+ so quickly after its launch?► G-Sync with Refle...
The problem is that it’s not technically correct. GPU can’t bottleneck CPU because the pipeline goes one way. Nothing prevents CPU from drawing as much frames at it wants. What happens in the most game engines - the engine checks if GPU is ready to accept another frame, and if it’s not, it tells CPU to chill a bit, but not all games do that properly - i.e. in NFS Undercover rendering thread sticks to 100% at all times, at least did last time I checked. Saying “GPU bound” is more technically correct, as bottleneck implies that there is some other PC part down the pipeline, which isn’t the case for GPU - GPU is the last part. But I’m just being pedantic here; objectively - Battle(non)sense ineed does a great job at explaining things to people in simple terms.
It’s more complicated than that. Yes, the physical pipeline ends at the GPU since the frame just sits in the GPU until the OS is ready to present it, but the logical pipeline loops back to the CPU since the CPU then moves on to the next frame in the render queue which may or may not be available. Ideally it would simply be available as the GPU has finished rendering that frame and the OS has finished presenting that frame which gives the CPU free reign over it, but it may be in a present-pending state where it’s waiting for the OS to present it or it may be in a currently-rendering state where the GPU is actively rendering it.
If the frame is in a currently-rendering state then the CPU cannot use that frame since that frame’s resources are being actively used by the GPU and trying to access those resources leads to a very bad time, so the CPU has to try another frame. If the frame is in a present-pending state then the CPU can use it so long as vsync is disabled and screen tearing is acceptable, as that frame’s resources aren’t being actively used anymore and the OS generally allows reusing a present-pending frame (after all, that’s why vsync is typically an option and not mandatory).
If the CPU is sufficiently far ahead of the GPU then it will always eventually hit a wall where it tries to use a currently-rendering frame, has no other frames it can use and is forced to sit idle. If you’re on newer APIs such as Vulkan or DirectX 12 then you can bypass this somewhat by using the mailbox presentation mode (not sure what the name is under DirectX 12, but that’s the name under Vulkan) to at least tell the OS that you intend on ping-ponging between two different frames in a triple-buffer setup, which lets the CPU ping-pong between those two frames while the GPU is busy rendering its currently-rendering frame. Things get exponentially more complicated under DirectX 12 and Vulkan, however, as the engine itself is now responsible for building and managing the render queue, the API/driver/OS just handles the presentation side of things.
What do you mean by “frame may not be available” for CPU? I assumed CPU creates frames. And then “CPU cannot use that frame”. Did you mean to say “frame buffer”?
What do you mean by “frame’s resources”?
Isn’t “the wall” render queue limit typically?
I guess mailbox presentation mode is LIFO-queued triple buffering. What you described sound like CPU is filling frame buffers with some data that might or might not be later used by GPU, but I assumed it’s GPU that creates and fills frame buffers with data. Are you sure it has anything to do with CPU’s job?
In unlocked framerate with no VSync scenario, when GPU is at 99% usage - in most games CPU usage reduces, as render queue is full. It, however, is not the case for some games, like NFS Undercover. How specifically does this process happen in such scenario, or what tells CPU to wait instead of drawing more frames?
What do you mean by “frame may not be available” for CPU? I assumed CPU creates frames. And then “CPU cannot use that frame”. Did you mean to say “frame buffer”?
I meant the render queue, of which the framebuffer/swapchain is part of.
What do you mean by “frame’s resources”?
In this case I mean GPU resources that the CPU may need to access. Think uniform buffers that pipe game state information to the shaders, textures that hold animations that update each frame, vertex/index buffers that hold mesh data that updates each frame, etc. Each frame typically has to be given its own set of these resources so that when the CPU updating the resources for frame N doesn’t change or potentially corrupt the resources that the GPU is actively using for frame N-1.
Isn’t “the wall” render queue limit typically?
Yes and no, depends on how well the CPU and GPU stay in sync with each other.
I guess mailbox presentation mode is LIFO-queued triple buffering. What you described sound like CPU is filling frame buffers with some data that might or might not be later used by GPU, but I assumed it’s GPU that creates and fills frame buffers with data. Are you sure it has anything to do with CPU’s job?
Yes, since it basically lets the CPU bounce between two available/present-pending frames while it waits for a currently-rendering frame to clear. This way the CPU never sits idle, it’s just constantly overwriting previously recorded command lists and previously updated resources that haven’t been picked up by the GPU yet.
In unlocked framerate with no VSync scenario, when GPU is at 99% usage - in most games CPU usage reduces, as render queue is full. It, however, is not the case for some games, like NFS Undercover. How specifically does this process happen in such scenario, or what tells CPU to wait instead of drawing more frames?
Normally it’s an API/system call that tells the render queue to present the current frame and swap to the next frame that tells the CPU to wait. In older APIs it’s a lot more nebulous so I can’t tell you exactly why NFS Undercover does that, but my guess would be that the CPU and GPU are close enough to not exhaust the render queue quickly or the API is detecting that some usage pattern lets the CPU access in-use resources by the GPU in some places in the pipeline.
No problem. I left out some of the more complicated details and simplified others so if you want to learn more I’d recommend looking into how Vulkan’s command buffers, device queues, fence/semaphore resources work which are all part of the logical side of the render queue, as well as how Vulkan’s swapchain works for the frame presentation side of the render queue. Vulkan and DirectX 12 both expose quite a lot of how the render queue works so they can shed some light on what the driver is having to do behind the scenes for DirectX 11 and OpenGL.
aside from the antilag stuff:
this is an excellent visual representation of CPU & GPU bottlenecks. what they are and how it occurs.
most people don’t understand it well enough
The problem is that it’s not technically correct. GPU can’t bottleneck CPU because the pipeline goes one way. Nothing prevents CPU from drawing as much frames at it wants. What happens in the most game engines - the engine checks if GPU is ready to accept another frame, and if it’s not, it tells CPU to chill a bit, but not all games do that properly - i.e. in NFS Undercover rendering thread sticks to 100% at all times, at least did last time I checked. Saying “GPU bound” is more technically correct, as bottleneck implies that there is some other PC part down the pipeline, which isn’t the case for GPU - GPU is the last part. But I’m just being pedantic here; objectively - Battle(non)sense ineed does a great job at explaining things to people in simple terms.
It’s more complicated than that. Yes, the physical pipeline ends at the GPU since the frame just sits in the GPU until the OS is ready to present it, but the logical pipeline loops back to the CPU since the CPU then moves on to the next frame in the render queue which may or may not be available. Ideally it would simply be available as the GPU has finished rendering that frame and the OS has finished presenting that frame which gives the CPU free reign over it, but it may be in a present-pending state where it’s waiting for the OS to present it or it may be in a currently-rendering state where the GPU is actively rendering it.
If the frame is in a currently-rendering state then the CPU cannot use that frame since that frame’s resources are being actively used by the GPU and trying to access those resources leads to a very bad time, so the CPU has to try another frame. If the frame is in a present-pending state then the CPU can use it so long as vsync is disabled and screen tearing is acceptable, as that frame’s resources aren’t being actively used anymore and the OS generally allows reusing a present-pending frame (after all, that’s why vsync is typically an option and not mandatory).
If the CPU is sufficiently far ahead of the GPU then it will always eventually hit a wall where it tries to use a currently-rendering frame, has no other frames it can use and is forced to sit idle. If you’re on newer APIs such as Vulkan or DirectX 12 then you can bypass this somewhat by using the mailbox presentation mode (not sure what the name is under DirectX 12, but that’s the name under Vulkan) to at least tell the OS that you intend on ping-ponging between two different frames in a triple-buffer setup, which lets the CPU ping-pong between those two frames while the GPU is busy rendering its currently-rendering frame. Things get exponentially more complicated under DirectX 12 and Vulkan, however, as the engine itself is now responsible for building and managing the render queue, the API/driver/OS just handles the presentation side of things.
This raises some questions.
I meant the render queue, of which the framebuffer/swapchain is part of.
In this case I mean GPU resources that the CPU may need to access. Think uniform buffers that pipe game state information to the shaders, textures that hold animations that update each frame, vertex/index buffers that hold mesh data that updates each frame, etc. Each frame typically has to be given its own set of these resources so that when the CPU updating the resources for frame N doesn’t change or potentially corrupt the resources that the GPU is actively using for frame N-1.
Yes and no, depends on how well the CPU and GPU stay in sync with each other.
Yes, since it basically lets the CPU bounce between two available/present-pending frames while it waits for a currently-rendering frame to clear. This way the CPU never sits idle, it’s just constantly overwriting previously recorded command lists and previously updated resources that haven’t been picked up by the GPU yet.
Normally it’s an API/system call that tells the render queue to present the current frame and swap to the next frame that tells the CPU to wait. In older APIs it’s a lot more nebulous so I can’t tell you exactly why NFS Undercover does that, but my guess would be that the CPU and GPU are close enough to not exhaust the render queue quickly or the API is detecting that some usage pattern lets the CPU access in-use resources by the GPU in some places in the pipeline.
Thanks for taking your time to explain all this!
No problem. I left out some of the more complicated details and simplified others so if you want to learn more I’d recommend looking into how Vulkan’s command buffers, device queues, fence/semaphore resources work which are all part of the logical side of the render queue, as well as how Vulkan’s swapchain works for the frame presentation side of the render queue. Vulkan and DirectX 12 both expose quite a lot of how the render queue works so they can shed some light on what the driver is having to do behind the scenes for DirectX 11 and OpenGL.