This topic has grown on me over the years as I have seen shader code on slides at conferences, by brilliant people, where the code could have been written in a much better way. Occasionally I hear an “this is unoptimized” or “educational example” attached to it, but most of the time this excuse doesn't hold. I sometimes sense that the author may use “unoptimized” or “educational” as an excuse because they are unsure how to make it right. And then again, code that's shipping in SDK samples from IHVs aren't always doing it right either.
When the best of the best aren't doing it right, then we have a problem as an industry.
2
3
(x – 0.3) * 2.5 = x * 2.5 + (-0.75)
4
Assembly languages are dead. The last time I used one was 2003. Since then it has been HLSL and GLSL for everything. I haven't looked back.
So shading has of course evolved, and it is a natural development that we are seeing higher level abstractions as we're moving along. Nothing wrong with that. But as the gap between the hardware and the abstractions we are working with widens, there is an increasing risk of losing touch with the hardware. If we only ever see the HLSL code, but never see what the GPU runs, this will become a problem. The message in this presentation is that maintaining a low-level mindset while working in a high-level shading language is crucial for writing high performance shaders.
5
This is a clear illustration of why we should bother with low-level thinking. With no other change than moving things around a little and adding some parentheses we achieved a substantially faster shader. This is enabled by having an understanding of the underlying HW and mapping of HLSL constructs to it.
The HW used in this presentation is a Radeon HD 4870 (selected because it features the most readable disassembly), but most of everything in this slide deck is really general and applies to any GPU unless stated otherwise.
6
Hardware comes in many configurations