There’s a lack of information on SV_Coverage (gl_SampleMaskIn & gl_SampleMask in GL lingo defined in ARB_sample_shading).
And I got hit by it over and over again. I guess part of the reason must be that MSAA was often considered a magic black box, and most of the uses of SV_Coverage are advanced topics.
So I figured I’d make a blogpost so others don’t make the same mistakes. This post assumes you’re moderately familiar with how MSAA works and you’re looking on ways to use SV_Coverage.
Myth: SV_Coverage forces per subsample rendering
SV_Coverage does not force per subsample rendering. I think the major reason is CryTek’s presentation where they claim “Avoid default SV_COVERAGE, since it results in redundant processing on regions not requiring MSAA”
I thought they meant avoid SV_Coverage because it would force subsample rendering (i.e. redundant processing).
But what they really meant is just that SV_Coverage overestimates, causing a lot of edges to be flagged as requiring MSAA when it’s not really necessary; as using SV_Coverage flags the borders of every triangle while we only need to apply MSAA to the borders of surfaces. If the triangles are connected and are very similar, there’s no need to resolve them. That’s what they really meant.
Do not confuse with SV_SampleIndex/gl_SampleID which does force per subsample rendering.
Issue: Outputting to SV_Coverage disables early depth optimization
Modifying SV_Coverage means depth testing must be done after running the pixel shader. It is unclear if just reading from SV_Coverage also disables EarlyZ. In theory it shouldn’t, but the compiler would have to analyze whether the values is actually written to it. I guess that’s the reason GLSL preferred to split the value in two (gl_SampleMaskIn for reading & gl_SampleMask writing).
Remember you can overcome this issue by enabling [earlydepthstencil] and early_fragment_tests; but of course this may result in different output depending on what you’re doing.
Issue: SV_Coverage is always before depth & stencil tests
I screamed in rage when I found about this. Regardless of whether you forced earlydepthstencil, SV_Coverage always returns the value of coverage before doing depth & stencil tests.
This means that if a pixel occupies all 4 subsamples (assuming 4xMSAA) but depth test kills one of these subsamples, SV_Coverage will always return 0xF instead of e.g. 0xD.
NVIDIA’s Maxwell has a driver hack for D3D11/12 via NVAPI and a GL extension ARB_post_depth_coverage to enable post-Z coverage. It’s unclear if using this functionality comes at a performance cost.
A workaround, as suggested by Adam Miles is to send Z and W from the vertex shader (you can’t use SV_Position.z because it’s not an actual interpolant), use EvaluateAttributeAtSample/interpolateAtSample and sample from the depth buffer, performing the comparison yourself. This workaround only works if you already have depth buffer contents (i.e. you did a depth prepass) and write depths are disabled (or you made a copy of the depth buffer)
Issue: You can’t make SV_Coverage cover more than what you’re already covering (i.e. override)
If the mask you got is 0x3; then you can only modify it to either 0x2, 0x1 or 0x0. If you output for example 0x4, then likely nothing will be rendered.
Again, NVIDIA Maxwell has driver hacks for D3D11/12 and NV_sample_mask_override_coverage to arbitrarily override SV_Coverage.
So that basically covers everything I found out. I only suspected about the last one and was the only one that wasn’t a surprise (although TBH the disabling EarlyZ one should’ve been obvious). Neither GL spec, docs, MSDN docs and other readings warned me about these gotchas. MJP’s blogposts were very useful, but that’s it. And they weren’t very specific to SV_Coverage.
Credit:
Thanks to
- Jesse Hall for telling me SV_Coverage is before depth/stencil.
- Matt Pettineo for his awesome MSAA explanations.
- Adam Miles for helping out with the workaround.
Until next time!