Why do Zwrite, Ztest, and Colormask simplifications not optimize shader performance?

I figured that it would make my shaders run faster if I tried optimizing the rendering when possible. Some of these supposed optimizations have been...

Zwrite Off
Ztest Always
Colormask RGB

But none of them seem to have any positive impact. On my MacBook Pro, there isn't any negative impact, either.

However, I found that using Colormask RGB is 20-40% SLOWER than using standard RGBA on my iPod touch.

If you're trying to use the Geometry Queue, using Zwrite Off/Ztest Always is slower than not using them, which makes sense, given the Tile-Based Deferred Rendering. (I don't even know if your shader still uses the Geometry queue if you use these commands.) However, if you don't have overlapping objects, I would think these two commands should have a benefit, but they don't seem to.

So why are these commands not helpful? It would make sense to me that the shader could run faster if the GPU has to do less work.

I think your problem is that these features aren't actually supposed to be used as optimisations. They are there as features, to allow you more flexibility when designing your shaders, and to allow you to achieve effects that you wouldn't be able to if they didn't exist.

So, while your hunch may be that certain settings might offer improved performance, that is not the main purpose of these features, and the inner workings of the GPU and the shader compiler is a complicated beast - there is already so much optimisation going on under the hood that it's difficult to guess whether certain features actually cause less or more work for the GPU!

Turning writes/tests on or off can affect performance in a positive or negative way. Turning color writes off (completely), for example, can speed up rendering a lot - but only if the rendering was the bottleneck!

In other words, maybe the bottleneck is somewhere entirely else? E.g. you're limited by the CPU?

As a side note, turning off Alpha writes but leaving RGB writes on is never an optimization. The GPU still writes more or less "complete pixels", just leaves alpha untouched. iPhone has quite quirky architecture where it can even be slower (another iPhone specific quirk: alpha blending is faster than alpha testing - on most other platforms it's the other way around).

I did a quick test and came to the same result. Colorwrite setting has not impact and turning off ZWrite and/or setting ZTest to Always slows down the framerate (because more polygones are drawn). For non-overlapping objects the performance stays always the same...

So why are these commands not helpful? It would make sense to me that the shader could run faster if the GPU has to do less work.

I am not sure about this, but perhaps the writing and checking of the ZBuffer is a dedicated pipeline stage and lies idle if there is nothing to do?