My own little DirectX FAQ

Wednesday, July 24, 2002
Why ZBIAS is not a good thing

ZBIAS looks great at first sight. It's a little tweak to your Z values and stops Z-fighting when things are coplanar but don't actually share vertices (e.g. posters stuck on walls). However, it's a problem. The problem is that the actual behaviour of ZBIAS is not well defined. The number you feed in has no absolute meaning - each card is free to interpret it any way it likes. So a ZBIAS of 3 may work in a particular scene on card A. Card B only need a ZBIAS of 1 to work. However, card C still Z-fights, so you need to increase the number to 7. OK, so you set it to be 7 on everything. Except now that's far too high for card B, and now when a person walks past the poster on the wall, the poster is drawn in front of the person! Argh. This is not a mythical example, this actually happens on two very common cards - the value that works for one is far too high for the other, and vice versa.

For this reason, many cards simply do not support ZBIAS. This is actually a Good Thing :-)

The safest option is not to use ZBIAS. The easy alternative is to push your near and far clip planes away from the camera a bit when rendering things that need biasing towards the camera. This will not change where on the screen the points are rendered, it will simply move those objects slightly nearer the camera in the Z buffer. The advantage of this method is that it works the same way on all cards, and the same bias works consistently (within a small factor because of slightly variable Z-precision). It is not very expensive to change the projection matrix (no more than any other matrix change), and if you store two projection matrices (one biased, one normal) every time the viewpoint changes (which isn't very often), then there's no recalculation needed, you just send the correct matrix to the card.

ZBIAS is not implemented at all the same on each card, and is not support by some. Moving clip planes does, and is.

Wednesday, July 03, 2002
What's the best way to do state changes?

Remember - there are three sides to this question. (1) what the app does, (2) what the D3D layer does and (3) what the videocard driver does. All affect performance in different ways. A few facts:

  • On a PURE device, SRS and STSS calls go straight to the driver (or rather, they go into the D3D command-buffer that goes to the driver).

  • On a non-PURE device, they are filtered by D3D. Redundant calls (i.e. changes to the same value) get filtered. Non-redundant values are added to the command-buffer.

  • On almost all drivers, when they get a state change command, they simply remember it. When they get a DIP-style command, then they think about setting states and so on.

  • I believe all IHVs now handle state blocks well. Older nVidia drivers do support state blocks natively, but with an eccentric and slow implementation. Careful about them.

  • On drivers that don't handle state blocks natively, the D3D runtime expands them internally into multiple SRS and STSS calls, filters redundant ones, and adds the others to the command-stream.

  • On drivers that handle state blocks natively, on a PURE device, the D3D runtime just puts the "SetStateBlock" command into the command buffer. It does not snoop it for redundant states.

  • On drivers that handle state blocks natively, on a non-PURE device, I think the D3D runtime snoops the state block command for the renderstates it contains, updates its internal settings (so it can filter out future redundant states) and passes the SetStateBlock command into the command buffer.

  • Drivers that handle tate blocks natively will either expand into SRS and STSS states internally (only slightly more efficient than reading them from the command-stream), or will do UltraCunningThings that minimise bus traffic and CPU time and all that malarky. Note that redundant states stored in the state block will be vaped very very quickly in either case.

    From these facts we can see that:

  • Redundant state sets get vaped by the D3D layer, except on PURE device. But you cost yourself an app->D3D call.

  • Redundant state sets and multiple state sets (i.e. the same state, set several times, to different values) get vaped by the driver. But you cost yourself an app->D3D call and an entry in the command-buffer (a tiny cost).

  • All states are set at the same time in the driver. For this reason, most IHVs say that changing a single state is expensive, but changing lots of states at the same time is not much more expensive.

  • Using state blocks at worst saves you multiple SRS and STSS calls. At the best, it goes really fast and stuff.

    So, personally I use slightly-wrappered state blocks. If I detect that the driver is shockingly slow at state blocks (some old nVidia drivers are), then I switch to emulating them myself. Essentially, I do what the D3D runtime does for drivers that don't support them - I expand them out into SRS and STSS calls (culling redundant calls of course). This is extremely fast to do - it is faster than the conventional wrapping of SRS and STSS calls that people do, because all the setting/culling is done in one single tight loop, not spread all over the code.

    I have a big chunk of code that shows how to effectively capture and replay state to/from state blocks. So you can set your states up conventionally using SRS/STSS calls, then capture them into state blocks (done at start of day). This code can also serve as the replay of state blocks when the driver doesn't like you just calling SetStateBlock, although in practice I use a much tighter loop with redundancy checking, but it is a very similar structure. At the moment you can find it in the DirectX mailing list archives here (if that doesn't work, just search for the word "ADDRENDERSTATE" in the archives). I'll put it somewhere more sensible in time.

    (SRS = SetRenderState, STSS=SetTextureStageState. There are other state-setting calls as well such as SetTransform - include them in the above comments).