I am seeing non-intuitive iOS performance when rendering multiple meshes. I start with a single mesh which renders about 30K tris and 120K verts. If I render it as a single object, it takes about 3 ms. on iPad 1. If I dice it up into 64 objects (8x8), it takes about 8 ms, and at 16x16 it takes about 22 ms. (mostly in Render.OpaqueGeometry). This happens even though most of the object is culled and is actually rendering far fewer tris and verts when it is diced (16x16 renders an average of 8K tris and 15K verts for instance). I have tried turning on/off dynamic batching and marking the mesh as static, none of which seems to make much difference. The number of draw calls is basically the same no matter how it is diced. Culling alone couldn't possibly be the cause (how long can it take to cull 64 bounding boxes?) I really want to dice up this object (and more like it), but performance tanks whenever I do. Has anyone seen this effect or know of a work-around? (NOTE: I am not using animations or skinned meshes. This is just a single static mesh with about 17 submeshes).
Update: This is only happening with OpenGL ES 2.0. With 1.1 it is faster drawing fewer tris as expected.
Update 2: I tested on PC and see also weird results. I will try on the iPhone again tomorrow. You can see two distinct performance regions in the profiler snapshot on the PC below.
Test1: A single "static" mesh with no textures. 17 draw calls. 0 Batches. 33K tris, 105K verts. .07 ms.
Test2: The mesh diced into 8x8 (64) objects. 28 draw calls. 125 Batches. 14K tris, 23.5K verts, .4 ms.
The diced version renders fewer tris and verts (as expected because of view frustum culling), but it is much slower.