We are migrating issue tracker of Cocos2d-x Project to Github, please create new issue there. Thanks.
Feature #3670: Performance Testing for 3.0
Test Performance for culling
Feature #3670: Performance Testing for 3.0
Test Performance for culling
- Description updated (diff)
I profiled based on Iphone4s, the result link is https://docs.google.com/spreadsheet/ccc?key=0AptvTVhRiiNBdHBhek1MTjY0QWhnRVRBa2ZpeHRGS2c&usp=drive_web#gid=0
I have tested AABB culling and no culling. Following is the profiling result.
1. fps does not change whether we enable culling or not.
2. if we enable culling, the CPU profiling result show there is a dramatic increase in function Scene::visit()
So, we can confirm that the bottleneck is on GPU.
My suggestions for improve performance on rendering is to fully parallelizing GPU and CPU work.
Our new model of rendering:
step1: Scene::visit(), generate rendering command and add command to render queue
step2:renderer::renderer(), render queue call openGL command(glXXX function)
step3:waiting for GPU finishing rendering of this frame and swap buffer.
The GPU profiler shows that the swapBuffer() occupies more than 30% of CPU time. I want to know what work does IOS do when call swapBuffer. If the major overhead is CPU waiting for GPU finishing GL rendering, so we can utilize this waiting time to let CPU processing rendering commands for the next frame. I tried to do this but I found the overhead of swapBuffer() does not decreased. Is there something wrong? Can someone have any suggestions of optimizing the overhead of swapBuffers()?
A version of 2D Bound culling has been implemented.
testing results listed here.
(for 3000 sprite on TestCpp->PerformanceTest->SpritePerfTest)
(iphone 4s) (iPod touch 4) With Culling On: D9: 60 FPS D9: 60 FPS D10: 60 FPS D10: 60 FPS E9: ~60 FPS E9: ~35 FPS E10: 60 FPS E10: 60 FPS F9: ~22 FPS F9: ~9 FPS F10: ~42 FPS F10: ~26 FPS G9: ~45 FPS G9: ~22 FPS G10: ~55 FPS G10: ~33 FPS With Culling Off: D9: 30 FPS D9: ~11 FPS D10: 60 FPS D10: ~55 FPS E9: 30 FPS E9: ~9 FPS E10: 60 FPS E10: ~52 FPS F9: 22 FPS F9: ~9 FPS F10: 43 FPS F10: ~27 FPS G9: 22 FPS G9: ~9 FPS G10: 43 FPS G10: ~28 FPS
A new Test for 2000 sprites
(iphone4s) With culling On: A1: 12 FPS E1: 52 FPS with culling Off: A1: 12 FPS E1: 50 FPS
Compared with E1 result. D9, E9 and G9 have dramatic performance increase.
Reason:
Sprite A1, E1 is very big, when it is rather small for D9 E9. Which have big overhead on GPU. If we profile the CPU time. the E1 swapbuffer() will need more than 30%, compared with 2-3% on D9, E9. etc.
E1 performance are bound to both CPU and GPU, and D9, E9 .etc are bound to CPU.
Which means that if the bound are on CPU, a fast culling method is very efficient for performance improvement.
It should contain the following test.
Performance when using Frustum Culling
Performance when using 2d Box Culling
Performance when not using any kind of culling