Projects > cpp > Issues > Feature #3717

We are migrating issue tracker of Cocos2d-x Project to Github, please create new issue there. Thanks.

Create Issue on Github

Feature #3670: Performance Testing for 3.0

Test Performance for culling

Feature #3717 [Closed]
nite 2014-01-15 19:24 . Updated about 9 years ago

It should contain the following test.

Performance when using Frustum Culling
Performance when using 2d Box Culling
Performance when not using any kind of culling

nite 2014-01-15 19:25
  • Tracker changed from Bug to Feature
zhangxm 2014-01-22 02:54
  • Assignee set to dabingnn
dabingnn 2014-02-12 08:13
  • Description updated (diff)

I profiled based on Iphone4s, the result link is https://docs.google.com/spreadsheet/ccc?key=0AptvTVhRiiNBdHBhek1MTjY0QWhnRVRBa2ZpeHRGS2c&usp=drive_web#gid=0
I have tested AABB culling and no culling. Following is the profiling result.
1. fps does not change whether we enable culling or not.
2. if we enable culling, the CPU profiling result show there is a dramatic increase in function Scene::visit()

So, we can confirm that the bottleneck is on GPU.

dabingnn 2014-02-12 08:32

My suggestions for improve performance on rendering is to fully parallelizing GPU and CPU work.
Our new model of rendering:
step1: Scene::visit(), generate rendering command and add command to render queue
step2:renderer::renderer(), render queue call openGL command(glXXX function)
step3:waiting for GPU finishing rendering of this frame and swap buffer.

The GPU profiler shows that the swapBuffer() occupies more than 30% of CPU time. I want to know what work does IOS do when call swapBuffer. If the major overhead is CPU waiting for GPU finishing GL rendering, so we can utilize this waiting time to let CPU processing rendering commands for the next frame. I tried to do this but I found the overhead of swapBuffer() does not decreased. Is there something wrong? Can someone have any suggestions of optimizing the overhead of swapBuffers()?

zhangxm 2014-03-03 06:55
  • Status changed from New to Resolved
dabingnn 2014-03-03 07:17

A version of 2D Bound culling has been implemented.
testing results listed here.
(for 3000 sprite on TestCpp->PerformanceTest->SpritePerfTest)

(iphone 4s)             (iPod touch 4)
With Culling On:
D9: 60 FPS              D9: 60 FPS
D10: 60 FPS             D10: 60 FPS
E9: ~60 FPS             E9: ~35 FPS
E10: 60 FPS             E10: 60 FPS
F9:  ~22 FPS            F9:  ~9 FPS
F10: ~42 FPS            F10: ~26 FPS
G9: ~45 FPS             G9: ~22 FPS
G10: ~55 FPS            G10: ~33 FPS

With Culling Off:
D9: 30 FPS              D9: ~11 FPS
D10: 60 FPS             D10: ~55 FPS
E9: 30 FPS              E9: ~9 FPS
E10: 60 FPS             E10: ~52 FPS
F9: 22 FPS              F9: ~9 FPS
F10: 43 FPS             F10: ~27 FPS
G9: 22 FPS              G9: ~9 FPS
G10: 43 FPS             G10: ~28 FPS

A new Test for 2000 sprites

(iphone4s)
With culling On:
A1: 12 FPS
E1: 52 FPS
with culling Off:
A1: 12 FPS
E1: 50 FPS

Compared with E1 result. D9, E9 and G9 have dramatic performance increase.

Reason:

Sprite A1, E1 is very big, when it is rather small for D9 E9. Which have big overhead on GPU. If we profile the CPU time. the E1 swapbuffer() will need more than 30%, compared with 2-3% on D9, E9. etc.
E1 performance are bound to both CPU and GPU, and D9, E9 .etc are bound to CPU.
Which means that if the bound are on CPU, a fast culling method is very efficient for performance improvement.

zhangxm 2014-03-07 08:35
  • Status changed from Resolved to Closed

Atom PDF

Status:Closed
Start date:2014-01-15
Priority:Normal
Due date:
Assignee:dabingnn
% Done:

0%

Category:all
Target version:3.0-rc0