Inconsistency between OpenGL and CUDA maximum number of threads -


my gpu nvidia geforce gt440, compute capability version 2.x. nvidia's official cuda_c_programming_guide points out

limit 1. maximum number of threads per block = 1024
limit 2. maximum number of resident threads per multiprocessor = 1536

however, 2 of opengl computer shader implementation limits are

limit 3. gl_max_compute_work_group_invocations = 1536

my questions are
1. why limit 1 not equal limit 2 , limit 3?
2. should real threads/block (invocations/workgroup) 1024 or 1536?

why limit 1 not equal limit 2 , limit 3?

because isn't same thing. blocks logical construct in cuda , limited maximum of 1024 threads. multiprocessor can run multiple blocks concurrently (up 8 in case of hardware). sm can have 1536 concurrent threads in hardware, not of threads can come single block.

should real threads/block 1024 or 1536?

1024 reasons outlined above. can see complete summary of capabilities of supported hardware here.


Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -