Inconsistency between OpenGL and CUDA maximum number of threads -
my gpu nvidia geforce gt440, compute capability version 2.x. nvidia's official cuda_c_programming_guide points out
limit 1. maximum number of threads per block = 1024
limit 2. maximum number of resident threads per multiprocessor = 1536
however, 2 of opengl computer shader implementation limits are
limit 3. gl_max_compute_work_group_invocations = 1536
my questions are
1. why limit 1 not equal limit 2 , limit 3?
2. should real threads/block (invocations/workgroup) 1024 or 1536?
why limit 1 not equal limit 2 , limit 3?
because isn't same thing. blocks logical construct in cuda , limited maximum of 1024 threads. multiprocessor can run multiple blocks concurrently (up 8 in case of hardware). sm can have 1536 concurrent threads in hardware, not of threads can come single block.
should real threads/block 1024 or 1536?
1024 reasons outlined above. can see complete summary of capabilities of supported hardware here.
Comments
Post a Comment