c++ - Advanced issues with GPU thread divergence -
my situation - have dynamic programming algorithm implement on gpus using opencl part of phd studies. gpus working include amd hd 7970, 7750, a10-5800k apu , nvidia gtx 680. understand principles involved , of best practices necessary obtaining performance.
my program contains 4 nested loops , in data-parallel formulation able unfold 2 of outer loops. due nature of problem inner-most loop cannot without causing divergence. output table represents schedules of jobs on machines (computer science).
when threads diverge (work-items in wavefront take different routes) wrong values, looks if work-items repeat themselves. example,
t = 0, 1, 2, 3, 4, ... 63, 64, 65, 66, 67, ...
m1 0, 0, 0, 9, 9, ... 9, 0, 0, 0, 9, ...
above work-group size 64. first values t=63 correct notice how repeats again @ t=64! shouldn't zeros. here each work-item mapped time t.
if fix parameter causes divergence table gets filled expected (wrong) results, no gaps (zeros), value 9 t=0 tmax, tmax multiple of 64.
question - thread divergence have tendency of resulting in wrong calculations or undefined thread behavior?
i have dug internet, documentations, books on can find thread divergence , memory consistency. have implemented whole program in different ways including 1 calls kernel multiple times rule out global memory inconsistency results same.
any input appreciated. thanks!
after further investigation, i'm ashamed admit here, 1 of computation conditions giving wrong values , looked work-items acting strange weren't. problem corrected. thanks!
Comments
Post a Comment