Compilation

When compiling CUDA kernels, do not generate debug information for device code (the -G or --device-debug flag) as this can significantly impair runtime performance. Use -lineinfo instead, for example:

nvcc device.cu -c -o device.o -g -lineinfo -O3