Compilation
When compiling CUDA kernels, do not generate debug information for device
code (the -G
or --device-debug
flag) as this can significantly
impair runtime performance. Use -lineinfo
instead, for example:
nvcc device.cu -c -o device.o -g -lineinfo -O3