2 changed files with 194 additions and 1 deletions
@ -0,0 +1,192 @@
@@ -0,0 +1,192 @@
|
||||
# When something goes wrong |
||||
|
||||
In the past few sections, we've learned how to allocate and manage device |
||||
memory, write kernels which execute in parallel and in multiple dimensions, and |
||||
decompose our device code into functions so that our kernels may grow ever more |
||||
sophisticated. By now, we have all the basic tools needed to start thinking |
||||
about building non-trivial parallel applications. On the other hand, these |
||||
tools we've collected also enable us to write buggy programs. With hundreds of |
||||
thousands of threads come hundreds of thousands of things which could go wrong. |
||||
This section discusses how to detect and deal with errors as soon as they |
||||
occur. |
||||
|
||||
To get started, let's write a buggy program that's obviously incorrect just to see what happens. |
||||
|
||||
|
||||
```c++ |
||||
__global__ void foo(int *ptr) |
||||
{ |
||||
*ptr = 7; |
||||
} |
||||
|
||||
int main(void) |
||||
{ |
||||
foo<<<1,1>>>(0); |
||||
return 0; |
||||
} |
||||
``` |
||||
|
||||
This program doesn't even bother to allocate device memory -- it just passes a |
||||
null pointer to a kernel which immediately writes to it. This is a rather |
||||
blatant instance of a common programming error -- dereferencing a null pointer. |
||||
Any sane program would sooner crash than attempt something so heinous. However, |
||||
when I compile and run this program on my system there's no indication that |
||||
something went wrong: |
||||
|
||||
``` |
||||
$ nvcc crash.cu -o crash |
||||
$ ./crash |
||||
``` |
||||
|
||||
Most C and C++ programmers would probably be used to some sort of immediate |
||||
feedback that something funny is afoot. A debugging window might open, or a |
||||
error message might be automatically written to the terminal indicating a |
||||
segmentation fault. However, in a CUDA program, if we suspect an error has |
||||
occurred during a kernel launch, then we must explicitly check for it after the |
||||
kernel has executed. |
||||
|
||||
The CUDA Runtime is happy to field questions about abnormal program execution, |
||||
but tends to be tight-lipped when it comes to offering unsolicited |
||||
information. To find out if an error has occurred inside a kernel, the host |
||||
needs to actively call the CUDA Runtime function `cudaGetLastError`, which |
||||
returns a value encoding the kind of the last error it has encountered: |
||||
|
||||
```c++ |
||||
cudaError_t cudaGetLastError(void); |
||||
``` |
||||
|
||||
To make things slightly more complicated, we need to be sure to check for the |
||||
error only after we're sure a kernel has finished executing. Because kernel |
||||
launches are asynchronous, the host thread doesn't wait on the device to finish |
||||
its work before continuing on with its own business, which might include |
||||
launching more kernels, copying memory, enjoying a Mai Tai, etc. So in order to |
||||
correctly check for an error, the host needs to block so it can synchronize |
||||
with the device. We can do this with the `cudaDeviceSynchronize` function. |
||||
|
||||
To see how all the pieces fit together, let's check for errors in our last |
||||
example: |
||||
|
||||
```c++ |
||||
#include <stdlib.h> |
||||
#include <stdio.h> |
||||
|
||||
__global__ void foo(int *ptr) |
||||
{ |
||||
*ptr = 7; |
||||
} |
||||
|
||||
int main(void) |
||||
{ |
||||
foo<<<1,1>>>(0); |
||||
|
||||
// make the host block until the device is finished with foo |
||||
cudaDeviceSynchronize(); |
||||
|
||||
// check for error |
||||
cudaError_t error = cudaGetLastError(); |
||||
if(error != cudaSuccess) |
||||
{ |
||||
// print the CUDA error message and exit |
||||
printf("CUDA error: %s\n", cudaGetErrorString(error)); |
||||
exit(-1); |
||||
} |
||||
|
||||
return 0; |
||||
} |
||||
``` |
||||
|
||||
This example, when compiled, produces the following output: |
||||
|
||||
``` |
||||
$ nvcc check_for_error.cu -o check_for_error |
||||
$ ./check_for_error CUDA error: an illegal memory access was encountered |
||||
``` |
||||
|
||||
Note that we've used the function `cudaGetErrorString` to conveniently turn the |
||||
enumerated variable, `error`, into human-readable form. This saves us the |
||||
hassle of looking up the error description in a table somewhere. |
||||
|
||||
In this example, we've explicitly synchronized with the device in order to |
||||
check its error status, but many CUDA Runtime functions are already implicitly |
||||
synchronous and return the current error status as a result. For example, |
||||
consider the next code listing which demands an unreasonable sum of |
||||
memory from `cudaMalloc`: |
||||
|
||||
```c++ |
||||
#include <stdlib.h> |
||||
#include <stdio.h> |
||||
|
||||
int main(void) |
||||
{ |
||||
int *ptr = 0; |
||||
|
||||
// gimme! |
||||
cudaError_t error = cudaMalloc((void**)&ptr, UINT_MAX); |
||||
if(error != cudaSuccess) |
||||
{ |
||||
// print the CUDA error message and exit |
||||
printf("CUDA error: %s\n", cudaGetErrorString(error)); |
||||
exit(-1); |
||||
} |
||||
|
||||
return 0; |
||||
} |
||||
``` |
||||
|
||||
This program outputs a different error message: |
||||
|
||||
``` |
||||
$ nvcc big_malloc.cu -o big_malloc |
||||
$ ./big_malloc CUDA error: out of memory |
||||
``` |
||||
|
||||
Though it's important to routinely check for errors during the debugging |
||||
process, you shouldn't liberally sprinkle `cudaDeviceSynchronize` and |
||||
`cudaGetLastError` through your code. Explicit synchronization may come with a |
||||
significant performance penalty. A common idiom is to wrap synchronous error |
||||
checking calls with a function which can be enabled in debug mode and elided |
||||
upon release: |
||||
|
||||
```c++ |
||||
#include <stdlib.h> |
||||
#include <stdio.h> |
||||
|
||||
inline void check_cuda_errors(const char *filename, const int line_number) |
||||
{ |
||||
#ifdef DEBUG |
||||
cudaDeviceSynchronize(); |
||||
cudaError_t error = cudaGetLastError(); |
||||
if(error != cudaSuccess) |
||||
{ |
||||
printf("CUDA error at %s:%i: %s\n", filename, line_number, cudaGetErrorString(error)); |
||||
exit(-1); |
||||
} |
||||
#endif |
||||
} |
||||
|
||||
__global__ void foo(int *ptr) |
||||
{ |
||||
*ptr = 7; |
||||
} |
||||
|
||||
int main(void) |
||||
{ |
||||
foo<<<1,1>>>(0); |
||||
check_cuda_errors(__FILE__, __LINE__); |
||||
|
||||
return 0; |
||||
} |
||||
``` |
||||
|
||||
During debugging, we can compile with a macro, `DEBUG`, defined to enable our synchronous error-checking code: |
||||
|
||||
``` |
||||
$ nvcc -DDEBUG debug_errors.cu -o debug_errors |
||||
$ ./debug_errors CUDA error at debug_errors.cu:25: unspecified launch failure |
||||
``` |
||||
|
||||
Here, we've used the built-in preprocessor variables, `__FILE__` and |
||||
`__LINE__`, to easily identify where in the code the error occurred. |
||||
|
||||
Expanded source code for [this example](cuda_errors.cu) is available in the site's code repository. |
||||
|
Loading…
Reference in new issue