This is a tracking document for some things I’ve found useful when writing CUDA extensions for PyTorch.
I found it useful to put these at the top of my Python file.
manual_seed is for reproducability and
set_printoptions is to make it easier to quickly identify whether or not two numbers match up.
This answer suggests the first step for debugging CUDA code is to enable CUDA launch blocking using this at the top of the Python file:
However, this didn’t work for a weird memory access issue I was having. This guide was more helpful. Actually, I had a minor improvement over that function with:
I found it to be very important to know how to divide rounding up. Normally integer division rounds down. For example, the code below will print
0, 0, 0, 0, 1, 1, 1, 1, 1, 2.
To round up, you use the identity
(numerator + denominator - 1) / denominator. The code below will print
1, 1, 1, 1, 1, 2, 2, 2, 2, 2:
These are some definitions which I found useful.
Additionally, here’s a useful struct to handle CUDA streams.
Below are some of the resources that I found useful.