GPU

dazuo Jul 1, 2020 2020-07-01T20:19:00+08:00

Oct 28, 2022 2022-10-28T20:27:28+08:00 2 min

Hanlding cuda error messages

http://cuda-programming.blogspot.com/2013/01/handling-cuda-error-messages.html

void cuda_safe(cudaError_t error, char* message) {
    if (error != cudaSuccess) {
        printf("ERROR: %s : %i\n", message, error);
        exit(-1);
    }
}

void Check_CUDA_Error(const char* message) {
    cudaError_t error = cudaGetLastError();
    if (error != cudaSuccess) {
        printf("ERROR: %s : %s\n", message, cudaGetLastErrorString(error));
        exit(-1);
    }
}

int main() {
    ...
    cuda_safe(cudaMalloc(...), "cudaMalloc");
    ...
    cuda_safe(cudaFree(...), "cudaFree");
    ...
    Check_CUDA_Error("Kernel Launch Failed!");
    
    return 0;
}

常见错误：

error: a host function call can not be configured.
Invalid Configuration Argument.
This error means that the number of registers available on the multiprocessor is being exceeded. Reduce the number of threads per block to solve the problem.
Too Many Resources Requested for Launch.
Unspecified launch failure.

如何在cuda c/c++中实现数据传输的重叠

H100相对以前的架构(比如A100)，引入了Cluster。

Thread—>Block—>Grid ==> Thread—>Block—>Cluster—>Grid.

before:
Block与Block之间进行数据交换是通过Global Mem。
after
Block与Block之间进行数据交换是通过Shared Mem。

cudaMemcpyToSymbol

拷贝方式的不同是由目的内存申请的方式决定的。

申请的是device内存，cudaMemcpyToSymbol拷贝就是从host拷贝到global memory。

申请的是constant内存，cudaMemcpyToSymbol拷贝就是从host拷贝到constant memory。

__device__ float g_damp_x[MAX_DIM]; __constant__ float c_damp_y[MAX_DIM];

针对half类型数据的Math函数在cuda_fp16.h里。比如：

__device__ __half hsqrt(const __half a)

#pragma unroll 4

grid_sampler_grad_impl.cu

_Pragma("unroll")

rcwm_small_impl.cu

使用CUDA Warp-Level级原语

知乎：使用CUDA Warp-Level级原语

add_relu_v2_impl.cu

__ballot_sync(mask, __activemask())

__shfl_down_sync

三维 block

adaptive_max_pool2d_impl.cu

dim3

Thrust, the CUDA C++ template library

non_zero_impl.cu

batchnorm_fold_impl.cu

batchnorm_fold2_impl.cu

unique_consecutive_impl.cu

share memory

maxSharedMemoryPerBlock

静态： batchnorm_grad_impl.cu

bias_add_grad_impl.cu

topk_impl.cu

动态： multinomial_impl.cu

layer_norm_impl.cu

layer_norm_grad_impl.cu

layer_norm_grad_grad_impl.cu

transpose_impl_opt.cu

Nsight Compute

知乎：使用Nsight Compute进行分析驱动的优化

GPU

This post is licensed under CC BY 4.0 by the author.

GPU

Hanlding cuda error messages

Further Reading

读薄《Professional CUDA C Programming》——基于CUDA的异构并行编程

读薄《Professional CUDA C Programming》——CUDA编程模型

读薄《Professional CUDA C Programming》——CUDA执行模型

Trending Tags