CUDA "unspecified launch failure"

本文章取自:http://www.herikstad.net/2009/05/cuda-unspecified-launch-failure.html

The error "unspecified launch failure" usually means the same as "segment fault" for host code. Check that your code does not try to access any areas outside the arrays being used. A common mistake is using 'the whole idx' instead of just the thread id to access shared memory. Here's an example:

int idx = blockIdx.x * blockDim.x + threadIdx.x;
shared[idx] = input[idx];


will give you an error and should look like this:

int idx = blockIdx.x * blockDim.x + threadIdx.x;


int tid = threadIdx.x;
shared[tid] = input[idx];


注意:shared memory 的 index 只能用threadIdx。
      因為shared memory load data 是以一個 half-warp load。


My kernel...

__global__ void kernel(float *in,float *out){
 
  //declare share memory
  //extern is use kernel set memory size
  extern __shared__ float tempsa[];
 
  //set global index
  int globalIdx = blockIdx.x*blockDim.x+threadIdx.x;
 
  tempsa[threadIdx.x]= in[globalIdx];
  __syncthreads();
  out[globalIdx] = tempsa[threadIdx.x];
  __syncthreads();

  return ;
}

沒有留言:

張貼留言