『 GoINg mY WAy 』: OpenMP常遇到的問題

最近在學習OpenMP，基本上使用方法大概都可以由前一篇的文章連結學習到
不過小弟我在把我的code改成OpenMP的時候。速度卻比原本的慢了快100倍......
因此來把問題寫下...避免之後遇到不知道怎麼處理
1.rand()
rand()涵數是一個在global memory上執行的涵數，因此如果要在多個threads上同時執行。
rand()會有threads-safe的問題，所以在執行上會只淮許一個threads進去。

目前有兩個解決的方法
i。rand() → rand_r()
   drand48() → drand48_r()

Example: rand_r()

#pragma omp parallel firstprivate(a) num_threads(4)
{
    unsigned int seed; //threads_number_n(or other parameters)
    seed = omp_get_thread_num();

    for(int i = 0;i< 100;i++){
    a = rand_r(&seed);
      printf("threadnum = %d a = %f\n",omp_get_thread_num(),a);
      }
}

Example drand48_r()

#pragma omp parallel firstprivate(a) num_threads(4)
{
    struct drand48_data drand_buf;
    unsigned short int x=0,y=0,z=0;
    long int see = 0;
    unsigned short int seed16v[3]={x,y,z};

    seed48_r(seed16v,&drand_buf);
    srand48_r(see,&drand_buf);
     //如此一來，四個threads都會是一樣的結果，記得seed48_r與srand48_r都要ini
     //否則結果會讓你很意外？
    do{

      for(int i = 0;i< 100;i++){
    drand48_r(&drand_buf,&a);
      printf("threadnum = %d a = %f\n",omp_get_thread_num(),a);
      }
#pragma omp barrier
#pragma omp single
      {
    txt++;
      }
   }while(txt < 10);
}
參考文章
參考文章2

ii。另外尋找其他parallel random number generators
→ SPRNG (還不會使用)
    → CUDA   (請查閱user guide)

2.malloc & free allocate (突然覺得我很雖...剛好都遇到了。_。)
其理由也是跟rand很像，他是直接把memory開在global memory上，因些無法直接分配給各個threads (不是很確定我這樣子解理對不對，畢竟對多執行緒還不是很了解)

以下是引述Jim Dempsey部落格裡的範例：
When you want each thread to have their own array
double* array = 0;   // *** bad, pointer in wrong scope
                     // ok to do this when shared(array) on pragma
#pragma omp parallel
{
    array = new double[count]; // *** bad all threads sharing same pointer
                                // *** 2nd and later threads overwrite pointer
   ...
   delete [] array; // *** 2nd and later threads returning same memory
}

------------------------------------

#pragma omp parallel
{
    double* array = 0;
    array = new double[count]; // *** good when you want each thread to have seperate copy
    ...
    delete [] array; // *** good each thread returning seperate copy
}

--------------------

double* array = 0; // OK because of private(array) on pragma
#pragma omp parallel private(array)
{
    array = new double[count]; // *** good when you want each thread to have seperate copy
    ...
    delete [] array; // *** good each thread returning seperate copy
}

--------------------

double* array = 0;
#pragma omp parallel private(array)
{
    array = new double[count]; // *** good when you want each thread to have seperate copy
    ...
}
delete [] array; // *** bad main thread returning one copy

There is nothing wrong with new/delete inside parallel regions, in fact it may be required when you want each thread to have seperate data (e.g. for temporary arrays).

原文連結

『 GoINg mY WAy 』

OpenMP常遇到的問題

沒有留言:

張貼留言