嗯..因為老師問我,怎麼把主機名稱"gauss"改成"Gauss"
傻了我@_@,因為我也不知道。
所以就查了一下,要怎麼改主機名稱
debain:
主機名稱是紀錄在
/etc/hostname
裡面,修改之後
/etc/init.d/hostname.sh start
登出再登入就會重新load hostname file了
不然就是重開機囉XP
OpenMP常遇到的問題
最近在學習OpenMP,基本上使用方法大概都可以由前一篇的文章連結學習到
不過小弟我在把我的code改成OpenMP的時候。速度卻比原本的慢了快100倍......
因此來把問題寫下...避免之後遇到不知道怎麼處理
1.rand()
rand()涵數是一個在global memory上執行的涵數,因此如果要在多個threads上同時執行。
rand()會有threads-safe的問題,所以在執行上會只淮許一個threads進去。
目前有兩個解決的方法
i。rand() → rand_r()
drand48() → drand48_r()
Example: rand_r()
#pragma omp parallel firstprivate(a) num_threads(4)
{
unsigned int seed; //threads_number_n(or other parameters)
seed = omp_get_thread_num();
for(int i = 0;i< 100;i++){
a = rand_r(&seed);
printf("threadnum = %d a = %f\n",omp_get_thread_num(),a);
}
}
Example drand48_r()
#pragma omp parallel firstprivate(a) num_threads(4)
{
struct drand48_data drand_buf;
unsigned short int x=0,y=0,z=0;
long int see = 0;
unsigned short int seed16v[3]={x,y,z};
seed48_r(seed16v,&drand_buf);
srand48_r(see,&drand_buf);
//如此一來,四個threads都會是一樣的結果,記得seed48_r與srand48_r都要ini
//否則結果會讓你很意外?
do{
for(int i = 0;i< 100;i++){
drand48_r(&drand_buf,&a);
printf("threadnum = %d a = %f\n",omp_get_thread_num(),a);
}
#pragma omp barrier
#pragma omp single
{
txt++;
}
}while(txt < 10);
}
參考文章
參考文章2
ii。另外尋找其他parallel random number generators
→ SPRNG (還不會使用)
→ CUDA (請查閱user guide)
2.malloc & free allocate (突然覺得我很雖...剛好都遇到了。_。)
其理由也是跟rand很像,他是直接把memory開在global memory上,因些無法直接分配給各個threads (不是很確定我這樣子解理對不對,畢竟對多執行緒還不是很了解)
以下是引述Jim Dempsey部落格裡的範例:
When you want each thread to have their own array
double* array = 0; // *** bad, pointer in wrong scope
// ok to do this when shared(array) on pragma
#pragma omp parallel
{
array = new double[count]; // *** bad all threads sharing same pointer
// *** 2nd and later threads overwrite pointer
...
delete [] array; // *** 2nd and later threads returning same memory
}
------------------------------------
#pragma omp parallel
{
double* array = 0;
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0; // OK because of private(array) on pragma
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0;
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
}
delete [] array; // *** bad main thread returning one copy
There is nothing wrong with new/delete inside parallel regions, in fact it may be required when you want each thread to have seperate data (e.g. for temporary arrays).
原文連結
不過小弟我在把我的code改成OpenMP的時候。速度卻比原本的慢了快100倍......
因此來把問題寫下...避免之後遇到不知道怎麼處理
1.rand()
rand()涵數是一個在global memory上執行的涵數,因此如果要在多個threads上同時執行。
rand()會有threads-safe的問題,所以在執行上會只淮許一個threads進去。
目前有兩個解決的方法
i。rand() → rand_r()
drand48() → drand48_r()
Example: rand_r()
#pragma omp parallel firstprivate(a) num_threads(4)
{
unsigned int seed; //threads_number_n(or other parameters)
seed = omp_get_thread_num();
for(int i = 0;i< 100;i++){
a = rand_r(&seed);
printf("threadnum = %d a = %f\n",omp_get_thread_num(),a);
}
}
Example drand48_r()
#pragma omp parallel firstprivate(a) num_threads(4)
{
struct drand48_data drand_buf;
unsigned short int x=0,y=0,z=0;
long int see = 0;
unsigned short int seed16v[3]={x,y,z};
seed48_r(seed16v,&drand_buf);
srand48_r(see,&drand_buf);
//如此一來,四個threads都會是一樣的結果,記得seed48_r與srand48_r都要ini
//否則結果會讓你很意外?
do{
for(int i = 0;i< 100;i++){
drand48_r(&drand_buf,&a);
printf("threadnum = %d a = %f\n",omp_get_thread_num(),a);
}
#pragma omp barrier
#pragma omp single
{
txt++;
}
}while(txt < 10);
}
參考文章
參考文章2
ii。另外尋找其他parallel random number generators
→ SPRNG (還不會使用)
→ CUDA (請查閱user guide)
2.malloc & free allocate (突然覺得我很雖...剛好都遇到了。_。)
其理由也是跟rand很像,他是直接把memory開在global memory上,因些無法直接分配給各個threads (不是很確定我這樣子解理對不對,畢竟對多執行緒還不是很了解)
以下是引述Jim Dempsey部落格裡的範例:
When you want each thread to have their own array
double* array = 0; // *** bad, pointer in wrong scope
// ok to do this when shared(array) on pragma
#pragma omp parallel
{
array = new double[count]; // *** bad all threads sharing same pointer
// *** 2nd and later threads overwrite pointer
...
delete [] array; // *** 2nd and later threads returning same memory
}
------------------------------------
#pragma omp parallel
{
double* array = 0;
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0; // OK because of private(array) on pragma
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0;
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
}
delete [] array; // *** bad main thread returning one copy
There is nothing wrong with new/delete inside parallel regions, in fact it may be required when you want each thread to have seperate data (e.g. for temporary arrays).
原文連結
Example OpenMP
因為研究上的需要,因此開始學習OpenMP的語法,接下來的所有的文章皆是測試的文章心得
第一篇就用最簡單的教學吧
首先就是在GUN上要怎麼把OpenMP編譯進去
$ g++ filenamie.cpp -fopenmp
如果不加上-fopenmp的話,編譯器會自動把openmp的語法都忽略掉。
以下開始進行測試
#include
#include
#include
int main()
{
for (int i = 0; i < 8; i++) {
printf("thread [%d]: print number %d\n", omp_get_thread_num(), i);
}
return 0;
}
結果是:
thread [0]: print number 0
thread [0]: print number 1
thread [0]: print number 2
thread [0]: print number 3
thread [0]: print number 4
thread [0]: print number 5
thread [0]: print number 6
thread [0]: print number 7
omp_get_thread_num()語法是告知現在是那個執行緒,最簡單的想法就是他安排那一個CPU在跑這段語法
明顯看到,就是我們一般單執行緒的寫法。
接著加上openmp平行的寫法
#include
#include
#include
int main()
{
#pragma omp parallel for
for (int i = 0; i < 8; i++) {
printf("thread [%d]: print number %d\n", omp_get_thread_num(), i);
}
return 0;
}
結果是:
thread [1]: print number 2
thread [1]: print number 3
thread [0]: print number 0
thread [0]: print number 1
thread [2]: print number 4
thread [2]: print number 5
thread [3]: print number 6
thread [3]: print number 7
可以看到,這個for變的不是單一執行緒在執行,而是不同的CPU在上面跑
所以出來的結果就不是如第一段相同。
最後再來看看
#include
#include
#include
int main()
{
#pragma omp parallel
for (int i = 0; i < 8; i++) {
printf("thread [%d]: print number %d\n", omp_get_thread_num(), i);
}
return 0;
}
結果會是:
thread [2]: print number 0
thread [2]: print number 1
thread [2]: print number 2
thread [2]: print number 3
thread [2]: print number 4
thread [2]: print number 5
thread [2]: print number 6
thread [2]: print number 7
thread [1]: print number 0
thread [1]: print number 1
thread [1]: print number 2
thread [1]: print number 3
thread [1]: print number 4
thread [1]: print number 5
thread [1]: print number 6
thread [1]: print number 7
thread [3]: print number 0
thread [3]: print number 1
thread [3]: print number 2
thread [3]: print number 3
thread [3]: print number 4
thread [3]: print number 5
thread [3]: print number 6
thread [3]: print number 7
thread [0]: print number 0
thread [0]: print number 1
thread [0]: print number 2
thread [0]: print number 3
thread [0]: print number 4
thread [0]: print number 5
thread [0]: print number 6
thread [0]: print number 7
目前就先寫到這,因為還有很多不懂的地方
學習參考文章:
VIML
OpenMP并行程序设计(二)
任務的劃分
變數環境
第一篇就用最簡單的教學吧
首先就是在GUN上要怎麼把OpenMP編譯進去
$ g++ filenamie.cpp -fopenmp
如果不加上-fopenmp的話,編譯器會自動把openmp的語法都忽略掉。
以下開始進行測試
#include
#include
#include
int main()
{
for (int i = 0; i < 8; i++) {
printf("thread [%d]: print number %d\n", omp_get_thread_num(), i);
}
return 0;
}
結果是:
thread [0]: print number 0
thread [0]: print number 1
thread [0]: print number 2
thread [0]: print number 3
thread [0]: print number 4
thread [0]: print number 5
thread [0]: print number 6
thread [0]: print number 7
omp_get_thread_num()語法是告知現在是那個執行緒,最簡單的想法就是他安排那一個CPU在跑這段語法
明顯看到,就是我們一般單執行緒的寫法。
接著加上openmp平行的寫法
#include
#include
#include
int main()
{
#pragma omp parallel for
for (int i = 0; i < 8; i++) {
printf("thread [%d]: print number %d\n", omp_get_thread_num(), i);
}
return 0;
}
結果是:
thread [1]: print number 2
thread [1]: print number 3
thread [0]: print number 0
thread [0]: print number 1
thread [2]: print number 4
thread [2]: print number 5
thread [3]: print number 6
thread [3]: print number 7
可以看到,這個for變的不是單一執行緒在執行,而是不同的CPU在上面跑
所以出來的結果就不是如第一段相同。
最後再來看看
#include
#include
#include
int main()
{
#pragma omp parallel
for (int i = 0; i < 8; i++) {
printf("thread [%d]: print number %d\n", omp_get_thread_num(), i);
}
return 0;
}
結果會是:
thread [2]: print number 0
thread [2]: print number 1
thread [2]: print number 2
thread [2]: print number 3
thread [2]: print number 4
thread [2]: print number 5
thread [2]: print number 6
thread [2]: print number 7
thread [1]: print number 0
thread [1]: print number 1
thread [1]: print number 2
thread [1]: print number 3
thread [1]: print number 4
thread [1]: print number 5
thread [1]: print number 6
thread [1]: print number 7
thread [3]: print number 0
thread [3]: print number 1
thread [3]: print number 2
thread [3]: print number 3
thread [3]: print number 4
thread [3]: print number 5
thread [3]: print number 6
thread [3]: print number 7
thread [0]: print number 0
thread [0]: print number 1
thread [0]: print number 2
thread [0]: print number 3
thread [0]: print number 4
thread [0]: print number 5
thread [0]: print number 6
thread [0]: print number 7
目前就先寫到這,因為還有很多不懂的地方
visual profiler V3.2 on debian
本來想用visual profiler看一下程式上還有那邊可以修改的
沒想到v3.2的cuda toolkit竟然開不了!出現缺少
GLIBCXX_3.4.11 not found (required by computeprof)
突然就傻了,之前記得是可以用的
然後跑去NVIDIA的論壇看看,找到解答了
解決辦法
不過我還是自已寫一遍下來吧,以後要用可以翻比較快
1.
$/usr/local/cuda/computeprof/bin/computeprof &
$computeprof: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by computeprof)
如果出現上面的訊息
先到debain的網站下載libstdc++.so.6
debian官方網站
2.
#dpkg -x libstdc++6_4.4.4-8_amd64.deb /tmp
#cp /tmp/usr/lib/libstdc++.so.* /usr/local/cuda/computeprof/bin/
這樣一來就應該可以使用了
沒想到v3.2的cuda toolkit竟然開不了!出現缺少
GLIBCXX_3.4.11 not found (required by computeprof)
突然就傻了,之前記得是可以用的
然後跑去NVIDIA的論壇看看,找到解答了
解決辦法
不過我還是自已寫一遍下來吧,以後要用可以翻比較快
1.
$/usr/local/cuda/computeprof/bin/computeprof &
$computeprof: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by computeprof)
如果出現上面的訊息
先到debain的網站下載libstdc++.so.6
debian官方網站
2.
#dpkg -x libstdc++6_4.4.4-8_amd64.deb /tmp
#cp /tmp/usr/lib/libstdc++.so.* /usr/local/cuda/computeprof/bin/
這樣一來就應該可以使用了
訂閱:
文章 (Atom)