Лупин / Лекции_2007 / Введение в параллельное программирование
.pdf
How to mix MPI and OpenMP* in one program?
A sequential program working on a data set
Replicate the program.
Add glue code
Break up the data
•Create the MPI program with its data decomposition.
• Use OpenMP inside each MPI process.
*Other names and brands may be claimed as the property of others.
Pi program with MPI and OpenMP*
#include <mpi.h> #include “omp.h”
void main (int argc, char *argv[])
{
int i, my_id, numprocs; double x, pi, step, sum = 0.0 ; step = 1.0/(double) num_steps ;
MPI_Init(&argc, &argv) ; MPI_Comm_Rank(MPI_COMM_WORLD, &my_id) ; MPI_Comm_Size(MPI_COMM_WORLD, &numprocs) ; my_steps = num_steps/numprocs ;
#pragma omp parallel for private(x) reduction(+:sum)
for (i=myrank*my_steps; i<(myrank+1)*my_steps ; i++)
{
x = (i+0.5)*step;
sum += 4.0/(1.0+x*x);
}
sum *= step ;
MPI_Reduce(&sum, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD) ;
}
*Other names and brands may be claimed as the property of others.
1 D Heat Diffusion Equation - sequential
#include <stdio.h> #include <stdlib.h> #define NX 100
void main(void) {
double ukArray[NX], ukp1Array[NX];
double *uk = ukArray; |
double *ukp1 = ukp1Array; |
double dx = 1.0/NX; |
double dt = 0.5*dx*dx; double *temp; |
uk[0] = 1.0 ; uk[NX-1] = 10.0; ukp1[0] = 1.0; ukp1[NX-1] = 10.0; for (int i = 1; i < NX-1; ++i) uk[i] = 0.0;
for (int k = 0; k < 10000; ++k) { for (int i = 1; i < NX-1; ++i) {
ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]);
}
temp = ukp1; ukp1 = uk; uk = temp;
}
}
1 D Heat Diffusion Equ. – OpenMP*
#include <stdio.h> #include <stdlib.h> #include <omp.h> #define NX 100
void main(void) {
double ukArray[NX], ukp1Array[NX]; |
|
double *uk = ukArray; |
double *ukp1 = ukp1Array; |
double dx = 1.0/NX; |
double dt = 0.5*dx*dx; double *temp; |
uk[0] = 1.0 ; uk[NX-1] = 10.0; ukp1[0] = 1.0; ukp1[NX-1] = 10.0; |
|
for (int i = 1; i < NX-1; ++i) uk[i] = 0.0; |
|
#pragma OMP parallel |
|
for (int k = 0; k < 10000; ++k) { |
|
#pragma omp for |
|
for (int i = 1; i < NX-1; ++i) { |
|
ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]); |
|
} |
|
#pragma omp single |
|
|
{ temp = ukp1; ukp1 = uk; uk = temp;} |
|
} |
} |
*Other names and brands may be claimed as the property of others. |
1 D Heat Diffusion Equation - MPI
#include <stdio.h> |
|
|
|
#include <stdlib.h> |
|
for (int k = 0; k < NSTEPS; ++k) { |
|
#include <mpi.h> |
|
||
#define NX 100 |
|
|
if (myID != 0) MPI_Send(&uk[1], 1, MPI_DOUBLE, |
void main(void) { |
|
|
leftNbr, 0,MPI_COMM_WORLD); |
double *uk = ukArray; |
|
|
if (myID != numProcs-1) MPI_Send(&uk[numPoints], |
double *ukp1 = ukp1Array; |
|
|
1, MPI_DOUBLE, rightNbr, 0, MPI_COMM_WORLD); |
double dx = 1.0/NX; |
|
|
if (myID != 0) MPI_Recv(&uk[0], 1, MPI_DOUBLE, leftNbr, |
double dt = 0.5*dx*dx; |
|
|
0, MPI_COMM_WORLD, &status); |
double *temp; |
|
|
if (myID != numProcs-1) MPI_Recv(&uk[numPoints+1], 1, |
int numProcs, myID, leftNbr, rightNbr; |
|
MPI_DOUBLE, rightNbr, 0, MPI_COMM_WORLD, |
|
int numPoints; |
|
|
&status); |
MPI_Status status; |
|
|
if (myID != 0) { |
|
|
|
int i=1; |
uk[0] = 1.0 ; uk[NX-1] = 10.0; |
|
ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]); |
|
ukp1[0] = 1.0; ukp1[NX-1] = 10.0; |
|
} |
|
for (int i = 1; i < NX-1; ++i) |
uk[i] = 0.0; |
|
if (myID != numProcs-1) { |
MPI_Init(&argc, &argv); |
|
|
int i=numPoints; |
MPI_Comm_size (MPI_COMM_WORLD, &numProcs); ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]); |
|||
MPI_Comm_rank(MPI_COMM_WORLD, &myID); |
} |
||
leftNbr = myID - 1; // ID of left "neighbor" process |
temp = ukp1; ukp1 = uk; uk = temp; |
||
|
} |
|
|
rightNbr = myID + 1; // ID of right "neighbor" process |
|||
numPoints = (NX / numProcs); |
MPI_Finalize(); |
||
uk = malloc(sizeof(double) * (numPoints+2)); |
return 0; |
||
ukp1 = malloc(sizeof(double) * (numPoints+2)); } |
|
||
|
|
|
|
|
|
|
|
Outline
Parallel programming, wetware, and software
Parallel programming APIs
–Thread Libraries
–Win32 API
–POSIX threads
–Compiler Directives
–OpenMP*
–Message Passing
–MPI
More complicated examples
Choosing which API is for you
*Other names and brands may be claimed as the property of others.
Choosing a parallel programming language
Which should you use?
–If you need to run on clusters, SMP, and many-core systems use MPI
–MPI is the assembly code of parallel programming. It Demands very little of the hardware and hence runs “everywhere”
–If you have very complex data structures that are hard to break into distinct chunks, use one of the shared memory approaches
–Both OpenMP* and the thread libraries are based on shared address spaces, so sharing big complex data structures is easy.
–BUT BE CAREFUL … a shared address space means you might be sharing when you don’t know it and that can mean race conditions.
–If you are writing complex applications and need to focus on the application not the API, use OpenMP.
–If you are writing system software and need to control everything, use a thread library.
*Other names and brands may be claimed as the property of others.
Summary
Most Parallel programs today are written with:
–A low level threading library (Pthreads or Windows threads)
–MPI
–OpenMP*
Pick one and start becoming familiar with parallel programming.
–Don’t stress too much on picking the best one. Programmers almost always work with multiple languages, and the same holds for parallel programmers and parallel languages.
Tools can help tremendously. Attend the next webinars in this series to learn about Intel’s parallel software tools
*Other names and brands may be claimed as the property of others.
Thank you for attending!
five more in this series… come interact with experts…
http://on24.com/event/36/88/3/rt/1/?eventid=36883
April 3 |
A Gentle Introduction to Parallel Software |
Dr. Tim Mattson |
|
|
|
|
Software Performance Analysis for Multi-Core CPUs and Windows |
|
April 17 |
|
Gary Carleton |
|
Vista* |
|
|
|
|
|
Three Steps to Threading and Performance. |
|
May 1 |
Part 1 – Thread Correctness: Maintaining Deterministic Results in |
Dr. David Mackay |
|
Developing, Maintaining and Tuning Threaded Software |
|
|
|
|
|
Three Steps to Threading and Performance. |
|
May 15 |
Part 2 – Expressing Parallelism: Case Studies with Intel® |
Victoria Gromova |
|
Threading Building Blocks |
|
|
|
|
|
Three Steps to Threading and Performance. |
|
June 5 |
|
Vasanth Tovinkere |
|
Part 3 – Tuning Threaded Software: Next Steps After Concurrency |
|
|
|
|
|
Using Intel® C++ and Fortran Compilers, Version 10.0 for |
|
June 19 |
|
Joe Wolf |
|
Performance, Multithreading, and Security |
|
|
|
|
Read detailed abstracts & sign-up
*Other names and brands may be claimed as the property of others.
