Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Лупин / Лекции_2007 / Введение в параллельное программирование

.pdf
Скачиваний:
94
Добавлен:
16.04.2013
Размер:
1.73 Mб
Скачать

How to mix MPI and OpenMP* in one program?

A sequential program working on a data set

Replicate the program.

Add glue code

Break up the data

•Create the MPI program with its data decomposition.

• Use OpenMP inside each MPI process.

*Other names and brands may be claimed as the property of others.

Get the MPI part done first, then add OpenMP pragma where it makes sense to do so

Pi program with MPI and OpenMP*

#include <mpi.h> #include “omp.h”

void main (int argc, char *argv[])

{

int i, my_id, numprocs; double x, pi, step, sum = 0.0 ; step = 1.0/(double) num_steps ;

MPI_Init(&argc, &argv) ; MPI_Comm_Rank(MPI_COMM_WORLD, &my_id) ; MPI_Comm_Size(MPI_COMM_WORLD, &numprocs) ; my_steps = num_steps/numprocs ;

#pragma omp parallel for private(x) reduction(+:sum)

for (i=myrank*my_steps; i<(myrank+1)*my_steps ; i++)

{

x = (i+0.5)*step;

sum += 4.0/(1.0+x*x);

}

sum *= step ;

MPI_Reduce(&sum, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD) ;

}

*Other names and brands may be claimed as the property of others.

1 D Heat Diffusion Equation - sequential

#include <stdio.h> #include <stdlib.h> #define NX 100

void main(void) {

double ukArray[NX], ukp1Array[NX];

double *uk = ukArray;

double *ukp1 = ukp1Array;

double dx = 1.0/NX;

double dt = 0.5*dx*dx; double *temp;

uk[0] = 1.0 ; uk[NX-1] = 10.0; ukp1[0] = 1.0; ukp1[NX-1] = 10.0; for (int i = 1; i < NX-1; ++i) uk[i] = 0.0;

for (int k = 0; k < 10000; ++k) { for (int i = 1; i < NX-1; ++i) {

ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]);

}

temp = ukp1; ukp1 = uk; uk = temp;

}

}

1 D Heat Diffusion Equ. – OpenMP*

#include <stdio.h> #include <stdlib.h> #include <omp.h> #define NX 100

void main(void) {

double ukArray[NX], ukp1Array[NX];

double *uk = ukArray;

double *ukp1 = ukp1Array;

double dx = 1.0/NX;

double dt = 0.5*dx*dx; double *temp;

uk[0] = 1.0 ; uk[NX-1] = 10.0; ukp1[0] = 1.0; ukp1[NX-1] = 10.0;

for (int i = 1; i < NX-1; ++i) uk[i] = 0.0;

#pragma OMP parallel

for (int k = 0; k < 10000; ++k) {

#pragma omp for

 

for (int i = 1; i < NX-1; ++i) {

ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]);

}

 

#pragma omp single

 

{ temp = ukp1; ukp1 = uk; uk = temp;}

 

}

}

*Other names and brands may be claimed as the property of others.

1 D Heat Diffusion Equation - MPI

#include <stdio.h>

 

 

 

#include <stdlib.h>

 

for (int k = 0; k < NSTEPS; ++k) {

#include <mpi.h>

 

#define NX 100

 

 

if (myID != 0) MPI_Send(&uk[1], 1, MPI_DOUBLE,

void main(void) {

 

 

leftNbr, 0,MPI_COMM_WORLD);

double *uk = ukArray;

 

 

if (myID != numProcs-1) MPI_Send(&uk[numPoints],

double *ukp1 = ukp1Array;

 

 

1, MPI_DOUBLE, rightNbr, 0, MPI_COMM_WORLD);

double dx = 1.0/NX;

 

 

if (myID != 0) MPI_Recv(&uk[0], 1, MPI_DOUBLE, leftNbr,

double dt = 0.5*dx*dx;

 

 

0, MPI_COMM_WORLD, &status);

double *temp;

 

 

if (myID != numProcs-1) MPI_Recv(&uk[numPoints+1], 1,

int numProcs, myID, leftNbr, rightNbr;

 

MPI_DOUBLE, rightNbr, 0, MPI_COMM_WORLD,

int numPoints;

 

 

&status);

MPI_Status status;

 

 

if (myID != 0) {

 

 

 

int i=1;

uk[0] = 1.0 ; uk[NX-1] = 10.0;

 

ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]);

ukp1[0] = 1.0; ukp1[NX-1] = 10.0;

 

}

for (int i = 1; i < NX-1; ++i)

uk[i] = 0.0;

 

if (myID != numProcs-1) {

MPI_Init(&argc, &argv);

 

 

int i=numPoints;

MPI_Comm_size (MPI_COMM_WORLD, &numProcs); ukp1[i]=uk[i]+ (dt/(dx*dx))*(uk[i+1]-2*uk[i]+uk[i-1]);

MPI_Comm_rank(MPI_COMM_WORLD, &myID);

}

leftNbr = myID - 1; // ID of left "neighbor" process

temp = ukp1; ukp1 = uk; uk = temp;

 

}

 

rightNbr = myID + 1; // ID of right "neighbor" process

numPoints = (NX / numProcs);

MPI_Finalize();

uk = malloc(sizeof(double) * (numPoints+2));

return 0;

ukp1 = malloc(sizeof(double) * (numPoints+2)); }

 

 

 

 

 

 

 

 

 

Outline

Parallel programming, wetware, and software

Parallel programming APIs

Thread Libraries

Win32 API

POSIX threads

Compiler Directives

OpenMP*

Message Passing

MPI

More complicated examples

Choosing which API is for you

*Other names and brands may be claimed as the property of others.

Choosing a parallel programming language

Which should you use?

If you need to run on clusters, SMP, and many-core systems use MPI

MPI is the assembly code of parallel programming. It Demands very little of the hardware and hence runs “everywhere”

If you have very complex data structures that are hard to break into distinct chunks, use one of the shared memory approaches

Both OpenMP* and the thread libraries are based on shared address spaces, so sharing big complex data structures is easy.

BUT BE CAREFUL … a shared address space means you might be sharing when you don’t know it and that can mean race conditions.

If you are writing complex applications and need to focus on the application not the API, use OpenMP.

If you are writing system software and need to control everything, use a thread library.

*Other names and brands may be claimed as the property of others.

Summary

Most Parallel programs today are written with:

A low level threading library (Pthreads or Windows threads)

MPI

OpenMP*

Pick one and start becoming familiar with parallel programming.

Don’t stress too much on picking the best one. Programmers almost always work with multiple languages, and the same holds for parallel programmers and parallel languages.

Tools can help tremendously. Attend the next webinars in this series to learn about Intel’s parallel software tools

*Other names and brands may be claimed as the property of others.

Thank you for attending!

five more in this series… come interact with experts…

http://on24.com/event/36/88/3/rt/1/?eventid=36883

April 3

A Gentle Introduction to Parallel Software

Dr. Tim Mattson

 

 

 

 

Software Performance Analysis for Multi-Core CPUs and Windows

 

April 17

 

Gary Carleton

 

Vista*

 

 

 

 

 

Three Steps to Threading and Performance.

 

May 1

Part 1 – Thread Correctness: Maintaining Deterministic Results in

Dr. David Mackay

 

Developing, Maintaining and Tuning Threaded Software

 

 

 

 

 

Three Steps to Threading and Performance.

 

May 15

Part 2 – Expressing Parallelism: Case Studies with Intel®

Victoria Gromova

 

Threading Building Blocks

 

 

 

 

 

Three Steps to Threading and Performance.

 

June 5

 

Vasanth Tovinkere

 

Part 3 – Tuning Threaded Software: Next Steps After Concurrency

 

 

 

 

 

Using Intel® C++ and Fortran Compilers, Version 10.0 for

 

June 19

 

Joe Wolf

 

Performance, Multithreading, and Security

 

 

 

 

Read detailed abstracts & sign-up

*Other names and brands may be claimed as the property of others.