
- •Explanation note
- •Introduction
- •1.Analyzing of given task.
- •1.1.Creation of system if equations.
- •1.2 Topological analyze: definition of branches of tree and antitree, definition of matrixes
- •1.3 Modifying of given system of equations to type, this can be calculated by eller method
- •Inverting of matrixes.
- •2. Developing of serial calculating of given task
- •2.Developing of parallel program for calculating of given task
- •3.1 Prior analysis of possible paralleling variants
- •Virtual speed and efficiency of resource use
- •3.2 Development of parallel program
- •Int mpi_Init(int *argc, char ***argv)
- •Int mpi_Barrier (mpi_Comm comm)
- •Conclusion
- •References
Int mpi_Barrier (mpi_Comm comm)
IN comm - communicator.
Synchronization by using barriers, such as completion of all processes some stage decision problem, the results of which will be used at a later stage. Use barrier ensures that none of the processes should not proceed in advance to the next stage until the previous result of not fully formed. Spotting the synchronization process performs any function as a collective.
Generic data distribution is performed using MPI_Bcast. The No. root sends a message from your transmission buffer area due to all processes of the communicator comm.
int MPI_Bcast (void * buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm)
INOUT Buffer - start address location in memory data that is sent;
IN count - the number of items that exist;
IN datatype - type of elements that exist;
IN root - the process number, the sender;
IN comm - communicator.
After each sub process of communication communicator comm, including himself sender receives a copy of messages on the shipping process root.
On fig. 3.4 graphic interpretation of operations Bcast.
Figure 2.4 - graphic interpretation of operations Bcast.
Results
of execution
are shown on
fig. 3.5
Figure 3.5 - Results of execution of parallel program
Here the parallel solution efficiency in comparison with consistent decisions on indicators is shown:
Acceleration
Efficiency
Conclusion
In the course work program was developed to generate consistent equations, and solving these equations.
It was considered approaches to parallelization, performance parameters of the decision on the parallel computer system:
Loading processes;
Correlation between the calculated volumes of transactions and data exchange operations;
Virtual acceleration;
When developing a parallel program is considered some features of MPI, which were used, the main characteristics of systems autocommutation.
In the end decisions are parallel efficiency in comparison to the serial.
References
Абрамов Ф.А., Фельдман Л.П., Святный В.А. Моделирование динамических процессов рудничной аэрологии. К.: Наук. думка, 1981. – 284 с.
Святний В.А. Проблеми паралельного моделювання складних динамічних систем. Наукові праці ДонДТУ, Серія ІКОТ, вип. 6, 1999, с. 6-14.
Святний В.А., Молдованова О.В., Перерва А.О. Проблемно орієнтоване паралельне моделююче середовище для динамічних мережних об’єктів. Наукові праці ДонДТУ, Серія ІКОТ, вип. 29, 2001, с. 246-253.
Перерва А.А. Магістерська дисертація, ДонДТУ, Донецьк, 2000.
Parallaxis Version 2 User Manual / Ingo Barth, Thomas Braunl, Stefan Engelhardt, Frank Sembach. Universitat Stuttgart, Fakultat Informatik, Feb.1991.
Паралельне програмування: Початковий курс: Навч. посібник / Вступ. слово А. Ройтера; Пер. з нім. В.А.Святного. – К.: Вища шк., 1997. – 358 с.: іл.
Т.Бройнль. Паралельне програмування (переклад з німецької мови В.А. Святного), Київ: ВШ, 1997, 358с.
Корнеев В.В. Параллельные вычислительные системы. – М.: «Нолидж», 1999, 320с.
Воеводин В.В., Воеводин Вл.В. Параллельные вычисления. – «БХВ-Петербург», 2002, 599с.
Немнюгин С.А., Стесик О.А., «Параллельное программирование для многопроцессорных вычислительных систем». – «БХВ-Петербург», 2002, 396с.
Богачев К.Ю. Основы параллельного программирования.- М.:Бионом – 2003, -342с.
Хьюз К., Хьюз Т. Параллельное и распределенное программирование с использованием С++ .– М.: «Вильямс», 2004, 672с.
Хорошевский В.Г. Архитектура вычислительной системы, - М.:МГТУ, 2008, - 520с.
Гергель В.П. Теория и практика параллельных вычислений.
APPENDIX A
Code of serial realization program of Eiller method
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <math.h>
#include <windows.h>
#define ERR_CODE 1
#define EPS 0.0001
#define STEP 0.1
using namespace std;
FILE *fin,*fout1,*fout2,*fout3,*fout4,*fout5,*fout6, *fout7, *fout8, *fout9, *fout10, *fout11;
FLOAT H[11][11],TPH[11][11],RUZ[11][11],F[11][11],W[11][11],TP[11][11],RU[11][11],Z[11][11],U[11][11],Y[11][11],X[11][11], Ut[11], Uold[4], Xold[3];
//---------------------------------------------------
VOID read_matrix(FLOAT matrix[11][11],INT n1,INT m1, FILE *f)
{
for(INT i=0;i<n1;i++)
{
for(INT j=0;j<m1;j++)
fscanf(f,"%f",&matrix[i][j]);
}
}
//---------------------------------------------------
VOID write_matrix(FLOAT matrix[11][11],INT n1,INT m1)
{
for(INT i=0;i<n1;i++)
{
printf("\n");
for(INT j=0;j<m1;j++)
printf("\t%f ",matrix[i][j]);
}
printf("\n");
}
//---------------------------------------------------
VOID write_vector(FLOAT matrix[11][11],INT n1)
{
INT i;
printf("\n");
for(i=0;i<n1;i++)
{
printf("\t%3.3f ",matrix[i][0]);
}
printf("\n");
}
//-----------------------------------------------------
VOID mul_matrix(FLOAT X[11][11],FLOAT Y[11][11],FLOAT C[11][11], INT n, INT m, INT p)
{
for(INT i=0;i<n;i++)
for(INT j=0;j<p;j++)
{
C[i][j]=0;
for (INT k=0;k<m;k++) C[i][j]=C[i][j]+X[i][k]*Y[k][j];
}
}
//-----------------------------------------------------
VOID mul_matrix_chislo(FLOAT X[11][11],FLOAT Y,FLOAT C[11][11], INT n, INT m)
{
for(INT i=0;i<n;i++)
for(INT j=0;j<m;j++)
{
C[i][j]=X[i][j]*Y;
}
}
//------------------------------------------------------
VOID mul_vector_chislo(FLOAT X[11][11],FLOAT Y,FLOAT C[11][11], INT n)
{
for(INT i=0;i<n;i++)
C[i][0]=X[i][0]*Y;
}
//------------------------------------------------------
VOID add_matrix(FLOAT X[11][11],FLOAT Y[11][11],FLOAT C[11][11], INT n, INT m)
{
for(INT i=0;i<n;i++)
for(INT j=0;j<m;j++)
{
C[i][j]=X[i][j]+Y[i][j];
}
}
//------------------------------------------------------
VOID sub_matrix(FLOAT X[11][11],FLOAT Y[11][11],FLOAT C[11][11], INT n, INT m)
{
for(INT i=0;i<n;i++)
for(INT j=0;j<m;j++)
{
C[i][j]=X[i][j]-Y[i][j];
}
}
//------------------------------------------------------
VOID sub_vector(FLOAT X[11][11],FLOAT Y[11][11],FLOAT C[11][11], INT n)
{
for(INT i=0;i<n;i++)
C[i][0]=X[i][0]-Y[i][0];
}
//---------------------------------------------------------
VOID add_vector(FLOAT X[11][11],FLOAT Y[11][11],FLOAT C[11][11], INT n)
{
for(INT i=0;i<n;i++)
C[i][0]=X[i][0]+Y[i][0];
}
//---------------------------------------------------------
VOID write_vectorXY(FLOAT x[11][11],FLOAT y[11][11])
{
INT i;
printf("\n");
for(i=0;i<3;i++)
{
printf("\t%3.3f ",x[i][0]);
}
for(i=0;i<4;i++)
{
printf("\t%3.3f ",y[i][0]);
}
printf("\n");
}
//-----------------------------------------------------
VOID write_vectorXYfile(FLOAT x[11][11],FLOAT y[11][11])
{
fprintf(fout1,"\n%5.3f",X[0][0]);
fprintf(fout2,"\n%5.3f",X[1][0]);
fprintf(fout3,"\n%5.3f",X[2][0]);
fprintf(fout4,"\n%5.3f",U[0][0]);
fprintf(fout5,"\n%5.3f",U[1][0]);
fprintf(fout6,"\n%5.3f",U[2][0]);
fprintf(fout7,"\n%5.3f",U[3][0]);
}
//-----------------------------------------------------
VOID init(VOID)
{
if((fin=fopen("data.txt","r"))==0)
{
cout<<"Error read file file-in.txt"<<endl;
exit(ERR_CODE);
}
if((fout1=fopen("file-out1.txt","w+"))==0)
{
cout<<"Error open fout1";
exit(ERR_CODE);
}
if((fout2=fopen("file-out2.txt","w+"))==0)
{
cout<<"Error open fout";
exit(ERR_CODE);
}
if((fout3=fopen("file-out3.txt","w+"))==0)
{
cout<<"Error open fout";
exit(ERR_CODE);
}
if((fout4=fopen("file-out4.txt","w+"))==0)
{
cout<<"Error open fout";
exit(ERR_CODE);
}
if((fout5=fopen("file-out5.txt","w+"))==0)
{
cout<<"Error open fout";
exit(ERR_CODE);
}
if((fout6=fopen("file-out6.txt","w+"))==0)
{
cout<<"Error open fout";
exit(ERR_CODE);
}
if((fout7=fopen("file-out7.txt","w+"))==0)
{
cout<<"Error open fout";
exit(ERR_CODE);
}
SecureZeroMemory(&W,sizeof(W));
SecureZeroMemory(&H,sizeof(H));
SecureZeroMemory(&TPH,sizeof(TPH));
SecureZeroMemory(&RUZ,sizeof(RUZ));
SecureZeroMemory(&F,sizeof(F));
SecureZeroMemory(&TP,sizeof(TP));
SecureZeroMemory(&RU,sizeof(RU));
SecureZeroMemory(&Z,sizeof(Z));
SecureZeroMemory(&Y,sizeof(Y));
SecureZeroMemory(&X,sizeof(X));
SecureZeroMemory(&U,sizeof(U));
SecureZeroMemory(&Ut,sizeof(Ut));
for(INT i=0;i<7;i++)
fscanf(fin,"%f",&H[i][0]);
read_matrix(W,3,4,fin);
read_matrix(TP,4,7,fin);
read_matrix(RU,4,7,fin);
printf("\n\t\tVector E\n");
for(INT i=0;i<7;i++)
printf("\t%5.2f",H[i][0]);
printf("\n\t\tMatrix W");
write_matrix(W,3,4);
printf("\n\t\tMatrix TP");
write_matrix(TP,4,7);
printf("\n\t\tMatrix RU");
write_matrix(RU,4,7);
X[0][0]=0.001;
X[1][0]=0.001;
X[2][0]=0.001;
U[0][0]=0.001;
U[1][0]=0.001;
U[2][0]=0.001;
U[3][0]=0.001;
}
//-----------------------------------------------------
INT main(INT argc, TCHAR* argv[])
{
SYSTEMTIME stime,etime;
DWORD min,sec,ms;
INT cnt = 0;
BOOL done = FALSE;
init();
getch();
// TP (4x7) * H (7*1) => TPH (4*1) - постоянно
GetSystemTime(&stime);
mul_matrix(TP,H,TPH,4,7,1);
//cout<<"TPH"<<endl;
//write_matrix(TPH,4,1);
//ELLER
for(INT j=0;j<150000;j++)
{
// заполняем Z[0..2] X[0..2] - иксы
for (INT i = 0; i<3; i++)
Z[i][0] = X[i][0]*X[i][0];
// заполняем Z[3..6] U[0..3] - игрики
for (INT i = 0; i<4; i++)
Z[i+3][0] = U[i][0]*U[i][0];
mul_matrix(RU,Z,RUZ,4,7,1);
sub_vector(TPH,RUZ,F,4);
mul_vector_chislo(F,STEP,F,4);
for(INT i=0;i<4;i++)
Uold[i] = U[i][0];
add_vector(U,F,U,4);
for(INT i=0;i<3;i++)
Xold[i] = X[i][0];
mul_matrix(W,U,X,3,4,1);
mul_vector_chislo(X,-1,X,3);
//cout<<"X ="<<X[0][0]<<" "<<X[1][0]<<" "<<X[2][0]<<endl;
//cout<<"DEBUG!!!!!! ENDED"<<endl<<endl;
write_vectorXY(X,U);
write_vectorXYfile(X,U);
//U[i][0] - вектор игриков следующего шага
//Uold[] Xold[] - иксы/игрики текущего шага
//X[0-2][0] - иксы следующего шага
if((fabs(U[0][0]-Uold[0])<=EPS)&& //Y1
(fabs(U[1][0]-Uold[1])<=EPS)&& //Y2
(fabs(U[2][0]-Uold[2])<=EPS)&& //Y3
(fabs(U[3][0]-Uold[3])<=EPS)&& //Y4
(fabs(X[0][0]-Xold[0])<=EPS)&& //X1
(fabs(X[1][0]-Xold[1])<=EPS)&& //X2
(fabs(X[2][0]-Xold[2])<=EPS)&& //X3
(j>1))
{
cout<<endl<<"Amount of iterations = "<<j<<endl;
done = TRUE;
break;
}
cnt++;
}
if (!done)
{
cout<<"Not done or done with errors"<<endl;
getch();
}
GetSystemTime(&etime);
if (etime. wMinute<stime.wMinute)
min=60-stime.wMinute+etime.wMinute;
else
min=etime.wMinute-stime.wMinute;
if (etime.wSecond<stime.wSecond)
sec=60-stime.wSecond+etime.wSecond;
else
sec=etime.wSecond-stime.wSecond;
if (etime.wMilliseconds<stime.wMilliseconds)
ms=100-stime.wMilliseconds+etime.wMilliseconds;
else
ms=etime.wMilliseconds-stime.wMilliseconds;
cout<<"\n-----------------------------------------"<<endl;
cout<<"Time of solving "<<min<<":"<<sec<<":"<<ms<<endl;
getch();
return NO_ERROR;
}
APPENDIX B
Code of parallel realization program of Eiller’s method
#include "stdafx.h"
#include <mpi.h>
#include <conio.h>
#include <stdio.h>
#include <math.h>
#include <Windows.h>
#define ROOT 0
//---------------------------------------------------
#define MPI_START_ERR 1
#define NOT_A_LOT_OF_PROCS 2
#define FILE_WORK_ERR 3
#define MIN_AMNT_PROC 7
#define XY_AMNT 7
#define STEPAMNT 548
#define STEP 0.1f
#define MSIZE 7
//---Prototypes of functions
//---------------------------------------------------
VOID read_matrix(FLOAT matrix[11][11],INT n1,INT m1, FILE *f);
//---------------------------------------------------
//like C = X_op_Y*Z
VOID mul_add_sub_matrixex(INT sign, FLOAT Y[11][11],FLOAT Z[11],FLOAT C[11], INT displ, INT n, INT rank);
//---------------------------------------------------
INT _tmain(INT argc, CHAR* argv[])
{
FLOAT H[MSIZE];
FLOAT TP[MSIZE][MSIZE];
FLOAT RU[MSIZE][MSIZE];
FLOAT W[MSIZE][MSIZE];
FILE *fout=NULL;
CHAR fName[MAX_PATH];
FLOAT X[XY_AMNT];
FLOAT U[XY_AMNT];
FLOAT XY[XY_AMNT];
FLOAT XYpow2[XY_AMNT];
FLOAT TPH[XY_AMNT];
FLOAT RUZ[XY_AMNT];
FLOAT F[XY_AMNT];
double starttime, endtime;
INT proccnt, rank, rc;
//-----------------------------------------------
//Инициализация MPI, и проверка инициализировано оно или нет
rc = MPI_Init(&argc, &argv);
if (rc != 0)
{
printf ("Error starting MPI program. Terminating.\n");
MPI_Finalize();
return MPI_START_ERR;
}
//------------------------------------------------
//Определение числа процессоров
MPI_Comm_size(MPI_COMM_WORLD, &proccnt);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if(proccnt < MIN_AMNT_PROC)
{
MPI_Finalize();
printf("You need &d at minimal, you have %d \n",MIN_AMNT_PROC,proccnt);
return NOT_A_LOT_OF_PROCS;
}
//initialization of memory
SecureZeroMemory(&U,sizeof(U));
SecureZeroMemory(&X,sizeof(X));
SecureZeroMemory(&U,sizeof(U));
SecureZeroMemory(&XY,sizeof(XY));
SecureZeroMemory(&XYpow2,sizeof(XYpow2));
SecureZeroMemory(&TPH,sizeof(TPH));
SecureZeroMemory(&RUZ,sizeof(RUZ));
SecureZeroMemory(&F,sizeof(F));
//-----------------------------------------------
if (rank == 0)
{
FILE *fin;
if((fin=fopen("data.txt","r"))==0)
{
MPI_Finalize();
printf("Error read file data.txt");
return FILE_WORK_ERR;
}
// reading H W TP RU
for(INT i=0;i<7;i++)
fscanf(fin,"%f",&H[i]);
read_matrix(W,3,4,fin);
read_matrix(TP,4,7,fin);
read_matrix(RU,4,7,fin);
} //if
//-----------------------------------------------
MPI_Barrier(MPI_COMM_WORLD);
// sending from proc_0 to all processes H W TP RU
// sending H
MPI_Bcast(&H, 7, MPI_FLOAT, ROOT, MPI_COMM_WORLD);
// sending W
MPI_Bcast(&W, 49, MPI_FLOAT, ROOT, MPI_COMM_WORLD);
// sending TP
MPI_Bcast(&TP, 49, MPI_FLOAT, ROOT, MPI_COMM_WORLD);
// sending RU
MPI_Bcast(&RU, 49, MPI_FLOAT, ROOT, MPI_COMM_WORLD);
//-----------------------------------------------
for (INT i=0;i<XY_AMNT;i++)
{
if (rank==i)
{
//opening files
//every proc opens it's own file
sprintf_s(&fName[0],sizeof(fName),"file-out%d.txt",i+1);
if(fopen_s(&fout,fName,"w+")!=0)
{
printf("Can't create file %s",fName);
MPI_Finalize();
return FILE_WORK_ERR;
} //if
}// if
//TPH=TP*H
//every proc counting own TPH
if (rank<XY_AMNT)
TPH[rank]=TPH[rank]+TP[rank][i]*H[i];
} // for
//-----------------------------------------------
for (INT i=0;i<XY_AMNT;i++)
MPI_Bcast(&TPH[i],1, MPI_FLOAT, i, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
// saving current time
starttime = MPI_Wtime();
//ELLER
//-----------------------------------------------
for(INT j=0;j<STEPAMNT;j++)
{
//X = -W*Y (U)
if(rank<3)
{
mul_add_sub_matrixex(-1,W,U,X,0,XY_AMNT,rank);
}
//BCAST X
for(INT i=0;i<3;i++)
MPI_Bcast(&X[i],1, MPI_FLOAT, i, MPI_COMM_WORLD);
//getting XY and XYpow2
for (INT i=0;i<3;i++)
XY[i]=X[i];
for (INT i=0;i<4;i++)
XY[i+3]=U[i];
for (INT i=0;i<7;i++)
XYpow2[i]=XY[i]*fabs((FLOAT)XY[i]);
MPI_Barrier(MPI_COMM_WORLD);
// U = U+F*STEP; F=TPH-RUZ
if((rank>2)&&(rank<XY_AMNT))
{
mul_add_sub_matrixex(1,RU,XYpow2,RUZ,3,XY_AMNT,rank);
F[rank-3]=TPH[rank-3]-RUZ[rank-3];
F[rank-3]=F[rank-3]*STEP;
U[rank-3]=U[rank-3]+F[rank-3];
}
// BCAST U
for(INT i=0;i<4;i++)
MPI_Bcast(&U[i],1, MPI_FLOAT, i+3, MPI_COMM_WORLD);
if (fout!=NULL)
fprintf(fout,"\n%5.3f",XY[rank]);
MPI_Barrier(MPI_COMM_WORLD);
if (rank==0)
{
printf("\n STEP = %d : %5.3f %5.3f %5.3f %5.3f %5.3f %5.3f %5.3f",j,XY[0],XY[1],XY[2],XY[3],XY[4],XY[5],XY[6]);
}
MPI_Barrier(MPI_COMM_WORLD);
}// for
//-----------------------------------------------
// getting end time of ELLER
endtime = MPI_Wtime();
MPI_Barrier(MPI_COMM_WORLD);
//-----------------------------------------------
if (rank == 0)
{
printf ("\n\nPROCESS0: X0 = %f",XY[0]);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 1)
{
printf ("\nPROCESS0: X1 = %f",XY[1]);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 2)
{
printf ("\nPROCESS1: X2 = %f",XY[2]);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 3)
{
printf ("\nPROCESS2: Y1 = %f",XY[3]);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 4)
{
printf ("\nPROCESS3: Y2 = %f",XY[4]);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 5)
{
printf ("\nPROCESS4: Y3 = %f",XY[5]);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 6)
{
printf ("\nPROCESS5: Y4 = %f",XY[6]);
}
MPI_Barrier(MPI_COMM_WORLD);
//-----------------------------------------------
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 0)
{
printf("\n-----------------------------------------\n");
printf("\nExecution time = %f seconds\n", endtime-starttime);
}
MPI_Finalize();
return NO_ERROR;
}
//---------------------------------------------------
//---function declaration
VOID mul_add_sub_matrixex(INT sign, FLOAT Y[11][11],FLOAT Z[11],FLOAT C[11], INT displ, INT n, INT rank)
{
C[rank-displ]=0;
for (INT i=0;i<n;i++)
if (sign == -1)
C[rank-displ]=C[rank-displ]-Y[rank-displ][i]*Z[i];
else C[rank-displ]=C[rank-displ]+Y[rank-displ][i]*Z[i];
}
//---------------------------------------------------
VOID read_matrix(FLOAT matrix[11][11],INT n1,INT m1, FILE *f)
{
for(INT i=0;i<n1;i++)
{
for(INT j=0;j<m1;j++)
fscanf(f,"%f",&matrix[i][j]);
}
}
//---------------------------------------------------