Добавил:

Andrew1992 Факультет ИКСС, группа ИКВТ-61 Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный университет телекоммуникаций им. проф. М.А. Бонч-Бруевича

Предмет:

Организация и управление облачными вычислениями в системах обработки и хранения данных

Файл:

CLOUD COMPUTING / Cloud.pdf

Скачиваний:

Добавлен:

20.06.2019

Размер:

50.48 Mб

Скачать

☆

<<< < Предыдущая 59 60 61 62 63 64 65 66 67 68 69 7071 / 13271 72 73 74 75 76 77 78 79 80 81 82 83 > Следующая >>>

188

Fig. 11.5 Combined grid-cloud security architecture

S. Ostermann et al.

Security

	1		2
GSI	1	request and
deployment		request and
deployment		release	MyCloud
request		release
		functions


			MyInstance
generate Keypair,		3, 5	4 store private Key
			4 store private Key
		Clouds
start instance		Clouds
		Management

1.A GSI-authenticated request for a new image deployment is received.

2.The security component checks in the MyCloud repository for the Clouds for which the user has valid credentials.

3.A new credential is generated for the new instance that needs to be started. In case multiple images need to be started, the same instance credential can be used to reduce the credential generation overhead (about 6–10 s in our experiments, including the communication overhead).

4.The new instance credentials are stored in the MyImage repository, which will only be accessible to the enactment engine service for job execution after proper GSI authentication.

5.A start instance request is sent to the Cloud using the newly generated instance credential.

6.When an instance is released, the resource manager deletes the corresponding credential from the MyInstance repository.

11.4 Evaluation

We extended the ASKALON enactment engine to consider our Cloud extensions by transferring files and submitting jobs to Cloud resources using the SCP/SSH provider of the Java CoG kit [23]. Some technical problems with these providers of the CoG kit required us to change the source code and create a custom build of the library to allow seamless and functional integration into the existing system.

For our experiments, we selected a scientific workflow application called Wien2k [24], which is a program package for performing electronic structure calculations of solids using density functional theory based on the full-potential (linearized) augmented plane-wave ((L)APW) and local orbital (lo) method. The Wien2k Grid workflow splits the computation into several course-grain activities,

11 Resource Management for Hybrid Grid and Cloud Computing

189

the work distribution being achieved by two parallel loops (second and fourth) consisting of a large number of independent activities calculated in parallel.

The number of sequential loops is statically unknown. We have chosen a problem case (called atype) that we solved using 193 and 376 parallel activities, and a problem size of 7.0, 8.0, and 9.0, which represents the number of planewaves that is equal to the size of the eigenvalue problem (i.e. the size of the matrix to be diagonalized) referenced as problem complexity in this work.

Figure 11.6 shows on the left the UML representation of the workflow that can be executed with ASKALON, and on the right, a concrete execution directed acyclic graph (DAG) showing one iteration of the while loop and four parallel activities in the parallel sections. The workflow size is determined at runtime as the parallelism is calculated by the first activity, and the last activity generates the result, which helps decide if the main loop is executed again or the result reaches the specified criteria.

We executed the workflow on a distributed testbed summarized in Table 11.3, consisting of four heterogeneous Austrian Grid sites [25] and 12 virtual CPUs from an “academic Cloud” called dps.cloud built using the Eucalyptus middleware [6] and the XEN virtualization mechanism [7]. We configured the dps.cloud resource classes to use one core, while multi-core configurations were prohibited by a bug in the Eucalyptus software (planned to be fixed in the next released). We fixed the

false

ﬁrst

true

<<Activity>>

first

second

<<ParallelFor>>

pforLAPW1

<<Activity>>

second

third

<<Activity>>

third

pforLAPW2

<<ParallelFor>>

fourth

<<Activity>>

fourt

<<Activity>>

last

Fig. 11.6 The Wien2k workflow in UML (left) and DAG (right) representation

190	S. Ostermann et al.

Table 11.3 Overview of resources used from the grid and the private cloud for workflow execution

Grid site	Location	Cores used	CPU type	GHz	Mem/core

karwendel	Innsbruck	12	Opteron	2.4	1,024 mb
altix1.uibk	Innsbruck	12	Itanium	1.4	1,024 mb
altix1.jku	Linz	12	Itanium	1.4	1,024 mb
hydra.gup	Linz	12	Itanium	1.6	1,024 mb
dps.cloud	Innsbruck	12	Opteron	2.2	1,024 mb

Table 11.4 Wien2K execution time and cost analysis on the Austrian grid with and without cloud resources for different number of parallel activities and problem sizes

			Grid +	Speedup	Used		Paid
Parallel	Problem	Grid	cloud	using	instances		instances		$/T $/
Parallel	Problem	Grid	cloud	using					$/T $/
activities	complexity	execution	execution	Cloud	Hours	$	Hours	$	min
activities	complexity	execution	execution	Cloud	Hours	$	Hours	$	min
193	Small (7.0)	874.66	803.66	1.09	2.7	0.54	12	2.04	1.72
193	Medium (8.0)	1,915.41	1218.09	1.57	4.1	0.82	12	2.04	0.18
193	Big (9.0)	3,670.18	2193.79	1.67	7.3	1.46	12	2.04	0.08
376	Small (7.0)	1,458.92	1275.31	1.14	4.3	0.86	12	2.04	0.67
376	Medium (8.0)	2,687.85	2020.17	1.33	6.7	1.34	12	2.04	0.18
376	Big (9.0)	5,599.67	4228.90	1.32	14.1	2.81	24	4.08	0.17

machine size of each Grid site to 12 cores to eliminate the variability in the resource availability and make the results across different experiments comparable.

We used a just-in-time scheduling mechanism that tries to map each activity onto the fastest available Grid resource. Once the Grid becomes full (because the size of the workflow parallel loops is larger than the total number of cores in the testbed), the scheduler starts requesting additional Cloud resources for executing, in parallel, the remaining workflow activities. Once these additional resources are available, they will be used to link Grid resources with different job submission methods.

Our goal was to compare the workflow execution for different problem sizes on the four Grid sites, with the execution using the same Grid environment supplemented by additional Cloud resources from dps.cloud. We executed each workflow instance five times and reported the average values obtained. The runtime variability in the Austrian Grid was less than 5%, because the testbed was idle during our experiments and each CPU was dedicated to running its activity with no external load or other queuing overheads.

Table11.4 shows the workflow execution times for 376 and 193 parallel activities in six different configurations. The small, medium, and big configuration values represent a problem size parameter that influences the execution time of the parallel activities. The improvement in using Cloud resources when compared with using only the four Grid sites increases from a small 1.08 speedup for short workflows with 14-min execution time, to a good 1.67 speedup for large workflows with 93-min execution time. The results show that a small and rather short workflow does not benefit much from the Cloud resources due to the high ratio between the smaller

11 Resource Management for Hybrid Grid and Cloud Computing

191

computation and the high provisioning and data transfer overheads. The main bottleneck when using Cloud resources is that the provisioned single core instances use separate file systems that require separate file transfers to start the computation. In contrast, Grid sites are usually parallel machines that share one file system across a larger number of cores, which significantly decreases the data transfer overheads. Nevertheless, for large problem sizes, the Cloud resources can help to significantly shorten the workflow completion time in case Grids become overloaded.

Table 11.5 gives further details on the file transfer overheads and the distribution of activity instances between the pure Grid and the combined Grid-Cloud execution. The file transfer overhead can be reduced by increasing the size of a resource class (i.e. number of cores underneath one instance, which share a file system and the input files for execution), which may result in a lower resource allocation efficiency as the resource allocation granularity increases. We plan to investigate this tradeoff in future work.

To understand and quantify the benefit and the potential costs of using commercial Clouds for similar experiments (without running the Wien2k workflows once again because of cost reasons), we executed the LINPACK benchmark [26] that measures the GFlop sustained performance of the resource classes offered by three Cloud providers: Amazon EC2, GoGrid (GG), and our academic dps.cloud (see Table 11.1). We configured LINPACK to use the GotoBLAS linear algebra library (one of the fastest implementations on Opteron processors in our experience) and MPI Chameleon [27] for instances with multiple cores. Table 11.6 summarizes the results that show the m1.large EC2 instance as being the closest to the dps.cloud, assuming that the two cores are used separately, which indicates an approximate realistic cost of $0.20 per core hour. The best sustained performance is offered by GG; however, it has extremely large resource provisioning latencies

Table 11.5 Grid versus cloud file transfer and activity instance distribution to grid and cloud resources [t]

Parallel	File transfers				Activities run
Parallel
activities	Total	To grid	To cloud		Total	On cloud
376	2,013	1,544	469 (23%)	759		209 (28%)
193	1,127	778	349 (31%)	389		107 (28%)

Table 11.6 Average LINPACK sustained performance and resource provisioning latency results of various resource classes (see Table 11.1)

	dps.
Instance	cloud	m1.smallm1.large		m1.xl	c1.medium	c1.xl	GG.1gig GG.4gig

Linpack	4.40	1.96	7.15	11.38	3.91	51.58	8.81	28.14
(GFlops)
Number of cores	1	1	2	4	2	8	1	3
GFlops per core	4.40	1.96	3.58	2.845	1.955	6.44	8.81	9.38
Speedup to dps	1	0.45	1.63	2.58	0.88	11.72	2.00	6.40
Cost [$ per hour]	0 (0.17)	0.085	0.34	0.68	0.17	0.68	0.18	0.72
Provisioning time	312	83	92	65	66	66	558	1,878
[s]

<<< < Предыдущая 59 60 61 62 63 64 65 66 67 68 69 7071 / 13271 72 73 74 75 76 77 78 79 80 81 82 83 > Следующая >>>

Соседние файлы в папке CLOUD COMPUTING

#
20.06.201950.48 Mб50Cloud.pdf
#
20.06.2019865.6 Кб74Oblachnye_vychislenia_otvety.pdf
#
20.06.201919.2 Кб23Voprosy_k_ekzamenu_po_OUOV.docx