Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

UnEncrypted

.pdf
Скачиваний:
11
Добавлен:
16.05.2015
Размер:
6.75 Mб
Скачать

Job Scheduling in Hadoop Non-dedicated Shared Clusters

Capacity, and the proposed policy. For the case of Capacity scheduler, we are defining four queues, one for each workgroup. Then, jobs submitted to the cluster were assigned to the corresponding workgroup queue. Cluster resources like available slots were distributed evenly among all queues. The other policies are using a single job queue.

Figure 6 shows makespan times for an increasing amount of jobs belonging to the four workgroups. The number of jobs belonging to each workgroup is divided equally among the number of jobs issued. Results show that the policy proposed improves the average makespan time in 7.3% when compared to FIFO and 3.2% when compared to Capacity scheduler.

Figure 6: Makespan times of 4 workgroup execution

Both makespan figures show the benefits of applying our framework adaptations to our system. As the Hadoop applications are very dependant on the reduce phase, we need to focus on further improvements on the reducers to get more relevant results.

5Conclusions and future work

We have analysed the adaptation of Hadoop map-reduce framework to run various instances of bioinformatic applications in a non-dedicated computer system. For that, we have described the application data consumption patterns and presented a new scheduling policy that defines workgroups of applications so that they are co-scheduled together. The execution of these Hadoop bioinformatic applications must be adapted to use existing computing resources so that local applications are not a ected. We propose the use of cgroups to make adequate resource reservations during the daily use of our non-dedicated computer system.

Next steps in the research will consider the dynamic definition of local resources and the impact of local resource occupation to the Hadoop application workgroups. Then, the

c CMMSE

Page 193 of 1573

ISBN:978-84-615-5392-1

A. Bezerra, P. Hernandez, A. Espinosa, J.C. Moure

scheduler will modify its choice of applications to consider tasks that fit better the available resources.

Acknowledgements

This work has been supported by projects number TIN2007-64974 and TIN2011-28689 of Spanish Ministerio de Ciencia y Tecnologia (MICINN).

References

[1]D. Thain, T. Tanembaum, M. Livny, Distributed Computing in Practice: The Condor Experience, Concurrency and Computation: Practice and Experience 17 (2005) 323–356.

[2]T. White, Hadoop. The Definitive Guide, Second edition. O’Reilly, Sebastopol, 2011.

[3]G. Banga, P. Druschel, J.C. Mogul, Resource containers: A new facility for resource management in server systems, Proceedings of OSDI 1999 (1999) 45–58.

[4]L. George, Hbase. The Definitive Guide, First edition. O’Reilly, Sebastopol, 2011.

[5]E. Hewitt, Cassandra. The Definitive Guide, First edition. O’Reilly, Sebastopol, 2011.

[6]M. Schatz, B. Langmead and S. L. Salzberg, Cloud Computing and DNA data race, Nature Biotechnology 28 (2011) 691–693.

[7]J. Dean, S. Gemawat and J. A. Wheeler, map-reduce: simplified data processing on large clusters., ACM Communications 51 (2008) 107–113.

[8]Capacity Scheduler, Tech. rep., Retrieved: February, 2012. http://hadoop.apache. org/common/docs/r0.20.2/capacity_scheduler.html

[9]Fair Scheduler, Tech. rep., Retrieved: February, 2012. http://hadoop.apache.org/ common/docs/r0.20.2/fair_scheduler.html

[10]H. Lin, J. Archuleta, W. Feng, M. Gardner, Z. Zhang, MOON: map-reduce On Opportunistic eNvironments, Proc. of the 19th ACM HPCD 2010.

[11]H. Jin, X. Yang, X. Sun, I. Raicu, ADAPT: Availability-aware map-reduce Data Placement for Non-Dedicated Distributed Computing, Proc. of ICDCS 2012.

[12]B. Palanisamy, A. Singh, L. Liu, B. Jain, Purleius: Locality-aware Resource Allocation for map-reduce in a Cloud, Proc. of ACM/IEEE Conf. on Supercomputing 2011 2011.

c CMMSE

Page 194 of 1573

ISBN:978-84-615-5392-1

Job Scheduling in Hadoop Non-dedicated Shared Clusters

[13]J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares, X. Qin,

Improving map-reduce Performance through Data Placement in Heterogeneous Hadoop Clusters , Proc. of IEEE IPDPSW 2010.

[14]M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Delay scheduling: a simple technique for achieving fairness in cluster locality and scheduling, Proc. of the 5th ECCS 2010.

[15]Z. Guo, G. Fox, M. Zhou, Investigation of Data Locality in mapReduce, Tech. Report, Indiana University 11/27/2011.

[16]A. Qin, D. Tu, C. Shu, C. Gao, Xconveyer: Guarantee Hadoop throughput via lightweight OS-level virtualization, Proc. of IEEE 8th Conf. on Grid and Cooperative Computing 2009.

[17]H. Li, J. Ruan, R. Durbin, Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Research 18 (2008) 1851–1858.

[18]A. Espinosa, P. Hernandez, J.C. Moure, J. Protasio, A. Ripoll, Analysis and improvement of map reduce data distribution in read mapping applications, Journal of Supercomputing. To appear. DOI 10.1007/S11227-012-0792-8

c CMMSE

Page 195 of 1573

ISBN:978-84-615-5392-1

Proceedings of the 12th International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2012 July, 2-5, 2012.

Ordering and Allocating Parallel Jobs on Multi-Cluster Systems

H´ector Blanco1, Jordi Llad´os1, Fernando Guirado1 and Josep Llu´ıs

L´erida1

1 Computer Science Department, Universitat de Lleida

emails: hectorblanco@diei.udl.cat, jordi.llados@udl.cat, f.guirado@diei.udl.cat, jlerida@diei.udl.cat

Abstract

The scheduling of jobs in a multi-cluster heterogeneous environment is known as a NP-hard problem, not only for the resource heterogeneity, but also for the possibility of applying co-allocation to take advantage of the greater amount of resources. Previous works in the literature have usually dealt with the co-allocation problem by acting on each jobs, present in the waiting queue, individually. In a previous work, the authors overcome these works by presenting a strategy based on Mixed Integer Programming, which was able to simultaneously allocate all those jobs that fitted into the available resources.

In this paper, the authors present a new algorithm with the power to treat all jobs in the waiting queue as a complete set. The algorithm deals with the job execution order to obtain the fairness allocation, or co-allocation when necessary, that can provide better execution times for all of them.

Key words: Job Scheduling, Multi-Cluster Heterogeneity and Performance, Co-Allocation

1Introduction

Multi-cluster environments are made up of several clusters of computers, using a dedicated interconnection network with a more predictable performance than in grid environments [1]. In these environments, co-allocation strategies allow jobs to be allocated across di erent clusters, permitting to execution of those jobs with more requirements than available in each single cluster, thus reducing the internal fragmentation taking advantage of available resources from di erent clusters, and thus, increasing the job throughput by reducing the

c CMMSE

Page 196 of 1573

ISBN:978-84-615-5392-1

waiting times in the system queue [2]. However, allocating jobs across di erent clusters can reduce the overall performance when co-allocated jobs contend for the inter-cluster network bandwidth. Moreover, the heterogeneity of the processing and communicating resources notably increases the complexity of the scheduling [2][3][4][5][6] .

A common issue in those previous works is that jobs are treated individually. This means that allocating a job without taking the rest of jobs into account can reduce the performance of future allocations, and could decrease overall system performance [7]. To extend those previous approaches, the authors developed a new scheduling strategy, named PAS [8], which selects the best suitable resources by means of a Mixed Integer Programming for a set of jobs that fits the available resources, but without changing the jobs order in the system queue.

The main constraint of the PAS strategy is its limitation to act on a set of jobs that fit the available resources without disturbing the arrival order. In the present work, the authors proposed a new strategy called METL, for Minimum Execution Time Loss, which is able to overcome this limitation considering not only the best allocation, but the order for all the jobs in the waiting queue. This strategy has been tested experimentally and compared with the most common techniques of the literature. The results show that ordering and allocating jobs considering the available resources and their processingand communicatingrequirements, provides best job execution time results than the classic policies.

The rest of the paper is organized as follows. In Section 2, the authors present the strategy for multiple job co-allocation in a multi-cluster environment. Section 3 shows the experimental results. Finally, the conclusions are presented in Section 4.

2METL Scheduling Policy

In this paper, we consider the parallel jobs following the Bulk-Synchronous Parallel model (BSP) [9] where jobs are made by a fixed number of tasks with similar processing and communication requirements. Under these assumptions the execution time can be expressed as

T ej = T bj · [σj · SPj + (1 − σj) · SCj ]

(1)

where T bj denotes the base time of the job j obtained by its execution in dedicated resources, and σj denotes the relevance of the processing time with respect to the communication. SCj and SPj are the slowdown due to the inter-cluster links and the allocated processing resources respectively. While there is no inter-cluster saturation on the communication links used, SCj = 1. Otherwise, SCj takes its value from the degree of saturation of the most saturated inter-cluster link used by j, calculated as explained in [10].

A good job co-allocation reduces the network saturation. Previous results [10] had

c CMMSE

Page 197 of 1573

ISBN:978-84-615-5392-1

shown that the allocation of sets of jobs considering both their processing and communication requirements, can be beneficial for the global job’s performance.

On the other hand, SPj determines the e ect of the allocated resources on the job execution time. Let R be the set of resources allocated to job j, SPj is defined as

SPj = max {(Γr)−1}

(2)

r R

 

where Γr be the e ective power for each resource r R. This normalized metric defined in [10] relates the processing power of each resource with its availability, being Γr = 1 when resource r R has capacity to run tasks at full speed, and otherwise Γr < 1.

The METL policy is able to treat a set of jobs, obtaining their allocation and also their execution order. The way in which the policy is called is an important issue. By calling it every time there is a single job on the waiting queue, it will be impossible to obtain advantage of its ordering capability. On the other hand, trying to allocate a big amount of jobs together could produce bigger waiting times for the jobs, and also the possibility to have unnecessary idle computing resources. By this METL is called under the next two assumptions:

1.If there is a single job on the system queue and enough resources, it will be scheduled alone in the most powerful resources but reducing the number of used inter-cluster boundaries.

2.If there are not enough resources to allocate the job, it must wait in the system queue. Before the job will be allocated, it could be possible that other jobs enter to the system queue. METL will be called when any resources became free, then all jobs waiting in the system queue become the set of jobs to be treated.

On the next subsections we elaborate the way in which METL obtains the job resource allocation and determines their execution order.

2.1Job Resource Allocation

The main aim of this step is to determine the allocation that can reduce the execution time of a set of jobs, and also to reduce the number of used computational nodes with higher e ective power used. There are two steps to do this:

1.Calculating the best allocation: Taking into account the available resources when the policy is applied, the allocation that obtains its minimum execution time is calculated for each individual job. This allocation defines the lower bound execution time.

2.Reducing under-utilised resources. For each of the previous obtained allocations, it is determined those task assignments that do not contribute to reduce the job

c CMMSE

Page 198 of 1573

ISBN:978-84-615-5392-1

execution time. This situation comes from the fact that in the co-allocation process computational nodes r with di erent e ective power (Γr) could be allocated, then the processing time for the tasks allocated in the powerful resources will be reduced but not their global execution time, due to communication synchronizations. Thus, those tasks assigned to nodes with higher Γr will be moved to other nodes with equal Γr than the slowest allocated resources. This re-allocation aids to reduce the intercluster links usage and to release the under-utilised resources providing better future allocation opportunities.

Figure 1 shows an example of the job resource allocation procedure. We assume an environment made by two clusters C1={N1,N2} with an e ective power ΓN1 = ΓN2 = 0.75 and C2={N3,N4} with ΓN3 = ΓN4 = 0.5, which means that cluster C1 is more powerful than C2. For the example, a job made by three tasks ready to be allocated is also supposed. In the figure, the x-axis represents the execution time, and the y-axis the computational nodes, with its respective e ective power values. The two steps of the allocation procedure are shown side-by-side.

Figure 1: Job Resource Allocation

.

On the left, the first allocating solution obtains the minimum execution time for the job, which are {N1,N2,N3}. However, the final execution time for the job is bounded by the slowest allocated node, which is N3. Thus, the use of the most powerful nodes does not imply the reduction of the job execution time. In this situation, the second step redefines the allocation, as it is shown in the right side, by moving a task from N1 to N4, without penalizing the job execution time and providing better future opportunities.

2.2Job Allocation Ordering

The main aim of this step is to determine the best order for the set of jobs in the system queue in order to minimize the global execution time. To reach this global optimization with

c CMMSE

Page 199 of 1573

ISBN:978-84-615-5392-1

a fair scheduling for all the jobs, our proposal is to select in each algorithm step the job with the least loss in its execution time considering the status of resources and its availability.

1.Execution time loss calculation. For each job, the di erence in execution time with the current resources status, with respect to the obtained in the environment, in dedicated mode, is calculated.

2.Job Selection. The job with the lowest loss in the execution time is selected. If there are no available resources to be allocated, the algorithm estimates the next job to be finished, releases its resources and re-evaluates the jobs waiting in the system queue.

This process is repeated until all jobs in the system queue have been processes, providing for all of them the execution order and allocation. As the set of jobs is treated as a whole, resource starvation is avoided.

2.3Policy Implementation

The main algorithm was implemented as shown in Algorithm 1, and the final result is the scheduling for all the jobs, consisting on a list with the order in which each job must be executed and also their allocated resources.

The algorithm starts finding the ideal allocation for all the jobs, assuming a dedicated multi-cluster environment (lines 2-4). These allocations will determine the lower execution time bound for each job.

Next, using the function CalculateAllocation(J, SR) (line 9), the allocation for each job considering the current resources availability is calculated. This function is detailed in Algorithm 2, and returns the best possible job allocation with the maximum under-utilised resources. When there are not enough available resources to allocate any job. the algorithm estimates the next job to be finished (line 17), releases its corresponding resources (lines 18-19), and tries to find the most suitable job to be executed under the new conditions.

In order to find the best suitable resources with the minimum underutilization we implemented the Algorithm 2. This algorithm has a list of the set of resources ordered by its e ective power. Then, the number of tasks n required by the job is determined (line 2). The first n resources from the set of resources are allocated to the job (line 3). Then, if the job must be co-allocated (line 4), i.e. the number of used clusters is greater than 1, the tasks from the most powerful resources are re-allocated to the slowest cluster (line 5), and the final allocation for the job is returned (line 7).

In order to illustrate how the policy works we show in the rest of this sections an example of the algorithms execution. In this example, we assume a single cluster made up by 5 heterogenous nodes, being their e ective powers ΓN1, ΓN2 = 0.75, ΓN3 = 0.5, Γ4 = 0.25, ΓN5=0.15. The set of jobs waiting to be allocated in the system queue are detailed in Table 1. The example is constructed as iterations over the main algorithm.

c CMMSE

Page 200 of 1573

ISBN:978-84-615-5392-1

Algorithm 1 METL algorithm implementation

1:function MainAlgorithm(SJ: Set of jobs, SR: Set of resources)

2:for all J in SJ do //Calculate ideal allocations

3:Ideal Allocation[J] ← CalculateAllocation(J, SR)

4:end for

5:while SJ = do //While there are jobs to allocate

6:min exec ← ∞

7:Selected Job ← NULL

8:for all J in SJ do //Calculate real allocations

9:Allocation[J] ← CalculateAllocation(J, SR)

10:

if Allocation[J] = NULL then

//If the job can be allocated

11:

if min

 

exec < (Allocation[J] − Ideal

 

Allocation[J]) then

12:

min

 

exec ← (Allocation[J] − Ideal

 

Allocation[J])

13:

Selected

 

Job ← J

 

 

 

 

 

14:

end if

 

 

 

 

 

15:end if

16:end for

17:if Selected Job = NULL then //If no job found that can be allocated

18:Locate J’ in Scheduling List that finalizes earlier

19:

SR ← SR + Allocation[J ]

//Release resources used by J’

20:else

21:Scheduling List ← (Selected Job, Allocation[Selected Job])

22:SJ ← SJ − Selected Job

23:

SR ← SR − Allocation[Selected

 

Job]

//Update resources availability

24:end if

25:end while

26:return Scheduling List

27:end function

Algorithm 2 Resource allocation implementation

1:function CalculateAllocation(J: Job to treat, SR: Set of resources, ordered by e ective power)

2:n ← number of tasks of J

3:Allocation[J] ← first n nodes from SR

4:

if #Clusters in Allocation[J] > 1 then

//Co-allocation

5:Move tasks from faster resources to the slowest used cluster

6:end if

7:return Allocation[J]

8:end function

c CMMSE

Page 201 of 1573

ISBN:978-84-615-5392-1

Iteration 1: First, the ideal and real allocations are calculated. Table 1 shows the Ideal allocation for all jobs, and their estimated execution time. Initially all resources are free, and then, the estimated execution time for each job is the same than the ideal, the di erences being 0 in all cases. Thus, the jobs are evaluated in the arrival order to reduce their waiting time, so in this example J1 is the first selected job to be allocated. Next, the resources status is updated, N1 and N2 being unavailable for next allocations.

Job

τj

σj

T bj

Ideal.Exec.Time

Ideal alloc.

J1

2

0.5

100

116.7s

N1, N2

J2

3

0.7

50

100s

N1, N2, N3

J3

2

0.5

75

87.5s

N1, N2

J4

2

0.7

100

123.33s

N1, N2

Table 1: Set of jobs in the system queue with the best execution time and allocation in dedicated resources.

Iteration 2: Now, only N3, N4 and N5 are available and jobs J2, J3 and J4 are waiting to be allocated. The best allocation for each waiting job is re-calculated taking into account the available resources. The results are shown in Table 2. As can be seen, J3, which is allocated to N3 and N4, is the job with the lowest loss of time compared to its ideal, so it is the next job to be allocated and their allocated resources status updated.

Job

Execution time

Di erence with ideal

Allocation

 

 

 

 

J2

258.3s

158.3s

N3, N4, N5

J3

187.5s

100s

N3, N4

J4

330s

206.7s

N3, N4

 

 

 

 

Table 2: Results for the second iteration. Di erence between the ideal and the estimated allocation based on the recent resources status.

Iteration 3: Now, only N5 is available. Neither J2 nor J4 fit the available resources, so the algorithm calculates the first job in execution to be finished. In our case, the first job to finish is J1, releasing the resources N1 and N2. Thus, both J2 and J4 could be allocated and evaluated as in Iteration 1. The process will continue until all jobs are finally allocated. Figure 2 shows the resulting scheduling for all the jobs.

As could be observed, the order in which the jobs were executed is di erent from the original queue. Jobs J3 and J4 were advanced and J2 delayed.

c CMMSE

Page 202 of 1573

ISBN:978-84-615-5392-1

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]