Скачиваний:
50
Добавлен:
20.06.2019
Размер:
50.48 Mб
Скачать

7  A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud 125

From the statistics reported earlier, and from the results generated by our experiments, we see that a master failure causes loss of dozens of CPU hours for a typical MapReduce job. Moreover, when the number of available machines per user is limited (as in a typical Cloud systems where resources are shared among thousands of users), a master failure also produces a significant loss of time because the job completion time increases as the number of machines decreases.

7.4  Conclusions

Providing effective mechanisms to manage master failures, job recovery, and participation of intermittent nodes is fundamental to exploit the MapReduce model in the implementation of data-intensive applications in dynamic Cloud environments or Cloud of clouds scenarios where current MapReduce implementations could be unreliable.

The P2P-MapReduce model presented in this chapter exploits a P2P model to perform job state replication, manage master failures, and allow participation of intermittent nodes in a decentralized but effective way. Using a P2P approach, we extended the MapReduce architectural model, making it suitable for highly dynamic environments where failure must be managed to avoid a critical loss of computing resources and time.

References

1.Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

2.Google’s Map Reduce (2009). http://labs.google.com/papers/mapreduce.html (Visited: September 2009)

3.Hadoop (2009) http://hadoop.apache.org (Visited: September 2009)

4.Marozzo F, Talia D, Trunfio P (2008) Adapting MapReduce for dynamic environments using a peer-to-peer model. Workshop on cloud computing and its applications, Chicago, USA

5.Gridgain (2009) http://www.gridgain.com (Visited: September 2009)

6.Skynet (2009) http://skynet.rubyforge.org (Visited: September 2009)

7.MapSharp (2009) http://mapsharp.codeplex.com (Visited: September 2009)

8.Disco (2009) http://discoproject.org (Visited: September 2009)

9.Gu Y, Grossman R (2009) Sector and sphere: the design and implementation of a high performance data cloud. Philos Tr S A 367(1897):2429–2445

10. Grossman R, Gu Y (2008) Data mining using high performance data clouds: experimental studies using sector and sphere. SIGKDD 2008, Las Vegas, USA

11. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. Symposium on Operating Systems Design and Implementation (OSDI), San Francisco, USA 12. Gong L (2001) JXTA: a network programming environment. IEEE Internet Comput 5(3):

88–95

Chapter 8

Enhanced Network Support for Scalable Computing Clouds

Francesco Pamieri and Silvio Pardi

AbstractWe introduce the concept of network resource visibility and performance awareness in the cloud control logic, aiming at optimizing the transport layer activities within the cloud, and thus coping with the scalability problems experienced in traditional Internet-based clouds by large-scale data processing applications. With the aid of new dynamic “network on demand” facilities complementing the existing cloud services portfolio, we can gain some form of control on the underlying transport layer, bypassing the actual locality constraints in resource allocation and allowing the flexible orchestration of resources available in different sites and belonging to different administrative domains.

8.1  Introduction

Sharing of computer and storage resources has become a popular solution for a number of key enterprise applications, including resolving complicated simulation tasks, distributing high workloads between several sites, and dispersing critical data and/or information technology assets among several locations to minimize the risk of catastrophic failures. During times of limited budgets, resource sharing has also become a popular means to reduce cost. Traditionally, this approach was limited to data center infrastructures, but the latest trends such as virtualization and broadband interconnects have pushed resource-sharing concepts even further. The emerging cloud-computing paradigm allows us to locate computing and storage resources anywhere in the world. No longer does the computer (whether it is a PC or supercomputer) have to be co-located with its users or funding institution. More precisely, cloud computing is referred to as an information service that is available to an end-user out of a “transparent” cloud, whereby the cloud is an abstract model for the end-user, which has no specific physical location. The cloud is generally a conglomerate of interconnected, redundant data centers built to provide certain services. Originally starting with Internet-related services such as search

F. Pamieri (*)

Università degli Studi di Napoli Federico II, CSI, Via Cinthia, 5, 80126 Napoli, Italy e-mail: fpalmier@unina.it

N. Antonopoulos and L. Gillam (eds.), Cloud Computing: Principles,

127

Systems and Applications, Computer Communications and Networks,

DOI 10.1007/978-1-84996-241-4_8, © Springer-Verlag London Limited 2010

128

F. Pamieri and S. Pardi

engines, more traditional services, applications, and tasks that used to reside on an end-user’s terminal or computer get transferred to the cloud. The only requirement to gain access to them is a broadband connection. With the available high bandwidth optical networks, it is now possible to locate the available resources on the cloud within properly equipped sites in remote locations throughout the world. A move towards clouds signals a fundamental shift in how we handle information. At the most basic level, it is the computing equivalent of the evolution in electricity a century ago when farms and businesses shut down their own generators and bought power instead from efficient industrial utilities. Unfortunately, the best-effort delivery system of the Internet, often used as the underlying transport network for most of the existing cloud infrastructures, imposes severe constraints on the transfer of massive amounts of data, and thus restricts the deployment of the above-men- tioned applications on wide-area scales. Besides the lack of bandwidth, the inability to provide dedicated links makes the current network technology not well suited for performance-critical Grid computing. A solution is needed for providing dedicated end-to-end connections, dynamically allocable on-demand or by scheduled reservation, to critical data-intensive applications. Accordingly, in this chapter, we introduce the concept of network resource visibility and network performance awareness into the cloud control logic for coping with the severe scalability limits (with respect to the more demanding data-intensive application) of cloud infrastructures operating in a network-oblivious fashion. We present the benefits of such an extended cloud by proposing a new service and resource management model, where each service is associated with specific performance requirements to be enforced by considering both the needed runtime resources available and the end- to-end communication features of the connections between them. We focus our efforts on the transport facilities located at the “lowest” layer of the cloud systems, because here we can provide a solid foundation on top of which language-, service-, and application-level cloud-computing systems can be explored and developed. By introducing some form of control of the underlying transport layer, we bypass the usual locality constraint in computation and storage resource allocation needed to ensure acceptable performances within the cloud runtime system, allowing the flexible orchestration of resources available in different sites and belonging to different administrative domains. Also, by adopting proven circuit switched network concepts­ with modern wavelength-routed networks as an improved hybrid transport facility within clouds, we address the “missing link” in the cloud networking “big picture”, i.e. the concept of dynamic “network on demand” services complementing the existing cloud resource-sharing and computing-services portfolio.

8.2  The Cloud Evolution

The upcoming evolution of cloud computing is a major change in our computing technology. One of the most important parts of that evolution is the advent of the first production platforms based on the cloud paradigm. Such platforms promise real

Соседние файлы в папке CLOUD COMPUTING