
- •Cloud Computing
- •Foreword
- •Preface
- •Introduction
- •Expected Audience
- •Book Overview
- •Part 1: Cloud Base
- •Part 2: Cloud Seeding
- •Part 3: Cloud Breaks
- •Part 4: Cloud Feedback
- •Contents
- •1.1 Introduction
- •1.1.1 Cloud Services and Enabling Technologies
- •1.2 Virtualization Technology
- •1.2.1 Virtual Machines
- •1.2.2 Virtualization Platforms
- •1.2.3 Virtual Infrastructure Management
- •1.2.4 Cloud Infrastructure Manager
- •1.3 The MapReduce System
- •1.3.1 Hadoop MapReduce Overview
- •1.4 Web Services
- •1.4.1 RPC (Remote Procedure Call)
- •1.4.2 SOA (Service-Oriented Architecture)
- •1.4.3 REST (Representative State Transfer)
- •1.4.4 Mashup
- •1.4.5 Web Services in Practice
- •1.5 Conclusions
- •References
- •2.1 Introduction
- •2.2 Background and Related Work
- •2.3 Taxonomy of Cloud Computing
- •2.3.1 Cloud Architecture
- •2.3.1.1 Services and Modes of Cloud Computing
- •Software-as-a-Service (SaaS)
- •Platform-as-a-Service (PaaS)
- •Hardware-as-a-Service (HaaS)
- •Infrastructure-as-a-Service (IaaS)
- •2.3.2 Virtualization Management
- •2.3.3 Core Services
- •2.3.3.1 Discovery and Replication
- •2.3.3.2 Load Balancing
- •2.3.3.3 Resource Management
- •2.3.4 Data Governance
- •2.3.4.1 Interoperability
- •2.3.4.2 Data Migration
- •2.3.5 Management Services
- •2.3.5.1 Deployment and Configuration
- •2.3.5.2 Monitoring and Reporting
- •2.3.5.3 Service-Level Agreements (SLAs) Management
- •2.3.5.4 Metering and Billing
- •2.3.5.5 Provisioning
- •2.3.6 Security
- •2.3.6.1 Encryption/Decryption
- •2.3.6.2 Privacy and Federated Identity
- •2.3.6.3 Authorization and Authentication
- •2.3.7 Fault Tolerance
- •2.4 Classification and Comparison between Cloud Computing Ecosystems
- •2.5 Findings
- •2.5.2 Cloud Computing PaaS and SaaS Provider
- •2.5.3 Open Source Based Cloud Computing Services
- •2.6 Comments on Issues and Opportunities
- •2.7 Conclusions
- •References
- •3.1 Introduction
- •3.2 Scientific Workflows and e-Science
- •3.2.1 Scientific Workflows
- •3.2.2 Scientific Workflow Management Systems
- •3.2.3 Important Aspects of In Silico Experiments
- •3.3 A Taxonomy for Cloud Computing
- •3.3.1 Business Model
- •3.3.2 Privacy
- •3.3.3 Pricing
- •3.3.4 Architecture
- •3.3.5 Technology Infrastructure
- •3.3.6 Access
- •3.3.7 Standards
- •3.3.8 Orientation
- •3.5 Taxonomies for Cloud Computing
- •3.6 Conclusions and Final Remarks
- •References
- •4.1 Introduction
- •4.2 Cloud and Grid: A Comparison
- •4.2.1 A Retrospective View
- •4.2.2 Comparison from the Viewpoint of System
- •4.2.3 Comparison from the Viewpoint of Users
- •4.2.4 A Summary
- •4.3 Examining Cloud Computing from the CSCW Perspective
- •4.3.1 CSCW Findings
- •4.3.2 The Anatomy of Cloud Computing
- •4.3.2.1 Security and Privacy
- •4.3.2.2 Data and/or Vendor Lock-In
- •4.3.2.3 Service Availability/Reliability
- •4.4 Conclusions
- •References
- •5.1 Overview – Cloud Standards – What and Why?
- •5.2 Deep Dive: Interoperability Standards
- •5.2.1 Purpose, Expectations and Challenges
- •5.2.2 Initiatives – Focus, Sponsors and Status
- •5.2.3 Market Adoption
- •5.2.4 Gaps/Areas of Improvement
- •5.3 Deep Dive: Security Standards
- •5.3.1 Purpose, Expectations and Challenges
- •5.3.2 Initiatives – Focus, Sponsors and Status
- •5.3.3 Market Adoption
- •5.3.4 Gaps/Areas of Improvement
- •5.4 Deep Dive: Portability Standards
- •5.4.1 Purpose, Expectations and Challenges
- •5.4.2 Initiatives – Focus, Sponsors and Status
- •5.4.3 Market Adoption
- •5.4.4 Gaps/Areas of Improvement
- •5.5.1 Purpose, Expectations and Challenges
- •5.5.2 Initiatives – Focus, Sponsors and Status
- •5.5.3 Market Adoption
- •5.5.4 Gaps/Areas of Improvement
- •5.6 Deep Dive: Other Key Standards
- •5.6.1 Initiatives – Focus, Sponsors and Status
- •5.7 Closing Notes
- •References
- •6.1 Introduction and Motivation
- •6.2 Cloud@Home Overview
- •6.2.1 Issues, Challenges, and Open Problems
- •6.2.2 Basic Architecture
- •6.2.2.1 Software Environment
- •6.2.2.2 Software Infrastructure
- •6.2.2.3 Software Kernel
- •6.2.2.4 Firmware/Hardware
- •6.2.3 Application Scenarios
- •6.3 Cloud@Home Core Structure
- •6.3.1 Management Subsystem
- •6.3.2 Resource Subsystem
- •6.4 Conclusions
- •References
- •7.1 Introduction
- •7.2 MapReduce
- •7.3 P2P-MapReduce
- •7.3.1 Architecture
- •7.3.2 Implementation
- •7.3.2.1 Basic Mechanisms
- •Resource Discovery
- •Network Maintenance
- •Job Submission and Failure Recovery
- •7.3.2.2 State Diagram and Software Modules
- •7.3.3 Evaluation
- •7.4 Conclusions
- •References
- •8.1 Introduction
- •8.2 The Cloud Evolution
- •8.3 Improved Network Support for Cloud Computing
- •8.3.1 Why the Internet is Not Enough?
- •8.3.2 Transparent Optical Networks for Cloud Applications: The Dedicated Bandwidth Paradigm
- •8.4 Architecture and Implementation Details
- •8.4.1 Traffic Management and Control Plane Facilities
- •8.4.2 Service Plane and Interfaces
- •8.4.2.1 Providing Network Services to Cloud-Computing Infrastructures
- •8.4.2.2 The Cloud Operating System–Network Interface
- •8.5.1 The Prototype Details
- •8.5.1.1 The Underlying Network Infrastructure
- •8.5.1.2 The Prototype Cloud Network Control Logic and its Services
- •8.5.2 Performance Evaluation and Results Discussion
- •8.6 Related Work
- •8.7 Conclusions
- •References
- •9.1 Introduction
- •9.2 Overview of YML
- •9.3 Design and Implementation of YML-PC
- •9.3.1 Concept Stack of Cloud Platform
- •9.3.2 Design of YML-PC
- •9.3.3 Core Design and Implementation of YML-PC
- •9.4 Primary Experiments on YML-PC
- •9.4.1 YML-PC Can Be Scaled Up Very Easily
- •9.4.2 Data Persistence in YML-PC
- •9.4.3 Schedule Mechanism in YML-PC
- •9.5 Conclusion and Future Work
- •References
- •10.1 Introduction
- •10.2 Related Work
- •10.2.1 General View of Cloud Computing frameworks
- •10.2.2 Cloud Computing Middleware
- •10.3 Deploying Applications in the Cloud
- •10.3.1 Benchmarking the Cloud
- •10.3.2 The ProActive GCM Deployment
- •10.3.3 Technical Solutions for Deployment over Heterogeneous Infrastructures
- •10.3.3.1 Virtual Private Network (VPN)
- •10.3.3.2 Amazon Virtual Private Cloud (VPC)
- •10.3.3.3 Message Forwarding and Tunneling
- •10.3.4 Conclusion and Motivation for Mixing
- •10.4 Moving HPC Applications from Grids to Clouds
- •10.4.1 HPC on Heterogeneous Multi-Domain Platforms
- •10.4.2 The Hierarchical SPMD Concept and Multi-level Partitioning of Numerical Meshes
- •10.4.3 The GCM/ProActive-Based Lightweight Framework
- •10.4.4 Performance Evaluation
- •10.5 Dynamic Mixing of Clusters, Grids, and Clouds
- •10.5.1 The ProActive Resource Manager
- •10.5.2 Cloud Bursting: Managing Spike Demand
- •10.5.3 Cloud Seeding: Dealing with Heterogeneous Hardware and Private Data
- •10.6 Conclusion
- •References
- •11.1 Introduction
- •11.2 Background
- •11.2.1 ASKALON
- •11.2.2 Cloud Computing
- •11.3 Resource Management Architecture
- •11.3.1 Cloud Management
- •11.3.2 Image Catalog
- •11.3.3 Security
- •11.4 Evaluation
- •11.5 Related Work
- •11.6 Conclusions and Future Work
- •References
- •12.1 Introduction
- •12.2 Layered Peer-to-Peer Cloud Provisioning Architecture
- •12.4.1 Distributed Hash Tables
- •12.4.2 Designing Complex Services over DHTs
- •12.5 Cloud Peer Software Fabric: Design and Implementation
- •12.5.1 Overlay Construction
- •12.5.2 Multidimensional Query Indexing
- •12.5.3 Multidimensional Query Routing
- •12.6 Experiments and Evaluation
- •12.6.1 Cloud Peer Details
- •12.6.3 Test Application
- •12.6.4 Deployment of Test Services on Amazon EC2 Platform
- •12.7 Results and Discussions
- •12.8 Conclusions and Path Forward
- •References
- •13.1 Introduction
- •13.2 High-Throughput Science with the Nimrod Tools
- •13.2.1 The Nimrod Tool Family
- •13.2.2 Nimrod and the Grid
- •13.2.3 Scheduling in Nimrod
- •13.3 Extensions to Support Amazon’s Elastic Compute Cloud
- •13.3.1 The Nimrod Architecture
- •13.3.2 The EC2 Actuator
- •13.3.3 Additions to the Schedulers
- •13.4.1 Introduction and Background
- •13.4.2 Computational Requirements
- •13.4.3 The Experiment
- •13.4.4 Computational and Economic Results
- •13.4.5 Scientific Results
- •13.5 Conclusions
- •References
- •14.1 Using the Cloud
- •14.1.1 Overview
- •14.1.2 Background
- •14.1.3 Requirements and Obligations
- •14.1.3.1 Regional Laws
- •14.1.3.2 Industry Regulations
- •14.2 Cloud Compliance
- •14.2.1 Information Security Organization
- •14.2.2 Data Classification
- •14.2.2.1 Classifying Data and Systems
- •14.2.2.2 Specific Type of Data of Concern
- •14.2.2.3 Labeling
- •14.2.3 Access Control and Connectivity
- •14.2.3.1 Authentication and Authorization
- •14.2.3.2 Accounting and Auditing
- •14.2.3.3 Encrypting Data in Motion
- •14.2.3.4 Encrypting Data at Rest
- •14.2.4 Risk Assessments
- •14.2.4.1 Threat and Risk Assessments
- •14.2.4.2 Business Impact Assessments
- •14.2.4.3 Privacy Impact Assessments
- •14.2.5 Due Diligence and Provider Contract Requirements
- •14.2.5.1 ISO Certification
- •14.2.5.2 SAS 70 Type II
- •14.2.5.3 PCI PA DSS or Service Provider
- •14.2.5.4 Portability and Interoperability
- •14.2.5.5 Right to Audit
- •14.2.5.6 Service Level Agreements
- •14.2.6 Other Considerations
- •14.2.6.1 Disaster Recovery/Business Continuity
- •14.2.6.2 Governance Structure
- •14.2.6.3 Incident Response Plan
- •14.3 Conclusion
- •Bibliography
- •15.1.1 Location of Cloud Data and Applicable Laws
- •15.1.2 Data Concerns Within a European Context
- •15.1.3 Government Data
- •15.1.4 Trust
- •15.1.5 Interoperability and Standardization in Cloud Computing
- •15.1.6 Open Grid Forum’s (OGF) Production Grid Interoperability Working Group (PGI-WG) Charter
- •15.1.7.1 What will OCCI Provide?
- •15.1.7.2 Cloud Data Management Interface (CDMI)
- •15.1.7.3 How it Works
- •15.1.8 SDOs and their Involvement with Clouds
- •15.1.10 A Microsoft Cloud Interoperability Scenario
- •15.1.11 Opportunities for Public Authorities
- •15.1.12 Future Market Drivers and Challenges
- •15.1.13 Priorities Moving Forward
- •15.2 Conclusions
- •References
- •16.1 Introduction
- •16.2 Cloud Computing (‘The Cloud’)
- •16.3 Understanding Risks to Cloud Computing
- •16.3.1 Privacy Issues
- •16.3.2 Data Ownership and Content Disclosure Issues
- •16.3.3 Data Confidentiality
- •16.3.4 Data Location
- •16.3.5 Control Issues
- •16.3.6 Regulatory and Legislative Compliance
- •16.3.7 Forensic Evidence Issues
- •16.3.8 Auditing Issues
- •16.3.9 Business Continuity and Disaster Recovery Issues
- •16.3.10 Trust Issues
- •16.3.11 Security Policy Issues
- •16.3.12 Emerging Threats to Cloud Computing
- •16.4 Cloud Security Relationship Framework
- •16.4.1 Security Requirements in the Clouds
- •16.5 Conclusion
- •References
- •17.1 Introduction
- •17.1.1 What Is Security?
- •17.2 ISO 27002 Gap Analyses
- •17.2.1 Asset Management
- •17.2.2 Communications and Operations Management
- •17.2.4 Information Security Incident Management
- •17.2.5 Compliance
- •17.3 Security Recommendations
- •17.4 Case Studies
- •17.4.1 Private Cloud: Fortune 100 Company
- •17.4.2 Public Cloud: Amazon.com
- •17.5 Summary and Conclusion
- •References
- •18.1 Introduction
- •18.2 Decoupling Policy from Applications
- •18.2.1 Overlap of Concerns Between the PEP and PDP
- •18.2.2 Patterns for Binding PEPs to Services
- •18.2.3 Agents
- •18.2.4 Intermediaries
- •18.3 PEP Deployment Patterns in the Cloud
- •18.3.1 Software-as-a-Service Deployment
- •18.3.2 Platform-as-a-Service Deployment
- •18.3.3 Infrastructure-as-a-Service Deployment
- •18.3.4 Alternative Approaches to IaaS Policy Enforcement
- •18.3.5 Basic Web Application Security
- •18.3.6 VPN-Based Solutions
- •18.4 Challenges to Deploying PEPs in the Cloud
- •18.4.1 Performance Challenges in the Cloud
- •18.4.2 Strategies for Fault Tolerance
- •18.4.3 Strategies for Scalability
- •18.4.4 Clustering
- •18.4.5 Acceleration Strategies
- •18.4.5.1 Accelerating Message Processing
- •18.4.5.2 Acceleration of Cryptographic Operations
- •18.4.6 Transport Content Coding
- •18.4.7 Security Challenges in the Cloud
- •18.4.9 Binding PEPs and Applications
- •18.4.9.1 Intermediary Isolation
- •18.4.9.2 The Protected Application Stack
- •18.4.10 Authentication and Authorization
- •18.4.11 Clock Synchronization
- •18.4.12 Management Challenges in the Cloud
- •18.4.13 Audit, Logging, and Metrics
- •18.4.14 Repositories
- •18.4.15 Provisioning and Distribution
- •18.4.16 Policy Synchronization and Views
- •18.5 Conclusion
- •References
- •19.1 Introduction and Background
- •19.2 A Media Service Cloud for Traditional Broadcasting
- •19.2.1 Gridcast the PRISM Cloud 0.12
- •19.3 An On-demand Digital Media Cloud
- •19.4 PRISM Cloud Implementation
- •19.4.1 Cloud Resources
- •19.4.2 Cloud Service Deployment and Management
- •19.5 The PRISM Deployment
- •19.6 Summary
- •19.7 Content Note
- •References
- •20.1 Cloud Computing Reference Model
- •20.2 Cloud Economics
- •20.2.1 Economic Context
- •20.2.2 Economic Benefits
- •20.2.3 Economic Costs
- •20.2.5 The Economics of Green Clouds
- •20.3 Quality of Experience in the Cloud
- •20.4 Monetization Models in the Cloud
- •20.5 Charging in the Cloud
- •20.5.1 Existing Models of Charging
- •20.5.1.1 On-Demand IaaS Instances
- •20.5.1.2 Reserved IaaS Instances
- •20.5.1.3 PaaS Charging
- •20.5.1.4 Cloud Vendor Pricing Model
- •20.5.1.5 Interprovider Charging
- •20.6 Taxation in the Cloud
- •References
- •21.1 Introduction
- •21.2 Background
- •21.3 Experiment
- •21.3.1 Target Application: Value at Risk
- •21.3.2 Target Systems
- •21.3.2.1 Condor
- •21.3.2.2 Amazon EC2
- •21.3.2.3 Eucalyptus
- •21.3.3 Results
- •21.3.4 Job Completion
- •21.3.5 Cost
- •21.4 Conclusions and Future Work
- •References
- •Index
6 Open and Interoperable Clouds: The Cloud@Home Way |
109 |
physically stored on chunk providers and corresponding storage masters index the chunks through specific file indexes (FI). The storage master is the directory server, indexing the data stored in the associated chunk providers. It directly interfaces with the resource engine to discover the resources storing data. In this context, the resource engine can be considered, in its turn, as the directory server indexing all the storage masters. To improve the storage Cloud reliability, storage masters must be replicated. Moreover, a chunk provider can be associated to more than one storage master.
In order to avoid a storage master becoming a bottleneck, once the chunk providers have been located, data transfers are implemented by directly connecting end users and chunk providers. Similar techniques to the ones discussed about VM schedulers can be applied to storage masters for improving performance and reliability of the storage Clouds.
Chunk providers physically store the data that, as introduced earlier, are encrypted to achieve the confidentiality goal. Data reliability can be improved by replicating data chunks and chunk providers, consequently updating the corresponding storage masters. In this way, a corrupted data chunk can be automatically recovered and restored through the storage masters, without involving the end user.
In order to achieve QoS/SLA requirements in a storage Cloud, it is necessary to periodically monitor its storage resources, as done in the execution Cloud for VM. For this reason, in the Cloud@Home core structure of Fig. 6.3, we have introduced a specific storage resource monitor block. As it monitors the state of a chunk provider, it is physically located and deployed into each chunk provider composing the storage Cloud. The choice of replicating the resource monitor in both execution and storage Clouds is motivated by the fact that we want to implement two different, separated, and independent services.
6.4 Conclusions
In this chapter, we have discussed an innovative computing paradigm merging volunteer contributing and Cloud approaches into Cloud@Home. This proposal represents a solution for building Clouds, starting from heterogeneous and independent nodes, not specifically conceived for this purpose. Cloud@Home implements a generalization of both Volunteer and Cloud computing by aggregating the computational potentialities of many small, low-power systems, exploiting the long-tail effect of computing.
In this way, Cloud@Home opens the Cloud computing world to scientific and academic research centers, as well as to public administration and communities, and potentially single users: anyone can voluntarily support projects by sharing his/ her resources. On the other hand, it opens the utility computing market to the single user who wants to sell his/her computing resources. To realize this broader vision, several issues must be adequately taken into account: reliability, security, portability of resources and services, interoperability among Clouds, QoS/SLA, and business models and policies.
110 |
V.D. Cunsolo et al. |
It is necessary to have a common understanding regarding an ontology that fixes concepts, such as resources, services, virtualization, protocol, format, interface, and corresponding metrics, including Clouds’ functional and nonfunctional parameters (QoS, SLA, and so on), which must be translated into specific interoperable standards.
References
1.Anderson C. (2006) The long tail: how endless choice is creating unlimited demand. Random House Business Books, London
2.Anderson DP, Fedak G (2006) The computational and storage potential of volunteer computing. In: CCGRID ’06, pp 73–80
3.Baker S (2008, December 24) Google and the wisdom of clouds. BusinessWeek. http://www. businessweek.com/magazine/content/07_52/b4064048925836.htm
4.Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: Proceedings of the nineteenth ACM symposium on operating systems principles (SOSP ’03), ACM, pp 164–177
5.Cohen B (2008) The BitTorrent protocol specification. BitTorrent.org. http://www.bittorrent. org/beps/bep_0003.html
6.Cunsolo VD, Distefano S, Puliafito A, Scarpa M (2009) Implementing data security in grid environment. Enabling technologies, IEEE international workshops on 0, pp 177–182
7.Distributed Systems Architecture Research Group: OpenNEbula Project [URL]. Universidad Complutense de Madrid (2009). http://www.opennebula.org/
8.Foster I (2005) Service-oriented science. Science 308(5723)
9.Foster I (2008) There’s grid in them thar clouds. Ian Foster’s blog. http://ianfoster.typepad. com/blog/2008/01/theres-grid-in.html
10. Ghemawat S, Gobioff H, Leung ST (2003) The google file system. SIGOPS Oper Syst Rev 37(5):29–43
11. Hacking S, Hudzia B (2009) Improving the live migration process of large enterprise applications. In: Proceedings of the 3rd international workshop on virtualization technologies in distributed computing (VTDC ’09), ACM, pp 51–58
12. Jin H, Deng L, Wu S, Shi X, Pan X (2009) Live virtual machine migration with adaptive, memory compression. In: IEEE cluster computing and workshops (CLUSTER ’09), pp 1–10
13. Kleinrock L (2005) A vision for the internet. ST J Res 2(1):4–5
14. MacKenzie CM, Laskey K, McCabe F, Brown PF, Metz R, Hamilton BA (2006) Reference model for service oriented architecture 1.0. http://docs.oasis-open.org/soa-rm/v1.0/
15. Martin R (2007, August 20) The red shift theory. InformationWeek. http://www.informationweek.com/news/hardware/showArticle.jhtml?articleID=201800873
16. Narayana SS, Jarke M, Prinz W (2006) Mobile web service provisioning. In: AICT-ICIW ’06, IEEE Computer Society, p 120
17. Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2008) The eucalyptus open-source cloud-computing system. In: Proceedings of cloud computing and its applications (2008)
18. Reservoir Consortium: Reservoir Project [URL] (2009). http://www-03.ibm.com/press/us/en/ pressrelease/23448.wss/
19. Saint-Andre P, Tronçon R, Smith K (2008) XMPP: the definitive guide: building real-time applications with jabber technologies, rough cuts version edn. O’Reilly
20. Tim O’Reilly: What is WEB 2.0 (2005). http://www.oreillynet.com/pub/a/oreilly/tim/ news/2005/09/30/what-is-web-20.html
6 Open and Interoperable Clouds: The Cloud@Home Way |
111 |
21. Tuecke S, Welch V, Engert D, Pearlman L, Thompson M (2004) Internet X.509 Public Key Infrastructure (PKI) proxy certificate profile
22. University of Chicago-University of Florida-Purdue University-Masaryk University: Nimbus-Stratus-Wispy-Kupa Projects [URL] (2009). http://workspace.globus.org/clouds/ nimbus.html/, http://www.acis.ufl.edu/vws/, http://www.rcac.purdue.edu/teragrid/resources/ #wispy, http://meta.cesnet.cz/cms/opencms/en/docs/clouds
23. VMWare: Understanding Full Virtualization, Paravirtualization, and Hardware Assist (2007). White Paper
24. Waldburger M, Stiller B (2006) Toward the mobile grid: service provisioning in a mobile dynamic virtual organization. In: IEEE international conference on computer system and application, pp 579–583
25.Wang L, Tao J, Kunze M, Castellanos AC, Kramer D, Karl W (2008) Scientific cloud computing: early definition and experience. In: HPCC ’08, pp 825–830
26. Youseff L, Butrico M, Da Silva D (2008) Toward a unified ontology of cloud computing. In: Grid computing environments workshop (GCE ’08), pp 1–10
27. Zadok E, Iyer R, Joukov N, Sivathanu G, Wright CP (2006)) On incremental file system development. ACM Trans Storage 2((2):161–196

Chapter 7
A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments
Fabrizio Marozzo, Domenico Talia, and Paolo Trunfio
Abstract MapReduce is a programming model widely used in Cloud computing environments for processing large data sets in a highly parallel way. MapReduce implementations are based on a master-slave model. The failure of a slave is managed by re-assigning its task to another slave, while master failures are not managed by current MapReduce implementations, as designers consider failures unlikely in reliable Cloud systems. On the contrary, node failures – including master failures – are likely to happen in dynamic Cloud scenarios, where computing nodes may join and leave the network at an unpredictable rate. Therefore, providing effective mechanisms to manage master failures is fundamental to exploit the MapReduce model in the implementation of data-intensive applications in those dynamic Cloud environments where current MapReduce implementations could be unreliable. The goal of our work is to extend the master-slave architecture of current MapReduce implementations to make it more suitable for dynamic Cloud scenarios. In particular, in this chapter, we present a Peer-to-Peer (P2P)-MapReduce framework that exploits a P2P model to manage participation of intermittent nodes, master failures, and MapReduce job recovery in a decentralized but effective way.
7.1 Introduction
Cloud computing is gaining increasing interest both in science and industry for its promise to deliver service-oriented remote access to hardware and software facilities in a highly reliable and transparent way. A key point for the effective implementation of large-scale Cloud systems is the availability of programming models that support a wide range of applications and system scenarios. One of the most successful programming models currently adopted for the implementation of dataintensive Cloud applications is MapReduce [1].
F. Marozzo (*)
Department of Electronics, Computer Science and Systems (DEIS), University of Calabria, Rende, Italy
e-mail: fmarozzo@deis.unical.itomenico
N. Antonopoulos and L. Gillam (eds.), Cloud Computing: Principles, |
113 |
Systems and Applications, Computer Communications and Networks,
DOI 10.1007/978-1-84996-241-4_7, © Springer-Verlag London Limited 2010