
- •Cloud Computing
- •Foreword
- •Preface
- •Introduction
- •Expected Audience
- •Book Overview
- •Part 1: Cloud Base
- •Part 2: Cloud Seeding
- •Part 3: Cloud Breaks
- •Part 4: Cloud Feedback
- •Contents
- •1.1 Introduction
- •1.1.1 Cloud Services and Enabling Technologies
- •1.2 Virtualization Technology
- •1.2.1 Virtual Machines
- •1.2.2 Virtualization Platforms
- •1.2.3 Virtual Infrastructure Management
- •1.2.4 Cloud Infrastructure Manager
- •1.3 The MapReduce System
- •1.3.1 Hadoop MapReduce Overview
- •1.4 Web Services
- •1.4.1 RPC (Remote Procedure Call)
- •1.4.2 SOA (Service-Oriented Architecture)
- •1.4.3 REST (Representative State Transfer)
- •1.4.4 Mashup
- •1.4.5 Web Services in Practice
- •1.5 Conclusions
- •References
- •2.1 Introduction
- •2.2 Background and Related Work
- •2.3 Taxonomy of Cloud Computing
- •2.3.1 Cloud Architecture
- •2.3.1.1 Services and Modes of Cloud Computing
- •Software-as-a-Service (SaaS)
- •Platform-as-a-Service (PaaS)
- •Hardware-as-a-Service (HaaS)
- •Infrastructure-as-a-Service (IaaS)
- •2.3.2 Virtualization Management
- •2.3.3 Core Services
- •2.3.3.1 Discovery and Replication
- •2.3.3.2 Load Balancing
- •2.3.3.3 Resource Management
- •2.3.4 Data Governance
- •2.3.4.1 Interoperability
- •2.3.4.2 Data Migration
- •2.3.5 Management Services
- •2.3.5.1 Deployment and Configuration
- •2.3.5.2 Monitoring and Reporting
- •2.3.5.3 Service-Level Agreements (SLAs) Management
- •2.3.5.4 Metering and Billing
- •2.3.5.5 Provisioning
- •2.3.6 Security
- •2.3.6.1 Encryption/Decryption
- •2.3.6.2 Privacy and Federated Identity
- •2.3.6.3 Authorization and Authentication
- •2.3.7 Fault Tolerance
- •2.4 Classification and Comparison between Cloud Computing Ecosystems
- •2.5 Findings
- •2.5.2 Cloud Computing PaaS and SaaS Provider
- •2.5.3 Open Source Based Cloud Computing Services
- •2.6 Comments on Issues and Opportunities
- •2.7 Conclusions
- •References
- •3.1 Introduction
- •3.2 Scientific Workflows and e-Science
- •3.2.1 Scientific Workflows
- •3.2.2 Scientific Workflow Management Systems
- •3.2.3 Important Aspects of In Silico Experiments
- •3.3 A Taxonomy for Cloud Computing
- •3.3.1 Business Model
- •3.3.2 Privacy
- •3.3.3 Pricing
- •3.3.4 Architecture
- •3.3.5 Technology Infrastructure
- •3.3.6 Access
- •3.3.7 Standards
- •3.3.8 Orientation
- •3.5 Taxonomies for Cloud Computing
- •3.6 Conclusions and Final Remarks
- •References
- •4.1 Introduction
- •4.2 Cloud and Grid: A Comparison
- •4.2.1 A Retrospective View
- •4.2.2 Comparison from the Viewpoint of System
- •4.2.3 Comparison from the Viewpoint of Users
- •4.2.4 A Summary
- •4.3 Examining Cloud Computing from the CSCW Perspective
- •4.3.1 CSCW Findings
- •4.3.2 The Anatomy of Cloud Computing
- •4.3.2.1 Security and Privacy
- •4.3.2.2 Data and/or Vendor Lock-In
- •4.3.2.3 Service Availability/Reliability
- •4.4 Conclusions
- •References
- •5.1 Overview – Cloud Standards – What and Why?
- •5.2 Deep Dive: Interoperability Standards
- •5.2.1 Purpose, Expectations and Challenges
- •5.2.2 Initiatives – Focus, Sponsors and Status
- •5.2.3 Market Adoption
- •5.2.4 Gaps/Areas of Improvement
- •5.3 Deep Dive: Security Standards
- •5.3.1 Purpose, Expectations and Challenges
- •5.3.2 Initiatives – Focus, Sponsors and Status
- •5.3.3 Market Adoption
- •5.3.4 Gaps/Areas of Improvement
- •5.4 Deep Dive: Portability Standards
- •5.4.1 Purpose, Expectations and Challenges
- •5.4.2 Initiatives – Focus, Sponsors and Status
- •5.4.3 Market Adoption
- •5.4.4 Gaps/Areas of Improvement
- •5.5.1 Purpose, Expectations and Challenges
- •5.5.2 Initiatives – Focus, Sponsors and Status
- •5.5.3 Market Adoption
- •5.5.4 Gaps/Areas of Improvement
- •5.6 Deep Dive: Other Key Standards
- •5.6.1 Initiatives – Focus, Sponsors and Status
- •5.7 Closing Notes
- •References
- •6.1 Introduction and Motivation
- •6.2 Cloud@Home Overview
- •6.2.1 Issues, Challenges, and Open Problems
- •6.2.2 Basic Architecture
- •6.2.2.1 Software Environment
- •6.2.2.2 Software Infrastructure
- •6.2.2.3 Software Kernel
- •6.2.2.4 Firmware/Hardware
- •6.2.3 Application Scenarios
- •6.3 Cloud@Home Core Structure
- •6.3.1 Management Subsystem
- •6.3.2 Resource Subsystem
- •6.4 Conclusions
- •References
- •7.1 Introduction
- •7.2 MapReduce
- •7.3 P2P-MapReduce
- •7.3.1 Architecture
- •7.3.2 Implementation
- •7.3.2.1 Basic Mechanisms
- •Resource Discovery
- •Network Maintenance
- •Job Submission and Failure Recovery
- •7.3.2.2 State Diagram and Software Modules
- •7.3.3 Evaluation
- •7.4 Conclusions
- •References
- •8.1 Introduction
- •8.2 The Cloud Evolution
- •8.3 Improved Network Support for Cloud Computing
- •8.3.1 Why the Internet is Not Enough?
- •8.3.2 Transparent Optical Networks for Cloud Applications: The Dedicated Bandwidth Paradigm
- •8.4 Architecture and Implementation Details
- •8.4.1 Traffic Management and Control Plane Facilities
- •8.4.2 Service Plane and Interfaces
- •8.4.2.1 Providing Network Services to Cloud-Computing Infrastructures
- •8.4.2.2 The Cloud Operating System–Network Interface
- •8.5.1 The Prototype Details
- •8.5.1.1 The Underlying Network Infrastructure
- •8.5.1.2 The Prototype Cloud Network Control Logic and its Services
- •8.5.2 Performance Evaluation and Results Discussion
- •8.6 Related Work
- •8.7 Conclusions
- •References
- •9.1 Introduction
- •9.2 Overview of YML
- •9.3 Design and Implementation of YML-PC
- •9.3.1 Concept Stack of Cloud Platform
- •9.3.2 Design of YML-PC
- •9.3.3 Core Design and Implementation of YML-PC
- •9.4 Primary Experiments on YML-PC
- •9.4.1 YML-PC Can Be Scaled Up Very Easily
- •9.4.2 Data Persistence in YML-PC
- •9.4.3 Schedule Mechanism in YML-PC
- •9.5 Conclusion and Future Work
- •References
- •10.1 Introduction
- •10.2 Related Work
- •10.2.1 General View of Cloud Computing frameworks
- •10.2.2 Cloud Computing Middleware
- •10.3 Deploying Applications in the Cloud
- •10.3.1 Benchmarking the Cloud
- •10.3.2 The ProActive GCM Deployment
- •10.3.3 Technical Solutions for Deployment over Heterogeneous Infrastructures
- •10.3.3.1 Virtual Private Network (VPN)
- •10.3.3.2 Amazon Virtual Private Cloud (VPC)
- •10.3.3.3 Message Forwarding and Tunneling
- •10.3.4 Conclusion and Motivation for Mixing
- •10.4 Moving HPC Applications from Grids to Clouds
- •10.4.1 HPC on Heterogeneous Multi-Domain Platforms
- •10.4.2 The Hierarchical SPMD Concept and Multi-level Partitioning of Numerical Meshes
- •10.4.3 The GCM/ProActive-Based Lightweight Framework
- •10.4.4 Performance Evaluation
- •10.5 Dynamic Mixing of Clusters, Grids, and Clouds
- •10.5.1 The ProActive Resource Manager
- •10.5.2 Cloud Bursting: Managing Spike Demand
- •10.5.3 Cloud Seeding: Dealing with Heterogeneous Hardware and Private Data
- •10.6 Conclusion
- •References
- •11.1 Introduction
- •11.2 Background
- •11.2.1 ASKALON
- •11.2.2 Cloud Computing
- •11.3 Resource Management Architecture
- •11.3.1 Cloud Management
- •11.3.2 Image Catalog
- •11.3.3 Security
- •11.4 Evaluation
- •11.5 Related Work
- •11.6 Conclusions and Future Work
- •References
- •12.1 Introduction
- •12.2 Layered Peer-to-Peer Cloud Provisioning Architecture
- •12.4.1 Distributed Hash Tables
- •12.4.2 Designing Complex Services over DHTs
- •12.5 Cloud Peer Software Fabric: Design and Implementation
- •12.5.1 Overlay Construction
- •12.5.2 Multidimensional Query Indexing
- •12.5.3 Multidimensional Query Routing
- •12.6 Experiments and Evaluation
- •12.6.1 Cloud Peer Details
- •12.6.3 Test Application
- •12.6.4 Deployment of Test Services on Amazon EC2 Platform
- •12.7 Results and Discussions
- •12.8 Conclusions and Path Forward
- •References
- •13.1 Introduction
- •13.2 High-Throughput Science with the Nimrod Tools
- •13.2.1 The Nimrod Tool Family
- •13.2.2 Nimrod and the Grid
- •13.2.3 Scheduling in Nimrod
- •13.3 Extensions to Support Amazon’s Elastic Compute Cloud
- •13.3.1 The Nimrod Architecture
- •13.3.2 The EC2 Actuator
- •13.3.3 Additions to the Schedulers
- •13.4.1 Introduction and Background
- •13.4.2 Computational Requirements
- •13.4.3 The Experiment
- •13.4.4 Computational and Economic Results
- •13.4.5 Scientific Results
- •13.5 Conclusions
- •References
- •14.1 Using the Cloud
- •14.1.1 Overview
- •14.1.2 Background
- •14.1.3 Requirements and Obligations
- •14.1.3.1 Regional Laws
- •14.1.3.2 Industry Regulations
- •14.2 Cloud Compliance
- •14.2.1 Information Security Organization
- •14.2.2 Data Classification
- •14.2.2.1 Classifying Data and Systems
- •14.2.2.2 Specific Type of Data of Concern
- •14.2.2.3 Labeling
- •14.2.3 Access Control and Connectivity
- •14.2.3.1 Authentication and Authorization
- •14.2.3.2 Accounting and Auditing
- •14.2.3.3 Encrypting Data in Motion
- •14.2.3.4 Encrypting Data at Rest
- •14.2.4 Risk Assessments
- •14.2.4.1 Threat and Risk Assessments
- •14.2.4.2 Business Impact Assessments
- •14.2.4.3 Privacy Impact Assessments
- •14.2.5 Due Diligence and Provider Contract Requirements
- •14.2.5.1 ISO Certification
- •14.2.5.2 SAS 70 Type II
- •14.2.5.3 PCI PA DSS or Service Provider
- •14.2.5.4 Portability and Interoperability
- •14.2.5.5 Right to Audit
- •14.2.5.6 Service Level Agreements
- •14.2.6 Other Considerations
- •14.2.6.1 Disaster Recovery/Business Continuity
- •14.2.6.2 Governance Structure
- •14.2.6.3 Incident Response Plan
- •14.3 Conclusion
- •Bibliography
- •15.1.1 Location of Cloud Data and Applicable Laws
- •15.1.2 Data Concerns Within a European Context
- •15.1.3 Government Data
- •15.1.4 Trust
- •15.1.5 Interoperability and Standardization in Cloud Computing
- •15.1.6 Open Grid Forum’s (OGF) Production Grid Interoperability Working Group (PGI-WG) Charter
- •15.1.7.1 What will OCCI Provide?
- •15.1.7.2 Cloud Data Management Interface (CDMI)
- •15.1.7.3 How it Works
- •15.1.8 SDOs and their Involvement with Clouds
- •15.1.10 A Microsoft Cloud Interoperability Scenario
- •15.1.11 Opportunities for Public Authorities
- •15.1.12 Future Market Drivers and Challenges
- •15.1.13 Priorities Moving Forward
- •15.2 Conclusions
- •References
- •16.1 Introduction
- •16.2 Cloud Computing (‘The Cloud’)
- •16.3 Understanding Risks to Cloud Computing
- •16.3.1 Privacy Issues
- •16.3.2 Data Ownership and Content Disclosure Issues
- •16.3.3 Data Confidentiality
- •16.3.4 Data Location
- •16.3.5 Control Issues
- •16.3.6 Regulatory and Legislative Compliance
- •16.3.7 Forensic Evidence Issues
- •16.3.8 Auditing Issues
- •16.3.9 Business Continuity and Disaster Recovery Issues
- •16.3.10 Trust Issues
- •16.3.11 Security Policy Issues
- •16.3.12 Emerging Threats to Cloud Computing
- •16.4 Cloud Security Relationship Framework
- •16.4.1 Security Requirements in the Clouds
- •16.5 Conclusion
- •References
- •17.1 Introduction
- •17.1.1 What Is Security?
- •17.2 ISO 27002 Gap Analyses
- •17.2.1 Asset Management
- •17.2.2 Communications and Operations Management
- •17.2.4 Information Security Incident Management
- •17.2.5 Compliance
- •17.3 Security Recommendations
- •17.4 Case Studies
- •17.4.1 Private Cloud: Fortune 100 Company
- •17.4.2 Public Cloud: Amazon.com
- •17.5 Summary and Conclusion
- •References
- •18.1 Introduction
- •18.2 Decoupling Policy from Applications
- •18.2.1 Overlap of Concerns Between the PEP and PDP
- •18.2.2 Patterns for Binding PEPs to Services
- •18.2.3 Agents
- •18.2.4 Intermediaries
- •18.3 PEP Deployment Patterns in the Cloud
- •18.3.1 Software-as-a-Service Deployment
- •18.3.2 Platform-as-a-Service Deployment
- •18.3.3 Infrastructure-as-a-Service Deployment
- •18.3.4 Alternative Approaches to IaaS Policy Enforcement
- •18.3.5 Basic Web Application Security
- •18.3.6 VPN-Based Solutions
- •18.4 Challenges to Deploying PEPs in the Cloud
- •18.4.1 Performance Challenges in the Cloud
- •18.4.2 Strategies for Fault Tolerance
- •18.4.3 Strategies for Scalability
- •18.4.4 Clustering
- •18.4.5 Acceleration Strategies
- •18.4.5.1 Accelerating Message Processing
- •18.4.5.2 Acceleration of Cryptographic Operations
- •18.4.6 Transport Content Coding
- •18.4.7 Security Challenges in the Cloud
- •18.4.9 Binding PEPs and Applications
- •18.4.9.1 Intermediary Isolation
- •18.4.9.2 The Protected Application Stack
- •18.4.10 Authentication and Authorization
- •18.4.11 Clock Synchronization
- •18.4.12 Management Challenges in the Cloud
- •18.4.13 Audit, Logging, and Metrics
- •18.4.14 Repositories
- •18.4.15 Provisioning and Distribution
- •18.4.16 Policy Synchronization and Views
- •18.5 Conclusion
- •References
- •19.1 Introduction and Background
- •19.2 A Media Service Cloud for Traditional Broadcasting
- •19.2.1 Gridcast the PRISM Cloud 0.12
- •19.3 An On-demand Digital Media Cloud
- •19.4 PRISM Cloud Implementation
- •19.4.1 Cloud Resources
- •19.4.2 Cloud Service Deployment and Management
- •19.5 The PRISM Deployment
- •19.6 Summary
- •19.7 Content Note
- •References
- •20.1 Cloud Computing Reference Model
- •20.2 Cloud Economics
- •20.2.1 Economic Context
- •20.2.2 Economic Benefits
- •20.2.3 Economic Costs
- •20.2.5 The Economics of Green Clouds
- •20.3 Quality of Experience in the Cloud
- •20.4 Monetization Models in the Cloud
- •20.5 Charging in the Cloud
- •20.5.1 Existing Models of Charging
- •20.5.1.1 On-Demand IaaS Instances
- •20.5.1.2 Reserved IaaS Instances
- •20.5.1.3 PaaS Charging
- •20.5.1.4 Cloud Vendor Pricing Model
- •20.5.1.5 Interprovider Charging
- •20.6 Taxation in the Cloud
- •References
- •21.1 Introduction
- •21.2 Background
- •21.3 Experiment
- •21.3.1 Target Application: Value at Risk
- •21.3.2 Target Systems
- •21.3.2.1 Condor
- •21.3.2.2 Amazon EC2
- •21.3.2.3 Eucalyptus
- •21.3.3 Results
- •21.3.4 Job Completion
- •21.3.5 Cost
- •21.4 Conclusions and Future Work
- •References
- •Index
12 Peer-to-Peer Cloud Provisioning: Service Discovery and Load-Balancing |
201 |
is restricted in App Engine. Load-balancing strategies, service provisioning, and autoscaling are all automatically managed by the system behind the scenes.
In addition, no single Cloud infrastructure provider has its data centers at all possible locations throughout the world. As a result, Cloud application service (SaaS) providers will have difficulty in meeting QoS expectations for all their users. Hence, they would like to logically construct hybrid Cloud infrastructures (mixing multiple public and private clouds) to provide better support for their specific user needs. This kind of requirement often arises in enterprises with global operations and applications such as Internet service, media hosting, and Web 2.0 applications. This necessitates building technologies and algorithms for seamless integration of Cloud infrastructure service providers for provisioning of services across different Cloud providers.
12.4 Cloud Service Discovery and Load-Balancing Using
DHT Overlay
12.4.1 Distributed Hash Tables
Structured systems such as DHTs offer deterministic query search results within logarithmic bounds on network message complexity. Peers in DHTs such as Chord, CAN, Pastry, and Tapestry maintain an index for O (log n) peers where n is the total number of peers in the system. Inherent to the design of a DHT are the following issues [19]: (i) generation of node-ids and object-ids, called keys, using cryptographic/randomizing hash functions such as SHA-1 [19–22] – the objects and nodes are mapped on the overlay network depending on their key value and each node is assigned responsibility for managing a small number of objects; (ii) building up routing information (routing tables) at various nodes in the network – each node maintains the network location information of a few other nodes in the network; and (iii) an efficient look-up query resolution scheme.
Whenever a node in the overlay receives a look-up request, it must be able to resolve it within acceptable bounds such as in O (log n) routing hops. This is achieved by routing the look-up request to the nodes in the network that are most likely to store the information about the desired object. Such probable nodes are identified by using the routing table entries. Though at the core various DHTs (Chord, CAN, Pastry, and Tapestry, etc.) are similar, still there exist substantial differences in the actual implementation of algorithms including the overlay network construction (network graph structure), routing table maintenance, and node join/ leave handling. The performance metrics for evaluating a DHT include fault-toler- ance, load-balancing, efficiency of look-ups and inserts, and proximity awareness [23]. In Table 12.2, we present the comparative analysis of Chord, Pastry, CAN, and Tapestry based on basic performance and organization parameters. Comprehensive details about the performance of some common DHTs under churn can be found in [24].

202
Table 12.2 Summary of complexity of distributed hash table overlays
|
|
|
|
Routing table |
|
Join/ |
DHT system |
Overlay structure |
Look-up protocol |
Network parameters |
size |
Routing complexity |
leave overhead |
|
|
|
|
|
|
|
Chord |
Circular identifier |
Matching key and |
n =number of servers |
O (log n) |
O (log n) |
O ((log n)2) |
|
space |
server-id |
|
|
|
|
Pastry |
Plaxton style mesh |
Matching key and prefix |
n =number of servers |
O (logb n) |
O (b logb n) + b |
O (log n) |
|
|
in server-id |
in the network, |
|
|
|
|
|
|
b= base of the |
|
|
|
|
|
|
identifier |
|
|
|
CAN |
Multidimensional space |
Key, value pair map to a |
n =number of servers |
O (2 d) |
O (d n1/d) |
O (2 d) |
|
|
point in space |
in the network, |
|
|
|
|
|
|
d = dimensions |
|
|
|
Tapestry |
Plaxton style mesh |
Matching suffix in |
n =number of servers |
O (logb n) |
O (b logb n) + b |
O (log n) |
|
|
server-id |
in the network, |
|
|
|
|
|
|
b = base of the |
|
|
|
|
|
|
identifier |
|
|
|
|
|
|
|
|
|
|
.al et Ranjan .R
12 Peer-to-Peer Cloud Provisioning: Service Discovery and Load-Balancing |
203 |
Other classes of structured peer-to-peer systems such as Mercury [25] do not apply randomizing hash functions for organizing data items and nodes. The Mercury system organizes nodes into a circular overlay and places data contiguously on this ring. As Mercury does not apply hash functions, data partitioning among nodes is not uniform. Hence, it requires an explicit load-balancing scheme. In recent developments, new-generation P2P systems have evolved to combine both unstructured and structured P2P networks. We refer to this class of systems as hybrid. Structella [26] is one such P2P system that replaces the random graph model of an unstructured overlay (Gnutella) with a structured overlay, while still adopting the search and content placement mechanism of unstructured overlays to support complex queries. Other hybrid P2P design includes Kelips [27] and its variants. Nodes in Kelips overlay periodically gossip to discover new members of the network, and during this process nodes may also learn about other nodes as a result of look-up communication. Other variants of Kelips allow routing table entries to store information for every other node in the system. However, this approach is based on the assumption that the system experiences low churn rate [24]. Gossiping and one-hop routing approach has been used for maintaining the routing overlay in the work [28].
12.4.2 Designing Complex Services over DHTs
Limitations of Basic DHT Implementations and Query Types: Traditionally, DHTs have been efficient for single-dimensional queries such as “finding all resources that match the given attribute value.” Since Cloud computing IaaS and PaaS level services such as servers, VMs, enterprise computers (private cloud resources), storage devices, and databases are identified by more than one attribute, a search query for these services is always multidimensional. These search dimensions or attributes can include service type, processor speed, architecture, installed operating system, available memory, and network bandwidth.
Based on recent information published by Amazon EC2 CloudWatch service, each Amazon Machine Image (AMI) instance has seven performance metrics (see Table 12.3) and four dimensions (see Table 12.4) associated with it. Additionally, these AMIs can host different application service types, including web hosting,
Table 12.3 |
Performance metrics associated with an Amazon EC2 AMI instance |
|
||||||
CPU |
|
Network |
Network |
Disk Write |
|
Disk Read |
Disk |
Disk |
Utilization |
Incoming |
Outgoing |
Operations |
Operations |
Write |
Read |
||
|
|
Traffic |
Traffic |
|
|
|
Bytes |
Bytes |
|
|
|
||||||
Table 12.4 |
Performance dimensions associated with an Amazon EC2 AMI instance |
|
||||||
|
|
|
|
|
||||
Image ID |
|
Autoscaling group name |
Instance ID |
Instance type |
||||
|
|
|
|
|
|
|
|
|
204 |
R. Ranjan et al. |
social networking, content-delivery, and high-performance computing, that have varying request invocation, access, and distribution patterns. The type of application services hosted by an AMI instance is dependent on the business needs and scientific experiments. In these cases, a Cloud service discovery query (which can be issued by provisioning software) will combine the aforementioned attributes related to AMI instances and application service types and therefore can have the following semantics:
Cloud Service Type = “web hosting” && Host CPU Utilization < “50%” && Instance OSType = “WinSrv2003” && Host Processor Cores > “1” && Host Processors Speed > “1.5 GHz” && Host Cloud Location = “Europe”
On the other hand, VM instances deployed on the Cloud hosts needs to publish their information so that provisioning software can search and discover them. VM instances update their software and hardware configuration and the deployed services’ availability status by sending update query to the DHT overlay. An update query has the following semantics:
Cloud Service Type = “web hosting” && Host CPU Utilization = “30%” && Instance OSType = “WinSrv2003” && Host Processor Cores = “2” && Host Processors Speed = “1.5 GHz” && Host Cloud Location = “Europe”
Extending DHTs to support indexing and matching of multidimensional range (service discovery query) or point (update query) queries, to index all resources whose attribute value overlaps a given search space, is a complex problem. Multidimensional range queries are based on ranges of values for attributes rather than on specific values. Compared to single-dimensional queries, resolving multidimensional queries is far more complicated, as there is no obvious total ordering of the points in the attribute space. Further, the query interval has varying size, aspect ratio, and position such as a window query. The main challenges involved in enabling multidimensional queries in a DHT overlay include designing efficient service attribute data: (i) distribution or indexing techniques; and (ii) query routing techniques.
Data Indexing Techniques for Mapping Multidimensional Range and Point Queries: A data indexing technique partitions the multidimensional attribute space over the set of VMs in a DHT network. Efficiency of the distribution mechanism directly governs how the query processing load is distributed among the Cloud peers. A good distribution mechanism should possess the following characteristics [29]: (i) locality: data points nearby in the attribute space should be mapped to the same Cloud peer, hence limiting the distributed lookup complexity; (ii) load balance: the number of data points indexed by each Cloud peer should be approximately the same to ensure uniform distribution of query processing; (iii) minimal metadata: prior information required for mapping the attribute space to the overlay space should be minimal; and (iv) minimal management overhead: during VM instantiation and destruction operation, update policies such as the transfer of data points to a newly joined Cloud peer should cause minimal network traffic. Note that the assumption here is that every VM instance hosts a Cloud peer service, which is responsible for managing activities related to overlay network.
12 Peer-to-Peer Cloud Provisioning: Service Discovery and Load-Balancing |
205 |
There are different kinds of database indices [30] that can handle mapping of multidimensional objects such as the space filling curves (SFCs) (including the Hilbert curves, Z-curves), k-d tree, MX-CIF Quad tree, and R*-tree in a DHT overlay. In literature, these indices are referred to as spatial indices [31]. Spatial indices are well suited for handling the complexity of multidimensional queries. Although some spatial indices can have issues as regards to routing load-balance in case of a skewed attribute/data set, all the spatial indices are generally scalable in terms of the number of hops traversed and messages generated while searching and routing multidimensional/spatial service discovery and update queries. However, there are different tradeoffs involved with each of the spatial indices, but basically they can all support scalability and Cloud service discovery. Some spatial index would perform optimally in one scenario but the performance could degrade if the attribute/ data distribution changed significantly.
Routing Techniques for Handling Multidimensional Queries in DHT Overlay: DHTs guarantee deterministic query look-up with logarithmic bounds on network message cost for single-dimensional queries. However, Cloud’s service discovery and update query are multidimensional (as discussed in previous sections). Hence, the existing DHT routing techniques need to be augmented in order to efficiently resolve multidimensional queries. Various data structures that we discussed in the previous section effectively create a logical multidimensional index space over a DHT overlay. A look-up operation involves searching for an index or set of indexes in a multidimensional space. However, the exact query routing path in the multidimensional logical space is directly governed by the data distribution mechanism (i.e. based on the data structure that maintains the indexes). In this context, various approaches have proposed different routing/indexing heuristics.
Efficient query routing algorithms should exhibit the following characteristics [29]: (i) routing load balance: every peer in the network should route forward/route approximately the same number of query messages; and (ii) low routing state per Cloud peer: each Cloud peer should maintain a small number of routing links hence limiting new Cloud peer (VM) join and Cloud peer (VM) state update cost. In the current peer-to-peer literature, multidimensional data distribution mechanisms based on the following structures have been proposed: (i) space filling curves; and (ii) tree-based structures. Resolving multidimensional queries over a DHT overlay that utilizes SFCs for data distribution consists of two basic steps [10]: (i) mapping the multidimensional query onto the set of relevant clusters of SFC-based index space; and (ii) routing the message to all VMs that fall under the computed SFCbased index space. On the other hand, routing multidimensional query in a DHT overlay that employs tree-based structures for data distribution requires routing to start from the root. However, the root VM presents a single point of failure and load imbalance. To overcome this, the authors introduced the concept of fundamental minimum level. This means that all the query processing and the data storage should start at that minimal level of the tree rather than at the root. There are a number of techniques available for distributed routing in multidimensional space. The performance of techniques varies depending on the distribution of data in the multidimensional space, and VM in the underlying DHT overlay.