Скачиваний:
50
Добавлен:
20.06.2019
Размер:
50.48 Mб
Скачать

1  Tools and Technologies for Building Clouds

13

The pairs are partitioned into groups for processing, and are sorted according to their key as they arrive for reduction. (In the example, the pairs are now grouped according to the key.)

The key/value pairs are reduced, once for each unique key in the sorted list, to produce a combined result. (In this example, this will be the count of each word).

MapReduce has been applied widely in various fields including dataand computeintensive applications, machine learning, and multicore programming. Moreover, many implementations have been developed in different programming languages for various purposes.

The popular open source implementation of MapReduce, Hadoop [48], was developed primarily by Yahoo, where it processes hundreds of terabytes of data on at least 10,000 cores [49], and is now used by other companies, including Facebook, Amazon, Last.fm, and the New York Times [50]. Research groups from enterprises and academia are starting to study the MapReduce model as a better fit for cloud computing, and explore the possibilities of adapting it for more applications.

1.3.1  Hadoop MapReduce Overview

The Hadoop common [48], formerly called the Hadoop core, includes filesystem, RPC (remote procedure call), and serialization libraries, and provides the basic services for building a cloud computing environment with commodity hardware. The two main subprojects are the MapReduce framework and the Hadoop Distributed File System (HDFS).

The HDFS is a distributed file system designed to run on commodity hardware. HDFS is highly fault-tolerant and so can be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. The Hadoop MapReduce framework is highly reliant on its shared file system, and it comes with plug-ins for HDFS, CloudStore [51], and the Amazon Simple Storage Service (S3).

The MapReduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node. The master is responsible for scheduling the jobs’ component tasks on the slaves (i.e., it queries the HDFS master Namenode about data block locations and assigns each task to the TaskTracker that is closest to where the data to be processed is physically stored), monitoring them, and re-­ executing any failed tasks. The slaves execute the tasks as directed by the master.

1.4  Web Services

To support cloud computing infrastructure efficiently, and to express business models­ easily, designers and developers need a group of web services technologies to construct a real, user-friendly, and content-rich set of applications on the top of

14

H. Jin et al.

their clouds. This section introduces four fundamental tools and technologies, which can be employed to construct cloud applications viewed at the infrastructure, architecture, and presentational level. These technologies are: Remote Procedure Call (RPC), Service-Oriented Architecture (SOA), Representational State Transfer (REST), and Mashup.

1.4.1  RPC (Remote Procedure Call)

Reliable and stable communications among cloud resources are fundamental to the infrastructure, and thus are an important consideration. Remote Procedure Call (RPC) has proven to be an efficient mechanism for implementing the client-server model in a distributed computing environment. It was proposed initially by Sun Microsystems as a great advancement in comparison with sockets (e.g., the programmer is not concerned with the underlying communications, since they are embedded inside the RPC). In RPC, the client must know what features the server provides, which are indicated by a service definition, written in IDL (Interface Description Language). An RPC call is a synchronous operation that suspends the calling program until the results of the call are returned. When an RPC is compiled, a stub is included in the compiled code that represents the remote service. When the program runs, it calls the stub, which knows where the operation is and how to reach the service. The stub will send the message through the network to the server. The result of the procedure is returned to the client in the same way.

Many commercial products built over the RPC mechanism have been practically proven as efficient and convenient to construct enterprise applications.

In 2002, Microsoft released the .NET Remoting [52], which was incrementally evolved from DCOM and Active X, to support. NET applications intercommunicating in a loosely coupled environment. Similar to RPC stubs, .NET Remoting initializes the “Channel” objects to proxy the remote calls. To improve the transparency and convenience, the procedure of serialization and marshalling will be completed automatically by .NET runtime. Each .NET Remoting object is identified as a unique URL and safely accessed by clients remotely.

Extending from Java Remote Method Invocation (RMI) [53], Java community presents a complete specification J2EE [54] to standardize the communications among loosely coupled Java components. The enhancements include Enterprise Java Beans (EJB), connectors, servlets, and portlets. The complete J2EE structure of specifications helps designers to easily construct business logic and assists developers in clearly implementing them. Although .Net Remoting and J2EE have been widely adopted by the industry, RPC mechanism is not feasible to construct Cloud applications. One of the problems with RPC is that RPC implementations, as shown in Table 1.4, can be incompatible with each other. To use one of the possible implementations of RPC will result in a high dependence on the particular RPC.

1  Tools and Technologies for Building Clouds

 

15

Table 1.4

Web service toolkits comparisons

 

 

 

 

 

 

 

 

 

 

 

Age

Dep.

Transport

Key Tech.

Categories

Implementations

RPC

1974

TCP/IP

Stubs, IDL

Infrastructure,

Java RMI

 

 

 

 

 

IaaS

[52], XML

 

 

 

 

 

 

RPC, .Net

 

 

 

 

 

 

Remoting

 

 

 

 

 

 

[53], RPyC,

 

 

 

 

 

 

CORBA

SOA

1998

WS-RPC

HTTP,FTP,

WSDL

Architecture

IBM Websphere,

 

 

 

SMTP

UDDI

level, PaaS

Microsoft .Net

 

 

 

 

SOAP

 

IIS, Weblogic

REST

2000

HTTP

HTTP,FTP,

Web-oriented

Architecture

RIP, Rails,

 

 

 

SMTP

 

level, DaaS

Restlet, Jboss

 

 

 

 

 

 

RESTEasy,

 

 

 

 

 

 

Apache CXF,

 

 

 

 

 

 

Symfony

MASHUP

2000

REST

HTTP

Web-oriented

Application

Google Mashup

 

later

SOA

 

(Web 2.0)

level, SaaS

editor, JackBe,

 

 

RSS

 

 

 

Mozilla

 

 

 

 

 

 

Ubiquity

1.4.2  SOA (Service-Oriented Architecture)

The goal of a Service-Oriented Architecture (SOA) [55,56] is to composite together fairly large chunks of functionality to form service-oriented applications, which are almost entirely built from the existing software services. SOA hired a bunch of open standards (1) to wrap the components in different localized runtime environment (e.g., in Java or .NET); (2) to enable different clients including pervasive devices free access; (3) to reuse the existing components to compose more services. This significantly reduces development costs and helps designers and developers to concentrate more on business models and their internal logic.

SOAs use several communication standards based on XML to enhance the interoperability among application systems. As the atomic access point inside an SOA, the web services are formally defined by three kernel standards: Web Service Description Language (WSDL), Simple Object Access Protocol (SOAP), and Universal Description Discovery and Integration (UDDI). Normally, the functional interfaces and parameters of specific services are described using the WSDL. Web services exchange messages are encoded in the SOAP messaging framework and transported over HTTP or other internet protocols (SMTP, FTP, and so forth). A typical web service lifecycle envisions the following scenario: A service provider publishes the WSDL description of their service in a UDDI, a registry that permits Universal Description Discovery and Integration of web services. Subsequently, service requesters can inspect the UDDI and locate/discover web services that are of interest. Using the information provided by the WSDL description, they can directly invoke the corresponding web service. Further, several web services can be

16

H. Jin et al.

composed to achieve more complex functionality. All the invocation procedures are similar to RPC except that the communications and deployments are described in open standards.

Moreover, the open standards organizations such as W3C, OASIS, and DMTF contribute many higher-level standards to help different users construct their reusable, interoperable, and discoverable services and applications. Some of these standards were widely adopted to construct grid and cloud systems, such as Web Services Resources Framework (WSRF) [57], Web Services Security (WS-Security) [58], Web Services Policy (WS-Policy) [59], and so on.

1.4.3  REST (Representative State Transfer)

REST [60] is an architectural style that Roy T. Fielding, now chief scientist at Day Software, first defined in his doctoral thesis. REST stipulates mechanisms for defining and accessing resources in specific distributed systems such as the web. In a REST implementation, resources are addressed via uniform resource identifiers (URIs). That is, a given URI is used to access the representational state of a resource, and also to modify that resource. For example, web URLs can be used to give descriptive information about resources, and consumers then need to know only the URL to read the information. Furthermore, an authorized user can also modify the information if needed.

REST defines three architectural entities as follows [60–62]:

Data elements: resource identifiers such as URIs and URLs, and resource representations, such as HTML documents, images, and XML documents

Components: Origin servers, gateways, proxies, and user agents

Connectors: Clients, servers, and caches

The representational state for resources in an HTTP-based REST system should be accessed using the standard HTTP methods.

A simple breakdown of these methods is as follows: GET is used to transfer the current representational state of a resource from a server to a client; PUT is used to transfer the modified representational state of a resource from the client to the server; POST is used to transfer the new representational state of a resource from the client to the server; and DELETE is used to transfer information needed to change a resource to a deleted representational state.

1.4.4  Mashup

A mashup has been defined in Wikipedia [63] as “a web page or application that combines data or functionality from two or more external sources to create a new service. To be more precise, Mashup technology concentrates on the following tasks

Соседние файлы в папке CLOUD COMPUTING