Скачиваний:
50
Добавлен:
20.06.2019
Размер:
50.48 Mб
Скачать

3  Towards a Taxonomy for Cloud Computing from an e-Science Perspective

49

3.2  Scientific Workflows and e-Science

This section presents the main definitions regarding e-Science and scientific workflow concepts. These concepts are presented along with some important aspects to be considered when modeling or executing scientific experiments using cloud computing. These aspects are used as a basis for elaborating the classes of the cloud computing taxonomy.

3.2.1  Scientific Workflows

According to the Workflow Management Coalition [31], a workflow may be defined as “the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules.” A workflow defines the order of task invocations or conditions under which tasks must be invoked and the task synchronization. This definition is related to business workflows; however, it can be exploited in the scientific domain [26], where tasks will be related to scientific applications instead of business ones. An example of scientific workflow is presented in Fig. 3.1. This workflow is part of a real deep water oil exploitation scientific experiment [20].

3.2.2  Scientific Workflow Management Systems

Scientific Workflow Management Systems (SWfMSs) are responsible for coordinating the invocation of programs, either locally or in remote environments. Many different SWfMSs can be found in the literature [1, 5]. Although current SWfMSs have many important characteristics and evolutions, according to Weske et al. [30], these SWfMSs need to offer adequate support for the scientist throughout the experimentation process, including: (i) designing the workflow through a guided interface; (ii) controlling several variations of workflows; (iii) executing the workflow in an efficient way; (iv) handling failures and; (v) accessing, storing, and managing data.

Most of this support can be achieved using the cloud computing paradigm. More specifically, efficient execution of scientific experiments, as well as management of

Fig. 3.1Deep water oil exploitation scientific workflow [20]

50

D. de Oliveira et al.

the large amount of scientific data produced by the experiment, is provided by the computational infrastructure of cloud computing environments. The next section presents some important aspects for scientific experiments to be considered when choosing a cloud computing environment.

3.2.3  Important Aspects of In Silico Experiments

In silico experiments (that are usually modeled as scientific workflows) have some important aspects to be considered when being modeled or executed. Many of these aspects should be taken into account when choosing a supporting cloud computing environment. Cloud computing environments present some important characteristics that are related to those aspects and may influence when scientists choose a cloud environment to use. This section presents these aspects (business model, privacy, pricing, technological infrastructure, architecture, access, and standards) as they guide us to choose the classes of the proposed taxonomy.

One of the most important aspects for scientific experiments is reproducibility. To reproduce and validate an experiment, scientists must have all available information related to the experiment, including which parameter values were used in each instance of execution, the results (both final and intermediary) produced during its execution. This type of information is called provenance [8]. This data is stored in databases or via specialized services to store provenance, thus handling failures and retaining data integrity. Therefore, to achieve experiment reproducibility, the supporting cloud computing environment should provide two fundamental features, data storage and environment configuration. Data storage is required to store provenance data. Preferably, there should be a service that provides storage or database mechanisms to enable the scientist to access provenance data and track how the results of an experiment execution were obtained. Environment configuration is required since the whole environment used to execute the experiment should be able to be reconfigured. Those characteristics are related to the business model followed by a cloud computing environment.

Privacy is also a major issue for the scientific community. Usually, provenance data and programs related to a scientific experiment are considered intellectual property and because of that, they are not public until the research is published in a scientific paper. This way, the privacy aspect of cloud environments must be analyzed when dealing with scientific experiments.

Another important aspect to be considered is related to pricing. Scientists frequently use open-source and community environments. This type of programs and environments is freely available for general use, thus contributing to the reproducibility of experiment executions. The open-software culture of the scientific community must be considered, since most cloud environments are commercial, which means that the service is paid for. Thus, scientists should take into account the pricing of environments.

Соседние файлы в папке CLOUD COMPUTING