Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced PHP Programming

.pdf
Скачиваний:
71
Добавлен:
14.04.2015
Размер:
7.82 Mб
Скачать

368 Chapter 15 Building a Distributed Environment

you might not only require separate servers but multiple bandwidth providers and possibly even disparate data center spaces in which to house redundant site facilities.

nCapacity—On the flip side, sites are often moved to a clustered setup to meet their increasing traffic demands. Scaling to meet traffic demands often entails one of two strategies:

nSplitting a collection of services into multiple small clusters

nCreating large clusters that can serve multiple roles

 

 

 

 

 

 

Figure 15.1 An application that does not meet the cluster definition.

Load Balancing

This book is not about load balancing. Load balancing is a complex topic, and the scope of this book doesn’t allow for the treatment it deserves. There are myriad software and hardware solutions available, varying in price, quality, and feature sets. This chapter focuses on how to build clusters intelligently and how to extend many of the techniques covered in earlier chapters to applications running in a clustered environment. At the end of the chapter I’ve listed some specific load-balancing solutions.

While both splitting a collection of services into multiple small clusters and creating large clusters that can serve multiple roles have merits, the first is the most prone to abuse. I’ve seen numerous clients crippled by “highly scalable” architectures (see Figure 15.3).

What Is a Cluster?

369

 

 

 

 

 

 

Figure 15.2 A simple clustered service.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 15.3 An overly complex application architecture.

The many benefits of this type of setup include the following:

nBy separating services onto different clusters, you can ensure that the needs of each can be scaled independently if traffic does not increase uniformly over all services.

nA physical separation is consistent and reinforces the logical design separation.

370 Chapter 15 Building a Distributed Environment

The drawbacks are considerations of scale. Many projects are overdivided into clusters. You have 10 logically separate services? Then you should have 10 clusters. Every service is business critical, so each should have at least two machines representing it (for redundancy).Very quickly, we have committed ourselves to 20 servers. In the bad cases, developers take advantage of the knowledge that the clusters are actually separate servers and write services that use mutually exclusive facilities. Sloppy reliance on the separation of the services can also include things as simple as using the same-named directory for storing data. Design mistakes like these can be hard or impossible to fix and can result in having to keep all the servers actually physically separate.

Having 10 separate clusters handling different services is not necessarily a bad thing. If you are serving several million pages per day, you might be able to efficiently spread your traffic across such a cluster.The problem occurs when you have a system design that requires a huge amount of physical resources but is serving only 100,000 or 1,000,000 pages per day.Then you are stuck in the situation of maintaining a large infrastructure that is highly underutilized.

Dot-com lore is full of grossly “mis-specified” and underutilized architectures. Not only are they wasteful of hardware resources, they are expensive to build and maintain. Although it is easy to blame company failures on mismanagement and bad ideas, one should never forget that the $5 million data center setup does not help the bottom line. As a systems architect for dot-com companies, I’ve always felt my job was not only to design infrastructures that can scale easily but to build them to maximize the return on investment.

Now that the cautionary tale of over-clustering is out of the way, how do we break services into clusters that work?

Clustering Design Essentials

The first step in breaking services into clusters that work, regardless of the details of the implementation, is to make sure that an application can be used in a clustered setup. Every time I give a conference talk, I am approached by a self-deprecating developer who wants to know the secret to building clustered applications.The big secret is that there is no secret: Building applications that don’t break when run in a cluster is not terribly complex.

This is the critical assumption that is required for clustered applications:

Never assume that two people have access to the same data unless it is in an explicitly shared resource.

In practical terms, this generates a number of corollaries:

nNever use files to store dynamic information unless control of those files is available to all cluster members (over NFS/Samba/and so on).

nNever use DBMs to store dynamic data.

Clustering Design Essentials

371

nNever require subsequent requests to have access to the same resource. For example, requiring subsequent requests to use exactly the same database connection resource is bad, but requiring subsequent requests be able to make connections to the same database is fine.

Planning to Fail

One of the major reasons for building clustered applications is to protect against component failure.This isn’t paranoia;Web clusters in particular are often built on so-called commodity hardware. Commodity hardware is essentially the same components you run in a desktop computer, perhaps in a rack-mountable case or with a nicer power supply or a server-style BIOS. Commodity hardware suffers from relatively poor quality control and very little fault tolerance. In contrast, with more advanced enterprise hardware platforms, commodity machines have little ability to recover from failures such as faulty processors or physical memory errors.

The compensating factor for this lower reliability is a tremendous cost savings. Companies such as Google and Yahoo! have demonstrated the huge cost savings you can realize by running large numbers of extremely cheap commodity machines versus fewer but much more expensive enterprise machines.

The moral of this story is that commodity machines fail, and the more machines you run, the more often you will experience failures—so you need to make sure that your application design takes this into account.These are some of the common pitfalls to avoid:

nEnsure that your application has the most recent code before it starts. In an environment where code changes rapidly, it is possible that the code base your server was running when it crashed is not the same as what is currently running on all the other machines.

nLocal caches should be purged before an application starts unless the data is known to be consistent.

nEven if your load-balancing solution supports it, a client’s session should never be required to be bound to a particular server. Using client/server affinity to promote good cache locality is fine (and in many cases very useful), but the client’s session shouldn’t break if the server goes offline.

Working and Playing Well with Others

It is critical to design for cohabitation, not for exclusivity. Applications shrink as often as they grow. It is not uncommon for a project to be overspecified, leaving it using much more hardware than needed (and thus higher capital commitment and maintenance costs). Often, the design of the architecture makes it impossible to coalesce multiple services onto a single machine.This directly violates the scalability goal of being flexible to both growth and contraction.

372 Chapter 15 Building a Distributed Environment

Designing applications for comfortable cohabitation is not hard. In practice, it involves very little specific planning or adaptation, but it does require some forethought in design to avoid common pitfalls.

Always Namespace Your Functions

We have talked about this maxim before, and with good reason: Proper namespacing of function, class, and global variable names is essential to coding large applications because it is the only systematic way to avoid symbol-naming conflicts.

In my code base I have my Web logging software.There is a function in its support libraries for displaying formatted errors to users:

function displayError($entry) {

//... weblog error display function

}

I also have a function in my general-purpose library for displaying errors to users:

function displayError($entry) {

//... general error display function

}

Clearly, I will have a problem if I want to use the two code bases together in a project; if I use them as is, I will get function redefinition errors.To make them cohabitate nicely, I need to change one of the function names, which will then require changing all its dependent code.

A much better solution is to anticipate this possibility and namespace all your functions to begin with, either by putting your functions in a class as static methods, as in this example:

class webblog {

static function displayError($entry) {

//...

}

}

class Common {

static function displayError($entry) {

//...

}

}

or by using the traditional PHP4 method of name-munging, as is done here:

function webblog_displayError($entry) {

//...

}

function Common_displayError($entry) {

//...

}

Clustering Design Essentials

373

Either way, by protecting symbol names from the start, you can eliminate the risk of conflicts and avoid the large code changes that conflicts often require.

Reference Services by Full Descriptive Names

Another good design principal that is particularly essential for safe code cohabitation is to reference services by full descriptive names. I often see application designs that reference a database called dbhost and then rely on dbhost to be specified in the /etc/hosts file on the machine. As long as there is only a single database host, this method won’t cause any problems. But invariably you will need to merge two services that each use their own dbhost that is not in fact the same host; then you are in trouble. The same goes for database schema names (database names in MySQL): Using unique names allows databases to be safely consolidated if the need arises. Using descriptive and unique database host and schema names mitigates the risk of confusion and conflict.

Namespace Your System Resources

If you are using filesystem resources (for example, for storing cache files), you should embed your service name in the path of the file to ensure that you do not interfere with other services’ caches and vice versa. Instead of writing your files in /cache/, you should write them in /cache/www.foo.com/.

Distributing Content to Your Cluster

In Chapter 7, “Enterprise PHP Management,” you saw a number of methods for content distribution. All those methods apply equally well to clustered applications.There are two major concerns, though:

nGuaranteeing that every server is consistent internally

nGuaranteeing that servers are consistent with each other

The first point is addressed in Chapter 7.The most complete way to ensure that you do not have mismatched code is to shut down a server while updating code.The reason only a shutdown will suffice to be completely certain is that PHP parses and runs its include files at runtime. Even if you replace all the old files with new files, scripts that are executing at the time the replacement occurs will run some old and some new code. There are ways to reduce the amount of time that a server needs to be shut down, but a shutdown is the only way to avoid a momentary inconsistency. In many cases this inconsistency is benign, but it can also cause errors that are visible to the end user if the API in a library changes as part of the update.

Fortunately, clustered applications are designed to handle single-node failures gracefully. A load balancer or failover solution will automatically detect that a service is unavailable and direct requests to functioning nodes.This means that if it is properly configured, you can shut down a single Web server, upgrade its content, and reenable it without any visible downtime.

374 Chapter 15 Building a Distributed Environment

Making upgrades happen instantaneously across all machines in a cluster is more difficult. But fortunately, this is seldom necessary. Having two simultaneous requests by different users run old code for one user and new code for another is often not a problem, as long as the time taken to complete the whole update is short and individual pages all function correctly (whether with the old or new behavior).

If a completely atomic switch is required, one solution is to disable half of the Web servers for a given application.Your failover solution will then direct traffic to the remaining functional nodes.The downed nodes can then all be upgraded and their Web servers restarted while leaving the load-balancing rules pointing at those nodes still disabled.When they are all functional, you can flip the load-balancer rule set to point to the freshly upgraded servers and finish the upgrade.

This process is clearly painful and expensive. For it to be successful, half of the cluster needs to be able to handle full traffic, even if for only a short time.Thus, this method should be avoided unless it is an absolutely necessary business requirement.

Scaling Horizontally

Horizontal scalability is somewhat of a buzzword in the systems architecture community. Simply put, it means that the architecture can scale linearly in capacity:To handle twice the usage, twice the resources will have to be applied. On the surface, this seems like it should be easy. After all, you built the application once; can’t you in the worst-case scenario build it again and double your capacity? Unfortunately, perfect horizontal scalability is almost never possible, for a couple reasons:

nMany applications’ components do not scale linearly. Say that you have an application that tracks the interlinking of Web logs.The number of possible links between N entries is O(N 2), so you might expect superlinear growth in the resources necessary to support this information.

nScaling RDBMSs is hard. On one side, hardware costs scale superlinearly for multi-CPU systems. On the other, multimaster replication techniques for databases tend to introduce latency.We will look at replication techniques in much greater depth later in this chapter, in the section “Scaling Databases.”

The guiding principle in horizontally scalable services is to avoid specialization. Any server should be able to handle a number of different tasks.Think of it as a restaurant. If you hire a vegetable-cutting specialist, a meat-cutting specialist, and a pasta-cooking specialist, you are efficient only as long as your menu doesn’t change. If you have a rise in the demand for pasta, your vegetable and meat chefs will be underutilized, and you will need to hire another pasta chef to meet your needs. In contrast, you could hire generalpurpose cooks who specialize in nothing.While they will not be as fast or good as the specialists on any give meal, they can be easily repurposed as demand shifts, making them a more economical and efficient choice.

Caching in a Distributed Environment

375

Specialized Clusters

Let’s return to the restaurant analogy. If bread is a staple part of your menu, it might make sense to bring in a baking staff to improve quality and efficiency.

Although these staff members cannot be repurposed into other tasks, if bread is consistently on the menu, having these people on staff is a sound choice. In large applications, it also sometimes make sense to use specialized clusters. Sometimes when this is appropriate include the following:

nServices that benefit from specialized tools—A prime example of this is image serving.There are Web servers such as Tux and thttpd that are particularly well designed for serving static content. Serving images through a set of servers specifically tuned for that purpose is a common strategy.

nConglomerations of acquired or third-party applications—Many environments are forced to run a number of separate applications because they have legacy applications that have differing requirements. Perhaps one application requires

mod_python or mod_perl. Often this is due to bad planning—often because a developer chooses the company environment as a testbed for new ideas and languages. Other times, though, it is unavoidable—for example, if an application is acquired and it is either proprietary or too expensive to reimplement in PHP.

nSegmenting database usage—As you will see later in this chapter, in the section “Scaling Databases,” if your application grows particularly large, it might make sense to break it into separate components that each serve distinct and independent portions of the application.

nVery large applications—Like the restaurant that opens its own bakery because of the popularity of its bread, if your application grows to a large enough size, it makes sense to divide it into more easily managed pieces.There is no magic formula for deciding when it makes sense to segment an application. Remember, though, that to withstand hardware failure, you need the application running on at least two machines. I never segment an application into parts that do not fully utilize at least two servers’ resources.

Caching in a Distributed Environment

Using caching techniques to increase performance is one of the central themes of this book. Caching, in one form or another, is the basis for almost all successful performance improvement techniques, but unfortunately, a number of the techniques we have developed, especially content caching and other interprocess caching techniques, break down when we move them straight to a clustered environment.

Consider a situation in which you have two machines, Server A and Server B, both of which are serving up cached personal pages. Requests come in for Joe Random’s personal page, and it is cached on Server A and Server B (see Figure 15.4).

376 Chapter 15 Building a Distributed Environment

Client X

Client Y

Request for Joe's Page

Request for Joe's Page

Server A

Server B

Page gets

Page gets

cached

cached

Figure 15.4 Requests being cached across multiple machines.

Now Joe comes in and updates his personal page. His update request happens on Server A, so his page gets regenerated there (see Figure 15.5).

This is all that the caching mechanisms we have developed so far will provide.The cached copy of Joe’s page was poisoned on the machine where the update occurred (Server A), but Server B still has a stale copy, but it has no way to know that the copy is stale, as shown in Figure 15.6. So the data is inconsistent and you have yet to develop a way to deal with it.

Caching in a Distributed Environment

377

Joe

Client Z

Joe updates his Page

Server A

Server B

Page gets

Old page is

re-cached

still cached

Figure 15.5 A single cache write leaving the cache inconsistent.

Cached session data suffers from a similar problem. Joe Random visits your online marketplace and places items in a shopping cart. If that cart is implemented by using the session extension on local files, then each time Joe hits a different server, he will get a completely different version of his cart, as shown in Figure 15.7.

Given that you do not want to have to tie a user’s session to a particular machine (for the reasons outlined previously), there are two basic approaches to tackle these problems:

nUse a centralized caching service.

nImplement consistency controls over a decentralized service.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]