Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Apress.Pro.Drupal.7.Development.3rd.Edition.Dec.2010.pdf
Скачиваний:
54
Добавлен:
14.03.2016
Размер:
12.64 Mб
Скачать

CHAPTER 23 OPTIMIZING DRUPAL

if your server or file system mounts go down, the site is affected unless you also create a cluster of file servers.

If there are many large media files to be served, it may be best to serve these from a separate server using a lightweight web server, such as Nginx, to avoid having a lot of long-running processes on your web servers contending with requests handled by Drupal. An easy way to do this is to use a rewrite rule on your web server to redirect all incoming requests for a certain file type to the static server. Here’s an example rewrite rule for Apache that rewrites all requests for JPEG files:

RewriteCond %{REQUEST_URI} ^/(.*\.jpg)$ [NC]

RewriteRule .* http://static.example.com/%1 [R]

The disadvantage of this approach is that the web servers are still performing the extra work of redirecting traffic to the file server. An improved solution is to rewrite all file URLs within Drupal, so the web servers are no longer involved in static file requests.

Beyond a Single File System

If the amount of storage is going to exceed a single file system, chances are you’ll be doing some custom coding to implement storage abstraction. One option would be to use an outsourced storage system like Amazon’s S3 service.

Multiple Database Servers

Multiple database servers introduce additional complexity, because the data being inserted and updated must be replicated or partitioned across servers.

Database Replication

In MySQL database replication, a single master database receives all writes. These writes are then replicated to one or more slaves. Reads can be done on any master or slave. Slaves can also be masters in a multitiered architecture.

Database Partitioning

Since Drupal can handle multiple database connections, another strategy for scaling your database architecture is to put some tables in one database on one machine, and other tables in a different database on another machine. For example, moving all cache tables to a separate database on a separate machine and aliasing all queries on these tables using Drupal’s table prefixing mechanism can help your site scale.

Finding the Bottleneck

If your Drupal site is not performing as well as expected, the first step is to analyze where the problem lies. Possibilities include the web server, the operating system, the database, file system, and the network.

518

CHAPTER 23 OPTIMIZING DRUPAL

Knowing how to evaluate the performance and scalability of a system allows you to quickly isolate and respond to system bottlenecks with confidence, even amid a crisis. You can discover where bottlenecks lie with a few simple tools and by asking questions along the way. Here’s one way to approach a badly performing server. We begin with the knowledge that performance is going to be bound by one of the following variables: CPU, RAM, I/O, or bandwidth. So begin by asking yourself the following questions:

Is the CPU maxed out? If examining CPU usage with top on Unix or the Task Manager on Windows shows CPU(s) at 100 percent, your mission is to find out what’s causing all that processing. Looking at the process list will let you know whether it’s the web server or the database eating up processor cycles. Both of these problems are solvable.

Is the server paging excessively? If the server lacks enough physical memory to handle the allocated task, the operating system will use virtual memory (disk) to handle the load. Reading and writing from disk is significantly slower than reading and writing to physical memory. If your server is paging excessively, you’ll need to figure out why.

Are the disks maxed out? If examining the disk subsystem with a tool like vmstat on Unix or the Performance Monitor on Windows shows that disk activity cannot keep up with the demands of the system while plenty of free RAM remains, you’ve got an I/O problem. Possibilities include excessively verbose logging, an improperly configured database that is creating many temporary tables on disk, background script execution, improper use of a RAID level for a write-heavy application, and so on.

Is the network link saturated? If the network pipe is filled up, there are only two solutions. One is to get a bigger pipe. The other is to send less information while making sure the information that is being sent is properly compressed.

Tip Investigating your page serving performance from outside your server is also useful. A tool like YSlow (http://developer.yahoo.com/yslow/help/) can be helpful when pinpointing why your pages are not downloading as quickly as you’d like when you haven’t yet hit a wall with CPU, RAM, or I/O. A helpful article on YSlow and Drupal can be found at http://wimleers.com/article/improving-drupals-page-loading- performance.

Web Server Running Out of CPU

If your CPU is maxed out and the process list shows that the resources are being consumed by the web server and not the database (which is covered later), you should look into reducing the web server overhead incurred to serve a request. Often the execution of PHP code is the culprit. See the description of PHP optimizations earlier in the chapter.

Often custom code and modules that have performed reasonably well for small-scale sites can become a bottleneck when moved into production. CPU-intensive code loops, memory-hungry algorithms, and large database retrievals can be identified by profiling your code to determine where PHP is spending most of its time and thus where you ought to spend most of your time debugging.

519

Download from Wow! eBook <www.wowebook.com>

CHAPTER 23 OPTIMIZING DRUPAL

If, even after adding an opcode cache and optimizing your code, your web server cannot handle the load, it is time to get a beefier box with more or faster CPUs or to move to a different architecture with multiple web server front ends.

Web Server Running Out of RAM

The RAM footprint of the web server process serving the request includes all of the modules loaded by the web server (such as Apache’s mod_mime, mod_rewrite, etc.) as well as the memory used by the PHP interpreter. The more web server and Drupal modules that are enabled, the more RAM used per request.

Because RAM is a finite resource, you should determine how much is being used on each request and how many requests your web server is configured to handle. To see how much real RAM is being used on average for each request, use a program like top (on Linux) to see your list of processes. In Apache, the maximum number of simultaneous requests that will be served is set using the MaxClients directive. A common mistake is thinking the solution to a saturated web server is to increase the value of MaxClients. This only complicates the problem, since you’ll be hit by too many requests at once. That means RAM will be exhausted, and your server will start disk swapping and become unresponsive. Let’s assume, for example, that your web server has 2GB of RAM and each Apache request is using roughly 20MB (you can check the actual value by using top on Linux or Task Manager on Windows). You can calculate a good value for MaxClients by using the following formula; keep in mind the fact that you will need to reserve memory for your operating system and other processes:

2GB RAM / 20MB per process = 100 MaxClients

If your server consistently runs out of RAM even after disabling unneeded web server modules and profiling any custom modules or code, your next step is to make sure the database and the operating system are not the causes of the bottleneck. If they are, then add more RAM. If the database and operating system are not causing the bottlenecks, you simply have more requests than you can serve; the solution is to add more web server boxes.

Tip Since memory usage of Apache processes tends to increase to the level of the most memory-hungry page served by that child process, memory can be regained by setting the MaxRequestsPerChild value to a low number, such as 300 (the actual number will depend on your situation). Apache will work a little harder to generate new children, but the new children will use less RAM than the older ones they replace, so you can serve more requests in less RAM. The default setting for MaxRequestsPerChild is 0, meaning the processes will never expire.

Identifying Expensive Database Queries

If you need to get a sense of what is happening when a given page is generated, devel.module is invaluable. It has an option to display all the queries that are required to generate the page along with the execution time of each query.

Another way to find out which queries are taking too long is to enable slow query logging in MySQL. This is done in the MySQL option file (my.cnf) as follows:

520

CHAPTER 23 OPTIMIZING DRUPAL

# The MySQL server [mysqld] log-slow-queries

This will log all queries that take longer than ten seconds to a log file at example.com-slow.log in MySQL’s data directory. You can change the number of seconds and the log location as shown in this code, where we set the slow query threshold to five seconds and the file name to example-slow.log:

# The MySQL server [mysqld] long_query_time = 5

log-slow-queries = /var/log/mysql/example-slow.log

Identifying Expensive Pages

To find out which pages are the most resource intensive, enable the statistics module that is included with Drupal. Although the statistics module increases the load on your server (since it records access statistics for your site into your database), it can be useful to see which pages are the most frequently viewed and thus the most ripe for query optimization. It also tracks total page generation time over a period, which you can specify in Configuration -> Statistics. This is useful for identifying out-of-control web crawlers that are eating up system resources, which you can then ban on the spot by visiting Reports -> Top visitors and clicking “ban.” Be careful, though—it’s just as easy to ban a good crawler that drives traffic to your site as a bad one. Make sure you investigate the origin of the crawler before banning it.

Identifying Expensive Code

Consider the following resource-hogging code:

//Very expensive, silly way to get node titles. First we get the node IDs

//of all published nodes.

$query = db_select('node', 'n'); $query->fields('n', array('nid')); $query->condition("n.status", 1); $query->addTag('node_access'); $result = $query->execute();

// Now we do a node_load() on each individual node and save the title.

foreach($result as $row) { $node = node_load($row->nid);

$titles[] = check_plain($node->title);

}

Fully loading a node is an expensive operation: hooks run, modules perform database queries to add or modify the node, and memory is used to cache the node in node_load()’s internal cache. If you are not depending on modification to the node by a module, it’s much faster to do your own query of the node table directly. Certainly this is a contrived example, but the same pattern can often be found, that

521

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]