Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Apress.Pro.Drupal.7.Development.3rd.Edition.Dec.2010.pdf
Скачиваний:
73
Добавлен:
14.03.2016
Размер:
12.64 Mб
Скачать

C H A P T E R 2 1

■ ■ ■

Writing Secure Code

It seems that almost daily we see headlines about this or that type of software having a security flaw. Keeping unwanted guests out of your web application and server should be a high priority for any serious developer.

There are many ways in which a user with harmful intent can attempt to compromise your Drupal site. Some of these include slipping code into your system and getting it to execute, manipulating data in your database, viewing materials to which the user should not have access, and sending unwanted e- mail through your Drupal installation. In this chapter, you’ll learn how to program defensively to ward off these kinds of attacks.

Fortunately, Drupal provides some tools that make it easy to eliminate the most common causes of security breaches.

Handling User Input

When users interact with Drupal, it is typically through a series of forms, such as the node submission form or the comment submission form. Users might also post remotely to a Drupal-based blog via XMLRPC using the blogapi module (http://drupal.org/project/blogapi). Drupal’s approach to user input can be summarized as store the original; filter on output. The database should always contain an accurate representation of what the user entered. As user input is being prepared to be incorporated into a web page, it is sanitized (i.e., potentially executable code is neutralized).

Security breaches can be caused when text entered by a user is not sanitized and is executed inside your program. This can happen when you don’t think about the full range of possibilities when you write your program. You might expect users to enter only standard characters, when in fact they could enter nonstandard strings or encoded characters, such as control characters. You might have seen URLs with the string %20 in them—for example, http://example.com/my%20document.html. This is a space character that has been encoded in compliance with the URL specification (see www.w3.org/Addressing/URL/url- spec.html). When someone saves a file named my document.html and it’s served by a web server, the space is encoded. The % denotes an encoded character, and the 20 shows that this is ASCII character 32 (20 is the hexadecimal representation of 32). Tricky use of encoded characters by nefarious users can be problematic, as you’ll see later in this chapter.

Thinking About Data Types

When dealing with text in a system such as Drupal where user input is displayed as part of a web site, it’s helpful to think of the user input as a typed variable. If you’ve programmed in a strongly typed language

465

CHAPTER 21 WRITING SECURE CODE

such as Java, you’ll be familiar with typed variables. For example, an integer in Java is really an integer, and will not be treated as a string unless the programmer explicitly makes the conversion. In PHP (a weakly typed language), you’re usually fine treating an integer as a string or an integer, depending on the context, due to PHP’s automatic type conversion. But good PHP programmers think carefully about types and use automatic type conversion to their advantage. In the same way, even though user input from, say, the Body field of a node submission form can be treated as text, it’s much better to think of it as a certain type of text. Is the user entering plain text? Or is the user entering HTML tags and expecting that they’ll be rendered? If so, could these tags include harmful tags, such as JavaScript that replaces your page with an advertisement for cell phone ringtones? A page that will be displayed to a user is in HTML format; user input is in a variety of “types” of textual formats and must be securely converted to HTML before being displayed. Thinking about user input in this way helps you to understand how Drupal’s text conversion functions work. Common types of textual input, along with functions to convert the text to another format, are shown in Table 21-1.

Table 21-1. Secure Conversions from One Text Type to Another

Source Format

Target Format

Drupal Function

What It Does

 

 

 

 

Plain text

HTML

check_plain()

Encodes special characters into HTML entities

 

 

 

and validates strings at UTF-8 to prevent cross-

 

 

 

site scripting attacks on Internet Explorer 6

HTML text

HTML

filter_xss()

Removes characters and constructs that can

 

 

 

trick browsers. Makes sure that all HTML

 

 

 

entities are well formed. Makes sure that all

 

 

 

HTML tags and attributes are well formed, and

 

 

 

makes sure that no HTML tags contain URLs

 

 

 

with a disallowed protocol (e.g., Javascript)

Rich text

HTML

check_markup()

Runs text through all enabled filters

Plain text

URL

drupal_encode_path() Encodes a Drupal path for use in a URL

URL

HTML

check_url()

Strips out harmful protocols, such as

 

 

 

javascript:

Plain text

MIME

mime_header_encode()

Encodes non-ASCII, UTF-8 encoded characters

 

 

 

 

Plain Text

Plain text is text that is supposed to contain only, well, plain text. For example, if you ask a user to type in his or her favorite color in a form, you expect the user to answer “green” or “purple,” without markup of any kind. Including this input in another web page without checking to make sure that it really does contain only plain text is a gaping security hole. For example, the user might enter the following instead of entering a color:

466

CHAPTER 21 WRITING SECURE CODE

<img src="javascript:window.location ='<a href="http://evil.example.com/133/index.php?s=11&"> http://evil.example.com/133/index.php?s=11&</a>;ce_cid=38181161'">

Thus, we have the function check_plain() available to enforce that all other characters are neutralized by encoding them as HTML entities. The text that is returned from check_plain() will have no HTML tags of any kind, as they’ve all been converted to entities. If a user enters the evil JavaScript in the preceding code, the check_plain() function will turn it into the following text, which will be harmless when rendered in HTML:

<img src="javascript:window.location ='<a

href="http://evil.example.com/133/index.php?s=11&">http://evil.

example.com/133/index.php?s=11&</a>;ce_cid=38181161'">

HTML Text

HTML text can contain HTML markup. However, you can never blindly trust that the user has entered only “safe” HTML; generally you want to restrict users to using a subset of the available HTML tags. For example, the <script> tag is not one that you generally want to allow because it permits users to run scripts of their choice on your site. Likewise, you don’t want users using the <form> tag to set up forms on your site.

Rich Text

Rich text is text that contains more information than plain text but is not necessarily in HTML. It may contain wiki markup, or Bulletin Board Code (BBCode), or some other markup language. Such text must be run through a filter to convert the markup to HTML before display.

Note For more information on filters, see Chapter 12.

URL

URL is a URL that has been built from user input or from another untrusted source. You might have expected the user to enter http://example.com, but the user entered javascript:runevilJS() instead. Before displaying the URL in an HTML page, you must run it through check_url() to make sure it is well formed and does not contain attacks.

467

CHAPTER 21 WRITING SECURE CODE

Using check_plain() and t() to Sanitize Output

Use check_plain() any time you have text that you don’t trust and in which you do not want any markup.

Here is a naïve way of using user input, assuming the user has just entered a favorite color in a text field. The following code is insecure:

drupal_set_message("Your favorite color is $color!"); // No input checking!

The following is secure but bad coding practice:

drupal_set_message('Your favorite color is ' . check_plain($color));

This is bad code because we have a text string (namely the implicit result of the check_plain() function), but it isn’t inside the t() function, which should always be used for text strings. If you write code like the preceding, be prepared for complaints from angry translators, who will be unable to translate your phrase because it doesn’t pass through t().

You cannot just place variables inside double quotes and give them to t(). The following code is still insecure because no placeholder is being used:

drupal_set_message(t("Your favorite color is $color!")); // No input checking!

The t() function provides a built-in way of making your strings secure by using a placeholding token with a one-character prefix, as follows.

The following is secure and in good form:

drupal_set_message(t('Your favorite color is @color', array('@color' => $color)));

Note that the key in the array (@color) is the same as the replacement token in the string. This results in a message like the following:

Your favorite color is brown.

The @ prefix tells t() to run the value that is replacing the token through check_plain().

Note When running a translation of Drupal, the token is run through check_plain(), but the translated string is not. So you need to trust your translators.

In this case, we probably want to emphasize the user’s choice of color by changing the style of the color value. This is done using the % prefix, which means “execute -theme('placeholder', $value) on the value.” This passes the value through check_plain() indirectly, as shown in Figure 21-1. The % prefix is the most commonly used prefix.

468

CHAPTER 21 WRITING SECURE CODE

The following is secure and good form:

drupal_set_message(t('Your favorite color is %color', array('%color' => $color)));

This results in a message like the following. In addition to escaping the value, theme_placeholder() has wrapped the value in <em></em> tags.

Your favorite color is brown.

If you have text that has been previously sanitized, you can disable checks in t() by using the ! prefix. For example, the l() function builds a link, and for convenience, it runs the text of the link through check_plain() while building the link. So in the following example, the ! prefix can be safely used:

//The l() function runs text through check_plain() and returns sanitized text

//so no need for us to do check_plain($link) or to have t() do it for us.

$link = l($user_supplied_text, $path);

drupal_set_message(t('Go to the website !website', array('!website' => $link));

Note The l() function passes the text of the link through check_plain() unless you have indicated to l() that the text is already in HTML format by setting html to TRUE in the options parameter. See http://api.drupal.org/api/function/l/7.

The effect of the @, %, and ! placeholders on string replacement in t() is shown in Figure 21-1. Although for simplicity’s sake it isn’t shown in the figure, remember that you may use multiple placeholders by defining them in the string and adding members to the array, for example:

drupal_set_message(t('Your favorite color is %color and you like %food', array('%color' => $color, '%food' => $food)));

Be especially cautious with the use of the ! prefix, since that means the string will not be run through check_plain().

469

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]