Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
158
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Accessing Databases

cursor.execute(query,(employee,)); for row in cursor.fetchone():

if (row != None): empid = row

# Now, modify the employee.

cursor.execute(“delete from employee where empid=?”, (empid,))

connection.commit()

cursor.close()

connection.close()

When you run this script, you need to pass the user name of the person to terminate. You should see no output unless the script raises an error:

$ python finduser.py bunny

bunny : Bunny Wailer managed by Eric Foster-Johnson in qa

$ python terminate.py bunny $ python finduser.py bunny

How It Works

This script uses the same techniques as the updatemgr.py script by performing an initial query to get the employee ID for the given user name and then using this ID in a later SQL statement. With the final SQL statement, the script deletes the employee from the employee table.

Note that this script leaves the record in the user table. Question 3 of the exercises at the end of this chapter addresses this.

Working with Transactions and Committing the Results

Each connection, while it is engaged in an action, manages a transaction. With SQL, data is not modified unless you commit a transaction. The database then guarantees that it will perform all of the modifications in the transaction or none. Thus, you will not leave your database in an uncertain and potentially erroneous state.

To commit a transaction, call the commit method of a connection:

connection.commit()

Note that the transaction methods are part of the Connection class, not the Cursor class.

If something goes wrong, like an exception is thrown that you can handle, you should call the rollback method to undo the effects of the incomplete transaction; this will restore the database to the state it was in before you started the transaction, guaranteed:

connection.rollback()

The capability to roll back a transaction is very important, as you can handle errors by ensuring that the database does not get changed. In addition, rollbacks are very useful for testing. You can insert, modify, and delete a number of rows as part of a unit test and then roll back the transaction to undo the effects of

271

TEAM LinG

Chapter 14

all the changes. This enables your unit tests to run without making any permanent changes to the database. It also enables your unit tests to be run repeatedly, because each run resets the data.

See Chapter 12 for more on testing.

Examining Module Capabilities and Metadata

The DB API defines several globals that need to be defined at the module level. You can use these globals to determine information about the database module and the features it supports. The following table lists these globals.

Global

Holds

 

 

apilevel

Should hold ‘2.0’ for the DB API 2.0, or ‘1.0’ for the 1.0 API.

paramstyle

Defines how you can indicate the placeholders for dynamic data in your SQL

 

statements. The values include the following:

 

‘qmark’ — Use question marks, as shown in the examples in this chapter.

 

‘numeric’ — Use a positional number style, with ‘:1’, ‘:2’, and so on.

 

‘named’ — Use a colon and a name for each parameter, such as :name.

 

‘format’ — Use the ANSI C sprintf format codes, such as %s for a string and

 

%d for an integer.

 

‘pyformat’ — Use the Python extended format codes, such as %(name)s.

 

 

In addition, remember that pydoc is your friend. You can use pydoc to display information on modules, such as the database modules.

With a Cursor object, you can check the definition attribute to see information about the data returned. This information should be a set of seven-element sequences, one for each column of result data. These sequences include the following items:

(name, type_code, display_size, internal_size, precision, scale, null_ok)

None can be used for all but the first two items. The Gadfly database, though, does not fill in the type code, as shown in this example:

((‘FIRSTNAME’, None, None, None, None, None, None),

(‘LASTNAME’, None, None, None, None, None, None),

(‘NAME’, None, None, None, None, None, None))

Handling Errors

Errors happen. With databases, errors happen a lot. The DB API defines a number of errors that must exist in each database module. The following table lists these exceptions.

272

TEAM LinG

 

 

Accessing Databases

 

 

 

 

Exception

Usage

 

 

 

 

 

 

Warning

Used for non-fatal issues. Must subclass StandardError.

 

 

Error

Base class for errors. Must subclass StandardError.

 

 

InterfaceError

Used for errors in the database module, not the database itself. Must

 

 

 

subclass Error.

 

 

DatabaseError

Used for errors in the database. Must subclass Error.

 

 

DataError

Subclass of DatabaseError that refers to errors in the data.

 

 

OperationalError

Subclass of DatabaseError that refers to errors such as the loss of a con-

 

 

 

nection to the database. These errors are generally outside of the control

 

 

 

of the Python scripter.

 

 

IntegrityError

Subclass of DatabaseError for situations that would damage the rela-

 

 

 

tional integrity, such as uniqueness constraints or foreign keys.

 

 

InternalError

Subclass of DatabaseError that refers to errors internal to the database

 

 

 

module, such as a cursor no longer being active.

 

 

ProgrammingError

Subclass of DatabaseError that refers to errors such as a bad table name

 

 

 

and other things that can safely be blamed on you.

 

 

NotSupportedError

Subclass of DatabaseError that refers to trying to call unsupported

 

 

 

functionality.

 

 

 

 

 

Your Python scripts should handle these errors. You can get more information about them by reading the DB API specification. See www.python.org/topics/database/ and http://www.python.org/ peps/pep-0249.html for more information.

Summary

Databases provide a handy means for storing data. You can write Python scripts that can access all the popular databases using add-on modules. This chapter provided a whirlwind tour of SQL, the Structured Query Language, and covered Python’s database APIs.

You also learned about the DBM modules that enable you to persist a dictionary using a variety of DBM libraries. These modules enable you to use dictionaries and transparently persist the data.

In addition, this chapter covered the Python database APIs, which define a standard set of methods and functions that you should expect from all database modules. This includes the following:

A Connection object encapsulates a connection to the database. Use the connect function on the database module to get a new Connection. The parameters you pass to the connect function may differ for each module.

A Cursor provides the main object for interacting with a database. Use the Connection object to get a Cursor. The Cursor enables you to execute SQL statements.

273

TEAM LinG

Chapter 14

You can pass dynamic data as a tuple of values to the Cursor execute method. These values will get filling into your SQL statements, enabling you to create reusable SQL statements.

After performing a query operation, the Cursor object holds the data. Use the fetchone or fetchall methods to extract the data.

After modifying the database, call commit on the Connection to commit the transaction and save the changes. Use the rollback method to undo the changes.

Call close on each Cursor when done. Call close on the Connection when done.

The DB APIs include a defined set of exceptions. Your Python scripts should check for these exceptions to handle the variety of problems that may arise.

Chapter 15 covers XML, HTML and XSL style sheets, technologies frequently used for web development.

Exercises

1.Suppose you need to write a Python script to store the pizza preferences for the workers in your department. You need to store each person’s name along with that person’s favorite pizza toppings. Which technologies are most appropriate to implement this script?

a.Set up a relational database such as MySQL or Gadfly.

b.Use a DBM module such as anydbm.

c.Implement a web-service-backed rich Web application to create a buzzword-compliant application.

2.Rewrite the following example query using table name aliases:

select employee.firstname, employee.lastname, department.name from employee, department

where employee.dept = department.departmentid order by employee.lastname desc

3.The terminate.py script, shown previously, removes an employee row from the employee table; but this script is not complete. There remains a row in the user table for the same person. Modify the terminate.py script to delete both the employee and the user table rows for

that user.

274

TEAM LinG

15

Using Python for XML

XML has exploded in popularity over the past few years as a medium for storing and transmitting structured data. Python supports the wealth of standards that have sprung up around XML, either through standard libraries or a number of third-party libraries.

This chapter explains how to use Python to create, manipulate, and validate XML. It also covers the standard libraries bundled with Python, as well as the popular PyXML library.

What Is XML?

The term XML is bantered around in corporate boardrooms and meetings around the world. Its flexibility and extensibility have encouraged people to think big, advocating XML for everything from a new, formatting-independent semantic code storage mechanism to a replacement for object serialization. But beyond the buzzwords and hype, what is it, really? Is it a panacea for the world’s woes? Probably not. But it is a powerful, flexible, open-standards-based method of data storage. Its vocabulary is infinitely customizable to fit whatever kind of data you want to store. Its format makes it human readable, while remaining easy to parse for programs. It encourages semantic markup, rather than formatting-based markup, separating content and presentation from each other, so that a single piece of data can be repurposed many times and displayed in many ways.

A Hierarchical Markup Language

At the core of XML is a simple hierarchical markup language. Tags are used to mark off sections of content with different semantic meanings, and attributes are used to add metadata about the content.

TEAM LinG

Chapter 15

Following is an example of a simple XML document that could be used to describe a library:

<?xml version=”1.0”?>

<library owner=”John Q. Reader”> <book>

<title>Sandman Volume 1: Preludes and Nocturnes</title> <author>Neil Gaiman</author>

</book>

<book>

<title>Good Omens</title>

<author>Neil Gamain</author> <author>Terry Pratchett</author>

</book>

<book>

<title>”Repent, Harlequin!” Said the Tick-Tock Man</title> <author>Harlan Ellison</author>

</book>

</library>

Notice that every piece of data is wrapped in a tag and that tags are nested in a hierarchy that contains further information about the data it wraps. Based on the previous document, you can surmise that <author> is a child piece of information for <book>, as is <title>, and that a library has an attribute called owner.

Unlike semantic markup languages like LaTeX, every piece of data in XML must be enclosed in tags. The top-level tag is known as the document root, which encloses everything in the document. An XML document can have only one document root.

Just before the document root is the XML declaration: <?xml version=”1.0”?>. This mandatory element lets the processor know that this is an XML document. As of the writing of this book, 1.0 is the only version of XML, so every document will use that version, and this element can just be ignored. If later versions of XML are released, you may need to parse this element to handle the document correctly.

One problem with semantic markup is the possibility for confusion as data changes contexts. For instance, you might want to ship a list of book titles off to a database about authors. However, without a human to look at it, the database has no way of knowing that <title> means a book title, as opposed to an editor’s business title or an author’s honorific. This is where namespaces come in. A namespace is used

to provide a frame of reference for tags and is given a unique ID in the form of a URL, plus a prefix to apply to tags from that namespace. For example, you might create a library namespace, with an identifier of http://server.domain.tld/NameSpaces/Library and with a prefix of lib: and use that to provide a frame of reference for the tags. With a namespace, the document would look like this:

<?xml version=”1.0”?>

<lib:library owner=”John Q. Reader” xmlns:lib=”http://server.domain.tld/NameSpaces/Library”>

<lib:book>

<lib:title>Sandman Volume 1: Preludes and Nocturnes</lib:title> <lib:author>Neil Gaiman</lib:author>

</lib:book>

<lib:book>

<lib:title>Good Omens</lib:title>

276

TEAM LinG

Using Python for XML

<lib:author>Neil Gamain</lib:author> <lib:author>Terry Pratchett</lib:author>

</lib:book>

<lib:book>

<lib:title>”Repent, Harlequin!” Said the Tick-Tock Man</lib:title> <lib:author>Harlan Ellison</lib:author>

</lib:book>

</lib:library>

It’s now explicit that the title element comes from a set of elements defined by a library namespace, and can be treated accordingly.

A namespace declaration can be added to any node in a document, and that namespace will be available to every descendant node of that node. In most documents, all namespace declarations are applied to the root element of the document, even if the namespace isn’t used until deeper in the document. In this case, the namespace is applied to every tag in the document, so the namespace declaration must be on the root element.

A document can have and use multiple namespaces. For instance, the preceding example library might use one namespace for library information and a second one to add publisher information.

Notice the xmlns: prefix for the namespace declaration. Certain namespace prefixes are reserved for use by XML and its associated languages, such as xml:, xsl:, and xmlns:. A namespace declaration can be added to any node in a document, and that namespace will be available to every descendant node of that node.

This is a fairly simple document. A more complex document might contain CDATA sections for storing unprocessed data, comments, and processing instructions for storing information specific to a single XML processor. For more thorough coverage of the subject, you may want to visit http:// w3cschools.org or pick up Wrox Press’s Beginning XML, 3rd Edition (0764570773) by David Hunter et al.

A Family of Standards

XML is more than just a way to store hierarchical data. If that were all there were to it, XML would quickly fall to more lightweight data storage methods that already exist. XML’s big strength lies in its extensibility, and its companion standards, XSLT, XPath, Schema, and DTD languages, and a host of other standards for querying, linking, describing, displaying, and manipulating data. Schemas and DTDs provide a way for describing XML vocabularies and a way to validate documents. XSLT provides a powerful transformation engine to turn one XML vocabulary into another, or into HTML, plaintext, PDF, or a host of other formats. XPath is a query language for describing XML node sets. XSL-FO provides a way to create XML that describes the format and layout of a document for transformation to PDF or other visual formats.

Another good thing about XML is that most of the tools for working with XML are also written in XML, and can be manipulated using the same tools. XSLTs are written in XML, as are schemas. What this means in practical terms is that it’s easy to use an XSLT to write another XSLT or a schema or to validate XSLTs or schemas using schemas.

277

TEAM LinG

Chapter 15

What Is a Schema/DTD?

Schemas and DTDs (Document Type Definitions) are both ways of implementing document models. A document model is a way of describing the vocabulary and structure of a document. It’s somewhat akin to what a DBA does when creating a database. You define the data elements that will be present in your document, what relationship they have to one another, and how many of them you expect. In plain English, a document model for the previous XML example might read as follows: “A library is a collection of books with a single owner. Each book has a title and at least one author.”

DTDs and schemas have different ways of expressing this document model, but they both describe the same basic formula for the document. There are subtle differences between the two, as you shall see later, but they have roughly the same capabilities.

What Are Document Models For?

Document models are used when you want to be able to validate content against a standard before manipulating or processing it. They are useful whenever you will be interchanging data with an application that may change data models unexpectedly, or when you want to constrain what a user can enter, as in an XML-based documentation system where you will be working with hand-created XML rather than with something from an application.

Do You Need One?

In some applications, a document model might not be needed. If you control both ends of the data exchange and can predict what elements you are going to be receiving, a document model would be redundant.

Document Type Definitions

A DTD is a Document Type Definition. These were the original method of expressing a document model and are ubiquitous throughout the Internet. DTDs were originally created for describing SGML, and the syntax has barely changed since that time, so DTDs have had quite a while to proliferate. The W3C (the World Wide Web Consortium, or one of the groups that brings standards to the Internet) continues to express document types using DTDs, so there are DTDs for each of the HTML standards, for Scalable Vector Graphics (SVG), MathML, and for many other useful XML vocabularies.

An Example DTD

If you were to translate the English description of the example library XML document into a DTD, it might look something like the following:

<?xml version=”1.0”?> <!ELEMENT library (book+)> <!ATTLIST library

owner CDATA #REQUIRED

>

278

TEAM LinG

Using Python for XML

<!ELEMENT book (title, author+)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)>

To add a reference to this DTD in the library file discussed before, you would insert a line at the top of the file after the XML declaration that read <!DOCTYPE config SYSTEM “library.dtd”>, where library.dtd was the path to the DTD on your system.

Let’s break this down, one step at a time. The first line, <?xml version=”1.0”?>, tells you that this is going to be an XML document. Technically, this line is optional; DTDs don’t behave like other XML documents, but we’ll get to that later. The next line, <!ELEMENT library (book+)>, tells you that there is an element known as library, which can have one or more child elements of the book type. The syntax for element frequencies and grouping in DTDs is terse, but similar to that of regular expressions. The following table lists element frequency and element grouping operators in DTDs.

Operator

Definition

 

 

?

Specifies zero or one of the preceding elements. For instance, editor? would

 

mean that a book could have an optional editor element.

+

Specifies one or more of the preceding element. As in the previous example,

 

author+ means that a book has one or more authors.

,

Specifies a sequence of elements that must occur in that order. (title,

 

author+) means that the book must have a title, followed by one or more

 

authors, in that order.

(list)

Groups elements together. An operator applied after parentheses applies to all

 

elements in the group. For instance, (author, editor)+ would mean that a

 

document could have one or more authors and one or more editors.

|

Or operator. This operator permits a choice between alternatives. As an exam-

 

ple, (author | editor) would permit a book to have an author or an editor,

 

but not both.

*

Specifies that zero or more of the preceding element or group can appear.

 

(book, CD)* would permit the library to have any number of books and CDs

 

in it, or none at all.

 

 

The next bit is a little more complex:

<!ATTLIST library

owner CDATA #REQUIRED

>

The first line specifies that the library element has a list of attributes. Notice that the attribute list is separate from the library element declaration itself and linked to it by the element name. If the element name changes, the attribute list must be updated to point to the new element name. Next is a list of attributes for the element. In this case, library has only one attribute, but the list can contain an unbounded number of attributes. The attribute declaration has three mandatory elements: an attribute name, an

279

TEAM LinG

Chapter 15

attribute type, and an attribute description. An attribute type can either be a data type, as specified by the DTD specification, or a list of allowed values. The attribute description is used to specify the behavior of the attribute. A default value can be described here, and whether the attribute is optional or required.

DTDs Aren’t Exactly XML

As a holdover from SGML, DTDs are technically not exactly XML. Unlike schemas, they are difficult to manipulate and validate using the same tools as XML. If you apply a document type declaration at the beginning of a DTD, your parser will either ignore it or, more likely, generate a syntax error. Although there is a specification for creating DTDs, there is no document model in the form of a DTD for validating the structure of a DTD. There are tools for validating DTDs, but they are distinct from the tools used to validate XML. On the other hand, there is a document model in the form of a schema against which schemas can be validated using standard XML tools.

Limitations of DTDs

DTDs have a number of limitations. Although it is possible to express complex structures in DTDs, it becomes very difficult to maintain. DTDs have difficulty cleanly expressing numeric bounds on a document model. If you wanted to specify that a library could contain no more than 100 books, you could write <!ELEMENT library (book, book, book, book etc etc)>, but that quickly becomes an unreadable morass of code. DTDs also make it hard to permit a number of elements in any order. If you have three elements that you could receive in any order, you have to write <!ELEMENT book ( ( (author, ((title, publisher) | (publisher, title))) | (title, ((author, publisher) | (publisher, author))) | (publisher, ((author, title) | (title, publisher)))))>, which is beginning to look more like LISP (which is a language with a lot of parentheses) than XML and is far more complicated than it really should be. Finally, DTDs don’t permit you to specify a pattern for data, so you can’t express constructs such as “A telephone number should be composed of digits, dashes, and plus signs.” Thankfully, the W3C has published a specification for a slightly more sophisticated language for describing documents, known as Schema.

Schemas

Schema was designed to address some of the limitations of DTDs and provide a more sophisticated XML-based language for describing document models. It enables you to cleanly specify numeric models for content, describe character data patterns using regular expressions, and express content models such as sequences, choices, and unrestricted models.

An Example Schema

If you wanted to translate the hypothetical library model into a schema with the same information contained in the DTD, you would wind up with something like the following:

<?xml version=”1.0”?>

<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”>

<xs:element name=”library”>

280

TEAM LinG