Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
177
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Numerical Programming

Generally, you can use an array object just as you would an ordinary list. You can insert, append, or delete elements, and the indexing syntax is the same. (Note that in versions of Python earlier than 2.4, an array object is somewhat more limited than a list object.) For example:

>>>a.append(15)

>>>a.extend([20, 17, 0])

>>>print a

array(‘l’, [15, 20, 17, 0])

>>>a[1] = 42

>>>print a

array(‘l’, [15, 42, 17, 0])

>>>del a[2]

>>>print a

array(‘l’, [15, 42, 0])

You can also convert a list or tuple to an array object by passing it to the constructor:

>>>t = (5.6, 3.2, -1.0, 0.7)

>>>a = array.array(“d”, t)

>>>print a

array(‘d’, [5.5999999999999996, 3.2000000000000002, -1.0, 0.69999999999999996])

Here again you see the approximate nature of floating-point values.

In fact, because an array object behaves very much like a list, you can pass it to the same stddev function you wrote previously, and it works just fine:

>>> print stddev(a) 2.50137462208

If you ever need to convert back to an ordinary tuple or list, just past the array to the tuple or list constructor:

>>>back_again = tuple(a)

>>>print back_again

(5.5999999999999996, 3.2000000000000002, -1.0, 0.69999999999999996)

Compared to lists, array objects have the following advantages and disadvantages:

All elements of an array are the same type.

Like a list, an array is one-dimensional.

The array module is part of Python’s standard library (but don’t forget to import it).

An array object cannot automatically be pickled.

An array object stores its values much more efficiently than a list of numbers does. However, computations on the numbers are generally not much faster, as computations are performed using Python’s normal number objects.

421

TEAM LinG

Chapter 19

The numarray Package

This chapter ends with a brief look at one more array package, numarray. The numarray package is much more sophisticated than the array module, and supports multidimensional arrays and operations on entire arrays. If you are familiar with an array manipulation package such as Matlab, you will recognize many of the features of numarray.

Unfortunately, numarray is not part of Python’s standard library, so you must install it yourself. Fortunately, it is also free software, and easy to download and install. If you work with arrays of numbers and would like to use Python for this, it’s definitely worth the trouble of installing numarray because of the rich set of features it provides.

The web page for numarray is at www.stsci.edu/resources/software_hardware/numarray.

You can browse or download the documentation, which is quite extensive. The Download link takes you to the package’s SourceForge page, from which you can download the full source code, including installation instructions. If you are using Windows, your life is easier: You can download an installer from the same place. If you are using GNU/Linux, you can download RPM packages from www.python.org/pyvault/.

After you have installed numarray correctly, you should be able to import it:

>>> import numarray

To explain all the features of numarray would fill an entire book by itself. Instead, here is a brief tour, which demonstrates how to write the stddev function using numarray. It is hoped that this will whet your appetite to learn more about the package on your own.

Using Arrays

The array type in numarray is also called array. You can convert a tuple or list to an array object by passing it to the constructor:

>>>a = numarray.array([5.6, 3.2, -1.0, 0.7])

>>>print a

[ 5.6 3.2 -1. 0.7]

Notice that when it printed the array, numarray omitted the commas between elements, which produces output similar to the notation used in mathematics. The elements of an array object must all be of the same type, just as in the array module, but numarray guesses the type from the elements you give it. You can ask to see what type it chose with the type method:

>>> print a.type() Float64

The “Float64” type stores a 64-bit floating-point value, suitable for Python’s float objects.

A key feature of numarray is that you can perform operations on entire arrays. For example, to double all of the values in an array, just multiply by two, like so:

>>> print 2 * a

[ 11.2 6.4 -2. 1.4]

422

TEAM LinG

Numerical Programming

You can perform operations on two arrays, too. The operation is performed elementwise — that is, on pairs of corresponding elements from the two arrays:

>>>b = numarray.array([0.5, 0.0, -1.0, 2.0])

>>>print 2 * a + b

[ 11.7 6.4 -3. 3.4]

You can also create multidimensional arrays by passing to the constructor lists of lists, lists of lists of lists, and so on. A two-dimensional, three-by-three array looks like this:

>>>m = numarray.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

>>>print m

[[1 2 3] [4 5 6] [7 8 9]]

Observe how each of the inner lists became a row in the array. Here, because all of the numbers in the array are integers, numarray uses an integer type:

>>> print m.type() Int32

You can index a one-dimensional array just as you would a list. To index a higher-dimensional array, separate the indices by commas:

>>>print m[0,1]

2

>>>m[1,2] = 12

>>>print m

[[ 1

2

3]

[

4

5

12]

[

7

8

9]]

The shape of an array is a tuple containing the number of elements along each dimension. A one-dimensional array has a shape of length one, a two-dimensional array has a shape of length two, and so on. Use the shape attribute to obtain this:

>>>print a.shape

(4,)

>>>print m.shape (3, 3)

Computing the Standard Deviation

Now write stddev using numarray:

from math import sqrt import numarray

def stddev2(numbers):

n, = numbers.shape

sum = numarray.sum(numbers)

sum_of_squares = numarray.sum(numbers * numbers) return sqrt(sum_of_squares / n - (sum / n) ** 2)

423

TEAM LinG

Chapter 19

Here is a line-by-line description of how this function works:

1.The first line extracts the length of the array. We could have instead written n = len(numbers), but we chose to exploit the fact that the shape of a one-dimensional array is a tuple whose only element is the length of the array.

2.The second line uses the sum function in numarray, which adds numbers in the array.

3.The third line uses the sum function again, but this time on numbers * numbers. What is this object? It’s the elementwise product of the array with itself, which, if you think about it, is an array whose elements are the squares of the elements of numbers. Calling sum on this array computes the sum of squares.

4.The last line computes the standard deviation as before.

Make sure it produces the same result:

>>> print stddev2(a) 2.50137462208

Why should you bother installing, learning, and using numarray?

As with the array module, its arrays are stored efficiently. This matters a lot if you are working with very large data sets.

It supports multidimensional arrays.

An array object from numarray can be pickled. (Make sure the numarray package is available when unpickling.)

Notice that our numarray version of stddev2 contains no explicit operations on the individual elements of the array. Computations are performed by the sum function and the multiplication operator * acting on arrays. As a result, numarray performs the computations on individual elements internally, without resorting to arithmetic using Python objects. For large arrays, this executes very much faster. (For example, on one of the authors’ computers, stddev2 runs about 20 times faster than stddev for an array of a million numbers.)

In addition, the code for stddev2 is somewhat shorter than the code for stddev, because the former doesn’t contain an explicit loop. Computations on array elements can usually be expressed more simply using the functions and methods in numarray — in some cases, very much more simply.

Summar y

In this chapter, you learned how to perform many kinds of numerical computations in Python. You experimented first with Python’s built-in integer and floating point number types and saw how to use Python’s built-in arithmetic operations. Then you moved on to higher mathematics, using the special functions in the math module and Python’s complex number type.

Finally, you learned three different ways of representing arrays of numbers: The simplest method is to use a list or tuple of numbers. For more efficient storage, use the array module included with Python. For greatest flexibility and a host of sophisticated features for programming with arrays, download and install the numarray package.

424

TEAM LinG

Numerical Programming

Exercises

1.Write a function that expresses a number of bytes as the sum of gigabytes, megabytes, kilobytes, and bytes. Remember that a kilobyte is 1024 bytes, a megabyte is 1024 kilobytes, and so on. The number of each should not exceed 1023. The output should look something like this:

>>>print format_bytes(9876543210)

9 GB + 203 MB + 5 KB + 746 bytes

2.Write a function that formats an RGB color in the color syntax of HTML. The function should take three numerical arguments, the red, green, and blue color components, each between zero and one. The output is a string of the form #RRGGBB, where RR is the red component as a value between 0 and 255, expressed as a two-digit hexadecimal number, and GG and BB likewise for the green and blue components.

For example:

>>>print rgb_to_html(0.0, 0.0, 0.0) # black #000000

>>>print rgb_to_html(1.0, 1.0, 1.0) # white #ffffff

>>>print rgb_to_html(0.8, 0.5, 0.9) # purple #cc80e6

3.Write a function named normalize that takes an array of float numbers and returns a copy of the array in which the elements have been scaled such that the square root of the sum of their squares is one. This is an important operation in linear algebra and other fields.

Here’s a test case:

>>> for n in normalize((2.2, 5.6, 4.3, 3.0, 0.5)):

... print “%.5f” % n,

...

0.27513 0.70033 0.53775 0.37518 0.06253

For bonus geek credit, implement it using numarray.

425

TEAM LinG

TEAM LinG

20

Python in the Enterprise

Enterprise applications are software that address the needs of a company or other organization that has a need for a portion of the business to be run through that software. Of course, this definition could encompass nearly any kind of software, but the software usually thought of as enterprise business software is that which supports the modeling of business processes. Enterprise software relies on infrastructure platforms — the flexible components that applications are built to rely on, such as relational databases, third-party libraries, high-availability suites, and more.

A business process is any repeatable way for an organization to accomplish a task that is part of its business or that supports its business. Business processes usually start as undocumented, ad hoc actions that have grown out of trial-and-error and at some point have become essential to getting work done; sometimes they exist only in the heads of particular individuals who simply know that, for instance, “the ABC forms are always sent to Mr. Kofax in HR on Thursday afternoons.”

This kind of undocumented process is dangerous when an organization has grown, because if it is lost for any reason (due to a long vacation, for instance, or a physical disaster), the business will suffer. This loss will involve associated costs due to missed deadlines, confusion, and disorganization when employees leave the company or simply move on to new duties, and their replacements are forced to resolve problems reactively instead of being able to rely on documented processes to get work done. In addition, brand-new regulatory requirements for business auditing may even make undocumented processes illegal in some cases. Therefore, it’s not at all surprising that an entire industry is devoted to the understanding, documentation, and formalization of business processes.

There are benefits beyond legal compliance. After a process has been documented, it becomes amenable to being supported by software and to being modeled using relatively standard platforms. Of course, several legacy applications support business processes — the larger the company, the more elaborate and often older they are. These legacy applications include standard infrastructure, such as relational database systems and e-mail systems, and custom software ranging from simple periodic processing scripts and standardized batch jobs, all the way through vast online architectures and bullet-proofed enterprise platforms.

TEAM LinG

Chapter 20

This chapter will show you the following:

Some of the main types of enterprise applications and how they can be useful, not only to the Fortune-500 CEO, but to the sole proprietor, and to any organization in between

Some of the regulatory frameworks out there that are making people think harder about their business processes lately

How you can use Python and the open-source workflow toolkit wftk, either to talk to any existing enterprise platform already available or simply to roll your own if you’re on a tight budget but still need to get organized

You’ll also be introduced to a few fairly simple applications of wftk and Python, which you can easily use as a starting point for more elaborate enterprise architecture.

Enterprise Applications

You now know (if you didn’t already) that enterprise applications take advantage of an infrastructure for modeling the organization and its activities. In general, things like organization charts, reporting trees, and activities that are shown as flowcharts are the familiar forms of this kind of modeling. When they’re being created, these models involve three main types of objects, which are then reflected in the infrastructure involved:

Documents and data

People, their authority to take action, and their groups (organizational structure)

Actions to be taken

Keep these in mind while you take a closer look at the subdivisions of enterprise infrastructure systems.

Document Management

A document is basically anything you can think of as being printed on a piece of paper. It might be a letter or a fax; it could be a 100-page Word document or an Excel spreadsheet; it might be a file of programming code; or it could be a simple e-mail message.

The default way to handle documents when you’re actually producing them is by storing them in files in your local computer or on a shared network folder, but there are problems with this approach:

The computers’ file systems are often on local machines or have different access paths from different areas of an organization (so a shared file in New York could be in \\local_server\file, but in Seattle it is in \\NYC\public\local_server\file). This creates a barrier to sharing files stored in them because each office, or even each department within a company, needs to know various, different, and specific information that they can’t usefully share with others in the organization.

The name of a file in a file system is arbitrary. If some system exists only in a person’s head, then files could be named something like FaxFromBob20050105.tiff — and this might be easy to figure out. But what if you just clicked Save and the file automatically received the name

428

TEAM LinG

Python in the Enterprise

7EF78S.tiff? You’ll almost certainly lose track of it; and, of course, anybody else who needs your files doesn’t stand a chance (think about what it would mean if this fax had great legal significance or was confirmation of a multi-million-dollar order!)

A file system has no way to store extra information about a document, such as comments, date of origination, process status (“needs handling” or “old business”, for instance), which means that it also can’t be searched for documents based on this desirable information. Nor is there a way to place it in its context; given the document, why do you have it? What is it for? Is it part of an ongoing customer complaint? A tax audit? Speaking of audits, do you know which of your files an auditor needs to see? Are you sure?

If the document is something you’re working on, or is something that changes (think of your organization’s employee manual, or the current list of catering service vendors — anything that can change over time), there is no way to tell when it changed, why it changed, whether it changed, or how it changed. You can’t tell what it might have looked like last November when you had that meeting about it, and you can’t tell why or when somebody added or removed content from it.

All of these business-level problems with simply storing files on file systems are reasons for you to get interested in thinking about document management, which is really a tautology. If you work on documents at all, you’re doing document management just by naming, writing, changing, and saving them. Even if you’re writing programs, source code is simply another type of document.

Therefore, the question is really this: How well are you doing document management? As soon as you’re talking about any organization of more than one person, the complexity and difficulty of managing documents well increases, multiplying many times for each person you add to the mix.

The Evolution of Document Management Systems

Documents aren’t the only data managed on an enterprise level. You also need to provide care and feeding of the infrastructure components. One of the first types of enterprise infrastructure software was the database; and the modern databases, relational databases, are a type of software that nearly all programmers have dealt with (and so have you if you’ve followed through these chapters in order, as Chapter 14 introduces you to databases). By standardizing the storage of record-based data, the advent of database management systems changed the way software was designed, by making it possible to concentrate on the specific ways in which data could be manipulated, related, and reported, while leaving the nittygritty of actually storing it to the database software.

Relational databases originally hit the market in the late 70s; and during the 80s, document management software that worked with this class of software also made its debut. Many early document management systems were basically standalone software. Sometimes you’ll see the acronym EDMS; the “E” stands for “electronic,” because the marketing strategy at the time was to replace paper filing systems with technology solutions that were purely electronic filing systems. As such, the design of these systems often reflected the assumption that document management would be the motivation for a given organization to buy computers at all! It wasn’t really a consideration that the organization had any separate interest in databases or their management.

By the early 90s, computers had gone through the global workplace like wildfire; and at the same time, all document management systems had moved to the model of utilizing the relational database for their storage of data about documents and leveraging their ability to store data, index it, and retrieve it easily.

429

TEAM LinG

Chapter 20

Document management systems usually use the term metadata to refer to data about documents — that is, information specifying things such as “who wrote it,” “when was it created,” “who modified it last,” “who authorized it,” and so on, as they consider the “data” they manage to be the documents themselves.

What You Want in a Document Management System

In short, a document management system can be thought of as a special case of a relational database system. This is because a document management system stores information about the document in one or more database records, while the content of the document will be attached to the record and can usually be retrieved. This much functionality can easily be implemented in, say, Oracle, by using Oracle’s standard BLOB or CLOB mechanism for storing large amounts of binary data (BLOB means Binary Large OBject) or character data (CLOB means Character Large OBject). If you store your documents in an Oracle database (or any RDBMS with LOB support — that is, Large Object support), you already have some advantages over the file system. What you gain by doing this is that you don’t need to worry as much about naming your files and your metadata (there’ll be a structure in the database) or the capability to store extra data about your documents (the database can store metadata in a column alongside the file), and your storage location is global (all enterprise databases can be connected to remotely), searchable, and, from the point of view of an organization that can afford to ensure that someone with the right expertise is on staff, easy to manage.

Of course, you still don’t have that list of versions of each document or a way to keep track of who is currently working on a new version. These features are the final basic requirements for realizing true document management as it’s practiced today.

Maintaining the version list of a document is pretty straightforward (and if you use CVS, Subversion, or some other version control system for your programming work, you already know how this can work). The principle is simple: When a user wants to work on a document, she registers herself as the user who is working on a new version. When this has been done, the user is referred to as having checked out the document, which indicates that she can work on it; and when she’s done, she performs a corresponding check in action when the new version is no longer being edited. The combination of the checkout and checkin enables the management system to know what version the document started at, and if other people have also, in parallel, checked out, edited, and checked in the same document, both users can be informed of the conflict and they can resolve it. There may be various draft versions of a document, with the currently valid production version being flagged in some way. In addition, there are often more or less elaborate schemes for avoiding or settling “conflicts” that result when more than one person is trying to work on the document, such as branching of versions, merging variants, and so on.

The key functionality inherent in document management is simply this: A document management system provides an organization within one unified system to track all of an organization’s documents, whether they’re computer-generated text files such as program source code, more complex textlike documents such as Word documents, scanned images from paper, or even just information about actual paper stored in a box somewhere. Once it’s actively used, that unified system can tell you the current status of any document, including when it was created, who created it, and when and why and can store whatever other information may be pertinent.

Because a document management system knows everything that an organization considers to be important about a document, including its history, it can also be used to control the retention of documents.

430

TEAM LinG