Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
158
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

 

 

Accessing Databases

 

 

 

 

dumbdbm

Uses a simple, but portable, implementation of the DBM library

 

gdbm

Uses the GNU DBM library

 

whichdb

Guesses which DBM module to use to open a file

 

 

 

All of these libraries exist because of the history of the DBM library. Originally, this library was only available on commercial versions of Unix. Free versions of Unix, and later Linux, Windows, and so on, could not use the DBM library. This lead to the creation of alternative libraries, such as the Berkeley Unix library and the GNU gdbm library.

With all the incompatible file formats, this plethora of libraries can be a real pain. The anydbm module, though, offers a handy alternative to choosing a specific DBM module. With the anydbm module, you can let it choose for you. In general, the anydbm module will choose the best implementation available on your system when creating a new persistent dictionary. When reading a file, the anydbm module uses the whichdb module to make an informed guess as to which library created the data file.

Unless you need a specific advanced feature of one of the DBM libraries, use the anydbm module.

Creating Persistent Dictionaries

All of the DBM modules support an open function to create a new dbm object. Once opened, you can store data in the dictionary, read data, close the dbm object (and the associated data file or files), remove items, and test for the existence of a key in the dictionary.

To open a DBM persistent dictionary, use the open function on the module you choose. For example, you can create a persistent dictionary with the anydbm module.

Try It Out

Creating a Persistent Dictionary

Enter the following code and name your file dbmcreate.py:

import anydbm

db = anydbm.open(‘websites’, ‘c’)

# Add an item.

db[‘www.python.org’] = ‘Python home page’

print db[‘www.python.org’]

# Close and save to disk. db.close()

When you run this script, you’ll see output like the following:

$ python dbmcreate.py

Python home page

251

TEAM LinG

Chapter 14

How It Works

This example uses the recommended anydbm module.

The open function requires the name of the dictionary to create. This name gets translated into the name of the data file or files that may already be on the disk. (The DBM module may — though not always — create more than one file, usually a file for the data and one for the index of the keys.) The name of the dictionary is treated as a base filename, including the path. Usually, the underlying DBM library will append a suffix such as .dat for data. You can find the file yourself by looking for the file named websites, most likely in your current working directory.

You should also pass the optional flag. This flag is not optional for the dbhash module. The following table lists the available flags.

Flag

Usage

 

 

c

Opens the data file for reading and writing, creating the file if needed.

n

Opens the file for reading and writing, but always creates a new empty file.

 

If one already exists, it will be overwritten and its contents lost.

w

Opens the file for reading and writing, but if the file doesn’t exist it will not

 

be created.

You can also pass another optional parameter, the mode. The mode holds a set of Unix file permissions. See Chapter 8 for more on opening files.

The open method of the dbm modules returns a new dbm object, which you can then use to store and retrieve data.

After you open a persistent dictionary, you can write values as you normally would with Python dictionaries, as shown in the following example:

db[‘www.python.org’] = ‘Python home page’

Both the key and the value must be strings and can’t be other objects, like numbers or python objects. Remember, however, that if you want to save an object, you can serialize it using the pickle module, as you saw in Chapter 8.

The close method closes the file or files and saves the data to disk.

Accessing Persistent Dictionaries

With the DBM modules, you can treat the object you get back from the open function as a dictionary object. Get and set values using code like the following:

db[‘key’] = ‘value

value = db[‘key’]

Remember that the key and the value must both be text strings.

252

TEAM LinG

Accessing Databases

You can delete a value in the dictionary using del:

del db[‘key’]

You can determine whether a particular key is stored in the dictionary using if:

if db[‘key’] != None:

print ‘Key exists in dictionary’

If you use the dbhash module, you can use the following syntax as an alternate means to determine whether a particular key is stored in the dictionary:

if (‘key’ in db):

print ‘Key exists in dictionary’

This syntax works with the dbhash type of DBM module. It does not work with all other DBM modules.

The keys method returns a list of all the keys, in the same way it would with a normal dictionary:

for key in db.keys():

# do something...

The keys method may take a long time to execute if there are a huge number of keys in the file. In addition, this method may require a lot of memory to store the potentially large list that it would create with a large file.

You can use the following script as a guide for how to program with DBM persistent dictionaries.

Try It Out

Accessing Persistent Dictionaries

Enter the following script and name the file dbmaccess.py:

import anydbm import whichdb

# Check the type.

print “Type of DBM file =”, whichdb.whichdb(‘websites’)

# Open existing file.

db = anydbm.open(‘websites’, ‘w’)

#Add another item. db[‘www.wrox.com’] = ‘Wrox home page’

#Verify the previous item remains. if db[‘www.python.org’] != None:

print ‘Found www.python.org’ else:

print ‘Error: Missing item’

#Iterate over the keys. May be slow.

253

TEAM LinG

Chapter 14

# May use a lot of memory. for key in db.keys():

print “Key =”,key,” value =”,db[key]

del db[‘www.wrox.com’]

print “After deleting www.wrox.com, we have:”

for key in db.keys():

print “Key =”,key,” value =”,db[key]

# Close and save to disk. db.close()

When you run this script, you’ll see output similar to the following:

$ python dbmaccess.py Type of DBM file = dbhash Found www.python.org

Key = www.wrox.com value = Wrox home page Key = www.python.org value = Python home page After deleting www.wrox.com, we have:

Key = www.python.org value = Python home page

How It Works

This script works with a small database of web site URLs and descriptions. You need to first run the dbmcreate.py example, shown previously. That example creates the DBM file and stores data in the file. The dbmaccess.py script then opens the pre-existing DBM file. The dbmaccess.py script starts out using the whichdb.whichdb function to determine the type of DBM file created by the previous example, dbmcreate.py, for the DBM persistent dictionary websites. In the example here, it’s correctly determined that the type is dbhash.

The dbmaccess.py script then opens the persistent dictionary websites in read/write mode. The call to the open function will generate an error if the necessary data file or files do not exist on disk in the current directory.

From the previous example, dbmcreate.py, there should be one value in the dictionary, under the key www.python.org. This example adds the Wrox web site, www.wrox.com, as another key.

The script verifies that the www.python.org key exists in the dictionary, using the following code:

if db[‘www.python.org’] != None: print ‘Found www.python.org’

else:

print ‘Error: Missing item’

Next, the script prints out all of the keys and values in the dictionary:

for key in db.keys():

print “Key =”,key,” value =”,db[key]

Note that there should be only these two entries.

254

TEAM LinG

Accessing Databases

After printing out all of the entries, the script removes one using del:

del db[‘www.wrox.com’]

The script then prints all of the keys and values again, which should result in just one entry, as shown in the output.

Finally, the close method closes the dictionary, which involves saving all the changes to disk, so the next time the file is opened it will be in the state you left it.

As you can see, the API for working with persistent dictionaries is incredibly simple because it works like files and like dictionaries, which you’re already familiar with.

Deciding When to Use DBM and When to Use a Relational Database

The DBM modules work when your data needs can be stored as key/value pairs. You can store more complicated data within key/value pairs with some imagination — for instance, by creating formatted strings that use a comma or some other character to delimit items in the strings, both on the key and the value part of the dictionary. This can be useful, but it can also be very difficult to maintain, and it can restrict you because your data is stored in an inflexible manner. Another way that you can be limited is technical: Note that some DBM libraries limit the amount of space you can use for the values (sometimes to a maximum of 1024 bytes, which is very, very little).

You can use the following guidelines to help determine which of these two types of data storage is appropriate for your needs:

If your data needs are simple, use a DBM persistent dictionary.

If you only plan to store a small amount of data, use a DBM persistent dictionary.

If you require support for transactions, use a relational database. (Transactions are when more than one thing happens at once — they let you keep your data from getting changed in one place but not in another; you get to define what happens concurrently with transactions.)

If you require complex data structures or multiple tables of linked data, use a relational database.

If you need to interface to an existing system, use that system, obviously. Chances are good this type of system will be a relational database.

Unlike the simple DBM modules, relational databases provide a far richer and more complex API.

Working with Relational Databases

Relational databases have been around for decades so they are a mature and well-known technology. People who work with relational databases know what they are supposed to do, and how they are supposed to work, so relational databases are the technology of choice for complex data storage.

255

TEAM LinG

Chapter 14

In a relational database, data is stored in tables that can be viewed as two-dimensional data structures. The columns, or vertical part of the two-dimensional matrix, are all of the same type of data; like strings, numbers, dates, and so on. Each horizontal component of the table is made up of rows, also called records. Each row in turn is made up of columns. Typically, each record holds the information pertaining to one item, such as an audio CD, a person, a purchase order, an automobile, and so on.

For example, the following table shows a simple employee table.

empid

firstname

lastname

department

manager

phone

 

 

 

 

 

 

105

Peter

Tosh

2

45

555-5555

201

Bob

Marley

1

36

555-5551

 

 

 

 

 

 

This table holds six columns:

empid holds the employee ID number. Relational databases make extensive use of ID numbers where the database manages the assignment of unique numbers so that each row can be referenced with these numbers to make each row unique (even if they have identical data). We can then refer to each employee by the ID number. The ID alone provides enough information to look up the employee.

firstname holds the person’s first name.

lastname holds the person’s last name.

department holds the ID of the department in which the employee works. This would likely be a numeric ID of the department, where departments are defined in a separate table that has a unique ID for each department.

manager holds the employee ID of the manager of the given employee. This is sort of selfreferential, because in this example, a manager is actually an employee.

phone holds the office phone number.

In real life, a company would likely store a lot more information about an employee, such as a taxation authority identification number (social security number in the U.S.), home address, and more, but not anything that’s really different in principle to what you’ve already seen.

In this example, the column empid, the employee ID, would be used as the primary key. A primary key is a unique index for a table, where each element has to be unique because the database will use that element as the key to the given row and as the way to refer to the data in that row, in a manner similar to dictionary keys and values in Python. So, each employee needs to have a unique ID number, and once you have an ID number, you can look up any employee. So, the empid will act as the key into this table’s contents.

The department column holds an ID of a department — that is, an ID of a row in another table. This ID could be considered a foreign key, as the ID acts as a key into another table. (In databases, a foreign key has a much more strict definition, so it’s okay to think of it this way.)

For example, the following table shows a possible layout for the department table.

256

TEAM LinG

 

 

Accessing Databases

departmentid

name

Manager

1

development

47

2

qa

32

In these examples, the employee Peter Tosh works for department 2, the qa, or quality assurance, department in a dynamic world-class high-quality software development firm. Bob Marley works for department 1, the development department.

In a large enterprise, there may be hundreds of tables in the database, with thousands or even millions of records in some tables.

Writing SQL Statements

The Structured Query Language, or SQL, defines a standard language for querying and modifying databases.

You can pronounce SQL as “sequel” or “s-q-l.”

SQL supports the basic operations listed in the following table.

Operation

Usage

 

 

Select

Perform a query to search the database for specific data.

Update

Modify a row or rows, usually based on a certain condition.

Insert

Create new rows in the database.

Delete

Remove a row or rows from the database.

 

 

In general, these basic operations are called QUID, short for Query, Update, Insert, and Delete, or CRUD, short for Create, Read, Update, and Delete. SQL offers more than these basic operations, but for the most part, these are the majority of what you’re going to use to write applications.

If you are not familiar with SQL, look at a SQL book or search on the Internet. You will find a huge amount of tutorial material. You may also look at the web site for this book for more references to SQL resources.

SQL is important because when you access databases with the Python DB API, you must first create SQL statements and then execute these statements by having the database evaluate them. You then retrieving the results and use them. Thus, you will find yourself in the awkward position of using one language, Python, to create commands in another language, SQL.

257

TEAM LinG

Chapter 14

The basic SQL syntax for the CRUD operations follows:

SELECT columns FROM tables WHERE condition ORDER BY columns ascending_or_descending

UPDATE table SET new values WHERE condition

INSERT INTO table (columns) VALUES (values)

DELETE FROM table WHERE condition

In addition to this basic look at the available syntax, there are many more parameters and specifiers for each operation that are optional. You can still use them with Python’s DB API if you’re familiar with SQL.

To insert a new row in the employee table, using the previous employee example, you can use a SQL query like the following (even though it’s adding data and not getting data, the convention is that all SQL commands or statements can also be called queries):

insert into employee (empid, firstname, lastname, manager, dept, phone)

values (3, ‘Bunny’, ‘Wailer’, 2, 2, ‘555-5553’)

In this example, the first tuple (it’s useful to think of these in Python terms, even though SQL will give these different names) holds the names of the columns in the order you are using for inserting your data. The second tuple, after the keyword values, holds the data items in the same order. Notice how SQL uses single quotes to delimit strings, and no quotes around numbers. (The phone number is different — it’s actually a string because it has to be able to contain nonnumbers, like dashes, periods, and plus signs, depending on how the data is entered.)

With queries, you can use shortcuts such as * to say that you want an operation to be performed using all of the columns in a table. For example, to query all of the rows in the department table, showing all of the columns for each row, you can use a query like the following:

select * from department

Note that SQL is not case-sensitive for its keywords, such as SELECT and FROM. But, some databases require table and column names to be all uppercase. It is common, therefore, to see people use SELECT and FROM and other operations in all capital letters to make them easily distinguished from other parts of the query.

This SQL statement omits the names of the columns to read and any conditions that would otherwise narrow down the data that would be returned. Thus the query will return all of the columns (from the *) and all of the rows (due to there being no where clause).

You can perform a join with the select command, to query data from more than one table, but present it all in a single response. It’s called a join because the data from both tables will be returned as though it was queried from a single table. For example, to extract the department name with each employee, you could perform a query like the following (all of which would need to be in one string to be a single query):

select employee.firstname, employee.lastname, department.name from employee, department

where employee.dept = department.departmentid order by lastname desc

258

TEAM LinG

Accessing Databases

In this example, the select statement requests two columns from the employee table (the firstname and the lastname, but these are specified as coming from employee by the convention of specifying the table name and the column name in the table) and one from the department table (department.name). The order by section of the statement tells the database to order the results by the value in the lastname column, in descending order.

To simplify these queries, you can use aliases for the table names, which make them easier to type and to read (but don’t change the logic or the syntax of your queries). For example, to use the alias e with the employee table, you can start a query as follows:

select e.firstname, e.lastname from employee e

...

In this case, you must place the alias, e, after the table name in the from clause. You can also use the following format with the optional key word as, which could be easier for you to read:

select e.firstname, e.lastname from employee as e

...

To modify (or update) a row, use a SQL statement like the following:

update employee set manager=55 where empid=3

This example modifies the employee with an ID of 3 by setting that employee’s manager to the employee with an ID of 55. As with other queries, numbers don’t need to have quotes around them; however, strings would need to be quoted with single quotes.

To delete a row, use a SQL statement like the following:

delete employee where empid=42

This example deletes the employee with an ID of 42 but doesn’t affect anything else in the database.

Defining Tables

When you first set up a database, you need to define the tables and the relations between them. To do this, you use the part of the SQL language called the DDL, or Data-Definition Language. (It defines the structure of your tables — get it?) DDL basics are pretty simple, where you use one operation to create tables, and another one to remove them:

CREATE TABLE tablename (column, type column type, . . . )

DROP TABLE tablename

There is also an ALTER TABLE command to modify an existing table, but you won’t need to do that for now. When you want to use this, a dedicated SQL book or web page will have more about this command.

Unfortunately, SQL is not an entirely standard language, and there are parts of it that each database doesn’t do the same. The DDL remains a part of SQL that has not been standardized. Thus, when

259

TEAM LinG

Chapter 14

defining tables you will find differences between the SQL dialects supported by the different databases, though the basics concepts are the same.

Setting Up a Database

In most cases when you’re the programmer, you will already have a database that’s up and running, perhaps even a database chosen by some other organization that you’re going to have to use. For example, if you host your web site with a web site hosting company that provides bells and whistles, like a database, your hosting package may include access to the MySQL database. If you work for a large organization, your IT department may have already standardized on a particular database such as Oracle, DB/2, Sybase, or Informix. These latter packages are likely present in your workplace if you create enterprise applications with Python.

If you have no database at all, yet still want to work on the examples in this chapter, then a good starting database is Gadfly. The main virtues of Gadfly include the fact that the database is written in Python, so Gadfly can run on any platform where Python runs. In addition, Gadfly is simple and small, but functional. This makes it a great candidate for your experimentation while you’re learning, even if you’ve got another database available to you. Just keep in mind that each database has its own quirks.

The examples in this chapter were written to work with Gadfly so that you can follow them without any external infrastructure being needed. You can easily modify these examples, though, to work with a different database. That’s one of the great aspects of the Python DB API.

Download the ZIP file for the latest Gadfly release from http://gadfly.sourceforge.net/. As with other Python modules, you can install Gadfly with the following steps:

1.Unpack the file. (You can use Unzip on Unix, Winzip or something similar on Windows. Make sure to use the options that will create the directory structure that’s embedded in the zip file.)

2.Change to the gadflyZip directory.

3.Run the command python setup.py install.

For example on a Linux or Unix platform (such as Mac OS/X):

$ python setup.py install

When you run this command, you may need administrator or root permissions to install the Gadfly scripts in the system-wide location alongside the Python installation.

Once you have installed the Gadfly modules, you need to create a database. This part of working with a database is not standardized as part of the DB API, so you need to write some Python code that is specific to the Gadfly database to handle this.

If you are working with another database, such as SQL Server, chances are good that a database has already been created. If not, follow the instructions from your database vendor. (A lot of the time, you can get help on tasks like this from your Database Administrator, or DBA, who would really rather have you working on a test database instead of on a production database.)

With Gadfly, creating a database is rather easy.

260

TEAM LinG