Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
177
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Python in the Enterprise

4.Now run the storage script:

C:\projects>python simplemysql.py [‘simple’, ‘mtest’]

[‘1’]

<record id=”1” list=”mtest” key=”1”> <field id=”id”>1</field>

<field id=”entry”>2005-03-12 20:25:51</field> <field id=”body”>this is a test</field> </record>

[‘1’, ‘2’]

<rec list=”mtest”>

<field id=”body”>this is a test value</field> <field id=”entry”>2005-03-12 20:42:09</field> </rec>

this is a test value

How It Works

Except for the specification of the list storage, this example is simply a combination of the code from the previous two examples, cleaned up a little because you already know what it’s doing. This can be pretty much the same because after you have defined a context for the wftk, you no longer have to worry about where things are stored because the wftk interfaces make them all work for you.

Although this example doesn’t demonstrate it, there is one significant difference between storage in MySQL and storage in a local directory: If you added an extra field not defined in the table, similar to the “extra” field in the storage example shown earlier, the MySQL adapter will simply ignore it when writing it to the database. This may have surprising consequences later when the record is retrieved again — simply put, your data would be lost.

There is a feature of the MySQL adaptor that can improve the situation: Any fields not included in the table definition can be stored as XML in a designated BLOB field. If you are defining your MySQL tables to suit wftk, this can be a good solution, but if you need to work with an existing schema, adding a BLOB field for this purpose may not be an option. The next example shows an alternative way of dealing with this kind of requirement.

Try It Out

Storing and Retrieving Documents

Storage in relational databases gives you some nice advantages, at the cost of some disadvantages. The most obvious advantage is that you have the selection and ordering facilities of the database available when searching for data, as you saw in the last code example. The default local directory storage adapter is very weak in this regard; although it does have some rudimentary logic for searching and sorting, there’s not much point in rewriting a database when it’s so simple to install a professionally designed database system initially.

However, a relational database has its own weaknesses. Chief among them is the rigidity of its schema — its data format. A table in an RDBMS must always have the same records, and there are only

limited facilities available for storage of attachments. Of course, BLOBs or similar structures can often be used, depending on requirements, but sometimes it’s convenient to combine the strengths of the default storage format (completely flexible storage in local files) with the relational database (good searching capabilities.) The mechanism for doing this is called an index.

441

TEAM LinG

Chapter 20

An index is an auxiliary storage mechanism to store selected fields from the main list. If an index is set on a list, the index is used to perform key generation and can also be used to perform searching and sorting. Thus, using a relational database to store the index, which refers to data that is in directory storage, can give you some real advantages, depending on how scalable the result needs to be and the resources you have available. In this code example, you’ll use just such a setup to create a simple document management system:

1.Using the text editor of your choice, open the file system.defn in the example repository from the previous examples and add a new document storage list to the repository as follows (you can leave in place the lists that you defined previously, but you’re not going to be using them anymore):

<repository loglevel=”6”>

<list id=”docs” list-from=”docindex”> <field id=”id”/>

<field id=”title”/>

<field id=”descr” type=”text”/> <field id=”content” type=”document”/>

<index id=”docindex” storage=”mysql:wftk” table=”docindex” order=”title”>

<field

id=”id” special=”key”/>

<field

id=”title”/>

<field

id=”descr”/>

<field

id=”created_by” from=”content.created_by”/>

<field

id=”created_on” from=”content.created_on”/>

<field

id=”edited_by” from=”content.edited_by”/>

<field

id=”edited_on” from=”content.edited_on”/>

<field

id=”size” from=”content.size”/>

</index>

 

</list>

 

<list id=”_users” storage=”here”>

<user id=”me” password=”x” name=”John Q. User”/> <user id=”you” password=”x” name=”Jane D. User”/> </list>

</repository>

2.Start your MySQL client and add a document index table like this:

mysql>

create table docindex (

->

id int(11) not null primary key auto_increment,

->

created_by text,

->

created_on datetime,

->

edited_by text,

->

edited_on datetime,

->

title text,

->

descr text);

Query OK, 0 rows affected (0.43 sec)

442

TEAM LinG

Python in the Enterprise

3.Now create a new script called docs.py and enter the following code:

import wftk

repos = wftk.repository()

repos.user_auth (‘me’, ‘x’)

e = wftk.entry (repos, ‘docs’) e.parse (“””

<rec>

<field id=”title”>Code file simple.py</field>

<field id=”descr”>This script demonstrates wftk reads.</field> </rec>

“””)

print “BEFORE SAVING:” print e

e.save()

print “AFTER SAVING:” print e

print

e.attach_file(‘content’, ‘simple.py’) print “AFTER ATTACHMENT:”

print e print

l = wftk.list (repos, ‘docs’) l.query (“edited_by=’me’”) print l.keys()

print l[l.keys()[0]]

print e.retrieve(‘content’)

How It Works

Here is where things start to get interesting with document management; and as always, when things get interesting, they start to get more complicated. The main thing to note is that this example saves a single document (using the code from the first example) into the document management system, does some simple searching, and then retrieves the document. On the way, of course, is a lot more detail than in the previous examples, so let’s look at everything in turn, starting at the top.

First, notice that the docs list is more complex than anything you’ve seen so far. In addition to having a field of type ‘content’ to store the document outside the main object, it also defines an index. The index, as explained previously, is a way of having your cake and eating it, too. It enables you to have a simple, easy-to-search table in MySQL that is always synchronized with the main list storage implemented as a directory containing XML files and attachment documents. Whenever a change is made to this list, the change is executed for both the main list storage and the MySQL table that mirrors it.

Here, the index uses a from attribute on several of the fields to extract information about the main attachment. This attachment field is named “content”, and so, for example, “content.size” refers to the “size” attribute of the final field XML (you can scan down to the output to see the structure of that field; we’ll come back to this later.) This means that you can build simple SQL query, one that uses a

443

TEAM LinG

Chapter 20

WHERE clause to find particular objects that have been recorded in the repository, such as only those with particular sizes, or those created or edited before or after particular dates, or by particular users. All of this information is always saved by the attachment process so you know that it will be available for you to query.

Note that the storage setup for this simple document management system, although it may encounter issues in scaling, can easily be a test bed for a more industrial-strength version at some later date.

For instance, you could easily replace the back-end with an Oracle database, storing documents in a BLOB field, and continue to use any Python scripts you’d written against the test system with no changes at all. The same applies to whatever workflow you define to be used with this system. Moreover, if you move to a commercial document management system, at most you would have to write an adaptor module to interface the wftk to the new storage system, and continue to use the same scripts and workflow.

The second feature of this new repository definition, and something you haven’t learned in this book yet, is that it contains a user list. This user list is very simple, and obviously it isn’t built with security in mind; in a real system, you would want to have some more convincing security. However, for demonstration purposes, and in other limited circumstances, this can be valid solution.

This list uses “here” storage, meaning it’s a read-only list that is defined and stored right in the repository definition. It defines two users, me and you. Of course, you need a user list because next you’re going to register a user before attaching documents, so that the attaching user can be used as a parameter to be searched for. This is needed in most real environments.

Moving along to the SQL definition of the docindex table, note that the primary key of the table has an auto_increment attribute. This is a nifty MySQL feature that assigns a unique key to a record when it’s saved; the wftk, after the key field is defined as a key field with attribute keygen set to “auto”, will first save the record to the index, ask MySQL what key it was assigned, and then will modify the record in the main storage to match.

Now take a look at the code. There are several new features to study here, the first being the call to user_auth to assign a user to the current session. The current assigned user has very little obvious effect, but it allows workflow, permissions, and the attachment process to note who is taking action. You’ll come back to user authorization when you look at more advanced workflow later.

The document object is created and saved in the same way you’ve created and saved your objects so far, but now you also attach a file to it. Note that attachments are named by field, and that you can attach arbitrary numbers and types of documents to an object. Objects don’t all have to be predefined in the list definition.

Because you’ve already defined an attachment field and have indexed some of its attributes, only attachments with that name will affect the index.

You aren’t restricted to attaching the contents of files, though. The attach method can specify an arbitrary string to be written to the attachment. In Python, this gives you a lot of flexibility because you could even attach a pickled object that you wanted to be run on the object when it’s retrieved!

When the file is attached, things get interesting. Because this list has MySQL behind it, you can use the power of SQL to search on it. The next few lines build a special list object to perform the query on, and then call the query with the text of an SQL WHERE clause. After the query is done, there is a little manipulation of the data, and then you can retrieve the attachment again and print it as a demonstration.

444

TEAM LinG

Python in the Enterprise

Looking at the output from this example, you can first see three versions of the object as it’s built, saved, and then attached. Remember that after it has been saved, it is given an id field. This field is written by MySQL with a unique key, and this is done automatically by the “autoincrement” option that was set when the table was defined.

After attachment happens, you can see that a new field, the content field, has been written to the object. This field doesn’t store the attachment itself, but rather specifies where the adapter can retrieve it when you do want it. Obviously, because attached files can be of any size, it’s better not to retrieve them whenever the object is used, because if it’s a large file and you don’t want it every time, that would slow down your programs a lot.

The descriptor field for an attachment is decorated with various useful attributes, which the wftk can use to retrieve information about the attachment during the attachment process — things that you’ve seen discussed and that you already know are important, such as information about the time at which events occurred to the document, about the user, and about the size of the file itself.

This is the data that you abstract out for the MySQL index summary, and you’ll see it and use it again later in the output. You can also see it by running queries using your MySQL client with what you already know about mysql — for instance, querying with SELECT * FROM docindex.

After the file is attached to the object, the results of the query are returned. The query searches on objects created by user “me,” so if you run this example code several times, you’ll see that all of those objects are retrieved by this query, which could be more useful in the future when you are looking for multiple results. Of course, you can easily modify the attachment code to do something else, and then the results of this query will change based on what you’ve done.

Here is the result of running the script:

C:\projects\simple>python docs.py BEFORE SAVING:

<rec>

<field id=”title”>Code file simple.py</field>

<field id=”descr”>This script demonstrates wftk reads.</field> </rec>

AFTER SAVING:

<rec list=”docs” key=”1”>

<field id=”title”>Code file simple.py</field>

<field id=”descr”>This script demonstrates wftk reads.</field> <field id=”id”>1</field>

</rec>

AFTER ATTACHMENT:

<rec list=”docs” key=”1”>

<field id=”title”>Code file simple.py</field>

<field id=”descr”>This script demonstrates wftk reads.</field> <field id=”id”>1</field>

<field id=”content” type=”document” created_on=”2005-03-19 20:16:27” edited_on=” 2005-03-19 20:16:27” created_by=”me” edited_by=”me” size=”272” mimetype=”” locat ion=”_att_1_content_.dat”/>

</rec>

[‘1’]

<record id=”1”>

445

TEAM LinG

Chapter 20

<field id=”id”>1</field>

<field id=”created_by”>me</field>

<field id=”created_on”>2005-03-19 20:16:27</field> <field id=”edited_by”>me</field>

<field id=”edited_on”>2005-03-19 20:16:27</field> <field id=”size”>272</field>

<field id=”title”>Code file simple.py</field>

<field id=”descr”>This script demonstrates wftk reads.</field> </record>

import wftk

repos = wftk.repository(‘site.opm’)

e = wftk.entry (repos, ‘simple’)

e.parse (“”” <rec>

<field id=”field2”>this is a test value</field> </rec>

“””)

print “BEFORE SAVING:” print e

e.save()

print “AFTER SAVING:” print e

print

l = repos.list(‘simple’) print l

The results of the query show first the list of keys returned by the query, and then an example record after it has been returned. Note that these return records are the returns from MySQL; they have a different structure from the records actually saved to the main list storage. Specifically, you can see that the attachment-specific fields such as size and created_on have been stored as separate fields in the database and that they remain separate fields here in the XML output.

Finally, the output dumps the content of the attachment, which is just the code from the first sample, which was saved.

There are now a hundred different things you could do to make this script serve a specific, useful purpose in your work. One of those is to manage your stored documents in a document retention framework, so let’s look at that.

Try It Out

A Document Retention Framework

Ready to get your feet wet with something really useful? Try putting together a simple document retention manager. You already know nearly everything you need from the preceding examples; all that’s needed is to roll it all up into one program. As noted, you shouldn’t be terribly worried at the lack of scalability of this test system; you can easily swap out the back-end for applications with more heavyduty performance requirements.

This example assumes that you’ve worked through the last one and that you have a few documents already stored in the docs list. If you didn’t define a docs list, this example isn’t going to work. Of

446

TEAM LinG

Python in the Enterprise

course, even if you did define a docs list, its contents are going to be pretty bland if you haven’t modified the example code, but you’ll still get a feel for how this works.

1.Using the text editor of your choice, open the file system.defn in the example repository from the previous examples and add a new list for storing the retention rules you’ll be defining:

<repository loglevel=”6”>

<list id=”rules” storage=”mysql:wftk” table=”rules” order=”sort”> <field id=”id”/>

<field id=”sort”/> <field id=”name”/> <field id=”rule”/> </list>

Make sure you don’t change the rest of the repository definition!

2.Start your MySQL client and add the rule table and a couple of rules, like this:

mysql> create table rules (

-> id int(11) not null primary key auto_increment, -> sort int(11),

-> name text, -> rule text);

Query OK, 0 rows affected (0.01 sec)

mysql> insert into rules (sort, name, rule) values (1, ‘files by me’, “created_by=’me’ and to_days(now()) – to_days(edited_on) > 4”); Query OK, 1 row affected (0.01 sec)

mysql> insert into rules (sort, name, rule) values (0, ‘files by you’, “created_ by=’you’ and to_days(now()) – to_days(edited_on) > 3”);

Query OK, 1 rows affected (0.01 sec)

3.Create a new script called trash.py and enter the following code:

import wftk

repos = wftk.repository()

rlist = wftk.list (repos, ‘rules’) rlist.query (“”)

for r in rlist.keys():

rule = wftk.xmlobj(xml=rlist[r])

print “Running rule ‘%s’” % rule[‘name’]

docs = wftk.list (repos, ‘docs’) docs.query (rule[‘rule’])

for d in docs.keys():

print “ -- Deleting document %s” % d doc = repos.get (‘docs’, d) doc.delete()

447

TEAM LinG

Chapter 20

4.Run it (naturally, your output will look different from this):

C:\projects\simple>python trash.py C:\projects\articles\python_book\ch20_enterprise>python trash.py Running rule ‘files by you’

Running rule ‘files by me’ -- Deleting document 2

How It Works

Given that this program is actually the first one in this chapter that may have some real-world application, you might find it surprising that it’s so simple. Here, the approach has been to define the working environment (the repository) first and then proceed to define the actions to take. Let’s look at things in detail.

The definition of the rules list is deliberately simplified so that the structure of the program will be easy to see. For instance, there is no way to define the action to be taken or the archival level to be used for any rule that fires; the only action, by default, is deletion. All you’ve really done is to define a list of “WHERE” clauses, give them convenient names, and make it possible to define a sort order. The order attribute on the rules list ensures that the wftk will retrieve them in the right order when you want to use them.

In step 2, you define the rules table in MySQL and then insert two rules. Files edited by the user “you” are deleted in three days, but files defined by the user “me” aren’t deleted until they’re four days old — this is an example of how roles can be used to define a retention policy.

Just to demonstrate a point, the sort order of the two rules is in the opposite of the order they’re defined (and, thus, the opposite of the numeric order of their keys.) The point is that the wftk will retrieve them anyway, and in the sort order that you specified, without needing any additional prompting for input from you.

To start the program, then, connect to the repository and define a list of rules to be retrieved; then use the query method to retrieve them. Using the keys method to iterate along the rules, define an fwtk XML object for each retrieved record to make them available to work with. An XML object is the standard record storage structure of the wftk, so after you’ve defined the object, you can address named fields directly, making it easy to work with list query results. Now, proceed to iterate through the keys and process each rule against each list in turn. After telling the user which rule is active, a second query is built to retrieve all of the documents matching the rule and delete those documents. It’s as simple as that.

The python-ldap Module

Now that you’ve gotten your feet wet with enterprise-style document management programming, look at the second open-source project of the chapter, OpenLDAP and the python-ldap module.

As mentioned earlier, OpenLDAP is a directory service that comes in a packaged form. It’s a convenient package because it runs well under Linux or Windows, and it implements the standard LDAP protocol. LDAP in general is a very solid, well-understood protocol for directory access, and it’s the long-term strategy for many key computer platforms, including Microsoft’s Active Directory, Sun’s SunOne directory server, and offerings from other vendors as well. LDAP, as an infrastructure component, is not going to go away soon, and using it from Python is incredibly simple.

448

TEAM LinG

Python in the Enterprise

When working through the LDAP examples, you have two options. You can use an existing LDAP server in your organization to try scripts out on (in this case, you’ll need to modify the scripts to match your server’s schema, authentication, and other values that will already be defined for you), or you can set up a test LDAP server, load some data into it, and use that to work with. The first Try It Out that follows explains how to set up your own LDAP server for these examples, if that’s the way you want to go, and it’s a good exercise to help you understand a little more about how LDAP works.

Unfortunately, although there are high-quality precompiled Windows distributions for the OpenLDAP server software itself, there are currently no recent Windows builds of the client software used by the python-ldap modules, and no interfaces from python-ldap to the standard Microsoft LDAP client in wldap32.dll. Therefore, for all of these LDAP examples, as of the time this book is published, you’ll need to have a Linux or Unix system to build an OpenLDAP instance. Fortunately, if you are a Windows user, you can use the cygwin toolkit to create an environment that you can use to build and run OpenLDAP. For instructions on downloading and installing OpenLDAP and the python-ldap module, see the web site for this book.

Try It Out

Using Basic OpenLDAP Tools

1.After you’ve downloaded and installed an OpenLDAP package and have followed this book’s web site instructions for how to set up a basic server, make sure that the domain name it serves in the slapd.conf file is “wftk.org” if you want to use the following examples without modifying them. When OpenLDAP is running on your system, use a text editor to create the following LDIF file anywhere you want:

#Add a simple, generic user

dn: cn=Different Person,dc=wftk,dc=org objectClass: person

sn: Different Person cn: Different Person

# Add another user

dn: cn=Michael Roberts,dc=wftk,dc=org objectClass: person

sn: Roberts

cn: Michael Roberts

# Add a workflow group: wfstarter dn: cn=wfstarter,dc=wftk,dc=org objectclass: organizationalRole cn: wfstarter

roleOccupant: cn=Michael Roberts roleOccupant: cn=Different Person</repository>

2.Save the file as testldif.txt, and then use ldapadd to add the data you just entered:

[michael@me michael]$ ldapadd -x -D “cn=Manager,dc=wftk,dc=org” -W -f testldif.txt Enter LDAP Password:

adding new entry “cn=Different Person,dc=vivtek,dc=com” adding new entry “cn=Michael Roberts,dc=vivtek,dc=com” adding new entry “cn=wfstarter,dc=vivtek,dc=com”

449

TEAM LinG

Chapter 20

3.Now, use ldapsearch to see what happened (note the command on the first line):

[michael@me michael]$ ldapsearch -x -b ‘dc=vivtek,dc=com’ ‘(objectclass=*)’

#extended LDIF

#LDAPv3

#base <dc=wftk,dc=org> with scope sub

#filter: (objectclass=*)

#requesting: ALL

#

# Different Person, wftk.org

dn: cn=Different Person,dc=wftk,dc=org objectClass: person

sn: Different Person cn: Different Person

# Michael Roberts, wftk.org

dn: cn=Michael Roberts,dc=wftk,dc=org

objectClass: person sn: Roberts

cn: Michael Roberts

# wfstarter, wftk.org

dn: cn=wfstarter,dc=wftk,dc=org objectClass: organizationalRole cn: wfstarter

roleOccupant: cn=Michael Roberts roleOccupant: cn=Different Person

#search result search: 2 result: 0 Success

#numResponses: 4

#numEntries: 3

How It Works

The LDIF format (LDAP Data Interchange Format) is defined in RFC 2849 and is the standard text-based format for defining data and dumping data to and from LDAP databases — and even specifying other arbitrary data changes, such as changes to and deletions of records. An LDIF file is divided into records by blank lines, and the first line of each record gives the distinguishing name (the DN, or the key) of the record affected. The default operation is addition, and the file defined previously simply adds a few test records to an otherwise empty database. Use this test database for the rest of the examples.

The ldapadd utility is used to interpret LDIF files and make the LDAP API calls appropriate to carry out the instructions they contain. In addition, the ldapsearch utility can be used to search the database from the command line and format the results in a more or less LDIF format. These are handy tools to have at your disposal when you’re working with LDAP, but to do any more involved work, you’ll want to write your own code in Python, and that’s what the next example is all about.

450

TEAM LinG