Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
177
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Web Applications and Web Services

app.cgi) or a standard string prepended to the PATH_INFO of a resource identifier (app.cgi/api/foo instead of app.cgi/foo). The PATH_INFO solution yields nicer-looking resource identifiers, but TiniWiki’s REST web service will be implemented as a separate CGI, just because it’s easier to present.

One final note with respect to PUT and DELETE. Web services are free from dependence on HTML forms. While the PUT and DELETE HTTP verbs aren’t supported by web browsers, they are supported by many (but not all) programmable clients. We could simplify the preexisting BittyWiki interface a little by bringing in PUT and DELETE. Doing this would let us get rid of the operation argument, which is only used to distinguish a PUT or POST-style POST request from a DELETE-style POST request. However, for the sake of correspondence with the web application, and because not all programmable clients support PUT and DELETE, the BittyWiki REST web service won’t take this route.

The second thing to consider is which features of the web application it makes sense to expose through an external API. Why would someone want programmatic access to the contents of a wiki? A wiki’s users might create two types of robot:

A robot that modifies or creates wiki pages — for instance, an automated test system that posts a daily status report to a particular wiki page

A robot that retrieves wiki pages — to archive or mirror a wiki or to render wiki pages to an end-user in some format besides HTML

The first type of robot might need to create, edit, and delete a wiki page. That functionality can remain more or less intact, but unlike in a web application, there’s no need to present a nice-looking document after taking a requested action. All the robot needs to know is whether or not its request was carried out. The document returned for a POST operation need only contain a status message.

Both types of robot need to retrieve pages from the wiki. What they actually need, though, is not the HTML rendering of the page (the thing you get when you GET /bittywiki.cgi/PageName), but the raw page data (the thing that shows up in the edit box when you GET /bittywiki.cgi/PageName?operation= write). The first type of robot needs the data in this format because it’s going to do its own rendering, and it’s easier to render from the raw data than from HTML. The second type of robot needs it in this format for a similar reason; it’s because that’s what shows up in the edit box because that’s how it’s stored on the back-end.

BittyWiki’s REST API for robots is therefore basically similar to the REST API for web browsers. The only difference is the format of the responses: Instead of human-readable HTML documents, the REST web service outputs plaintext documents. A more complicated REST web service, like Amazon’s, would probably output documents formatted in XML or sparse HTML, expecting the client to parse them. Here’s the plaintext result of GETting http://localhost:8000/cgi-bin/bittywiki-rest.cgi; compare it to the HTML output when you GET http://localhost:8000/cgi-bin/bittiwiki.cgi:

This is the home page for my BittyWiki installation.

Here you can learn about the philosophy and technologies that drive web applications: REST, CGI, and the PythonLanguage.

The structure of bittywiki-rest.cgi is also similar to bittywiki.cgi:

501

TEAM LinG

Chapter 21

#!/usr/bin/python import cgi import cgitb cgitb.enable() import os

import re

from BittyWiki import Wiki, Page, NotWikiWord

class WikiRestApiCGI:

#The possible operations on a wiki page. VIEW = ‘’

WRITE = ‘write’ DELETE = ‘delete’

#The possible response codes this application might return. RESPONSE_CODES = { 200 : ‘OK’,

400 : ‘Bad Request’,

404 : ‘Not Found’}

def __init__(self, wikiBase): “Initialize with the given wiki.” self.wiki = Wiki(wikiBase)

def run(self):

“””Determine the command, dispatch to the appropriate handler, and print the results as an XML document.”””

toDisplay = None try:

page = os.environ.get(‘PATH_INFO’, ‘’) if page:

page = page[1:]

page = self.wiki.getPage(page) except NotWikiWord, badName:

toDisplay = 400, ‘“%s” is not a valid wiki page name.’ % badName

if not toDisplay:

form = cgi.FieldStorage()

operation = form.getfirst(‘operation’, self.VIEW) operationMethod = self.OPERATION_METHODS.get(operation) if operationMethod:

if not page.exists() and operation != self.WRITE: toDisplay = 404, ‘No such page: “%s”’ % page.name

else:

toDisplay = operationMethod(self, page, form)

else:

toDisplay = 400, ‘“%s” is not a valid operation.’ % operation

#Print the response. responseCode, payload = toDisplay

print ‘Status: %s %s’ % (responseCode, self.RESPONSE_CODES.get(responseCode))

print ‘Content-type: text/plain\n’

print payload

502

TEAM LinG

Web Applications and Web Services

The main code figures out the resource and the desired operation and hands this off (along with any provided representation) to a handler method. The result is then rendered — but this time as plaintext:

def viewOperation(self, page, form=None): “Returns the raw text of the given wiki page.” return 200, page.getText()

def writeOperation(self, page, form): “Writes the specified page.” page.text = form.getfirst(‘data’) page.save()

return 200, “Page saved.”

def deleteOperation(self, page, format, form=None): “Deletes the specified page.”

if not page.exists():

toDisplay = 404, “You can’t delete a page that doesn’t exist.” else:

page.delete()

toDisplay = 200, “Page deleted.” return toDisplay

#A registry mapping ‘operation’ keys to methods that perform the operations. OPERATION_METHODS = { VIEW : viewOperation,

WRITE: writeOperation, DELETE: deleteOperation }

The three operation handler methods are also similar to their counterparts in bittywiki.cgi, though simpler because they produce less data.

Wiki Search-and-Replace Using the REST Web Service

What good is this web service for BittyWiki? Well, here’s an only slightly contrived example: Suppose that you get someone to host a BittyWiki installation for an open-source project you’re working on, called Foo. You create a lot of wiki pages that mention the name of the project in their text (“Foo is a triphasic degausser for semantic defribulation”) and in the titles of the pages (BenefitsOfFoo, FooDesign, etc.). All is going well until one day when you decide to change the name of your project to Bar. It would take a long time to manually change those wiki pages (including renaming many of them), and you don’t have access to the server on which the wiki is actually hosted, so you can’t write a script to crawl the file system. What do you do?

Here’s a Python script, WikiSpiderREST.py, which acts as a wiki search-and-replace spider. Starting at the HomePage of the wiki (which is a WikiWord), it crawls the wiki by following WikiWord links, and replaces all of the instances of one string (e.g., “Foo”) with another string (e.g., “Bar”). A page whose name contains the old string (e.g., “FooDesign”) is deleted and recreated under a different name (e.g., “BarDesign”). WikiSpiderREST.py keeps track of the pages it has processed so as not to waste time or get stuck in a loop:

503

TEAM LinG

Chapter 21

#!/usr/bin/python import re

import urllib

class WikiReplaceSpider:

“A class for running search-and-replace against a web of wiki pages.”

WIKI_WORD = re.compile(‘(([A-Z][a-z0-9]*){2,})’)

def __init__(self, restURL):

“Accepts a URL to a BittyWiki REST API.” self.api = BittyWikiRestAPI(restURL)

def replace(self, find, replace):

“””Spider wiki pages starting at the front page, accessing them and changing them via the provided API.”””

processed = {} #Keep track of the pages already processed. todo = [‘HomePage’] #Start at the front page of the wiki. while todo:

for pageName in todo:

print ‘Checking “%s”’ % pageName try:

pageText = self.api.getPage(pageName) except RemoteApplicationException, message:

if str(message).find(“No such page”) == 0:

#Some page mentioned a WikiWord that doesn’t exist #yet; not a big deal.

pass else:

#Some other problem; pass it on up. raise RemoteApplicationException, message

else:

#This page actually exists; process it.

#First, find any WikiWords in this page: they may #reference other existing pages.

for wikiWord in self.WIKI_WORD.findall(pageText): linkPage = wikiWord[0]

if not processed.get(linkPage) and linkPage not in todo: #We haven’t processed this page yet: put it on

#the to-do list. todo.append(linkPage)

#Run the search-and-replace on the page text to get the #new text of the page.

newText = pageText.replace(find, replace)

#Check to see if this page name matches #search and replace. If it does, delete it and #recreate it with the new text; otherwise, just #save the new text.

newPageName = pageName.replace(find, replace) if newPageName != pageName:

print ‘ Deleting “%s”, will recreate as “%s”’ \ % (pageName, newPageName)

504

TEAM LinG

Web Applications and Web Services

self.api.delete(pageName)

if newPageName != pageName or newText != pageText: print ‘ Saving “%s”’ % newPageName self.api.save(newPageName, newText)

#Mark the new page as processed so we don’t go through #it a second time.

if newPageName != pageName: processed[newPageName] = True

processed[pageName] = True

todo.remove(pageName)

So far, there’s been nothing REST-specific except the reference to a BittyWikiRestAPI class. That’s about to change as we go ahead and define that class, as well as others that implement a general Python interface to the BittyWiki REST API:

class BittyWikiRestAPI:

“A Python interface to the BittyWiki REST API.”

def __init__(self, restURL):

“Do all the work starting from the base URL of the REST interface.” self.base = restURL

def getPage(self, pageName):

“Returns the raw markup of the named wiki page.” return self._doGet(pageName)

def save(self, pageName, data):

“Saves the given data to the named wiki page.”

return self._doPost(pageName, { ‘operation’ : ‘write’, ‘data’ : data })

def delete(self, pageName): “Deletes the named wiki page.”

return self._doPost(pageName, { ‘operation’ : ‘delete’ })

def _doGet(self, pageName):

“”””Does a generic HTTP GET. Returns the response body, or throws an exception if the response code indicates an error.””” url = self._makeURL(pageName)

return self.Response(urllib.urlopen(url)).body

def _doPost(self, pageName, data):

“””Does a generic HTTP POST. Returns the response body, or throws an exception if the response code indicates an error.””” url = self._makeURL(pageName)

return self.Response(urllib.urlopen(url, urllib.urlencode(data))).body

def _makeURL(self, pageName):

“Returns the URL to the named wiki page.” url = self.base

if url[-1] != ‘/’: url += ‘/’

return url + pageName

505

TEAM LinG

Chapter 21

class Response:

“””This class handles the HTTP response returned by the REST web service.”””

def __init__(self, inHandle): self.body = None statusCode = None

info = inHandle.info()

#The status has automatically been read into an object #that also contains all the HTTP headers. The status #string looks like ‘200 OK’

statusHeader = info[‘status’]

statusCode = int(statusHeader.split(‘ ‘)[0])

#The remaining data is the plain-text response. In a more #complex application, this might be structured text or #XML, and at this point it would need to be parsed. self.body = inHandle.read()

#The response codes in the 2xx range are the only good #ones. Getting any other response code should result in #an exception.

if statusCode / 100 != 2:

raise RemoteApplicationException, self.body

class RemoteApplicationException(Exception):

“””A simple exception class for use when the REST API returns an error condition.”””

pass

The BittyWikiRestAPI class uses the urllib library to GET and POST things to BittyWiki’s REST interface CGI. It interprets the response as a status message, an exception message, or the text of a requested page. This class could be distributed in a standalone module to encourage developers to write BittyWiki add-ons in Python.

Note that the Response class is defined within the BittyWikiRestAPI class: No one else is going to use it, and putting it here makes it invisible outside the class. This is completely optional, but it makes the top-level view neater.

Finally, some code that implements a command-line interface to the spider:

if __name__ == ‘__main__’: import sys

if len(sys.argv) == 4:

restURL, find, replace = sys.argv[1:] else:

print ‘Usage: %s [URL to BittyWiki REST API] [find] [replace]’ \ % sys.argv[0]

sys.exit(1)

WikiReplaceSpider(restURL).replace(find, replace)

506

TEAM LinG

Web Applications and Web Services

Try It Out

Wiki Searching and Replacing

Use your BittyWiki installation to create a few wiki pages around a particular topic. In the example shown in Figure 21-9, a few pages have been created for the mythical Foo project.

Figure 21-9

Run the WikiSpiderREST.py command to change your topic to another one. You should see output similar to this:

$ python WikiSpiderREST.py http://localhost:8000/cgi-bin/bittywiki-rest.cgi Foo Bar Checking “HomePage”

Saving “HomePage” Checking “FooCaseStudies”

Deleting “FooCaseStudies”, will recreate as “BarCaseStudies” Saving “BarCaseStudies”

Checking “CVSRepository”

Saving “CVSRepository” Checking “CaseStudy2”

Checking “BenefitsOfFoo”

Deleting “BenefitsOfFoo”, will recreate as “BenefitsOfBar”

Saving “BenefitsOfBar”

Checking “CaseStudy1”

Saving “CaseStudy1”

Checking “FooDesign”

Deleting “FooDesign”, will recreate as “BarDesign”

Saving “BarDesign”

Lo and behold: The wiki pages have been changed and, where necessary, renamed (see Figure 21-10).

How It Works

WikiSpiderREST.py keeps a list of WikiWords to check and possibly subject to search-and-replace. To process one of the WikiWords, it retrieves the corresponding page through the BittyWiki web service API. If the page actually exists, its text is scanned, and all of its WikiWords are put on the list of items to check later.

The page then has its text modified using string search-and-replace, and is saved through the web service API. If the page name contains the string to be replaced, it’s deleted and a new page with the same content is created — again, through the web service API. The next WikiWord in the list is then checked, and so on.

507

TEAM LinG

Chapter 21

Figure 21-10

Because WikiSpiderREST.py has no knowledge of wiki pages that are inaccessible from the HomePage, it’s not guaranteed to get all of the pages on the wiki. It only gets the ones a human user would see if they started at the HomePage and clicked all of the links.

XML-RPC

XML-RPC is a protocol that does the same job as REST: It makes it easy to write a robot that accesses and/or modifies some remote application just by making HTTP requests. There are some important differences, though. Whereas a REST call looks like manipulation of a document repository, an XML-RPC looks like a function call (in fact, in Python implementations the call to the web service is disguised as a function call). Instead of sending a GET or POST to the resource you want to retrieve or modify, as with REST, XML-RPC traditionally has you do all your calls by POSTing to one special “server” resource. The data you POST contains an XML representation of a function you’d like to call, and any arguments to that function. As with REST, the response to your call is a document containing any information you requested, any status messages, and so on.

BittyWiki is simple enough that everything you pass in or get out is a mere string. We’re fortunate in this regard because strings are the only data type supported by REST. If you need to pass an integer into a REST application, you need to encode it as a string and trust that the resource handler will know to turn it back into an integer. If you need to pass in an ordered list, you need to learn the server’s preferred way of representing an ordered list as a string. One REST application might represent lists as “item1,item2,item3”; another might represent them as “item1|item2|item3|”; a third might represent them as a custom-defined XML data structure. The major shortcoming of REST is that there’s no standard way of marshalling different data types into strings, or of unmarshalling a string into typed data. You need to relearn the request and response format for every REST web service you use.

Here’s the canonical sample XML-RPC client application. The public XML-RPC server betty.userland.com provides some example methods, including one that returns the name of a U.S. state, given an index into an alphabetical list:

>>>import xmlrpclib

>>>server = xmlrpclib.ServerProxy(“http://betty.userland.com”)

>>>server.examples.getStateName(41)

‘South Dakota’

508

TEAM LinG

Web Applications and Web Services

If this were a REST web service, the forty-first state in the list would be accessible as a distinct resource, perhaps “http://betty.userland.com/StateNames/41”. You’d get the name of a state by GETting the appropriate resource. You might have access to a Python library that handles the request and response details (the way the PyAmazon library handles the details of Amazon Web Services), but such libraries need to be written anew for each REST web service, as there’s no REST standard for data structure representation.

XML-RPC’s main advantage over REST is that it provides a standard way of encoding simple data structures into request and response data. XML-RPC specifies different XML strings for encoding the integer 4, the floating-point value 4.0, the string “4”, and a list containing only the string “4”. What you get back from an XML-RPC call is not a document that you have to parse, but a description of a data structure that can be automatically created for you by xmlrpclib, the XML-RPC library that comes with Python. It’s possible to make any kind of XML-RPC call using just one library (xmlrpclib).

By now, you’ll have noticed that Python is not very fastidious about types, and it will work with you on transforming one type to another. That said, its built-in types cover just about everything for which XML-RPC defines a representation: Booleans, integers, floating-point numbers, strings, arrays, and dictionaries. For binary data and dates, xmlrpclib provides wrapper classes (Python got date/time object support in version 2.3, but xmlrpclib hasn’t yet been updated to use it).

The XML-RPC spec, at www.xml-rpc.com/spec/, is short and sweet.

XML-RPC Quick Start: Get Tech News from Meerkat

Meerkat is a public web application that aggregates technology news from hundreds of weblogs and news sites. It was one of the first web applications to expose a web service interface: first a RESTlike interface and then an XML-RPC interface.

Meerkat’s XML-RPC interface is described at www.oreillynet.com/pub/a/rss/2000/11/14/ meerkat_xmlrpc.html.

Meerkat’s API exposes access to three types of objects: channels (weblogs and news sites), categories (groupings of channels), and items (stories published by channels). Unfortunately, to use any of the functions that deal with channels or categories, you must do some legwork ahead of time to ascertain the numeric channel or category IDs. The most generally useful method is therefore getItems, a search function that tries to match your search criteria against Meerkat’s database of recently posted news items.

Here’s a simple script, MeerkatSummary.py, that takes a search criterion as input and determines which Meerkat channels have the most stories that match the search:

import xmlrpclib

class MeerkatSummary:

“””Lists channels that match a search term, in order of how many stories match.”””

SERVER_URL = ‘http://www.oreillynet.com/meerkat/xml-rpc/server.php’

def __init__(self):

“Set up a reference to the Meerkat server.”

509

TEAM LinG

Chapter 21

#Passing ‘verbose=True’ to the server constructor will make it #print the text of the request and response for each XML-RPC #call, letting you see the internal workings of the protocol. #verbose = True

verbose = False

server = xmlrpclib.ServerProxy(self.SERVER_URL, verbose=verbose) self.meerkat = server.meerkat

def findChannels(self, searchTerm):

“Given a search term, find out which channels have the most hits.” channelTotals = {}

items = self.meerkat.getItems({‘search’ : searchTerm, ‘channels’ : True})

for item in items:

channel = item[‘channel’]

totalForChannel = channelTotals.get(channel, 0) totalForChannel += 1

channelTotals[channel] = totalForChannel

#Turn the map into a list of (matches, channel name) tuples, and sort it. totalAndChannel = [(a,b) for b,a in channelTotals.items()] totalAndChannel.sort()

totalAndChannel.reverse()

print ‘Meerkat report for “%s”:’ % searchTerm for total, channel in totalAndChannel:

print “%2d %s” % (total, channel)

The actual web service call is self.meerkat.getItems, on the third line of MeerkatSummary. findChannels. If you blink you’ll miss it, because as far as Python is concerned, it’s just another method call — albeit one that’s implemented differently than a local method call. xmlrpclib defines a __call__ method for ServerProxy that handles the XML-RPC for getItems.

The previous section’s WishListBargainFinder also hid the complexity of a web service behind a standard Python method: In that case, it was amazon.searchWishList that activated the REST web service. The difference is that someone had to write a Python method called “searchWishList” that made an AWS-specific REST request and processed the AWS-specific response. The getItems method is handled by xmlrpclib — there’s no special code for dealing with the Meerkat XML-RPC server, no need for an actual Python method called getItems:

if __name__ == ‘__main__’: import sys

if len(sys.argv) != 2:

print “Usage: %s [search term]” % sys.argv[0] sys.exit(1)

else: MeerkatSummary().findChannels(sys.argv[1])

Run the script, and you’ll see a variety of news channels that have mentioned Python:

$ python MeerkatSummary.py Python Meerkat report for “Python”:

22 Freshmeat Daily News

10 Python URL (daily updates)

8 Vaults of Parnassus

510

TEAM LinG