Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
177
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Web Applications and Web Services

2 Python Cookbook

1 Zope org

1 SourceForge Project News

1 Python org latest headlines

1 NetBSD Packages

1 Linux Weekly News

1 Linux Today

1IceWalkers

1Developer Shed

James Joyce, who got lots of results from an Amazon Web Services query, doesn’t do as well here:

$ python MeerkatSummary.py “James Joyce” Meerkat report for “James Joyce”:

1kottke.org

1Beyond the Beyond

Note that Meerkat’s getItems method will never return more than 50 results; and unlike with Amazon Web Services, there’s no pagination interface that lets you get more. Any script you write that runs against Meerkat will be limited to delving into the recent past.

The XML-RPC Request

The XML-RPC request body is the body of an HTTP POST request. It’s an XML document containing a methodCall element. The methodCall element contains two elements of its own: methodName, which designates the method to be called; and params, which contains a list of the parameters to be passed as arguments into the method.

Here’s a sample XML-RPC request for a hypothetical method that sorts a list of numbers in either ascending or descending order:

<?xml version=”1.0”?> <methodCall>

<methodName>searchsort.sortList</methodName>

<params>

<param>

<value>

<array>

<data>

<value><i4>10</i4></value>

<value><i4>2</i4></value>

</data>

</array>

</param>

<param><value><boolean>1</boolean></param>

</params>

</methodCall>

This is the XML-RPC equivalent of invoking a hypothetical local method with the following code:

import searchsort

searchsort.sortList([10, 2], True)

511

TEAM LinG

Chapter 21

Given what you know about xmlrpclib, it’s no surprise that this method request would be generated and POSTed when you ran code like this:

import xmlrpclib

xmlrpclib.ServerProxy(“http://sortserver/RPC”).searchsort.sortList([10, 2], True)

Representation of Data in XML-RPC

The XML-RPC methodName can be any string, but XML-RPC methods are traditionally grouped into named packages, such as searchsort in the preceding example. In a Python implementation, this makes it look like a module called searchsort that contains the functions to expose, like sortList.

XML-RPC parameters can be any of the following types:

 

Data Type

Sample XML-RPC Representation

 

 

 

 

 

 

Boolean True or False

<boolean>1</boolean>

 

 

A string

<string>James Joyce</string>

 

 

An integer

<i4>10</i4>

 

 

A floating-point number

<double>5.1</double>

 

 

An array (items can be of

<array>

 

 

any type, or mixed type)

<data>

 

 

 

 

 

 

<value><i4>10</i4></value>

 

 

 

<value><i4>2</i4></value>

 

 

 

</data>

 

 

 

</array>

 

 

A dictionary (keys must be

<struct>

 

 

strings; values can be any type)

<member>

 

 

 

 

 

 

<name>search</name>

 

 

 

<value><string>James Joyce</string></value>

 

 

 

</member>

 

 

 

<member>

 

 

 

<name>channels</name>

 

 

 

<value><boolean>1</boolean></value>

 

 

 

</member>

 

 

 

</struct>

 

512

TEAM LinG

 

 

Web Applications and Web Services

 

 

 

 

Data Type

Sample XML-RPC Representation

 

 

 

 

A date and time

<dateTime.iso8601>20050614T19:11:20 </dateTime.iso8601>

 

 

(Use xmlrpclib’s DateTime wrapper class, which can be

 

 

instantiated from a time tuple, seconds since epoch, etc.)

 

Binary data

<base64>AVRoaXMgaXMgYmluYXJ5IGRhdGEu</base64>

 

 

(Use xmlrpclib’s Binary wrapper class, which can be instan-

 

 

tiated from a string.)

 

 

 

Strongly typed languages can have problems with some of these: mixed-type arrays, for example. Dynamic languages like Python handle these in stride.

The XML-RPC Response

The body of the XML-RPC response is an XML document describing the return value of the function invoked by the request.

Assuming our hypothetical searchsort.sortList method does what it says, when invoked with the sample body given earlier it’ll return a response that looks like this:

<?xml version=”1.0”?> <methodResponse> <params>

<param>

<value>

<array>

<data>

<value><i4>2</i4></value>

<value><i4>10</i4></value>

</data>

</array>

</value>

</param>

</params>

</methodResponse>

The response has the same basic structure as the request, but it’s sparser. It’s missing a methodName element because it’s assumed you know which method you just called. It has a params element, just like the request; but whereas the request’s params element could contain any number of param children (the arguments to the method), the response list is only allowed to contain a single param child: the return value.

If Something Goes Wrong

A REST web service is expected to flag error conditions using HTTP status codes, in conjunction with error documents that describe the problem. As you might expect, XML-RPC does a similar thing in a more structured way.

513

TEAM LinG

Chapter 21

If an XML-RPC server can’t complete a request for any reason, it will return a response containing a fault, instead of one containing a return value in params. A sample fault response is as follows:

<?xml version=”1.0”?> <methodResponse> <fault>

<value>

<struct>

<member>

<name>faultCode</name>

<value><int>4</int></value>

</member>

<member>

<name>faultString</name>

<value><string>No such method: “searchSort.sortList”.</string></value> </member>

</struct>

</value>

</fault>

</methodResponse>

The fault element describes an XML-RPC struct (i.e., a Python dictionary) with two members: faultString, which contains a human-readable description of what went wrong, and faultCode, the equivalent to the HTTP status code used to signify failure in REST contexts (even an XML-RPC call that results in a fault response will have an HTTP status code of 200). The advantage of faultCodes is that you can define them as you please for whatever problems are specific to your application. The disadvantage is that, unlike with HTTP status codes, there’s no consensus as to what faultCodes mean. You’ll need to reach an understanding with your users about the meanings of your service’s faultCodes.

Within Python, a response with a fault corresponds to an xmlrpclib.Fault object, a subclass of Error. If you’re using Python’s XML-RPC libraries, you can just raise and catch exceptions normally, instead of having to worry about creating or parsing XML-RPC faults.

Exposing the BittyWiki API through XML-RPC

If you doubt that Python programmers are spoiled, consider this: Not only does the language come bundled with a library that makes it easy to write XML-RPC clients; it also comes bundled with an XML-RPC server along the lines of the bundled SimpleHTTPServer and SimpleCGIServer. As with the other Simple*Server classes, SimpleXMLRPCServer runs as a standalone web server on its own port. However, the XML-RPC functionality is also available as a CGI program that accepts HTTP POSTs in XML-RPC format. This is implemented in another class, CGIXMLRPCRequestHandler, the name of which probably has more consecutive capital letters than any other class name in the Python standard library.

CGIXMLRPCRequestHandler is only included with Python versions 2.3 and later. However, you should be able to use Python 2.3’s copy of SimpleXMLRPCServer.py with Python version 2.2. Just copy it into a local directory, give it a different name, and import it from 2.2. Remember, of course, when you upgrade to Python 2.3, to get rid of your copy and start using the “real” copy.

514

TEAM LinG

Web Applications and Web Services

Here’s a script, bittywiki-xmlrpc.cgi, that will expose the BittyWiki API either through an XMLRPC CGI (if you invoke it without command-line arguments, the way a CGI-enabled web server would) or through a standalone XML-RPC server (if you pass it the port to use on the command line):

If you’re using the EasyCGIServer presented earlier, or another server based on Python’s CGIHTTPServer, using this script as a CGI may not work for you. If you run into problems with the CGI, try using another web server, such as Apache, or running a standalone XML-RPC server instead of going through a CGI.

import sys

import SimpleXMLRPCServer from BittyWiki import Wiki

class BittyWikiAPI:

“””A simple wrapper around the basic BittyWiki functionality we want to expose to the API.”””

def __init__(self, wikiBase):

“Initialize a wiki located in the given directory.” self.wiki = Wiki(wikiBase)

def getPage(self, pageName):

“Returns the text of the given page.” page = self.wiki.getPage(pageName) if not page.exists():

raise NoSuchPage, page.name return page.getText()

def save(self, pageName, newText): “Saves a page of the wiki.”

page = self.wiki.getPage(pageName) page.text = newText

page.save()

return “Page saved.”

def delete(self, pageName): “Deletes a page of the wiki.”

page = self.wiki.getPage(pageName) if not page.exists():

raise NoSuchPage, pageName page.delete()

return “Page deleted.”

class NoSuchPage(Exception): pass

So far, nothing XML-RPC specific — just a nicely packaged interface to the three basic functions of the BittyWiki API. Next, we write a function that exposes those three functions to XML-RPC. There are two ways of doing this: You can register functions one at a time or register an object instance, which registers all that object’s methods at once. This example provides code for both ways of registering the methods, but the instance registration is commented out, because in earlier versions of Python it exposed a security vulnerability:

515

TEAM LinG

Chapter 21

def handlerSetup(handler, api):

“””This function registers the methods of the BittyWiki API as functions of an XML-RPC handler.”””

#Register the standard functions used by XML-RPC to advertise which methods #are available on a given server. handler.register_introspection_functions()

#Register the BittyWiki API methods as XML-RPC functions in the #’bittywiki’ namespace.

handler.register_function(api.getPage, ‘bittywiki.getPage’) handler.register_function(api.save, ‘bittywiki.save’) handler.register_function(api.delete, ‘bittywiki.delete’)

#Here’s a way of registering all three methods in one line of #code, by registering the API object itself. It’s commented out #because it exposes a security hole in older versions of Python #(specifically: 2.2, 2.3.4 and 2.4.0). See #http://www.python.org/security/PSF-2005-001/ for details.

#

#handler.register_instance(api)

#

#Note also that when you do this, the functions exposed to XML-RPC #will not have the ‘bittywiki.’ prefix: they’ll be named,

#e.g. “getPage” instead of “bittywiki.getPage”. If this disturbs #you, try something like this instead:

#class Container:

#pass

#container = Container() #container.bittywiki = api #handler.register_instance(container)

Finally, the script portion, which starts up either a standalone XML-RPC server that can serve any number of requests, or a CGI-based XML-RPC script, which serves only the current request:

if __name__ == ‘__main__’: WIKI_BASE = ‘wiki/’

api = BittyWikiAPI(WIKI_BASE) standalonePort = None

if len(sys.argv) > 1:

#The user provided a port number; that means they want to #run a standalone server.

standalonePort = sys.argv[1] try:

standalonePort = int(standalonePort)

except ValueError:

#Oops, that wasn’t a port number. Chide the user and exit. scriptName = sys.argv[0]

print ‘Usage:’

print ‘ “%s [port number]” to start a standalone server.’ \ % scriptName

print ‘ “%s” to invoke as a CGI.’ % scriptName sys.exit(1)

isStandalone = 1

print “Starting up standalone XML-RPC server on port %s.” \

516

TEAM LinG

Web Applications and Web Services

% standalonePort

handler = SimpleXMLRPCServer.SimpleXMLRPCServer\ ((‘localhost’, standalonePort))

else:

#No port number specified; this is a CGI invocation. handler = SimpleXMLRPCServer.CGIXMLRPCRequestHandler()

#The function registration step is identical for #SimpleXMLRPCServer and CGIXMLRPCRequestHandler. handlerSetup(handler, api)

if standalonePort: handler.serve_forever()

else: handler.handle_request()

Try It Out

Manipulating BittyWiki through XML-RPC

It’s now possible to make XML-RPC calls against BittyWiki from other machines and even other programming languages, just as we were earlier making XML-RPC calls against Meerkat (which is written in PHP).

In one window, start the standalone XML-RPC server (alternatively, make sure the web server that serves the XML-RPC CGI is running):

# python BittyWiki-XMLRPC.py 8001

Starting up standalone XML-RPC server on port 8001.

In another, start an interactive Python session:

>>>import xmlrpclib

>>>server = xmlrpclib.ServerProxy(“http://localhost:8001/”)

>>>#Alternatively, something like:

... #server = xmlrpclib.ServerProxy(“http://localhost/cgi-bin/bittywiki- xmlrpc.cgi”)

>>>bittywiki = server.bittywiki

>>>bittywiki.getPage(“CreatedByXMLRPC”) Traceback (most recent call last):

File “<stdin>”, line 1, in ?

...

raise Fault(**self._stack[0])

xmlrpclib.Fault: <Fault 1: ‘No such page:CreatedByXMLRPC’>

>>>bittywiki.save(“CreatedByXMLRPC”, “This page was created through the XML-RPC interface.”)

‘Page saved.’

>>>bittywiki.getPage(“CreatedByXMLRPC”)

‘This page was created through the XML-RPC interface.’

You’re using web services, but you didn’t have to write special client code or (except at the beginning, when you connected to the server) even be aware that you’re using web services. Of course, the changes you make to the wiki through this interface will also show up for people using the web application or BittyWiki’s REST-based web service.

517

TEAM LinG

Chapter 21

Wiki Search-and-Replace Using the XML-RPC Web Service

Remember WikiSpiderREST.py, the script that crawled BittyWiki pages using its REST API to perform search-and-replace operations? You had to write a custom class (BittyWikiRESTAPI) to construct the right URLs to use against the REST interface, and a custom XML parser to process the response documents you got in return. Of course, once you have written that stuff, it can be reused in any application that uses BittyWiki’s REST API, but the main selling point of XML-RPC is that such classes aren’t necessary: xmlrpclib handles everything. Let’s put that to the test by rewriting WikiSpiderREST.py as

WikiSpiderXMLRPC.py:

#!/usr/bin/python import re

import xmlrpclib

class WikiReplaceSpider:

“A class for running search-and-replace against a web of wiki pages.”

WIKI_WORD = re.compile(‘(([A-Z][a-z0-9]*){2,})’)

def __init__(self, rpcURL):

“Accepts a URL to a BittyWiki XML-RPC API.” server = xmlrpclib.ServerProxy(rpcURL) self.api = server.bittywiki

def replace(self, find, replace):

“””Spider wiki pages starting at the front page, accessing them and changing them via the XML-RPC API.”””

processed = {} #Keep track of the pages already processed. todo = [‘HomePage’] #Start at the front page of the wiki. while todo:

for pageName in todo:

print ‘Checking “%s”’ % pageName try:

pageText = self.api.getPage(pageName) except xmlrpclib.Fault, fault:

if fault.faultString.find(“No such page”) == 0: #We tried to access a page that doesn’t exist; #not a big deal.

pass else:

#Some other problem; pass it on up. raise xmlrpclib.Fault, fault

else:

#This page actually exists; process it.

#First, find any WikiWords in this page: they may #reference other pages.

for wikiWord in self.WIKI_WORD.findall(pageText): linkPage = wikiWord[0]

if not processed.get(linkPage) and linkPage not in todo: #We haven’t processed this page yet: put it on

#the to-do list. todo.append(linkPage)

518

TEAM LinG

Web Applications and Web Services

#Run the search-and-replace on the page text to get the #new text of the page.

newText = pageText.replace(find, replace)

#Check to see if this page name matches the search #string. If it does, delete it and recreate it #with the new text; otherwise, just save the new #text in the existing page.

newPageName = pageName.replace(find, replace) if newPageName != pageName:

print ‘ Deleting “%s”, will recreate as “%s”’ \

% (pageName, newPageName) self.api.delete(pageName)

if newPageName != pageName or newText != pageText: print ‘ Saving “%s”’ % newPageName

saveResponse = self.api.save(newPageName, newText) #Mark the new page as processed so we don’t go through #it a second time.

if newPageName != pageName: processed[newPageName] = True

processed[pageName] = True todo.remove(pageName)

The WikiReplaceSpider class looks almost exactly the same as before. The only big difference is that, whereas before a method call like api.getPage moved into custom REST code you had to write, it now moves into pre-existing xmlrpclib code. Without those API-specific classes to implement, the WikiReplaceSpider class is pretty much all the code:

if __name__ == ‘__main__’: import sys

if len(sys.argv) == 4:

rpcURL, find, replace = sys.argv[1:] else:

print ‘Usage: %s [URL to BittyWiki XML-RPC API] [find] [replace]’ \ % sys.argv[0]

sys.exit(1) WikiReplaceSpider(rpcURL).replace(find, replace)

That’s it. This spider works just like the REST version, but it takes less code because there’s no one-off code to deal with the specifics of the REST API. This script is run just like the REST version, but the URL passed in is the URL to the XML-RPC interface, instead of the URL to the REST interface:

$ python WikiSpiderXMLRPC.py http://localhost:8000/cgi-bin/bittywiki-xmlrpc.cgi Foo Bar

Checking “HomePage”

Saving “HomePage” Checking “FooCaseStudies”

...

519

TEAM LinG

Chapter 21

SOAP

XML-RPC solves REST’s main problem by defining a standard way to represent data types such as integers, dates, and lists. However, while XML-RPC was being defined, the W3C’s XML working group was working on its own representation of those data types and many others. After XML-RPC became popular, the W3C turned their attention to it, and started redesigning it to use their preexisting standards.

Along the way, ambition broadened the scope of their project to include any kind of message exchange, not just procedure calls and their return values. The result was SOAP. The acronym originally stood for Simple Object Access Protocol, but because the standard’s scope has been expanded so far beyond simple remote procedure calls, the acronym itself is no longer applicable.

SOAP may still be simple compared to COM or CORBA, but it’s a lot more complicated than XML-RPC. Fortunately, you don’t need all of SOAP just to expose a web application as a web service. The part you do need looks basically like XML-RPC with a more general XML encoding scheme. SOAP gives you access to a broader range of data types than XML-RPC, and it lets you define your own, but this introductory treatment will just treat it as an analogue of XML-RPC.

SOAP Quick Start: Surfing the Google API

Just as with REST and XML-RPC, a SOAP message is typically sent as the data portion of an HTTP POST request. Just as with those other protocols, then, it’s technically possible to use a SOAP web service without any SOAP-specific tools: Just construct the message by hand, send it off with urllib, and parse the response with the xml.sax module. Realistically, though, you need a SOAP library to use SOAP with Python. A SOAP library will deal with transforming Python data structures to SOAP’s XML representations and back, just as xmlrpclib does for XML-RPC.

Unfortunately, there’s no “soaplib” bundled with Python, but you can download one. There are two SOAP libraries for Python. The one library we’ll use in this chapter is SOAPpy, which provides an xmlrpclib-like version of a SOAP client and a SOAP server.

If you’re running Debian GNU/Linux, you can just install the “soappy” package; if not, you can download the distribution from http://pywebsvcs.sourceforge.net/. ZSI, the other Python SOAP package, is also available from that site. Be warned that SOAPpy requires two other packages: fpconst, a floating-point library, and PyXML, a set of XML utilities. More information and links to the packages are available in the SOAPpy README file.

To start using SOAPpy, you’ll need a SOAP service to call. As it happens, Google Web Services, one of the most celebrated web service APIs, uses SOAP. Google’s SOAP API gives you access to the data set used by their search engine.

To use the APIs, you’ll need to sign up with Google for an API key, just as with Amazon in the REST example. Also as with Amazon, Google puts limitations on your use of their API. Whereas Amazon allows only one API call per second per IP address, Google limits you to one thousand API calls per day per API key. Be careful while experimenting with this service; a script that goes out of control could leave you unable to make any more API calls until the next day.

Sign up for a Google API key at http://api.google.com/createkey.

520

TEAM LinG