Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
177
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Writing Shareware and Commercial Programs

The preceding code fragment demonstrates some easy ways to handle dates with Python. The input to this function is a string in the format of “5/12/2004” (May 12, 2004). Typically, this would be the expiration date of the user’s subscription. All such dates are normalized by this function to be in the format “20050512”. The advantage of this is that later, dates can be easily compared with simple subtractions:

#md5 sums a file

def getmd5sum(filename): m = md5.new()

f = open(filename)

#insert error checking here s = f.read()

f.close()

m.update(s)

return m.hexdigest()

#md5 sums data def getmd5(data):

m = md5.new() m.update(data) #sys.stderr.write(“%s”%m) return m.hexdigest()

These two functions show how to use the md5 module in your programs to verify a file’s integrity by producing a small enough number that can be compared against a known value.

Of course, there is a simple gotcha in the getmd5sum function for the sake of simplification. If you use the read method of the f file, and read it all into s, and if the file you’re reading is very, very large, it can cause s to take up all of your free memory, causing the Python program to crash or, even worse, make your system unusable by hogging that important resource:

def getresult(who,ip,passwd): userdb = {}

filedb = {}

data = file(“newuserdb.txt”).readlines() for line in data:

line = line.replace(“\r”,””).replace(“\n”,””) if line == “”:

continue

#print “Doing %s”%line stuff=line.split(“ “) try:

name = stuff[0] date = stuff[1]

passwordhash = stuff[2] number = stuff[3]

userdb[name] = [date,passwordhash,number] except IndexError:

#print “IndexError on *%s*”%line pass

From reading the preceding fragment, you should have a clear sense of the format of the CANVAS customer database file. Your own customer database file may be similar, assuming you keep it as a plaintext file. It’s also perfectly acceptable to use something like Pickle() here for code as small as this.

391

TEAM LinG

Chapter 18

Alternatively, you could use Python’s excellent database support to connect to a mySQL or other relational database that is more suitable for large data sets. One good reason to keep it as a text file as we do here is that you’ll invariably find yourself looking at it and editing it manually to fix the kind of data entries problems that lead to common customer billing problems.

In this case, a sample line we’re reading in might look like this:

JUSTINE 5/12/2005 6372a7c27d981de94e464b934e9a6ebc 279321

The preceding code says that there is a user JUSTINE whose subscription expires on 5/12/2005, and who has a user ID of 279321. The long hexadecimal string between the date and JUSTINE’s user ID is the password hash (in a different format than what was used in the example in Chapter 9). Remember that this is a continuation of the function, so make sure that your indentation is correct:

#check existence of user if not userdb.has_key(who):

#print “No %s in userdb!”%who error(ip=ip,who=who)

At any point in the program, if we find that there is an error of any kind, we print an error message and bail. CGI programs need to be careful about what they print out — anything they print may show up on the user’s screen as the result of their query! Therefore, print statements are commented out here, but they can be enabled if you are testing this from the command line.

#check password

hash = userdb[who][1]

if hash != getmd5(passwd):

#print “Passwords did not match (%s)!”%hash error(ip = ip, who = who)

Note here that we give no indication as to whether it’s the user name or the password that was incorrect. This is proper security best practices for web applications. While the webmaster and application developer should be able to get the information as it’s printed in the error function, the user reading the error message in his or her browser shouldn’t be given information that could indicate to a hostile attacker that they’ve found the name of a valid user.

date = userdb[who][0]

The preceding line will return the date of this user (now authenticated with user name and password) from our customer database.

number = int(userdb[who][2])

In assigning the values of date and number, we pull the customer number from the user database. This line assumes that the integers are in base 10. If you think you’ll be dealing with numbers such as 0x01020304, you’ll want to use int(number,0) to automatically convert them.

date_normalized = normalizeDate(date) import time

today = “%4.4d%2.2d%2.2d” % time.localtime()[:3]

392

TEAM LinG

Writing Shareware and Commercial Programs

Note that you can import modules at any point, and you may find yourself doing this in production code, although it’s not the best form.

if int(normalizeDate(date)) < int(today): #customer has run out of canvas updates

#Note - in production this is formatted as html print “Your subscription has run out on %s” % date sys.exit(1)

The ability to compare dates with a simple integer operation is why we have normalizeDate. It enables a human-readable date to be converted to a format that Python can work with easily.

logfd=open(“/var/CANVAS/log.txt”,”a+”) if logfd != None:

import time

logfd.write(“CANVAS downloaded: %s %s %s\n”%(time.ctime(), ip, who)) logfd.close()

try:

import os

#remove old directory, if one exists os.system(“rm -rf /var/CANVAS/CANVAS_%s”%who ) #create a new CANVAS directory for this person

os.system(“cp -R /var/CANVAS/CANVAS_DEMO /var/CANVAS/CANVAS_%s”%who) #then add watermark

The watermarking code has been removed for public release — you can easily come up with your own. We’re careful here to protect the input to os.system(). If you get lazy, using os.system() can have severe consequences for your security. In Python 2.4 and later, the subprocess.call() function has been added specifically to help address this.

#then compress and archive.

#note: only works on linux or places with gnu tar

os.system(“cd /var/CANVAS; tar -czf CANVAS_%s.tgz CANVAS_%s > /dev/null” % (who, who))

os.system(“rm -rf /var/CANVAS/CANVAS_%s” % who)

#then serve up

fd = open(“/var/CANVAS/CANVAS_” + who + “.tgz”) except:

import traceback traceback.print_exc(file = sys.stderr)

The traceback module is useful for your own debugging — we’re careful to send its output to stderr here so that the user doesn’t learn too much about our CGI script. (Of course, it’s now been printed — the whole script — in a book!)

print “CANVAS not found!” error()

sys.exit(1)

data = fd.read() fd.close()

os.system(“rm -rf /var/CANVAS/CANVAS_%s.tgz” % who)

393

TEAM LinG

Chapter 18

print “Content-Type: application/octet-stream”

print “Content-Disposition: attachment; filename=\”%s\”” % (“CANVAS_” + who + “.tgz”)

print “Content-Size: %d” % len(data) print “”

sys.stdout.write(data)

sys.exit(1)

This next, and final, code fragment starts the whole script off by loading in variables passed from the web server using the cgi.FieldStorage method.

def run_as_cgi():

form = cgi.FieldStorage()

#print “””Content-Type: text/html

#

#”””

#print “form=*%s*”%os.getenv(“REMOTE_ADDR”) if form.has_key(‘username’):

if not form.has_key(‘password’): error()

who = form[‘username’].value passwd = form[‘password’].value ip = os.getenv(“REMOTE_ADDR”) getresult(who, ip, passwd)

else:

#If there are no form values, return a page with the form itself.

#See chapter 22 for how to write a form for yourself. Remember

#that you need to include all of the form elements that will allow this to

succeed!

print formhtml

Watermarking is a funny thing — often, simply saying you do watermarking is as effective as doing it. If you do decide to go this route, you’ll want to change your watermark every six months or so. But you should note that anyone with two copies of your program will be able to erase it, because they’ll be able to compare the two versions.

And that’s it! That’s a working starting point for watermarking software.

Other Models

Looking at your product as more than just licensing revenue enables you to structure it to take advantage of Python in many other ways. This simple set of code exemplifies the main revenue stream of CANVAS: recurring subscriptions:

def registerModule(name):

“imports and adds a exploit module to our list, returns 1 on success” #print “RegisterModule %s”%name

if name in exploitnames: return

sys.path.append(“exploits/%s”%name)

try:

code=”import %s as exploitmod”%name exec code

394

TEAM LinG

Writing Shareware and Commercial Programs

except:

print “Was unable to import %s”%name return 0 #failure

#go on to add it to our list, since we were able to import it exploitnames.append(name)

exploitmods.append(exploitmod) return 1 #success

This code uses a directory name from the CANVAS_name/exploits/ directory and then assumes that the directory contains a file with the same name as the directory, with .py. It then creates a mini-script to import that module into the current namespace, and adds it to our internal list of modules.

As you can see, the CANVAS product is written so that it can continually be expanded with additional modules, which run within the framework. These enable the company selling it as a service to provide a continually expanding value to customers willing to pay for a subscription.

Therefore, in your own endeavors, take advantage of Python’s dynamic nature wherever possible to provide this kind of functionality in your product. While a C++ programmer can look to a shared library (DLL or .so files) loading as a way of expanding their features, only Python can do this so quickly and easily, while also offering a variety of important buzzwords that actually mean something: introspection, comprehensive exception handling, and portability built in!

Selling as a Platform, Rather Than a Product

Likewise, now that you have extensibility built into your product in fifteen lines or less, you can offer your product as a framework on which other people can build. Microsoft calls this “developing an ecosystem,” whereby they sell one copy of Microsoft Windows to everyone on Earth, and everyone else has to build to that standard. Not that this will automatically make you into Microsoft, but you don’t have to look too far to see that this model can work.

This is where you may find that having a completely open codebase at each of your customer sites is a huge advantage. They can build it into their own processes, doing things you would never have imagined. For instance, early on in CANVAS development, Immunity sold a copy to a large software company that then developed a process of their own using CANVAS. They were using CANVAS to scan their entire network once a day. They would use the results to automatically upload the patches needed to address the issue that had permitted them to break in. This is the sort of automation you allow your customers to have when your product is Pure-Python. They will find a lot of value in that.

Additionally, Python is so easy to debug that many non-programmers have sent Immunity patches they’ve figured out on their own in their environment. Although these patches may not always have been of the quality you wanted, it’s important to note that you won’t see customers going to that sort of effort for any compiled product or even a large opensource C codebase.

Your Development Environment

The right IDE makes all the difference with Python. Although many Python programmers stick to basic editors like vim, it and other vi variants tend to deal poorly with tabs and spaces, and intermixed tabs and spaces can make it nearly impossible to find errors in your programs. Besides, it makes sense to take full advantage of anything that can make your development team more agile than the competition — that’s why you went with Python in the first place!

395

TEAM LinG

Chapter 18

To make everything work as well as possible, you should have your entire development team working with the same IDE and have them use the same number of spaces for their indentation. This will save you reams of problems down the road.

Immunity standardized on WingIDE, although BlackAddr and Emacs also work. Figure 18-1 shows WingIDE 2.0 running on Windows XP. WingIDE has several other features specific to Python that you may not have needed in your C/C++/Java IDE. Primary among these features is a “Source Assistant,” which will attempt to guess the type of any variable you click. This can be a time-saving feature for Python, which is dynamically typed, which essentially means that the value of a variable may be different at different times. The key to selecting an IDE is to choose one that runs on all of the platforms your developers use — in this case study, Linux and Windows.

Figure 18-1

Part of choosing your IDE is acknowledging that your IDE should come from the same community as your product. In this case, WingIDE developers can often be found on the pyGTK mailing lists. This lets you know that you’ll be well supported when you have problems debugging our own pyGTK-based program. If your product is QT-based, BlackAddr might be a good choice. If you’re already familiar with it, Emacs is also quite good with Python, and IDLE is always available as a backup because it comes with Python. Of course, each Python programmer has his or her favorite, and as Python becomes more and more popular, the field of editors and IDEs gets more crowded. CodeEditor (http://pythoncard. sourceforge.net/tools/codeEditor.html) is another favorite.

Finding Python Programmers

Whenever a company decides to build using a language or platform, they need to look at how difficult it is to find developers to work on that product as they expand. Even if your initial team are all master Python programmers, unless you can replace them as they come and go, and unless you can grow at a reasonable cost, Python might not be your best choice of languages. Thankfully, Python programmers are quite easy to find — often, they’re willing to leave other jobs to program in Python. There are really two major ways to grow your team, as described in the following sections.

396

TEAM LinG

Writing Shareware and Commercial Programs

Training non-Python Programmers

Python is reasonably C-like. Once you can get your new hire past the idea that he has to work within WingIDE (or your IDE of choice) and he has to work within a whitespace-sensitive language, it’s easy to get him up to speed on Python itself. This book might even help get him past the initial hump of learning new APIs.

The IDE shouldn’t be a big deal, but programmers can be creatures of habit. This is understandable, as once you’ve dedicated time to learning a complex set of tools like an editor, it’s hard to take baby steps again. You need to emphasize the benefits of the environment you’re offering, and provide new Python programmers the time they need to learn their environment.

Python Employment Resources

Finding Python programmers specifically can be done on any technical mailing list. You’ll be surprised how many skilled Python programmers there are in your direct community. Immunity draws from security mailing lists and conferences (Python has made significant inroads into the security community) and from the local Linux User Group mailing lists. Finding a list of all Python projects on SourceForge and e-mailing the developers of those will get you more responses than you might think.

Largely, however, because Python is so easy to learn, look for domain expertise first — it’s often harder to get someone who can think about the problems you’re working on. Once you’ve got that, you’ll develop their Python experience later.

Python Problems

Many people initially go through an infatuation phase with Python. Then, like in any relationship, they realize that Python is not perfect. In fact, the more you use it, the more you realize there are gremlins at every corner. Although some of those gremlins are covered here, it helps to know that not all is perfect in the promised land.

Porting to Other Versions of Python

Python can feel like it is not a stable platform for development if you use portions of it that are changing and you aren’t forewarned. Compared to a stable C software stack, Python applications that rely on these changing parts of the language can appear to be fragile. If you want to build something large, you will most likely have to stay with one version of Python, rather that let your users and developers use the most recent version of Python. This is often a problem, as you now face a dilemma:

1.The binary modules (such as pyGTK) that you rely on will usually only support the latest version of Python.

2.Some things in the new Python are guaranteed to break your application in subtle ways, until you and other users have a chance to shake this out. This process does not directly benefit your business.

This is not a dilemma any company wants to face with their development platform, though some form of this dilemma will often turn up no matter what development language or environment you pick. The only real solution is more engagement with the open-source community that develops the software

397

TEAM LinG

Chapter 18

stack. If you have the skills on hand, you can become a pyGTK (or whatever module you use) developer and do testing and maintenance on old versions for them, and receive a quid pro quo of getting better support than you could possibly buy.

Making the decision to give up the benefit of moving to new versions of Python or paying the size and complexity price of distributing your own Python with your product may initially seem counter-intuitive, but take a quick look at some of the things the Python development process brings to you that you may not be expecting if you are used to developing to a more commercialized platform:

On Python 2.3.4 or previous Python versions, the following code will produce the result you would expect from a C program:

>>> print “%x” % -2 fffffffe

However, a later version of Python will behave differently, as seen below with Python 2.4:

>>> print “%x” % -2 -2

In this trivial example, the problem is obvious, and somewhat documented in “future” warnings in versions of Python prior to 2.4. (When you see warnings pertaining to future changes to the language, if you expect to use that features you should get onto a Python mailing list and ask what will change! It can save you a lot of work down the line.) Once your code is written and tested, and has stabilized into everyday use, however, this kind of problem may be hidden under many layers of complexity.

This kind of language wart may seem like a small thing. But what happened is that everything under the covers regarding how unsigned integer types are implemented has changed. Integers are a base type you may be using in more than a few places in your program, so this kind of change can be very significant. This is the sort of thing that introduces subtle bugs, costing development time that could otherwise be going into new and cool features. This is not the only thing that changed in Python 2.4 that will affect you, and discovering every new incompatibility is going to potentially destroy your schedule if you don’t keep an eye out for them.

The Python community doesn’t seem to see this as a problem — they’re simply not as conservative about such changes as other platform stacks are, but it is a huge reason for other platforms’ commercial acceptance, such as Java’s. Stability is more important than almost anything else when you’re looking at a software development life cycle of ten years or more, and Python still doesn’t have that mindset.

As another example, the struct.pack(“L”,1) function will return 32 bits on 32-bit systems, and 64 bits on 64-bit systems (such as the new AMD processor). This can completely break your network protocol support silently and irrevocably. For this reason, large parts of CANVAS eschew struct.pack and struct.unpack altogether, functions that are prolific in common networking code.

Porting to Other Operating Systems

Python is not just a virtual machine, an API, and a set of syntax. It’s also a set of libraries that wraps each operating system’s internal API and converts the OS concepts to Python concepts. This is important when dealing with threading, certainly, but a lot of other APIs can cause you problems because the Python community wrapped them thinly, or without a lot of work to isolate you, the Python programmer, from these differences between platforms.

398

TEAM LinG

Writing Shareware and Commercial Programs

Marshalling

In computer science terms, to marshall or unmarshall is to take a program’s variables and transform them into a binary representation, which can then be sent over the network or used by another API. The Pickle module’s capability to turn Python objects into a string is an example of marshalling. This is commonly used in Remote Procedure Call (RPC) libraries, and is also used here to format arguments for the Linux system call ioctl. Ioctl takes different arguments on each Unix-like operating system, and can also be endian dependent, such that Linux on the SPARC would need different arguments than Linux on x86.

For example, the socket API, a commonly used API, will throw different exceptions on Windows than it will on Linux. This means you’ll often need a blanket try: and except:, or you’ll have to catch both possible exceptions, making your code more complex than it seems like it should be — you’d expect that the same error could be handled the same way on two different platforms. Not being able to do this can make you feel that the code to do this looks and feels ugly, because it can be puzzling to see apparent duplication of effort.

Here’s some CANVAS code that gets the IP address of a particular interface:

def getLinuxIPFromInterface(interface): import fcntl

SIOCGIFADDR = 0x8915

s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM,0)

r = fcntl.ioctl(s.fileno(), SIOCGIFADDR,interface+256*’\x00’) IP = socket.inet_ntoa(r[20:24])

return IP

You’ll notice that this code has to manually marshall the arguments into a poorly documented system call using a magic number (ioctl and 0x8915, respectively) available only on Unix and Linux; and the magic number is different on different Unix systems (other Unix platforms will have a value that is different from Linux, and from each other). As you can imagine, this code does not work on Mac OS X, Solaris, Linux for SPARC or any other Unix.

The basic rule is that where Python has thin wrappers over the OS layer, your development is going to suffer. Unfortunately, sometimes where Python has a thick, well-tested, and mature wrapper, such as for threads, you’re going to suffer from third-party binary modules using it improperly (again, such as pyGTK).

Debugging Threads

Put simply, you can’t debug threads in Python. Print statements are as good as it gets for now. This is a significant omission in the environment, as all real programs are heavily threaded. Programmers who come from other backgrounds may find this difficult, so program carefully around threads.

Common Gotchas

Large books have been written on “gotchas” in various languages, but Immunity found that one gotcha in particular riddled their early code. While many experienced Python programmers know about this particular feature of the language, it is not intuitively obvious to beginner programmers and can cause

399

TEAM LinG

Chapter 18

otherwise fast programs to grind to a halt on large data sets. The problem stems from the fact that in Python, many data types, such as strings, are considered immutable (that is, they cannot be changed or added to). Programmers used to C’s pointer types are often prone to this mistake more than other programmers but are also equipped with the implementation knowledge to avoid it.

Take, for example, the following Python program fragment:

A=”A”

B=”B”

A+=B is really equivalent to the C code fragment:

A = malloc(strlen(A) + strlen(B)); sprintf(A, “%s%s”, A,B);

Hence, this is an O(N) operation, which means that it takes a linear time to run. This means that, for example, if you have 10 items, it takes 10 operations to work. If you put in 100, it takes 100. There are faster ways to do this; and in computer science, considerable research is done to identify and avoid situations where you’re stuck with performance this bad. If you run this in a loop — say, as part of a file download program — you will suffer horrible performance.

Therefore, this code

A = “A”

B = “B” * 50

for i in range(0,2000): A += B

should really be this much, much faster version:

A = “A”

B = “B” * 50 alist = [A]

for i in range(0,2000): alist.append(B)

A = “”.join(alist)

Fixing this simple gotcha may improve your performance a thousand times. Other than simple fixes like that, Immunity tries to maintain code that is as readable as possible:

Eschew the complex parts of Python — for example, lambda constructions or map statements — wherever you can unless you know why you want them.

Keeping your Python self-documenting is a matter of discipline.

Always assume that the next person to debug your code is looking at Python code for the first time. Doing this can help you write your code so that you will understand it better.

Por table Distribution

Distributing your program is more than just sending people a tarball (Unix-speak for a tar.gz file, similar in Windows to a .zip file) — it’s also about having that tarball work the way you expect it to when it

400

TEAM LinG