
Dr.Dobb's journal.2006.02
.pdf
#381 FEBRUARY 2006
Dr.DobbsSOFTWARE
TOOLS FOR THE
PROFESSIONAL
J O U R N A L PROGRAMMER
http://www.ddj.com
64-BIT COMPUTING!
Multiplatform Porting to 64 Bits
Mac OS X & 64 Bits
Examining µC++
Native Queries for Persistent Objects
Dynamic Bytecode
Instrumentation
$ 4 . 95US $ 6 . 95CAN |
|
Summer of Code |
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 2 |
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
74470 01051 |
7 |
Range
Tracking &
Comparison
GIF Images &
Mobile
Phones
Inside Sudoku
Viewing &
Organizing
Log Files
Porting
Real-Time
Operating
Systems

C O N T E N T S
F E A T U R E S
FEBRUARY 2006 VOLUME 31, ISSUE 2
Multiplatform Porting to 64 Bits 20
by Brad Martin, Anita Rettinger, and Jasmit Singh
Porting 300,000 lines of 32-bit code to nearly a dozen 64-bit platforms requires careful planning.
Mac OS X Tiger & 64 Bits 26
by Rodney Mach
Before migrating to 64-bit platforms, the first question to ask is whether you really need to do so.
Ajax: Asynchronous JavaScript and XML 32
by Eric J. Bruno
Ajax, short for “Asynchronous JavaScript and XML,” lets you create dynamic web pages.
Examining C++ 36
by Peter A. Buhr and Richard C. Bilson
C++ was designed to provide high-level concurrency for C++.
Native Queries for Persistent Objects 41
by William R. Cook and Carl Rosenberger
Among other benefits, native queries overcome the shortcomings of string-based APIs.
Dynamic Bytecode Instrumentation 45
by Ian Formanek and Gregg Sporar
Dynamic bytecode instrumentation is an innovative technique that makes profiling fast and easy.
Range Tracking & Comparison Algorithms 50
by Kirk J. Krauss
Some information is best viewed as a list of ranges. Kirk presents algorithms for dealing with ranges.
Displaying GIF Images on J2ME Mobile Phones 52
by Tom Thompson
Surprisingly, many Java-based mobile phones couldn’t display GIF image files— until now.
Sudoku & Graph Theory 56
by Eytan Suchard, Raviv Yatom, and Eitan Shapir
Understanding graph theory is central to building your own Sudoku solver.
Google’s Summer of Code: Part III 58
by DDJ Staff and Friends
Google’s Summer of Code resulted in thousands and thousands of lines of code. Here are more students who participated.
Viewing & Organizing Log Files 61
by Phil Grenetz
LogChipper, the tool Phil presents here, lets you view and organize the contents of log files.
E M B E D D E D S Y S T E M S P R O G R A M M I N G
Porting an RTOS to a New Hardware Platform 65
by Byron Miller
Porting software to new hardware boards doesn’t need to be difficult.
C O L U M N S
Programming Paradigms 68 |
Chaos Manor 74 |
by Michael Swaine |
by Jerry Pournelle |
Everything Michael knows he attributes to Roger |
Beware of Sony’s Digital Rights |
Penrose’s The Road to Reality: A Complete |
Management (DRM) scheme, which |
Guide to the Laws of the Universe. |
covertly installs itself. |
Embedded Space 71 |
Programmer’s Bookshelf 77 |
by Ed Nisley |
by Peter N. Roth |
Ed remembers to tell you that memory really |
Peter reviews Stephen C. Perry’s Core |
does matter. |
C# and .NET. |
F O R U M
EDITORIAL 10
by Jonathan Erickson
LETTERS 12 by you
DR. ECCO’S
OMNIHEURIST CORNER 14 by Dennis E. Shasha
NEWS & VIEWS 16 by DDJ Staff
PRAGMATIC EXCEPTIONS 24 by Benjamin Booth
OF INTEREST 79 by DDJ Staff
SWAINE’S FLAMES 80 by Michael Swaine
NEXT MONTH: The smart thing to do in March is to read our issue on Intelligent Systems.
http://www.ddj.com |
Dr. Dobb’s Journal, February 2006 |
5 |

D R . D O B B ’ S O N L I N E
C O N T E N
O n l i n e E x c l u s i v e s
http://www.ddj.com/exclusives/
VB6 to VB.NET Migration
There are millions of Visual Basic 6 developers and an enormous amount of VB6 code. What does the landscape look like for this tremendous pool of legacy code and talent?
The Obsolete Operating System
To some, the modern definition of a computer operating system is obsolete.
T S
T h e C / C + +
U s e r s J o u r n a l
http://www.cuj.com/
Flexible C++ #13: Beware Mixed Collection/Enumerator Interfaces
When the semantics of collection and enumerator interfaces are blurred, the result can mean trouble.
D o b b s c a s t A u d i o
http://www.ddj.com/podcast/
SysML: A Modeling Language for Systems Engineering
Chris Sibbald discusses SysML, a visual modeling language for systems engineering applications.
Computer Theft: A Growing Problem
Biometric and computer security expert Greg Chevalier discusses the growing problem of mobile computer theft, and what you can do to combat it.
AADL: A Design Language for Embedded Systems
Peter Feiler discusses the Architecture Analysis and Design Language, a textual and graphical language that supports modelbased engineering of embedded real-time systems.
COM Interop
.NET guru Juval Lowy explores how COM Interop can allow legacy VB6 applications to coexist in a .NET world.
W i n d o w s / . N E T
http://www.ddj.com/topics/windows/
An Overview of Generics
In the .NET Framework 2.0, C# and Visual Basic .NET support generics.
D o t n e t j u n k i e s
http://www.dotnetjunkies.com/
Top 10 Must-Have Features in O/R Mapping Tools
What features would a good O/R mapping tool provide you with and how can it be beneficial to you?
B Y T E . c o m
http://www.byte.com/
Why Can’t Windows Do Windows?
Multimedia apps require lots of desktop real estate, so having two or more displays can be the answer — if you can get them to work.
T h e N e w s S h o w
http://thenewsshow.tv/
The Feds and IT Failures
The IRS spent nearly $2 billion on business modernization before it began to process even 1 percent of tax returns.
R E S O U R C E
C E N T E R
As a service to our readers, source code, related files, and author guidelines are available at http://www.ddj.com/. Letters to the editor, article proposals and submissions, and inquiries should be sent to editors@ddj.com. For subscription questions, call 800-456-1215 (U.S. or Canada). For all other countries, call 902- 563-4753 or fax 902-563-4807. E-mail subscription questions to ddj@neodata.com, or write to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188.
If you want to change the information you receive from CMP and others about products and services, go to http://www.cmp.com/ feedback/permission.html or contact Customer Service at Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188.
Back issues may be purchased prepaid for $9.00 per copy (which includes shipping and handling). For issue availability, send e-mail to orders@cmp.com, fax to 785-838-7566, or call 800-444-4881 (U.S. and Canada) or 785- 838-7500 (all other countries). Please send payment to Dr. Dobb’s Journal, 4601 West 6th Street, Suite B, Lawrence, KS 66049-4189. Digital versions of back issues and individual articles can be purchased electronically at http://www.ddj.com/.
W E B S I T E
A C C O U N T A C T I VAT I O N
Dr. Dobb’s Journal subscriptions include full access to the CMP Developer Network web sites. To activate your account, register at http://www.ddj.com/registration/ using the web ALL ACCESS subscriber code located on your mailing label.
DR. DOBB’S JOURNAL (ISSN 1044-789X) is published monthly by CMP Media LLC., 600 Harrison Street, San Francisco, CA 94017; 415-947-6000. Periodicals Postage Paid at San Francisco and at additional mailing offices. SUBSCRIPTION: $34.95 for 1 year; $69.90 for 2 years. International orders must be prepaid. Payment may be made via Mastercard, Visa, or American Express; or via U.S. funds drawn on a U.S. bank. Canada and Mexico: $45.00 per year. All other foreign: $70.00 per year. U.K. subscribers contact Jill Sutcliffe at Parkway Gordon 01-49-1875-386. POSTMASTER: Send address changes to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80328-6188. Registered for GST as CMP Media LLC, GST #13288078, Customer #2116057, Agreement #40011901. INTERNATIONAL NEWSSTAND DISTRIBUTOR: Source Interlink International, 27500 Riverview Center Blvd., Suite 400, Bonita Springs, FL 34134, 239-949-4450. Entire contents © 2006 CMP Media LLC.
Dr. Dobb’s Journal→ is a registered trademark of CMP Media LLC. All rights reserved.
6 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |

Dr.DobbsSOFTWARE
TOOLS FOR THE
PROFESSIONAL
J O U |
R N A L |
|
PROGRAMMER |
||
|
|
|
|
|
|
P U B L I S H E R |
E D I T O R - I N - C H I E F |
||||
Michael Goodman |
|
Jonathan Erickson |
E D I T O R I A L
MANAGING EDITOR
Deirdre Blake
SENIOR PRODUCTION EDITOR
Monica E. Berg
ASSOCIATE EDITOR
Della Wyser
COPY EDITOR
Amy Stephens
ART DIRECTOR
Margaret A. Anderson
SENIOR CONTRIBUTING EDITOR
Al Stevens
CONTRIBUTING EDITORS
Bruce Schneier, Ray Duncan, Jack Woehr, Jon Bentley,
Tim Kientzle, Gregory V. Wilson, Mark Nelson, Ed Nisley,
Jerry Pournelle, Dennis E. Shasha
EDITOR-AT-LARGE
Michael Swaine
PRODUCTION MANAGER
Stephanie Fung
I N T E R N E T O P E R A T I O N S
DIRECTOR
Michael Calderon
SENIOR WEB DEVELOPER
Steve Goyette
WEBMASTERS
Sean Coady, Joe Lucca
A U D I E N C E D E V E L O P M E N T
AUDIENCE DEVELOPMENT DIRECTOR
Kevin Regan
AUDIENCE DEVELOPMENT MANAGER
Karina Medina
AUDIENCE DEVELOPMENT ASSISTANT MANAGER
Shomari Hines
AUDIENCE DEVELOPMENT ASSISTANT
Andrea Abidor
M A R K E T I N G / A D V E R T I S I N G
ASSOCIATE PUBLISHER
Will Wise
SENIOR MANAGERS, MEDIA PROGRAMS see page 78
Pauline Beall, Michael Beasley, Cassandra Clark, Ron Cordek, Mike Kelleher, Andrew Mintz
MARKETING DIRECTOR
Jessica Marty
SENIOR ART DIRECTOR OF MARKETING
Carey Perez
DR. DOBB’S JOURNAL
2800 Campus Drive, San Mateo, CA 94403
650-513-4300. http://www.ddj.com/
CMP MEDIA LLC
Steve Weitzner President and CEO
John Day Executive Vice President and CFO
Jeff Patterson Executive Vice President, Corporate Sales
and Marketing
Bill Amstutz Senior Vice President, Audience Marketing
and Development
Mike Azzara Senior Vice President, Internet Business
Joseph Braue Senior Vice President, CMP Integrated
Marketing Solutions
Sandra Grayson Senior Vice President and General
Counsel
Anne Marie Miller Senior Vice President, Corporate Sales
Marie Myers Senior Vice President, Manufacturing
Alexandra Raine Senior Vice President, Communications
Kate Spellman Senior Vice President, Corporate
Marketing
Michael Zane Vice President, Audience Development
Robert Faletra President, Channel Group
Tony Keefe President, CMP Entertainment Media
Vicki Masseria President, CMP Healthcare Media
Philip Chapnick Senior Vice President, Group Director,
Applied Technologies Group
Paul Miller Senior Vice President, Group Director,
Electronics and Software Groups
Fritz Nelson Senior Vice President, Group Director,
Enterprise Group
Stephen Saunders Senior Vice President, Group
Director, Communications Group
Printed in the USA
American Buisness Press
8 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |

|
E D I T O R I A L |
Bits and Bytes… |
must think so, as the company recently announced at least some of its upcoming server |
|
f you believe everything you read, “64 bits” is this week’s bee’s knees of computing. Microsoft |
|
Iofferings will run only on x86-compatible 64-bit processors. In fact, the ready availability of 64- |
|
bit platforms is an important step forward. Still, that doesn’t necessarily mean it’s time to post your |
|
32-bit system on Craigslist or eBay. There’s a time and place for everything, including 64 bits. |
|
According to Microsoft’s Bob Kelly, the time and place for 64-bit systems is with performance- |
|
critical applications such as Microsoft’s Exchange 12 e-mail server and its SQL Server database. |
|
Other applications areas that benefit from 64-bit processors are complex engineering programs, |
|
games, and anything that involves audio/video encoding. Anything, in other words, which takes |
|
advantage of 64-bit arithmetic or requires addressing datasets beyond the 4-gigabyte constraint of |
|
32-bit processors. A 64-bit processor can address up to 16 exabytes of memory— that’s 18-billion |
|
gigabytes, and more than enough for most compute-intensive applications. |
|
Of course, in the spirit of “there’s no such thing as a free lunch,” the memory used by a 64-bit |
|
processor’s larger integers and/or pointers can also lead to more paging and disk I/O, thereby |
|
degrading performance. This means that while some applications don’t need 64-bit integers |
|
and/or pointers, they end up paying for them anyway. |
|
In short, the fundamental difference between 32-bit and 64-bit processors isn’t necessarily the |
|
speed of the processor, but the amount of data that can be processed that, at times, lends the |
|
appearance of faster speed. That said, there are workarounds (some of which involve virtual |
|
memory) that let you utilize 64-bit addressing on systems with less than 4 GB of memory, not to |
|
mention that you can gain some performance pop by running a 64-bit processor in 32-bit mode. |
|
The bottom line is that there’s still a lot to learn when it comes to effectively using next-generation |
|
platforms, and the sooner we jump on them, the better prepared we will be for the future. |
|
Speaking of the future, anyone who doesn’t think the wireless world has found a home in |
|
academia hasn’t sat in on a college lecture class recently. What with everything from iPods and |
|
Instant Messaging to e-mail and FreeCell, there’s a whole lot of something going on, most of |
|
which seems to have little to do with learning. |
|
That’s changing, however, with the advent of “Interactive Audience Response Systems,” |
|
referred to simply as “clickers”— radio frequency (RF) sender/receiver devices that let students |
|
and teachers interact in real time. A typical student/teacher scenario goes something like this: |
|
Students buy or rent a clicker (somewhat akin to a TV remote-control device but with fewer |
|
keys) at the beginning of the semester and register it with the school. Students can use a single |
|
clicker in multiple classes. When instructors want feedback, students answer, and their responses |
|
are instantly available and/or recorded for later review. Because many universities now have |
|
wired lecture halls, tracking and storing clicker information for professors isn’t a big deal. |
|
Alternatively, instructors can plug USB readers into their laptops and store the information locally. |
|
With typical systems, up to 1000 student RF keypads can be used per receiver, with up to 82 |
|
sessions (channels) running at the same time in close proximity without interference. |
|
There are a number of companies that offer this technology, including Turning Technologies |
|
(http://www.turningtechnologies.com/) and eInstruction (http://www.einstruction.com/). |
|
eInstruction claims its system is being used in 800 institutions in 50 states and 20 countries, with |
|
more than a million devices in the hands of students. |
|
Granted, audience response systems such as these have been around for a while. Early |
|
implementations were based on infrared technology (IR), but RF offers clear advantages in range |
|
and the ratio of sender units to the receiver. Additionally, some vendors offer “virtual clickers”— |
|
soft keypads that run on PCs or PDAs that support all the features of standard clickers but with |
|
the added functionality of text messaging, which lets students submit questions to teachers and |
|
offers support for response to fill-in-the-blank and essay questions. |
|
And on a sad note, John Vlissides, coauthor of the seminal book Design Patterns: Elements of |
|
Reusable Object-Oriented Software, recently passed away. Along with his coauthors who made |
|
up the “Gang of Four,” John was a recipient of the Dr. Dobb’s Journal Excellence in |
|
Programming Award in 1998. He was also the author of several other books, most of which |
|
focused on software design and patterns. For much of his career, John was a researcher at IBM’s |
|
T.J. Watson Research Center. Prior to joining IBM Research, John was a postdoctoral scholar in |
|
the Computer Systems Lab at Stanford University, where he codeveloped InterViews. Memories of |
|
John have been put together on Ward Cunningham’s Wiki (http://c2.com/cgi/wiki?JohnVlissides/). |
Jonathan Erickson editor-in-chief jerickson@ddj.com
10 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |

L E T T E R S
|
|
|
|
|
|
T |
|
, |
|
|
S |
||
|
|
O |
|
|||
|
|
|
P |
|
|
|
|
|
S |
|
|
|
|
|
B |
|
|
|
|
|
B |
|
|
|
|
|
|
O |
|
|
|
|
|
|
D |
|
|
|
|
|
|
|
2 |
|
S |
|
T |
|
N |
2 |
E |
C |
Nuclear versus Wind Energy
Dear DDJ,
Luis de Sousa stated in “Letters” (DDJ, September 2005) that nuclear is not a clean energy due to mining, purifying, and disposing of nuclear wastes. Okay, as a 25year nuclear health physicist who dealt with nuclear waste issues in about 15 of the 48 contiguous states, I might agree with the waste issue because our hosed-up government can’t find anybody willing to give enough kickbacks to make some Senator or Representative rich enough to make the waste issues work.
However, to compare the first two issues— mining and purifying—I have to ask Luis how does he expect the windmills to be made? Will the same God that makes the wind provide the metal for the towers, the blades, the housings, and the generators; the metal for the cabling that will run for how far from the wind towers; the insulation for these same cables? (As an aside, the creation of insulation for cable is one of the most polluting manufacturing processes known to man. And the generators, breakers, and switches of a sea-mounted windpowered farm filled with PCBs and other chemicals is just scary!) How about the environmental impact on the sea bed where his “wind-generators” will be placed? I believe when we start comparing the manufacturing of the materials that are used, well, the scales are pretty much balanced.
When nuclear (not “nukular” as GWB would say) people discuss the cleanliness of nuclear power, they are talking about the actual lack of emissions of any pollutants into the atmosphere: I mean sulfuric acid, sulfur dioxide, carbon monoxide, carbon dioxide, hydroand hyperchloric acids, and the like, that come from burning fossil fuels. Granted, wind has great potential, but if you have driven through northern New Mexico and observed the miles and miles of wind-powered generators (most of them setting idle, by the way, where land potential is surrendered to make room for 50+ foot wind-turbine blades by the score), well, I cannot consider wind as a viable option,
unless we place a few wind-turbines around the inner-belt of the GW parkway in Washington, D.C. When Congress is in session, I am certain gigawatts of electricity could easily be generated by the hot air produced.
Ronald R. Goodwin goodwir2@nationwide.com
Piracy versus Privacy
Dear DDJ,
It is reported that Mr. Yale spent his entire life attempting to make a lock he himself could not pick. He never succeeded. Reading Dennis Shasha and Michael Rabin’s “Preventing Piracy While Preserving Privacy” (DDJ, October 2005) in the light of this insight leads me to several questions, none of them included in the FAQ:
1.The users of my software operate in remote parts of the globe, where Internet access is unavailable (or prohibitively expensive). Weekly access to your servers is out of the question. Also, I have a missioncritical WinXP PC here on my desk that has never been infected by a virus or adware or spyware trojan. How is this possible, given the notorious fragility of Microsoft software? I never let it on the Internet for any reason. I often transfer files on the local LAN to this Mac, but only through a physical A/B switch that disconnects the Internet when the PC is connected. Who cares about privacy if our mission-critical systems won’t work at all under your system?
2.Speaking of the notorious fragility of Microsoft products and the comparable (adjusted for market penetration) fragility of UNIX-based products, how do you propose to implement a “Supervising Program” that cannot be remotely cracked (to say nothing of local attacks)?
3.What happens if a clever pirate distributes a freeware program (no rights management needed) that runs under your SP and acts as a surrogate SP to run the protected content one step removed from the “Content Identifying” processes of the actual SP? For example, this rogue crypto-SP can process sound files, but instead of sending the sound waves out the speaker port where the real SP can measure the melodic content, it sends it out to an iPod on the USB bus? Everybody knows the iPod has no direct Internet connection to run your verification protocols. Or else to a rogue USB- to-speaker device sold on the black market? It is arbitrarily difficult for your SP to know it is sound content going out that port.
4.Speaking of a surrogate SP running under the real SP, given that your protocols must be open, how do you prevent rogue SPs from swamping the servers with bogus TTIDs?
5.Who is qualified to upload a CII signature to your “Superfingerprint” server? What happens if a “vendor” tries to upload
a fingerprint that matches an existing fingerprint? In the case of music, I can imagine something keyed to melodic lines matching only if the music is, in fact, the same tune (although much modern “music” is, in fact, tuneless), but I can also imagine a clever programmer designing his software to have a signature that matches the signature of the program he wishes to bore.
These questions arose in just the few minutes it took me to read your article. Crackers have a lot more time to probe for weaknesses. Do you really think your system is any more secure than the existing software-based protection mechanisms?
I think the iPod phenomenon is a much more robust mechanism for reducing the market cost of piracy: The proportion of paid-for music to pirate copies has improved significantly since the iPod came to market. Furthermore, the remaining pirate copies do not represent nearly as great a loss to the content-creation industry as they want you to believe because most of those “librarians and 12-year-old kids” wouldn’t buy it anyway.
I was there when Dan Sokol came to the HomeBrew Computer Club with 10 copies of Altair Basic (which, as he pointed out, contained no copyright notice anywhere and was, therefore, legally in the public domain), and I watched over the years as those pirate copies were multiplied into thousands of local electronics businesses, so that when they needed a legitimate copy of Basic, they bought the version they knew— from Microsoft! My own Basic was too cheap to pirate, so it never reached the same market penetration. The result: Bill Gates is rich and I am not.
Tom Pittman tpittman@ittybittycomputers.com
Dennis and Michael respond: Thanks, Tom.
1.Superfingerprint downloads and callups can occur through intermediaries. So there is no need for a direct connection to the Internet. The fidelity of Superfingerprints is certainly an issue and will require substantial care.
2.The article refers to the Lampson-style boot strategy to assure the integrity of the Supervising Program. Trusted hardware is a part of this solution.
3.Content going out to unprotected devices may not be detected. We agree.
4.There will be a notion of hash-cash to prevent denial-of-service attacks.
5.When Superfingerprints are uploaded, they must be checked against existing ones to ensure that an author’s rights are protected. We will also provide a service to register freeware, so Superfingerprints don’t appear that prevent freeware from running.
DDJ
12 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |

D R . E C C O ’ S O M N I H E U R I S T C O R N E R
Proteins for Fun and Profit
Dennis E. Shasha
Pulling a card out of the inside pocket of his well-tailored, dark suit, the professor presented it to Ecco. It read
Ming Thomas, PhD, protein industrialist. “I’ve come with a project,” Thomas began after greeting us and taking a seat. “In the early days of molecular biology, people asserted — with the authority that only uncertainty could inspire — that every gene generates one protein.
“Now it seems that there are at least a few genes that produce thousands of proteins. Let me explain how.
“A gene is a sequence of DNA, but, in higher organisms, that DNA alternates between strings that in fact produce portions of proteins (called ‘exons’) and strings that don’t (called ‘introns’). Thus, a gene sequence has the form E1 I1 E2 I2 E3 I3… where the Es represent exons and the Is represent introns.
“Genes can produce many proteins because any (not necessarily consecutive) subsequence of exons can form a protein. For example, E2 E4 E5 can form a protein as can E1 E2 E7, but E6 E4 E5 cannot because the ordering E6 E4 violates the order of the original exon sequence. E3 E3 E5 cannot form a protein either because an exon at a given position cannot be repeated.
“When manufacturing proteins at industrial scale, we can handle up to seven exons. Our expense is directly related to the total length of those exons. We hope you can minimize our expense.
“Our first client wants us to generate 15 hydrophobic proteins that are alanine heavy. They believe these will act like sticky balls floating on top of water allowing translucent water sculpture. Think Los Angeles swimming pools. We want help designing the exons in order to minimize their size. I know you like warmups, so here is one. Suppose we could use only three exons and we wanted to generate the following proteins (where each amino acid is represented by a single letter; for example, Alanine is A):
GA
GAGAS
GAS
RAGA
RAGAS
Dennis, a professor of computer science at New York University, is the author of four puzzle books. He can be contacted at DrEcco@ddj.com.
What would the exons have to be to generate these proteins, trying to minimize the total length of the exons?”
Solution to Warm-Up:
The following three exons could do this, having a total length of seven.
RA
GA
GAS
“Just a minute,” Ecco interrupted turning to his 17-year-old niece Liane, who had been listening in. “Liane, isn’t the biology here somewhat more complicated?” “Well, yes, but probably not in an essential way,” Liane responded. “DNA doesn’t literally consist of amino acids, but rather, an alphabet of ‘nucleotides’ whose nonoverlapping consecutive triplets are translated to amino acids. So, when Dr. Thomas speaks of minimizing the length of the exons, he formally means minimizing the number of nucleotides. Provided each exon’s length is a multiple of three, however, the problems are mathematically identical because minimizing the number of amino acids produced by the exons minimizes the number of nu-
cleotides in the exons themselves.”
“I couldn’t have explained this better myself,” said Thomas visibly impressed.
“For many reasons, we want each exon to generate full amino acids, so each exon’s length is in fact a multiple of three. Therefore, we can view each exon as consisting of the amino acid string it generates. Now do you understand the warm-up?”
“Sure,” said 11-year-old Tyler.
“The protein RAGAS is generated from the RA and GAS exons, for example. RAGA is generated from the first two exons and GAGAS from the last two. So give us your big challenge.”
Ming Thomas chuckled. “May I hire your whole family, Dr. Ecco?”
“We’re all confirmed puzzle freaks,” Ecco responded with a smile. “Do tell us which proteins you want.”
“Here they are,” said Thomas. “Remember that you are allowed seven exons and we want to minimize the total length (in amino acids) of those exons:
AGPA
APASAG
APASARAGPA
APASARASA
APASARASAPA
CAAPASAGASAPA
CAAPASARAG
CAAPASARPA
CARAPAPAS
CARAPAPASAGASA
CARAPAPASPA
CARAPASA
RAPAPASAGPA
RAPAPASASAPA
RAPASA
1.Can you find an encoding into exons whose total amino acid length is 20 or less?
Liane and Tyler worked this out.
“Very nice,” said Thomas. “That’s better than the solution we had thought of. Very nice work.
“Here is a follow-up question: One of our biochemists says he can manipulate up to 11 exons provided each produces two amino acids. In that case, what is the smallest total amino acid length of exons to create the following 15 proteins?
BAPAFADAFACA
BAPAGAPADA
RABAPAGADAFACA
RASA
RASAGAPAFAFACA
RASATABAPAGAPAFACA
RASATABAPAGAPAFAFA
RATAGAPAFADAFA
SABAPAFADACA
SAPADA
SAPAPAFADAFACA
SATABAGAPADAFA
SATABAPAGADAFACA
SATAPAGAPAFA
TABACA
Ecco helped his nephew and niece solve the problem this time. When Thomas saw the solution, he nodded and said, “Excellent. We have a long consulting arrangement ahead of us.” 2. Please give it a shot.
Ecco turned to the children after Thomas left: “The longest protein in Dr. Thomas’s last problem had a length of only 18. It is therefore conceivable that nine two-amino-acid exons would have been sufficient. Our solution required 11. Could we have done better?”
3. What do you think?
For the solution to last month’s puzzle, see page 70.
DDJ
14 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |

Dr. Dobb’s
SECTION
MAINANEWS News & Views
DR. DOBB’S JOURNAL
February 1, 2006
IBM Previews Next-Generation DB2 Database
IBM has unveiled details about Viper, its next-generation DB2 database that is designed to help manage and access data across service-oriented architectures (http:// www.ibm.com/db2/xml/). Viper will be the first database with both native XML data management and relational data capability. Scheduled for release in 2006, DB2 Viper will supposedly be able to seamlessly manage both conventional relational data and XML data without requiring the XML data to be reformatted or placed into a large object within the database. DB2 Viper also will simultaneously handle range partitioning, multidimensional clustering, and hashing, and provide XQuery support.
Smart Vehicles Show Off
Among the technology demonstrations presented at the 12th World Congress on Intelligent Transport Systems (ITS) (http:// www.itsworldcongress.org/) were those involving: Vehicle-Infrastructure Integration (VII) technology, in which “smart” roads with roadside antennas wirelessly communicated information to cars equipped with on-board units — the communication network provides information about travel times and about warnings and locations of work zones or traffic incidents to the driver; Integrated Collision Warning Systems, in which conference attendees rode transit buses fitted with a front and side collision warning system designed for use on both highways and in dense urban environments; Automated Bus Rapid Transit Technology, in which buses were fitted with sensors, actuators, and computerbased processors that let them perform automated lane maneuvers and precisely dock at boarding platforms; and Smart Intersections, in which radar, GPS, and sensors were used to track the position of vehicles approaching intersections and activate warning signs. ITS is an organization of international researchers, industry professionals, and government officials developing advanced transportation technologies and deployment activities.
Microsoft Opens File Formats
Microsoft has announced that it will open up and submit its file format technology for its Office produces —Word, PowerPoint, and Excel — to the Ecma International standards body. In turn, Ecma will develop and make available documenta-
tion of those formats. In addition, Microsoft will make available tools to enable old documents to make use of the open standard format.
Report Says Innovation Is Possible
In a study entitled “Innovation, R&D and Offshoring,” University of California at Berkeley researchers Dwight Jaffee and Ashok Bardhan concluded that technological innovation — even if it takes place in emerging international markets — will not spell economic doom. According to their study (http://repositories.cdlib.org/ iber/fcreue/reports/1005/), new jobs and economic growth will result in the U.S., particularly in the Silicon Valley. Jaffee and Bardhan found that many large U.S. firms are increasingly sending R&D activities offshore by setting up affiliated, intrafirm R&D centers abroad. Their research also shows that smaller firms generally conduct their research in the U.S.— and tend to produce more innovation. At the same time, the authors found that the U.S. market could benefit from the geographical dispersion of innovation and research to India, China and other transitioning countries.
Iris Recognition
Is an Eye Opener
Researchers at the University of Bath have developed a biometric iris recognition system that uses the colored part of the eye to validate a person’s identity (http:// www.bath.ac.uk/elec-eng/pages/sipg/ irisweb/). According to Professor Don Monro of the Department of Electronic and Electrical Engineering, the algorithm at the heart of the system has produced 100 percent accuracy in initial trials. Monro and his team are currently road testing the technology using a specially constructed database containing thousands of iris images collected from students and colleagues at the university. Iris recognition, which is regarded as the most accurate biometric recognition technology, works by “unwrapping” a digital image of a person’s iris and creating a unique encrypted “barcode” that is stored in a database. The images are captured using a special camera and an infrared light source that helps get over problems caused by shadows and competing light sources. Hundreds of images can be captured in a few minutes, and the team selected 20 from each eye from each vol-
unteer. Monro hopes to build a database with 16,000 iris images.
Sun Announces Postgres Support, ZFS Filesystem
Sun Microsystems will distribute the Postgres database with its Solaris 10 operating system. At the same time, the company announced integration of Solaris ZFS, a 128bit filesystem with error detection and correction capabilities, into OpenSolaris. Finally, Sun announced plans to integrate Solaris Containers for Linux applications, which lets companies run Red Hat binaries unmodified in Containers on Solaris 10 into OpenSolaris. The Solaris ZFS filesystem supports self-healing data through advanced error detection and correction, task automation that simplifies storage management — in some cases reducing task times from hours to seconds — and builtin storage virtualization that eliminates the complexity of a volume manager.
Financial Industry
Is Always a Target
In a recent study entitled “2005 Attack Trends: Beyond The Numbers,” security expert Bruce Schneier reports that criminals who are motivated by money are generally better funded, less risk-averse, and more tenacious than run-of-the-mill intruders who are in it for thrills (http://www.counterpane
.com/cgi-bin/attack-trends2.cgi). Schneier also pointed out that, although the financial industry ranks second highest in attacks, it is actually the most vulnerable to criminal activity. Of the 13 major vertical markets tracked by Counterpane (the security company Schneier founded), approximately 50 percent of all targeted scans detected by Counterpane occurred within the financial industry. According to Schneier, damaging attacks such as Trojan viruses and bot networks are expected to increase. All categories of organizations are at risk, but the financial industry is expected to remain the highest risk vertical in the near term.
Security Threats: Cross-Platform Software
For the first time, the SANS Institute has included cross-platform applications as targets in its annual list of top Internet security threats (http://www.sans.org/top20/). The list includes backup programs, media players, antivirus software, PHP-based applications, and database software, among others.
16 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |

Multiplatform Porting to 64 Bits
Up-front planning is worth the effort
BRAD MARTIN, ANITA RETTINGER, AND JASMIT SINGH
One project we were recently involved in was the port of a large 32-bit application, which supported 11 platforms to a 64-bit environment. The number of lines of code in this application exceeded 300,000 lines. Considering that
the 32-bit application had parts developed several years ago, there was every likelihood that the code had been modified by a variety of developers. For this and other reasons, we suspected that, among other problems, type mismatches that cause problems for a 64-bit port were likely introduced as modules were added or removed over time. We ported the 32-bit application to 64-bit to take advantage of the benefits of 64-bit technology— large file support, large memory support, and 64-bit computation, among other features. Our overall approach was an iterative one that alternated between zooming in on detailed issues such as byte order and refining compiler flags, to stepping back to look at global issues, such as ANSI compliance and future portability of source-code base. Our first step was to research 64-bit resources to learn about each of the 11 operating system’s compiler switches, memory models, and coding considerations. To define our starting point, we turned on the compiler warnings for one platform, ran a first build, and examined the build log’s messages. With these initial builds and later use of tools such as Parasoft’s Insure++ (http://www.parasoft.com/), lint, and native debuggers, we developed a road map of the issues we would encounter. From there, we proceeded to perform a complete inventory of the source code and examine every build configuration.
After initial code modifications, debug sessions, and passes through build messages, we had enough information to sort out and prioritize realistic milestones and the specific tasks required to get there. We reached a significant milestone when we had a running application with enough basic functionality that it could be debugged by running it through our automated test suite, which consists of backward compatibility tests in addition to new tests built to exercise 64-bit features.
If you have several 64-bit platforms as part of your conversion project, you might be tempted to work on one platform at a time. Once the application is running properly on the first platform, you might move on to the next platform, and so on. However, we found significant advantages to working on all platforms at the same time because:
The authors are senior software engineers for Visual Numerics. They can be contacted at http://www.vni.com/.
•Each of the compilers provided different information in its warnings, and looking at the errors from several compilers can help to pinpoint problem areas.
•Errors behave differently on different platforms. The same problem might cause a crash on one platform and appear to run successfully on another.
“Some application requirements call for binary data or files to work with both 64-bit and 32-bit applications”
A final consideration in approaching this project was to plan ahead for time required for the final release testing phase. Because our newly modified code base is shared across multiple 32-bit and 64-bit platforms, each 32-bit platform would need to be retested as thoroughly as our newly ported platforms, thereby doubling testing time and resources.
Cross-Platform Issues
There are a number of issues, ranging from compiler warnings to reading/writing binary data, that you can face when porting 32-bit applications that run on multiple 64-bit operating systems. Luckily, compilers can assist in determining 64-bit porting issues. Set the warning flags of the compilers to the strictest level on all platforms, paying close attention to warnings that indicate data truncation or assignment of 64-bit data to 32-bit data. However, one problem with compiler warnings is that turning on stricter warning levels can lead to an overwhelming number of warnings, many of which were automatically resolved by the compiler. The problem is that major warnings are buried within the mass of minor warnings, with no easy way to distinguish between the two. To resolve this issue, we enabled the warnings on multiple platforms and performed concurrent builds. This helped because different compilers give different warnings with different levels of detail. We then filtered the warnings using information from multiple compilers and were able to determine which warnings needed to be fixed.
20 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |

(continued from page 20)
Some application requirements call for binary data or files to work with both 64-bit and 32-bit applications. In these situations, you have to examine your binary format for issues resulting from larger longs and pointers. This may require modifications to your read/write functions to convert sizes and handle any Littleor Big-endian issues for multiple platforms. To get the correct machine endianess, the larger data sizes in 64-bit applications require extended byte swapping. For example, a 32-bit long:
Big Endian = (B0, B1, B2, B3)
can be converted to:
Little Endian = (B3, B2, B1, B0)
while a 64-bit long:
Big Endian = (B0, B1, B2, B3, B4, B5, B6, B7)
is converted to:
Little Endian = (B7, B6, B5, B4, B3, B2, B1, B0).
Most compilers will find mismatched types and correct them during the build. This is true for simple assignments as well as most parameters passed to other functions. The real problems lay in the integer-long-pointer mismatches that are invisible to the compiler at compile time, or when an assumption the compiler makes at compile time is what produces a mismatch. The former concerns pointer arguments and function pointers, while the latter primarily concerns function prototypes.
Passing integer and long pointers as arguments to functions can cause problems if the pointers are then dereferenced as a different, incompatible type. These situations are not an issue in 32-bit code because integers and longs are interchangeable. However, in 64-bit code, these situations result in runtime errors because of the inherent flexibility of pointers. Most compilers as-
In a 32-bit system, the structure would look like:
4 Bytes
4 Bytes |
4 Bytes |
4 Bytes |
4 Bytes |
Integer
Long
Natural Boundary
In a 64-bit system, the structure would look like:
8 Bytes
4 Bytes |
4 Bytes |
4 Bytes |
4 Bytes |
Integer Padding
Long
Natural Boundary
Figure 1: Structure alignment in 32-bit and 64-bit systems.
sume that what you are doing is what you intended to do, and quietly allow it unless you can enable additional warning messages. It is only during runtime that the problems surface.
Listing One, for example, compiles without warnings on both Solaris and AIX (Forte7, VAC 6) in both 32-bit and 64-bit modes. However, the 64-bit version prints the incorrect value when run. While these problems may be easy to find in a short example, it may be more difficult in much larger code bases. This sort of problem might be hidden in real-world code and most compilers will not find it.
Listing One works properly when built as a 64-bit executable on a Little-endian machine because the value of arg is entirely contained within the long’s four least-significant bytes. However, even on Little-endian x86 machines, the 64-bit version produces an error during runtime when the value of arg exceeds its four least-significant bytes.
With function pointers, the compiler has no information about which function will be called, so it cannot correct or warn you about type mismatches that might exist. The argument and return types of all functions called via a particular function pointer should agree. If that is not possible, you may have to provide separate cases at the point at which the function is called to make the proper typecasts of the arguments and return values.
The second issue concerns implicit function declarations. If you do not provide a prototype for each function that your code calls, the compiler makes assumptions about them. Variations of the compiler warning “Implicit function declaration: assuming extern returning int” are usually inconsequential in 32-bit builds. However, in 64-bit builds, the assumption of an integer return value can cause real problems when the function returns either a long or a pointer (malloc, for example). To eliminate the need for the compiler to make assumptions, make sure that all required system header files are included and provide prototypes for your own external functions.
Hidden Issues
There are, of course, issues that may not be readily apparent at the beginning of the project. For instance, in 64-bit applications, longs and pointers are larger, which also increases the size of a structure containing these data types. The layout of your structure elements determines how much space is required by the structure. For example, a structure that contains an integer followed by a long in a 32-bit application is 8 bytes, but a 64-bit application adds 4 bytes of padding to the first element of the structure to align the second element on its natural boundary; see Figure 1.
To minimize this padding, reorder the data structure elements from largest to smallest. However, if data structure elements are accessed as byte streams, you need to change your code logic to adjust for the new order of elements in the data structure.
For cases where reordering the data structures is not practical and the data structure’s elements are accessed as a byte stream, you need to account for padding. Our solution for these cases was to implement a helper function that eliminates the padding from the data structure before writing to the byte stream. A side benefit to this solution was that no changes were required on the reader side; see Listing Two.
Arrays
64-bit long type arrays and arrays within structures will not only hold larger values than their 32-bit equivalents, but they may also hold more elements. Consider that 4-byte variables previously used to define array boundaries and allocate array sizes may also need to be converted to longs. (For help in determining whether existing long arrays should be reverted to integer type for better performance in your 64-bit application, see http://developers
.sun.com/prodtech/cc/articles/ILP32toLP64Issues.html.)
22 |
Dr. Dobb’s Journal, February 2006 |
http://www.ddj.com |