Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Mastering Enterprise JavaBeans™ and the Java 2 Platform, Enterprise Edition - Roman E

..pdf
Скачиваний:
41
Добавлен:
24.05.2014
Размер:
6.28 Mб
Скачать

614 M A S T E R I N G E N T E R P R I S E J A V A B E A N S

Business Needs for XML

We begin by studying the needs of the business community that make XML such a useful standard. Why is XML important? What business problems does it solve? Why did we need to create XML rather than use existing technology? Those are the questions we will answer in this section.

Electronic Commerce

The business need for a standard such as XML has arisen with the advent of electronic commerce (also called e-commerce, e-business, or your favorite buzzword). When most people hear the word “electronic commerce”, they usually think of Web storefronts that you can visit to purchase goods electronically. This is called business-to-consumer e-commerce because a business is conducting a transaction with a consumer. Examples of business-to-consumer Web sites are Amazon.com (www.amazon.com), Buy.com (www.buy.com), and carOrder.com (www.carOrder.com).

But e-commerce extends beyond simply the business-to-consumer model. For instance, an online auction house such as eBay (www.ebay.com) facilitates transactions between consumers by hosting auctions. This is called consumer-to- consumer e-commerce because goods are exchanging hands between consumers.

A business may also sell goods to other businesses and take consumers out of the picture altogether. This economic model is called business-to-business e-commerce, and is where most of the money changes hands by far, because every business needs to conduct inter-business transactions to survive. Manufacturers need to buy parts from suppliers. Resellers need to buy products from manufacturers. And all corporations need to buy office supplies and furniture. Geographically distributed companies, conglomerates and even whole industries (such as aerospace) rely on communication, and the ability to distribute manufacturing activities gives some companies an essential economic advantage. Business-to-business e-commerce is the single largest financial impact the Internet is making on the world economy, and has been estimated to be 20 times as large as the other Internet economic models. As we will see, business-to- business e-commerce is where XML has the largest impact as well.

Inadequacies with Existing Technology

The challenge for businesses to conduct affairs electronically is for businesses to understand each other’s data, such as products, customers, and financial data. With a paper-based system, a human being always intervened and could make

Go back to the first page for a quick link to buy this book online!

Understanding the Extensible Markup Language (XML) 615

logical guesses about ambiguous data. With electronic business, however, computer programs need to receive accurate, structured data, or millions of dollars could be lost due to incorrect transactions.

Thus, a structured data document standard is needed that businesses can use to share information. This document standard should be simple enough for anyone to use elegantly, yet be powerful enough to represent any business data. A computer program should be able to read an electronic document structured in this language and figure out the semantic details of the document based on its structure. For example, an application should be able to query a digital purchase order document and determine what product and quantity the purchase order is for.

Let’s take a look at the existing technology standards and examine why they are inadequate for our needs.

VANs and EDI

Electronic business is not a new concept. Companies have already been doing it for years in a very proprietary way. Before the Internet hit mainstream, two corporations would conduct business electronically using a third-party vendor’s value-added network (VAN), or private network that links companies together. The largest four VAN vendors are General Electric Information Services, IBM Global Information Network, Sterling Commerce, Inc., and Harbinger Corporation.

The standard for conducting business over VANs is called Electronic Data Exchange (EDI), a standard for facilitating the electronic exchange of data. EDI has traditionally been used over VANs although it is been extended to run over the Internet as well. EDI has widespread use in multiple vertical industries, from the business sector (transferring business documents) to the educational sector (transferring student records, transcripts, and test scores).

The problems with VANs and EDI are as follows:

■■VANs using EDI are a very expensive subscription service, and charge businesses outrageous per transaction fees.

■■VANs are a challenge to link to other businesses that are already on the Internet.

■■VANs are designed for batch-mode processing (rather than just-in-time processing, which is necessary for efficient transactions).

■■Within industries, large companies typically define a set of EDI templates that lock other companies into proprietary standards for data exchange.

■■EDI is an outdated, cumbersome, and non-extensible format for transferring data.

Go back to the first page for a quick link to buy this book online!

616 M A S T E R I N G E N T E R P R I S E J A V A B E A N S

Note that there is definitely a lot to be said about VANs and EDI. Many businesses run quite smoothly on these technologies today, as VANs using EDI are quite reliable and secure. Many corporations are also very concerned about gambling their businesses on anything new. Due to these factors, plus the slow rate of technology adoption, the EDI market is growing rapidly as we speak. In the long run, though, VANs and EDI are likely to die off in favor of newer technology evolutions.

What would you do to replace VANs and EDI with an Internet-based model? First you would need to replace the proprietary VAN networks with an Internet link. That’s simply a hardware problem. The larger issue is replacing or enhancing EDI with an efficient, modern, structured data document standard that business can use to exchange information. As we will see, XML is that standard, and it is what early adopting businesses are tackling as an integration method, even as we speak.

SGML

The Standard Generalized Markup Language (SGML) is a meta-markup lan- guage—you can use a meta-markup language to design your own markup language (such as XML or HTML). SGML provides a mechanism to add structure to your documents, and has a great track record of successful deployments of applications, especially in the publishing realm. But, unfortunately, SGML has never become mainstream, largely because of its complexity. SGML is quite powerful, and it could easily be used to represent business data. Its power comes at the cost of ease of use, as SGML is a bit too powerful for everyday business applications. The ramp-up curve for programming with SGML is particularly steep, and the high cost of leveraging SGML is very prohibitive. Few people use SGML in its raw form, but everyone uses implementations of SGML, such as HTML and XML.

HTML

The HyperText Markup Language (HTML) is the predominant standard for Web documents. HTML is an application of SGML that is intended for multimedia presentation of information over the Internet.

HTML is an inappropriate markup language for electronic data, primarily because HTML was designed around the use of GUI tags, rather than business data content. HTML is great for displaying documents to end users, but it is very poor for defining other structure in a document. For example, consider the following HTML snippet:

<B>John Doe</B>

<I>The Doe Corporation</I>

Go back to the first page for a quick link to buy this book online!

Understanding the Extensible Markup Language (XML) 617

Here, The <B> and <I> tags tell the client-side browser to represent the associated text in bold and italics, respectively. However, the structure ends there. The browser has no way of structuring the semantic meaning of the text within the document. For example, by glancing at this code, there’s no way we can automatically identify that the string “John Doe” is the name of a person. Nor can a computer program discern that “The Doe Corporation” is the name of a company. Note that there are clunky ways around this (for example, you could add ID attributes).

Similary, HTML is not extensible. If a business needs to add new tags to accommodate its needs, that business will run into a wall with HTML. This is because HTML is a markup language, but is not a meta-markup language.

XML

The Extensible Markup Language (XML) is a universal standard for structuring content in electronic documents. XML is extensible, enabling businesses to add new structure to their documents as needed. The XML standard does not suffer the version control problems of other markup languages such as HTML because it has no predefined tags. Rather, with XML you define your own tags for your business needs. XML is a meta-markup language because you can define your own markup language which is self-describing. This makes XML the ideal document format for transferring business data electronically, and it has a wide variety of other applications as well.

Benefits of XML

From a business perspective, XML is compelling because it allows businesses to structure data in an elegant, extensible way. But XML has other benefits as well:

XML is simple and easy to use. The raw XML language does not contain specific tags for vertical markets. Learning to use XML is straightforward and does not require much ramp-up time.

XML is an open, Internet-standard. The Worldwide Web Consortium (W3C) recommended the XML 1.0 standard in February 1998. No single commercial company controls the standard, which means that everyone’s interests are taken into account.

XML is human-readable. An XML document can be stored as a simple text file, yet it can represent complex business data. If you want to inspect or modify an XML document, you can simply edit the text file. This is a huge benefit over binary data formats that cannot be easily viewed or modified, such as serialized Java objects (see Appendix A for more on Java object serialization).

Go back to the first page for a quick link to buy this book online!

618 M A S T E R I N G E N T E R P R I S E J A V A B E A N S

XML compresses very well. Because an XML document can be stored as a flat text file, it gains the advantage of very high compression rates. This makes XML well suited for massive document storage, and it also makes XML useful as an on-the-wire data format.

XML has massive industry support behind it. Microsoft, IBM, Sun Microsystems, Oracle Corporation, webMethods, SAP, and many others are jumping on the XML bandwagon.

XML has great tools available. There are already numerous XML tools and other XML applications available for download or purchase. These include XML viewers, high-performance XML parsers, XML JavaBean toolkits, XMLbased databases, XML browsers, XML search engines, XML file utilities, and much more. See the book’s accompanying Web site for links to XML resources.

XML is the basis for other standards. Already there are companies using XML as a foundation for standards in other technologies and industries. For example, WebMethods has defined an interface definition language for the Web using XML. Sun Microsystems has used XML within its EJB and JSP specifications. By learning XML, you will be prepared to understand these new topics as well.

XML brings new power to content searches. Once you add structure to your data using XML, it is quite straightforward to search your documents for specific information. For example, let’s say you’re building a repository of historical information. Using XML, you can specify that the string George Washington represents a United States president. Once you’ve built up your historical information repository, you can search that repository for all documents that contain information about United States presidents. Note that this is unlikely to happen on a large-scale (such as searching the Internet for XML tags) because of schema differences between companies.

XML is self-describing. An XML document can contain all the information needed for a program to interpret it. This makes XML highly useful for communicating data between applications because an application can discover information about a document at runtime, without preconceived knowledge of the document’s format.

XML uses Unicode, rather than ASCII. This makes XML highly suitable for international electronic commerce.

XML allows for the use of URLs. This makes XML ideal for Internet usage (SGML does not support URLs).

XML Compared to EDI

While EDI is a useful format for structuring business data, it is also a fixed format. EDI does not have the flexibility that XML offers because it does not let

Go back to the first page for a quick link to buy this book online!

Understanding the Extensible Markup Language (XML) 619

you define rules for your business data. XML is a language that can be used to define message formats, whereas EDI defines a bunch of message formats that are used to conduct specific business-to-business transactions. Just as HTML has limited success as a Web markup language because it isn’t extensible, EDI has limited success in conducting business-to-business transactions because the predefined transactions aren’t extensible. XML’s extensibility is its big win over EDI.

It should also be noted that endeavors are underway to unite XML and EDI. For instance, The XML/EDI Group is working on XML/EDI, a standard that allows XML to express EDI, and also allows for EDI to be transported across the Internet rather than through traditional VANs. This opens up new potential for EDI, as XML brings widespread industry support with it. See the book’s accompanying Web site for links to XML/EDI resources.

XML Compared to SGML

XML is an application profile of SGML, meaning XML is a subset of SGML. The advantage that XML has over SGML is simplicity—it will not take you long at all to understand how XML works, yet XML is powerful enough to format any business’s data. XML packages the most important aspects of SGML into an easy- to-use document format that you can use to format data transferred over the Internet, using conventional Internet protocols such as HTTP.

XML Compared to HTML

HTML is also an application profile of SGML. Whereas HTML serves as a markup language that defines static tags such as <B> and <I>, XML is a meta-markup language that you can use to define your own markup language. You can invent your own tags that represent business data in XML, and you can use the tags to represent semantic information about your business data. The power that XML has over HTML is XML documents can contain tags that relate business semantics, and not just format semantics.

XML Concepts

Now that you’ve seen the XML value proposition, let’s take a quick technical tour of XML concepts. The best way to learn XML is by example, and so that is how we will begin. Source C.1 shows a sample XML document.

Let’s dissect this document and reveal how XML works.

Go back to the first page for a quick link to buy this book online!

620 M A S T E R I N G E N T E R P R I S E J A V A B E A N S

<?xml version="1.0"?>

<library>

<book isbn="0451524934"> <title>1984</title> <author>George Orwell</author> <pages>268</pages> <softcover/>

</book>

<book isbn="0201634465"> <title>Essential COM</title> <author>Don Box</author> <pages>440</pages> <description>

Microsoft COM explained for developers. </description>

<softcover/>

</book>

<book isbn="0316769487">

<title>The Catcher in The Rye</title> <author>J. D. Salinger</author> <pages>214</pages>

<hardcover/>

</book>

</library>

Source C.1 An XML document.

Prolog

Every XML document begins with a prolog, or a header statement introducing the document. The prolog in our example above is:

<?xml version="1.0"?>

This identifies that our document uses version 1.0 of XML (which is the only version of XML right now). There are some other interesting things you can put in the prolog as well, such as your text encoding type or whether the document is a stand-alone document that does not have any dependencies on external markup declarations.

Go back to the first page for a quick link to buy this book online!

Understanding the Extensible Markup Language (XML) 621

XML Elements

An XML element is the basic building block for defining structured data. The following is an example of an XML element:

<title>Essential COM</title>

A typical XML element begins with a starting tag (such as <title>), has some data (such as Essential COM), and is followed by an ending tag (such as </title>).

XML elements are useful for structuring your document content. For instance, consider the following flat text file snippet:

0201634465

Essential COM

Don Box

440

Microsoft COM explained for developers.

There is no way for a computer program to discern what’s what in this document. What does 440 stand for? Is it the number of pages, or is it an area code for a phone number? There is no way for a computer program to know this. However, if you mark up the text with XML:

<book isbn="0201634465">

<title>Essential COM</title>

<author>Don Box</author>

<pages>440</pages>

<description>

Microsoft COM explained for developers.

</description>

<softcover/>

</book>

Suddenly there is a wealth of knowledge that a computer program can discern from the document. A computer program can read this document in, and then you can query the program for the title, the author, or the number of pages because the computer program can parse that information from the document. This is why it’s important to mark up your documents’ text into elements.

The more fine-grained your XML elements are, the finer your document searches and queries can be. For example, if you write an XML financial news article, you could mark up your news article with tags such as <stockPriceIncrease> and <marketCapitalization>. You could then search all news articles for stock price increases that were greater than 50 percent or search for all news articles about companies with a market capitalization over $30 billion.

Go back to the first page for a quick link to buy this book online!

622 M A S T E R I N G E N T E R P R I S E J A V A B E A N S

Attributes

An element can have attributes associated with it that provide extra information with the element. For example, consider the following XML snippet:

<book isbn="0316769487">

<title>The Catcher in The Rye</title>

<author>J. D. Salinger</author>

<pages>214</pages>

<hardcover/>

</book>

The book element has one attribute whose name is isbn and whose value is

0316769487.

HTML programmers will recognize attributes right away, as HTML uses attributes extensively for setting GUI tag parameters. For instance, consider the following HTML snippet:

<a href="http://www.amazon.com">

Click here for Amazon.com

</a>

This is an HTML anchor tag that links one document to another. It is syntactically almost identical to our book snippet above. The HTML document, though, makes sense only when rendered in a graphical Web browser because the text refers to clicking on a link. The XML document, on the other hand, describes semantic information about the document—specifically, the book’s ISBN—and it can be queried at a later date.

You can also have XML elements that are empty and that stand alone. For example, the <hardcover/> element in the previous XML snippet is an empty element. Empty elements are single tags, rather than a pair of beginning and ending tags. They are useful for conveying element structure without text or subelements. Empty elements need to look like <hardcover/> instead of <hardcover> to keep things unambiguous.

The Root Element

A special type of element in a document is the document’s root element. The root element is the main element tags that wrap the entire document. In our library example, the <library> and </library> tags denote the root element.

A root element is a requirement in any XML document because it demarcates the beginning and ending of each document. If there were no root, it would be impossible to know whether you’ve reached the end of a document. Remember that XML documents can be streamed over a slow network, rather than simply read in from files. Without an ending root element tag, a program receiving an

Go back to the first page for a quick link to buy this book online!

Understanding the Extensible Markup Language (XML) 623

XML document would never know whether it has received the entire document, and it would never know if it should close the network connection.

XML Entities

An XML entity is a nickname for something else. XML entities allow you to type a brief keyword in your XML document, which when parsed results in something different. The following code illustrates this.

five is < ten.

When the above line is parsed (by an XML parser that we’ll explain in a bit), the resulting text is:

five is < ten.

The reason you need to use the < entity is because the less-than sign, <, is used to begin tags, such as <library>. It would be ambiguous if you could use < in your text because programs that read in your XML documents would not be able to determine what was a tag and what was regular content.

CDATA Sections

A CDATA section is a portion of your document that should be interpreted literally (similar to the HTML <PRE> tag). For example, consider the following XML:

five is < ten.

The above is not legal XML because there is a less-than sign that the XML parser is mistakenly interpreting as a tag. The following code is legal:

<![CDATA[

five is < ten.

]]>

It’s legal because we’ve declared that the text inside the CDATA section should be interpreted literally and should not be scanned for tags. CDATA sections are useful for long text sections with lots of symbols such as < and & that a program would normally interpret as markup. You cannot apply styling or format to CDATA sections.

Well-Formed Documents

An XML document is well-formed if it meets the well-formed criteria of the XML specification. Well-formed documents follow the syntactic rules of XML, such

Go back to the first page for a quick link to buy this book online!