
Pro ASP.NET 2.0 In CSharp 2005 (2005) [eng]
.pdf
428 C H A P T E R 1 2 ■ X M L
But if anything, XML is much more useful today than it ever was before. The benefits of using XML in a modern application include the following:
•Adoption: XML is ubiquitous. Many companies are using XML to store data or are actively considering it. Whenever data needs to be shared, XML is automatically the first (and often the only) choice that’s examined.
•Extensibility and flexibility: XML imposes no rules about data semantics and does not tie companies into proprietary networks, unlike EDI (Electronic Data Interchange). As a result, XML can fit any type of data and is cheaper to implement.
•Related standards and tools: Another reason for XML’s success is the tools (such as parsers) and the surrounding standards (such as XML Schema, XPath, and XSLT) that help in creating and processing XML documents. As a result, programmers in nearly any language have ready-made components for reading XML, verifying that XML is valid, verifying XML against a set of rules (known as a schema), searching XML, and transforming one format of XML into another.
XML acts like the glue that allows different systems to work together. It helps standardize business processes and transactions between organizations. But XML is not just suited for data exchange between companies. Many programming tasks today are all about application integra- tion—web applications integrate multiple web services, e-commerce sites integrate legacy inventory and pricing systems, and intranet applications integrate existing business applications. All these applications are held together by the exchange of XML documents.
Well-Formed XML
XML is a fairly strict standard. This strictness is designed to preserve broad compatibility. HTML, on the other hand, is much more lenient. As a result, it’s quite possible to create an HTML web page with errors that will be successfully rendered in one browser but interpreted differently in another. When it comes to storing business data, this type of error could cause catastrophic problems.
To prevent this sort of problem, all XML parsers perform a few basic quality checks. If an XML document does not meet these standards, it’s rejected outright. If the XML document does follow these rules, it’s deemed to be well formed. Well-formed XML isn’t necessarily correct XML—for example, it could still contain incorrect data—but an XML processor can parse it.
To be considered well formed, an XML document must meet these criteria:
•Every start tag must have an end tag.
•An empty element must end with />.
•Elements can never overlap. In other words, <person><firstName></firstName>_</person> is valid, but <person><firstName></person></firstName> is not.
•An element cannot have two elements with the same name because there will be no way to distinguish them from each other. However, you can have elements with the same name in different places. For example, you can place a <name> element inside multiple <product> elements and a separate <customer> element.
•A document can have only one root element.
•All attributes must have quotes around the value.
•The document must not contain illegal characters.
•Comments and processing instructions can’t be placed inside tags.

C H A P T E R 1 2 ■ X M L |
429 |
■Tip To quickly test if an XML document is well formed, try opening it in Internet Explorer. If there is an error, Internet Explorer will report a message and flag the offending line.
XML Namespaces
As the XML standard gained ground, dozens of XML markup languages (often called XML grammars) were created, and many of them are specific to certain industries, processes, and types of information. In many cases, it becomes important to extend one type of markup with additional company-specific elements, or even create XML documents that combine several different XML grammars. This poses a problem. What happens if you need to combine two XML grammars that use elements with the same names? How do you tell them apart? A related, but more typical, problem occurs when an application needs to distinguish between XML grammars in a document. For example, consider an XML document that has order-specific information using a standard called OrderML and client-specific information using a standard called ClientML. This document is sent to an order-fulfillment application that’s interested only in the OrderML details. How can it quickly filter out the information that it needs and ignore the unrelated details?
The solution is the XML Namespaces standard. The core idea behind this standard is that every XML markup language has its own namespace that uniquely identifies all related elements. Technically, namespaces disambiguate elements by making it clear to which markup language they belong.
All XML namespaces use URIs (universal resource identifiers). Typically, these URIs look like a web-page URL. For example, http://www.mycompany.com/mystandard is a typical name for a namespace. Though the namespace looks like it points to a valid location on the Web, this isn’t required (and shouldn’t be assumed). URIs are used for XML namespaces because they are more likely to be unique. Usually, if you create a new XML language, you’ll use a URI that points to a domain or website you control. That way, you can be sure that no one else is likely to use that URI. However, the namespace doesn’t need to be a URI—any sequence of text is acceptable.
■Tip Namespace names must match exactly. If you change the capitalization in part of a namespace, add a trailing / character, or modify any other detail, the XML parser will interpret it as a different namespace.
To specify that an element belongs to a specific namespace, you simply need to add the xmlns attribute to the start tag and indicate the namespace. For example, the element shown here is part of the http://mycompany/OrderML namespace. If you don’t take this step, the element will not be part of any namespace.
<order xmlns="http://mycompany/OrderML"></order>
It would be cumbersome if you needed to type in the full namespace URI every time you wrote an element in an XML document. Fortunately, when you assign a namespace in this fashion, it becomes the default namespace for all child elements. For example, in the XML document shown here, the <order> and <orderItem> elements are both placed in the http://mycompany/OrderML namespace:
<?xml version="1.0"?>
<order xmlns="http://mycompany/OrderML"> <orderItem>...</orderItem> <orderItem>...</orderItem>
</order>

430 C H A P T E R 1 2 ■ X M L
You can declare a new namespace for separate portions of the document. The easiest way to deal with this is to use namespace prefixes. Namespace prefixes are short character sequences that you can insert in front of a tag name to indicate its namespace. You define the prefix in the xmlns attribute by inserting a colon (:) followed by the characters you want to use for the prefix.
Here’s an order document that uses namespace prefixes to map different elements into two different namespaces:
<?xml version="1.0"?>
<ord:order xmlns:ord="http://mycompany/OrderML" xmlns:cli="http://mycompany/ClientML">
<cli:client> <cli:firstName>...</cli:firstName> <cli:lastName>...</cli:lastName>
</cli:client>
<ord:orderItem>...</ord:orderItem> <ord:orderItem>...</ord:orderItem>
</ord:order>
Namespace prefixes are simply used to map an element to a namespace. The actual prefix you use isn’t important as long as it remains consistent.
XML Schemas
A good part of the success of the XML standard is due to its remarkable flexibility. Using XML, you can create exactly the markup language you need. This flexibility also raises a few problems. With developers around the world using your XML format, how do you ensure that everyone is following the rules?
The solution is to create a formal document that states the rules of your custom markup language, which is called a schema. These rules won’t include syntactical details (such as the requirement to use angle brackets or properly nest tags) because these requirements are already part of the basic XML standard. Instead, the schema document will list the logical rules that pertain to your type of data. They include the following:
•Document vocabulary: This determines what element and attribute names are used in your XML documents.
•Document structure: This determines where tags can be placed and can include rules specifying that certain tags must be placed before, after, or inside others. You can also specify how many times an element can occur.
•Supported data types: This allows you to specify whether data is ordinary text or must be able to be interpreted as numeric data, date information, and so on.
•Allowed data ranges: This allows you to set constraints that restrict numbers to certain ranges or that allow only specific values.
The XML Schema standard defines the rules you need to follow when creating a schema document. The following is an XML schema that defines the rules for the product catalog document shown earlier:
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="productCatalog">
<xsd:complexType>
<xsd:sequence>

C H A P T E R 1 2 ■ X M L |
431 |
<xsd:element name="catalogName" type="xsd:string"/> <xsd:element name="expiryDate" type="xsd:date"/>
<xsd:element name="products"> <xsd:complexType>
<xsd:sequence>
<xsd:element name="product" type="product" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="product"> <xsd:sequence>
<xsd:element name="productName" type="xsd:string"/> <xsd:element name="productPrice" type="xsd:decimal"/> <xsd:element name="inStock" type="xsd:boolean"/>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:integer"/> </xsd:complexType>
</xsd:schema>
Every schema document is an XML document that begins with a root <schema> element. Inside the <schema> element are two types of definitions—the <element> element, which defines the structure the target document must follow, and one or more <complexType> elements, which define smaller data structures that are used to define the document structure.
The <element> tag is really the heart of the schema, and it’s the starting point for all validation. In this example, the <element> tag identifies that the product catalog must begin with a root element named <productCatalog>. Inside the <productCatalog> element is a sequence of three elements. The first, <catalogName>, contains ordinary text. The second, <expiryDate>, includes text that fits the rules for date representation, as set out in the schema standard. The final element, <products>, contains a list of <product> elements.
Each <product> element is a complex type, and the type is defined with the <complexType> element at the end of the document. This <product> complex type consists of a sequence of three elements with product information. The elements must store this information as text
(<productName>), a decimal value (<productPrice>), and a Boolean value (<inStock>), respectively.
■Note A full discussion of XML Schema is beyond the scope of this book. However, if you want to learn more, you can consider the excellent online tutorials at http://www.w3schools.com/schema or the standard itself at http://www.w3.org/XML/Schema.
Writing and Reading XML Programmatically
The .NET Framework allows you to manipulate XML data with a set of classes in the System.Xml namespace (and other namespaces that begin with System.Xml). These types fully support the XML DOM (Document Object Model) Level 2 Core, as defined by the W3C. It also adds classes and methods that make it easier to read and write XML documents; navigate through nodes, attributes, and elements; and query, transform, and manipulate XML data in various ways.

432 C H A P T E R 1 2 ■ X M L
Writing XML Files
The .NET Framework provides two approaches for writing XML data to a file:
•You can build the document in memory using the XmlDocument class and write it to a file when you’re finished by calling the Save() method. The XmlDocument represents XML using a tree of node objects.
•You can write the document directly to a stream using the XmlTextWriter. This outputs data as you write it, node by node.
The XmlDocument is a good choice if you need to perform other operations on XML content after you create it, such as searching it, transforming it, or validating it. It’s also the only way to write an XML document in a nonlinear way, because it allows you to insert new nodes anywhere. However, the XmlTextWriter provides a much simpler and better performing model for writing directly to a file, because it doesn’t store the whole document in memory at once.
■Tip You can use both the XmlDocument and the XmlTextWriter to create XML data that isn’t stored in a file. Both of these classes allow you to write information to any stream, and the XmlDocument allows you to retrieve the raw XML as string data. Using techniques such as these, you could build an XML document and then insert it into another storage location such as a text-based field in a database table.
The next web-page example shows how to use the XmlTextWriter to create a well-formed XML file. The first step is to create a private WriteXML() method that will handle the job. It begins by creating an XmlTextWriter object and passing the physical path of the file you want to create as a constructor argument.
private void WriteXML()
{
string xmlFile = Server.MapPath("DvdList.xml"); XmlTextWriter writer = new XmlTextWriter(xmlFile, null);
...
The XmlTextWriter has properties such as Formatting and Indentation, which allow you to specify whether the XML data will be automatically indented with the typical hierarchical structure and to indicate the number of spaces to use as indentation. You can set these two properties as follows:
...
writer.Formatting = Formatting.Indented;
writer.Indentation = 3;
...
■Tip Remember, in a datacentric XML document, whitespace is almost always ignored. But by adding indentation, you create a file that is easier for a human to read and interpret, so it can’t hurt.
Now you’re ready to start writing the file. The WriteStartDocument() method writes the XML declaration with version 1.0 ( <?xml version="1.0"?> ), as follows:
writer.WriteStartDocument();

C H A P T E R 1 2 ■ X M L |
433 |
The WriteComment() method writes a comment. You can use it to add a message with the date and time of creation:
writer.WriteComment("Created @ " + DateTime.Now.ToString());
Next, you need to write the real content—the elements, attributes, and so on. This example builds an XML document that represents a DVD list, with information such as the title, the director, the price, and a list of actors for each DVD. These records will be child elements of a parent <DvdList> element, which must be created first:
writer.WriteStartElement("DvdList");
Now you can create the child nodes. The following code opens a new <DVD> element:
writer.WriteStartElement("DVD");
Now the code writes two attributes, representing the ID and the related category. This information is added to the start tag of the <DVD> element.
...
writer.WriteAttributeString("ID", "1");
writer.WriteAttributeString("Category", "Science Fiction");
...
The next step is to add the elements with the information about the DVD inside the <DVD> element. These elements won’t have child elements of their own, so you can write them and set their values more efficiently with a single call to the WriteElementString() method. WriteElementString() accepts two arguments: the element name and its value (always as string), as shown here:
...
// Write some simple elements. writer.WriteElementString("Title", "The Matrix"); writer.WriteElementString("Director", "Larry Wachowski"); writer.WriteElementString("Price", "18.74");
...
Next is a child <Starring> element that lists one or more actors. Because this element contains other elements, you need to open it and keep it open with the WriteStartElement() method. Then you can add the contained child elements, as shown here:
...
writer.WriteStartElement("Starring");
writer.WriteElementString("Star", "Keanu Reeves");
writer.WriteElementString("Star", "Laurence Fishburne");
...
At this point the code has written all the data for the current DVD. The next step is to close all the opened tags, in reverse order. To do so, you just call the WriteEndElement() method once for each element you’ve opened. You don’t need to specify the element name when you call WriteEndElement(). Instead, each time you call WriteEndElement() it will automatically write the closing tag for the last opened element.
...
//Close the <Starring> element. writer.WriteEndElement();
//Close the <DVD> element. writer.WriteEndElement();
...


C H A P T E R 1 2 ■ X M L |
435 |
■Note Keep in mind that when you use the XmlTextWriter to create an XML file, you face all the limitations that you face when writing any other type of file in a web application. In other words, you need to take safeguards (such as generating unique filenames) to ensure that two different clients don’t run the same code and try to write the same file at once. Chapter 13 has more information about file access and dealing with these types of problems.
Reading XML Files
The following are ways to read and navigate the content of an XML file:
Using XmlDocument: You can load the document using the XmlDocument class mentioned earlier. This holds all the XML data in memory once you call Load() to retrieve it from a file or stream. It also allows you to modify that data and save it back to the file later. The XmlDocument class implements the full XML DOM.
Using XPathNavigator: You can load the document into an XPathNavigator (which is located in the System.Xml.XPath namespace). Like the XmlDocument, the XPathNavigator holds the entire XML document in memory. However, it offers a slightly faster, more streamlined model than the XML DOM, along with enhanced searching features. Unlike the XmlDocument, it doesn’t provide the ability to make changes and save them.
Using XmlTextReader: You can read the document one node at a time using the XmlTextReader class. This is the least expensive approach in terms of server resources, but it forces you to examine the data sequentially from start to finish.
The following sections demonstrate each of these approaches to loading the DVD list XML document.
Using the XML DOM
Figure 12-2 shows the final web page that reads the DVDList.xml document and displays a list of elements, using different levels of indenting to show the overall structure.
The XmlDocument stores information as a tree of nodes. A node is the basic ingredient of an XML file and can be an element, an attribute, a comment, or a value in an element. A separate XmlNode object represents each node, and nodes are grouped together in collections.
You can retrieve the first level of nodes through the XmlDocument.ChildNodes property. In this example, that property provides access to the <DvdList> element. The <DvdList> element contains other child nodes, and these nodes contain still more nodes and the actual values. To drill down through all the layers of the tree, you need to use recursive logic, as shown in this example.

436 C H A P T E R 1 2 ■ X M L
Figure 12-2. Retrieving information from an XML document
When the example page loads, it creates an XmlDocument object and calls the Load() method, which retrieves the XML data from the file. It then calls a recursive function in the page class named GetChildNodesDescr(). GetChildNodesDescr() takes an XmlNodeList object as an input and the index of the nesting level. It then returns the string with the content for that node and all its child nodes and attributes.
private void Page_Load(object sender, System.EventArgs e)
{
string xmlFile = Server.MapPath("DvdList.xml");
//Load the XML file in an XmlDocument. XmlDocument doc = new XmlDocument(); doc.Load(xmlFile);
//Write the description text.
XmlText.Text = GetChildNodesDescr(doc.ChildNodes, 0);
}

C H A P T E R 1 2 ■ X M L |
437 |
THE XMLDOCUMENT AND USER CONCURRENCY
In a web application, it’s extremely important to pay close attention to how your code accesses the file system. If you aren’t careful, a web page that reads data from a file can become a disaster under heavy user loads. The problem occurs when two users access a file at the same time. If the first user hasn’t taken care to open a shareable stream, the second user will receive an error.
You’ll learn more about these issues in Chapter 13. However, all of this raises an excellent question—how does the XmlDocument.Load() method open a file? To find the answer, you need to dig into the IL code of the .NET Framework. What you’ll find is that several steps actually unfold to load an XML document into an XmlDocument object. First, the path you supply is examined by an XmlUrlResolver and passed to an XmlDownloadResolver, which determines whether it needs to make a web request (if you’ve supplied a URL) or can open a FileStream (if you’ve supplied a path). If it can use the FileStream, it explicitly opens the FileStream with shareable reads enabled. As a result, if more than one user loads the file with the XmlDocument.Load() method at once on different threads, no conflict will occur. Of course, the best approach is to reduce contention by using caching (see Chapter 11).
When the Page.Load event handler calls GetChildNodesDescr(), it passes an XmlNodeList object that represents the first level of nodes. (The XmlNodeList contains a collection of XmlNode objects, one for each node.) The code also passes 0 as the second argument of GetChildNodesDescr() to indicate that this is the first level of the structure. The string returned by the GetChildNodesDescr() method is then shown on the page using a Literal control.
■Tip What if you want to create an XmlDocument and fill it based on XML content you’ve drawn from another source, such as a field in a database table? In this case, instead of using the Load() method, you would use LoadXml(), which accepts a string that contains the content of the XML document.
The interesting part is the GetChildNodesDescr() method. It first creates a string with three spaces for each indentation level that it will later use as a prefix for each line added to the final HTML text.
private string GetChildNodesDescr(XmlNodeList nodeList, int level)
{
string indent = "";
for (int i=0; i<level; i++) indent += " ";
...
Next, the GetChildNodesDescr() method cycles through all the child nodes of the XmlNodeList. For the first call, these nodes include the XML declaration, the comment, and the <DvdList> element. An XmlNode object exposes properties such as NodeType, which identifies the type of item (for example, Comment, Element, Attribute, CDATA, Text, EndElement, Name, and Value). The code checks for node types that are relevant in this example and adds that information to the string, as shown here:
...
StringBuilder str = new StringBuilder(""); foreach (XmlNode node in nodeList)
{
switch(node.NodeType)
{
case XmlNodeType.XmlDeclaration: str.Append("XML Declaration: <b>");