Chapter 4 XML Processing
183
because such processing is potentially very CPU, memory, and input/output or
network intensive.
Why does XML document processing potentially impact performance so sig
nificantly? Recall that processing an incoming XML document consists of multi
ple steps, including parsing the document; optionally validating the document
against a schema (this implies first parsing the schema); recognizing, extracting,
and directly processing element contents and attribute values; or optionally
mapping these components to other domain specific objects for further process
ing. These steps must occur before an application can apply its business logic to
the information retrieved from the XML document. Parsing an XML document
often requires a great deal of encoding and decoding of character sets, along with
string processing. Depending on the API that is used, recognition and extraction
of content may consist of walking a tree data structure, or it may consist of inter
cepting events generated by the parser and then processing these events according
to some context. An application that uses XSLT to preprocess an XML document
adds more processing overhead before the real business logic work can take place.
When the DOM API is used, it creates a representation of the document a DOM
tree in memory. Large documents result in large DOM trees and corresponding
consumption of large amounts of memory. The XML data binding process has, to
some extent, the same memory consumption drawback. Many of these constraints
hold true when generating XML documents.
There are other factors with XML document processing that affect perfor
mance. Often, the physical and logical structures of an XML document may be
different. An XML document may also contain references to external entities.
These references are resolved and substituted into the document content during
parsing, but prior to validation. Given that the document may originate on a
system different from the application's system, and external entities and even
the schema itself may be located on remote systems, there may be network over
head affecting performance. To perform the parsing and validation, external enti
ties must first be loaded or downloaded to the processing system. This may be a
network intensive operation, or require a great deal of input and output operations,
when documents have a complex physical structure.
In summary, XML processing is potentially CPU, memory, and network
intensive, for these reasons:
It may be CPU intensive. Incoming XML documents need not only to be
parsed but also validated, and they may have to be processed using APIs which
may themselves be CPU intensive. It is important to limit the cost of validation
New Page 1