184
Performance Considerations
as much as possible without jeopardizing the application processing and to use
the most appropriate API to process the document.
It may be memory intensive. XML processing may require creating large num
bers of objects, especially when dealing with document object models.
It may be network intensive. A document may be the aggregation of different
external entities that during parsing may need to be retrieved across the net
work. It is important to reduce as much as possible the cost of referencing ex
ternal entities.
Following are some guidelines for improving performance when processing
XML documents. In particular, these guidelines examine ways of improving the
CPU, memory, and input/output or network consumption.
4.5.1 Limit Parsing of Incoming XML Documents
In general, it is best to parse incoming XML documents only when the request has
been properly formulated. In the case of a Web service application, if a document is
retrieved as a
Source
parameter from a request to an endpoint method, it is best first
to enforce security and validate the meta information that may have been passed as
additional parameters with the request.
In a more generic messaging scenario, when a document is wrapped inside
another document (considered an envelope), and the envelope contains meta
information about security and how to process the inner document, you may apply
the same recommendation: Extract the meta information from the envelope, then
enforce security and validate the meta information before proceeding with the
parsing of the inner document. When implementing a SAX handler and assuming
that the meta information is located at the beginning of the document, if either the
security or the validation of the meta information fails, then the handler can throw
a SAX exception to immediately abort the processing and minimize the overall
impact on performance.
4.5.2 Use the Most Appropriate API
It's important to choose the most appropriate XML processing API for your particu
lar task. In this section, we look at the different processing models in terms of the
situations in which they perform best and where their performance is limited.
In general, without considering memory consumption, processing using the
DOM API tends to be slower than processing using the SAX API. This is because
New Page 1