276
Data Integration Guidelines
Another example of data transformation might involve customer data. Cus
tomer data spans a range of information, and might include identity and address
information as well as credit and past ordering information. Different systems
may be interested in different parts of this customer data, and hence each system
may have a different notion of a customer.
Even schemas, including industry standard schemas such as Electronic Data
Interchange For Administration, Commerce, and Transport (EDIFact), Universal
Business Language (UBL), and RosettaNet, must be transformed to each other.
Often enterprises need to use these industry standard formats for external commu
nications while at the same time using proprietary formats for internal communi
cations.
One way you might solve the data transformation problem is to require that all
systems use the same standard data format. Unfortunately, this solution is unreal
istic and impractical, as illustrated by the Y2K problem of converting the repre
sentation of a calendar year from two digits to four digits. Although going from
two to four digits should be a minor change, the cost to fix this problem was enor
mous. System architects must live with the reality that data transformations are
here to stay, since different systems will inevitably have different representations
of the same information.
E
A good strategy for data transformation is to use the canonical data model.
An enterprise might set up one canonical data model for the entire enterprise
or separate models for different aspects of its business. Essentially, a canonical
data model is a data model independent of any application. For example, a canon
ical model might use a standard format for dates, such as
MM/DD/YYYY
. Rather than
transforming data from one application's format directly to another application's
format, you transform the data from the various communicating applications to
this common canonical model. You write new applications to use this common
format and adapt legacy systems to the same format.
E
Use XML to represent a canonical data model.
XML provides a good means to represent this canonical data model, for a
number of reasons:
XML, through a schema language, can rigorously represent types. By using
XML to represent your canonical model, you can write various schemas that
<
New Page 1
Clan Web Hosting