| Guidelines:
Mapping an Object Model to a Relational Data ModelTopicsRelational databases and object orientation are not entirely compatible. They represent
two different views of the world: in an RDBMS, all you see is data; in an Object-Oriented
system, all you see is behavior. It is not that one perspective is better than the other:
the Object-Oriented model tends to work well for systems with complex behavior and
state-specific behavior in which data is secondary, or systems in which data is accessed
navigationally in a natural hierarchy (e.g. bills of materials).  The RDBMS model is
well-suited to reporting applications and systems in which the relationships are dynamic
or ad-hoc. The real fact of the matter is that a lot of information is stored in relational
databases, and if Object-Oriented applications want access to that data, they need to be
able to read and write to an RDBMS. In addition, Object-Oriented systems often need to
share data with non-Object-Oriented systems. It is natural, therefore, to use an RDBMS as
the sharing mechanism. While object-oriented and relational design share some common characteristics,
fundamental differences make seamless integration a challenge. The data models and
computational models are different. The relational model is composed of entities and relations. An entity may be a physical
table or a logical projection of several tables also known as a view. The figure below
illustrates LINEITEM and PRODUCT tables and the various relationships between them. A
relational model has the following elements: 
 A Relational Model An entity has columns. Each column is identified by a name and a type. In the Figure 1,
the LINEITEM entity has Description, Id, Number, Order_Id, and Quantity columns.  An entity has records or rows. Each row represents a unique set of information which
typically represents an object's persistent data.  Each entity has one or more primary keys. The primary keys uniquely identifies each
record (e.g. Id is the primary key for LINEITEM table).  Support for relations is vendor specific. The example illustrates the logical model and
the relation between the PRODUCT and LINEITEM tables. In the physical model relations are
typically implemented using foreign key / primary key references. If one entity relates to
another, it will contain columns which are foreign keys. Foreign key columns contain data
which can relate specific records in the entity to the related entity.  Relations have multiplicity(also known as cardinality). Common cardinalities are one to
one (1:1), one to many (1:m), many to one (m:1), and many to many (m:n). In the example,
LINEITEM has a 1:1 relationship with PRODUCT and PRODUCT has a 0:m relationship with
LINEITEM.  An object model contains, among other things, classes (see the Unified Modeling
Language, version 1.1, for a complete definition of an object model). Classes define the
structure and behavior of a set of objects, sometimes called objects instances.
The structure is represented as attributes (data values) and associations (relationships
between classes)The following figure illustrates a simple class diagram model, showing
only attributes (data) of the classes.  
 An Object Model (Class Diagram) An Order has a number (the Order Number), and an association to 1 or more (1..*) Line
Items.  Each Line Item has a quantity (the quantity ordered) The LineItem  The object model supports inheritance. A class can inherit data and behavior from
another class (e.g. SoftwareProduct and HardwareProduct products inherit attributes and
methods from Product class).  The majority of business applications utilize relational technology as a physical data
store. The challenge facing object-oriented applications developers is to sufficiently
separate and encapsulate the relational database so that changes in the data model do not
"break" the object model, and vice versa. Many solutions exist which let
applications directly access relational data; the challenge is in achieving a seamless
integration between the object model and the data model.   Database APIs come in standard flavors (e.g. Microsoft's Open Data Base Connectivity
API, or ODBC) and are proprietary (native bindings to specific databases). The APIs
provide data manipulation language (DML) pass through services which allow applications to
access raw relational data. In object-oriented applications, the data must undergo
object-relational translation prior to being used by the application. This requires
considerable amount of application code to translate raw database API results into
application objects. The role of the object-relational framework is to generically
encapsulate the physical data store and to provide appropriate object translation
services.  
 The Role of a Persistence Framework Application developers spend over 30% of their time implementing relational database
access in object-oriented applications. If the object-relational interface is not
correctly implemented, the investment is lost. Implementing an object-relational framework
captures this investment. The object-relational framework can be reused in subsequent
applications reducing the object-relational implementation cost to less than 10% of the
total implementation costs. The most important cost to consider when implementing any
system is maintenance. Over 60% percent of the total costs of a system over its entire
life-cycle can be attributed to maintenance. A poorly  implemented object relational
system is both a technical and financial maintenance nightmare. 
  Performance. Close consideration must be given towards decomposing
    objects into data and composing objects from data. In systems where data through-put is
    high and critical, this is often an Achilles heel of an inadequately designed access
    layer. Minimize design compromises. A familiar pattern to object technologists
    who have built systems which utilize relational databases is to adjust the object model to
    facilitate storage into relational systems, and to alter the relational model for easier
    storage of objects. While minor adjustments are often needed, a well designed access layer
    minimizes both object and relational model design degradation. Extensibility. The access layer is a white-box framework which allows
    application developers to extend the framework if certain functionality is desired in the
    framework. Typically, an access layer will support without extension 65-85% of an
    application's data storage requirements. If the access layer is not designed as an
    extensible framework, achieving the last 35-15% of an application's data storage
    requirements can be very difficult and costly. Documentation. The access layer is a both a black-box component, and a
    white-box framework. The API of the black-box component must be clearly defined, well
    documented, and easily understood. As previously mentioned, the access layer is designed
    to be extended. An extensible framework must be very thoroughly documented. Classes which
    are intended to be subclassed must be identified. The characteristics of each relevant
    class's protocol must be specified (e.g. public, private, protected, final, 
).
    Moreover, a substantial portion of the access layer framework's design must be exposed and
    documented to facilitate extensibility. Support for common object-relational mappings. An access layer should
    provide support for some basic object-relational mappings without the need for extension.
    These object-relational mappings are discussed further in a subsequent section of this
    document.Persistence Interfaces: In an object oriented application, the business
    model for an object application captures semantic knowledge of problem domain. Developers
    should manipulate and interact with objects without having to worry too much about the
    data storage and retrieval details. A well defined subset of persistent interfaces (save,
    delete, find) should be provided to application developers.  Common patterns are emerging for object-relational applications. IT professionals who
have repeatedly crossed the chasm are beginning to understand and recognize certain
structures and behaviors which successful object-relational applications exhibit. These
structures and behaviors have been formalized by the high-level CORBA Services
specifications (which apply equally well to COM/DCOM-based systems as well).
 The CORBA service specifications which are applicable and useful to consider for
object-relational mapping are:
 The following sections will use these categories to structure a discussion of common
object-relational services. The reader is encouraged to reference the appropriate CORBA
specifications for further details. Persistence is a term used to describe how objects utilize a secondary storage medium
to maintain their state across discrete sessions. Persistence provides the ability for a
user to save objects in one session and access them in a later session. When they are
subsequently accessed, their state (e.g. attributes) will be exactly the same as it was
the previous session. In multi-user systems, this may not be the case since other users
may access and modify the same objects. Persistence is interrelated with other services
discussed in this section. The consideration of relationship, concurrency and others is
intentional (and consistent with CORBA's decomposition of the services). Examples of specific services provided by persistence are: 
 
  Data source connection management: Object-relational applications must
    initiate connection to the physical data source. Relational database systems typically
    require identification of the server and database. The specifics of connection management
    tends to be database vendor specific and the framework must accordingly be designed in a
    flexible accommodating manner. Object retrieval: When objects are restored from the database, data is
    retrieved from the database and translated into objects. This process involves extracting
    data from database specific structures retrieved from the data source, marshaling the data
    from database types into the appropriate object types and/or classes, creation of the
    appropriate object, and setting the specific object attributes. Object storage: The process of object storage mirrors object retrieval.
    The values of the appropriate attributes are extracted from the object, a database
    specific structure is created with the attribute values (this may be a SQL string, stored
    procedure, or special remote procedure call), and the structure is submitted to the
    database.  Object deletion: Objects that are deleted from within a system, must
    have their associated data deleted from the relational database. Object deletion requires
    that appropriate information be extracted from the object, a deletion request be
    constructed (this may be a SQL string, stored procedure, or special remote procedure
    call), and the request submitted to the database.  Note that in some languages (e.g.
    Smalltalk and Java), explicit deletion is not supported; instead, a strategy called garbage
    collection is supported.  Persistence frameworks supporting these languages
    must provide an alternative way to remove data from the database once applications no
    longer reference the data.  One common way is for the database to maintain reference-counts
    of the number of times an object is referenced by other objects.  When the
    reference count for an object drops to zero, no other objects reference it, and it may
    be possible to delete it.  It may be acceptable to delete objects
    with a reference count of zero, since even when an object is no longer referenced, it may
    still be queried.  A database-wide policy on when object deletion is allowed is still
    needed. Persistent object storage is of little use without a mechanism to search for and
retrieve specific objects. Query facilities allow applications to interrogate and retrieve
objects based on a variety of criteria. The basic query operations provided by an
object-relational mapping framework are find and find unique. The find unique operation
will retrieve a specific object and find will return a collection of objects based on a
query criteria.  Data store query facilities vary significantly. Simple file-based data stores may
implement rigid home-grown query operations, while relational systems provide a flexible
data manipulation language. Object-relational mapping frameworks extend the relational
query model to make it object-centric rather than data centric. Pass-through mechanisms
are also implemented to leverage relational query flexibility and vendor-specific
extensions (e.g. stored-procedures). Note that there is some potential conflict between database-based query mechanisms and
the object paradigm: database query mechanisms are driven by values of
attributes (columns) in a table.  In the corresponding objects, the principle of
encapsulation prevents us from seeing the values of attributes; they are encapsulated
by the operations of the class.  The reason for encapsulation is that it makes
applications easier to change: we can alter the internal structure of a class without
concern for dependent classes as long as the publicly-visible operations of the class do
not change. A query mechanism based on the database is dependent on the internal
representation of a class, effectively breaking encapsulation.  The
challenge for the framework is to present queries from making the applications brittle to
changes. Transactional support enables the application developer to define an atomic unit of
work.  In database terminology, it means that the system must be able to apply a set
of changes to the database, or it must ensure that none of the changes are applied. The
operations within a transaction either all execute successfully or the transaction fails
as whole. Object-relational frameworks at a minimum should provide a relational
database-like commit/rollback transaction facility. Designing object-relational frameworks
in a multi-user environment can present many challenges and careful thought should be
given to it. In addition to the facilities provided by the persistence framework, the application
must understand how to handle errors.  When a transaction fails or is aborted, the
system must be able to restore its state to a stable prior state, usually by reading the
prior state information from the database.  Thus, there is a close interaction
between the persistence framework and the error handling framework. Multi-user object-oriented systems must control concurrent access to objects. When an
object is accessed simultaneously by many users, the system must provide a mechanism to
insure modifications to the object in the persistent store occur in a predictable and
controlled manner. Object-relational frameworks may implement pessimistic and/or
optimistic concurrency controls. 
 
  Pessimistic concurrency control requires that the application developer
    specify their intent when the object is retrieved from the data store (e.g. read only,
    write lock, 
). If objects are locked, other users may block when accessing the
    object and wait for the lock to be relinquished. Pessimistic concurrency should be used
    and implemented with caution as it is possible to create dead-lock situations.Optimistic concurrency control assumes that it is unlikely that the
    same object will be simultaneously accessed. Concurrency conflicts are detected when the
    modifications are saved to the database. Typically, if the object has been modified by
    another user since its retrieval, an error will be returned to the application indicating
    failure of the modify operation. It is the application's responsibility to detect and
    handle the error. This calls for the framework to cache the concurrent values of objects
    and compare them against the database.  Optimistic concurrency is less costly if
    there are few concurrency conflicts, but more expensive if the number of conflicts is
    fairly large (because of the need to re-do work when  conflicts occur). All applications using shared data must use the same concurrency strategy; you cannot
mix optimistic and pessimistic concurrency control in the same shared data or corruption
may occur.  The need for a consistent concurrency strategy is best handled through a
persistence framework. Objects have relationships to other objects. An Order object has many Line Item
objects. A Book object has many Chapter objects. An Employee object belongs to exactly one
Company object. In relational systems, relations between entities are implemented using
foreign key / primary key references. In object-oriented systems relations are usually
explicitly implemented through attributes. If an Order object has LineItem's, then Order
will contain an attribute named lineItems. The lineItems attribute of Order will contain
many LineItem objects. 
 The relationship aspects of an object-relational framework are interdependent with the
persistence, transaction, and query services. When an object is stored, retrieved,
transacted, or queried consideration must be given to its related objects:
 
  When an object is retrieved, should associated objects be retrieved as well?  
    Simplistically, yes, but doing so when the associated objects are not needed is very
    expensive.  A good framework will allow a mix of strategies.When an object is stored, should associated objects be stored as well if they have been
    changed?  Again, the answer depends on the context. While it is conceptually advantageous to consider common object-relational services
separately, their object-relational framework implementations will be codependent. The
services must be implemented consistently across not only individual organizations, but
all applications which share the same data.  A framework is the only economical way
to achieve this. The persistent classes in the Design Model represent the information the system must
store.  Conceptually, these classes may resemble a relational design (e.g. the
classes in the design model may be reflected in some fashion as entities in the relational
schema). As we move from elaboration into construction, however, the goals of the Design
Model and the Relational Data Model diverge. This divergence is caused because the
objective of relational database development is to normalize data whereas the goal of The
Design Model is to encapsulate increasingly complex behavior.  The divergence of
these two perspectives - data and behavior - leads to be need for mapping between related
elements in the two models. In a relational database written in third normal form, every row in the tables 
every "tuple"  is regarded as an object. A column in a table is equivalent
to a persistent attribute of a class (keep in mind that a persistent class may have
transient attributes). So, in the simple case where we have no associations to other
classes, the mapping between the two worlds is simple. The data type of the attribute
corresponds to one of the allowable data types for columns.  Example The class Customer: 
 when modeled in the RDBMS would translate to a table called Customer,
with the columns Name, Address and Customer_ID. We can visualize an instance of this table as: 
 For each persistent attribute, additional information is required to appropriately
model the persistent object in a relational data model: 
 
  Can it serve as a key or part of a key? Example: "attribute X, together with
    attribute Z, uniquely identifies the object." In the Customer table, the Customer_ID
    represents a primary key.What are the minimum and maximum values for the attribute?Will it be possible to search using this attribute as a key? It might, for instance, be
    part of a filter in a Select statement such as "It is common to search for all
    instances where Y > 1000".The attribute should have a description, e.g. "attribute X is the number of
    retransmissions per 100 000 transmitted charactersPossible numerical values and desired conversions between different numerical values.Who is allowed to update the attribute? Example: "T may only be changed by people
    in authority class nn".Who is allowed to read the attribute? Examples: "P may be read by people in
    authority classes yy and zz" or ""P is included in views Vi and Vj".Information about volumes and frequencies. Examples: "There are up to 50 000
    occurrences of A" or "On average 2000 As are changed per day".Is the attribute unique? For example, only one person can have the same drivers
    license number. Rows in tables need to have unique identity; they are uniquely identified by their
primary key values. The primary key will be used as a reference to a tuple in this table
from other tables, and it will uniquely identify the row/object for searches. As a result,
it must never change. Names in plain text are not suitable; people change their names, and
names are not unique. Because numeric comparisons are less resource-consumptive than
string comparisons, primary keys should be numeric, and preferably system-assigned so that
they do not change. Generally, primary keys are indexed for performance. This, however, is
an implementation decision and is independent of the object to table mapping.  Associations between two persistent objects are realized as foreign
keys to the associated objects. A foreign key is a column in one table
which contains the primary key value of associated object.  Example Assume we have the following association between Order and Customer: 
 When we map this into relational tables, we get an Order table and a
Customer table. The Customer table will have columns for attributes listed, plus an
additional column Customer_ID which contains foreign-key references to associated rows in
the Customer table.  For a given Order, the Customer_ID column will contain the
identifier of the Customer to whom the Order is associated. Foreign keys allow the RDBMS
to join related information together. Aggregation is also modeled using foreign key relationships. Example Assume we have the following association between Order and LineItem: 
 When we map this into relational tables, we get an Order table and a
Line_Item table. The Line_Item table will have columns for attributes listed, plus an
additional column Order_ID which contains foreign-key references to associated rows in the
Order table.  For a given Line Item, the Order_ID column will contain the Order_ID of
the Order that the Line Item is associated with. Foreign keys allow the RDBMS to join
related information together. In addition, to provide referential integrity in the data model, we would also want to
implement a cascading delete constraint, so that whenever the Order is
deleted, all of their Line Items are deleted as well. Note that in the object model, a customer knows about its purchase
orders; in the relational data model, a purchase order knows about its customers. This is
a feature of the way relationships are represented in a relational data model.  The standard relational data model does not support modeling inheritance associations
in a direct way. There are a number of strategies which can be used to
model inheritance. These are summarized as follows: 
 
  Use separate tables to represent the super-class and sub-class. Have, in the sub-class
    table, a foreign key reference to the super-class table. In order to
    instantiate a sub-class object, the two tables would have to be joined
    together. This approach is conceptually easier, and makes changes to the model easier, but
    it often performs poorly due to the extra work.Duplicate all inherited attributes and associations as separate columns in the sub-class
    table. This is similar to de-normalization in the standard relational
    data model. A standard technique in relational modeling is to use an intersection entity to
represent many-to-many associations. The same approach should be used here: an
intersection table should be used to represent the association. Example If Suppliers can supply many Products, and a Product can be supplied by
many Suppliers, the solution is to create a Supplier/Product table. This table would only
contain the primary keys of the Supplier and Product tables, and serves to link the
Suppliers and their related Products. There is no analog for this table in the object
model; it is strictly used to represent the associations in the relational data model. In the object model, referential integrity is not an issue; objects refer to each other
directly. In the relational data model, rows refer to other rows by their
primary key values. Referential integrity rules are needed to ensure that foreign key
references remain valid. Typically, these rules are implemented as integrity
constraints, or in some cases as triggers. The RDBMS reference
manuals can provide more direction on the appropriate implementation technique. DBA Database Administrator CORBA Common Object Request Broker Architecture DBMS Database Management System ODBC Open Database Connectivity, a generic database API standard
developed by Microsoft RDBMS Relational Data Base Management System 1NF First Normal Form 2NF Second Normal Form (etc.) ID# Keys in tables. The "#" represents integers; it is an "integer
pointer" 
 |