Specialization and generalization is the subclass and superclass hierarchy mechanism in a relational database akin to the concept of inheritance in Java. The idea is simple yet provokes confusion at times due to its unmatched organization between relational schema and object-oriented technology. The way relational databases function does not quite adhere to object-oriented language reciprocity. ORM tools can be a way out in many cases, but we cannot ignore the complexity of inviting another layer of complexity into what may be an already complex situation. This article shall explore the idea and try to give some implementation hints in simpler terms.
Specialization and Generalization
Specialization is defined as the process of subclassing a superclass entity on the basis of some distinguishing characteristic of the entity in the superclass. The reason for creating such hierarchical relationship is that:
- Certain attributes of the superclass may apply to some but not all entities of the superclass. These extended subclasses then can be used to encompass the entities to which these attributes apply.
Figure 1: Subclasses of Employee
- Maybe one or more of the subclasses participates in a relationship with another classes that neither concerns its siblings or parent classes.
Figure 2: Relationship between members of a subclass
Generalization is the bottom-up process of abstraction, where we club the differences among entities according to the common feature and generalize them into a single superclass. The original entities are thus subclasses of it. In more simple terms, it is just the reverse of specialization, which is a top-down process whereas generalization is bottom up. That's it.
So, basically when we are referring only to specialization, it applies to both specialization and generalization as well, as one is the flip side of the other.
Let's take an example of a problem representing the financial transactions of a banking system. Here, we create three individual transaction classes, such as BalanceInquiry, Withdrawal, and Deposit to represent the transactions that the system can perform.
Figure 3: Three transaction classes
Observe that the classes have one common attribute (AccountNo) and one common operation (process). Each class requires an accountNo attribute to identify the account on which the transaction will apply. Clearly, the classes represent the type of transaction. So, in view of the common functionality, we can synthesize the relation into a generalized class, called Transaction, to model inheritance.
Figure 4: Finding the commonality
The relation indicates that the classes BalanceInquiry, Withdrawal, and Deposit extend Transaction. The class Transaction (superclass) is the generalization of (subclasses) BalanceInquiry, Withdrawal, and Deposit when viewed from a bottom-up hierarchy and, from top-down, these subclasses are said to be a specialization of the superclass alternatively. As should be obvious, Transaction is an abstract class and the method process in the same class is also abstract.
Where Complexity Creeps In
Representing hierarchical relationships in a relational database is tricky. There is no standard way to implement them in SQL. So, a mapping object-oriented hierarchy in a relational schema requires a different approach altogether.
- One approach is to create a single table for all the classes in the hierarchy. Here, we have one table that contains all the data for all the classes. Each class stores relevant data in one row and any irrelevant columns in the row are kept empty. The problem in this approach, as may be assumed, is a lot of wasted space in the case of one or more irrelevant column data that may occur more often than expected. Moreover, the table may end up being too large, thus hurting performance with indexes and frequent locking.
- Or, we can create one table for each of the concrete classes in the hierarchy. The superclass will merely be an idea, and will not have any concrete schema in the relational hierarchy. So, if there are any changes to a superclass from Java, the programmer must remember to cascade the alteration to all the tables. The situation becomes awkward and quite unmanageable with bigger changes and can be a compromise to referential integrity.
- Another approach is to create one table per class in the hierarchy. This may be the simplest solution, but it is fraught with performance problems because it needs multiple joins to load a single object.
With no one best approach, the mapping often requires a smart mix and match of patterns because they are not mutually exclusive. One good thing Java did is to remove the option of multiple inheritance (except for interfaces). This discarded the complexity of mapping to a good degree.
These are only glimpses of the complexity we encounter when applying only the concept of specialization/generalization between two unmatched platforms. Perhaps we can appreciate a little more the pains of creating an ORM tool that makes our life a little easier. One thing to be remembered is that a relational database and Java are two different species; matching them together is obviously a complex task. We can either do it with an ORM tool with a little performance tradeoff or take the burden ourselves by doing it from scratch.