Title of Project: Management of Data Incorporating Intervals


Participants: Dr Roger Johnson, Georgia Garani, James Green

Aims: To develop a DBMS which provides fast, user-friendly facilities for the manipulation of data incorporating intervals

Description of the Research

Many real world objects have attributes associated with them that are not single values but ranges or intervals. Common examples include cartographic and geological data and long term studies of medical patients. Experience of manipulating interval data with existing data base management systems has shown that it becomes very difficult to formulate correct queries involving intervals. The project has developed a relationally complete extension, called the XRM (eXtended Relational Model), to Codd's relational algebra to support data containing intervals. The extension includes intervals as a new data type, along with the conventional types, such as integers and strings. Two new relational algebra operations, UNFOLD and FOLD, have been formally defined. These convert data between an interval form and an equivalent set of points. The definitions of the other conventional operators have been extended to support intervals. Thus it has been possible to prove that the XRM is a proper superset of Codd's Relational Model. Consequently, all the results of relational algebra extend to XRM, and the XRM behaves identically to the conventional relational model when processing data that does not include intervals.

Consider a simple example, concerning the salaries paid to some employees over some period of time, represented by d1, d2, etc. Consider the query:

Retrieve John's salaries during the period [d4,d12) and the period during which those amounts were in effect.

This can be expressed in the XRM as:

R= SELECT [Name="john", Period common_points [d4,d12) ] (SALARY)
S= PROJECT [Amount,Period](R)

The result is given by the relation S; common_points is one of a series of new interval comparison operators developed in earlier work.


Extensive testing of users has shown that the relational calculus is the preferred style of query language. The standardisation of SQL has also added greatly to its importance. So far our work has concentrated on a relational algebra extension, in order to obtain a formal definition of the language. It is now essential to develop an extension to SQL to support generalised intervals.

Some preliminary work has suggested that, in SQL, the query would appear as:

SELECT Amount, Period
FROM SALARY
WHERE Name="john"
AND Period common_points [d4,d12)

A prototype implementation of the extension to the relational algebra has shown that the conversion of intervals to enumerated sets of points, as used in the formal statement of the relational algebra, is an unreasonable computational task. However, in addition to applying conventional optimisation techniques to queries, the use of a "lazy" approach to the enumeration of sets of points appears to offer an opportunity to avoid explicit enumeration in a wide range of circumstances. This should ensure that the system performance will not be substantially different from the evaluation of a conventional SQL query.

This work was systematically evaluated in the USA against eight other proposed extensions, as reported by McKenzie and Snodgrass. The survey showed that this work was the best currently available when measured against a broad-based range of criteria. The evaluation was of temporal extensions to relational languages. However, the earlier work has created the first generalised interval model.

Recent work has been devoted to assessing the suitability of the nested relational model to support interval based attribute timestamped models. This work led to the publication of what is claimed to be the first fully comprehensive nested Join algorithm. The work is expected to be presented in a PhD later in 2001.

Other work being carried out involves comparing the usability of two alternative temporal database extensions IXSQL and TSQL2. This has involved firstly developing a DBMS to support both SQL extensions. User Guides are currently being developed to explain each extension. Once these are complete initial tests with small groups of students will take place to evaluate the approach which if successful will be applied to larger student groups. It is hoped that the study of these two extensions will show whether there is value in comparative testing of language extensions, such as those to SQL, before final adoption.

Applications

The generalised interval extension to the relational model as described in our papers has given rise to much interest. It is clear that there are very widespread areas of application, including soil science, medicine, geography and forestry.

Publications

Garani, G. and Johnson, R. G. Joining Nested Relations and Sub-relations.  Information Systems Vol 25, 4 (2000) pp 287-307

Garani, G. and Johnson, R. "Nest and Unnest in Nested Relations Revisited". University of London Publications, 2000. ISBN: 071871637X
 
Jang, Y.P. and Johnson, R.G. Nested relation based temporal data representation. Proceedings of the 4th International Hong Kong Computer Society Database Workshop, Hong Kong, (1992), 94-111.

Jang, Y.P. and Johnson, R.G. Evolutions of object states in temporal object-oriented databases. Proceedings of the ACM Computer Science Conference, Phoenix, Arizona, (1994), 304-11.

Johnson, R. and Garani, G. "Joining Nested Sub-Relations", Technical Report, No. 9701, Birkbeck College, University of London, 1997

Johnson, R. and Garani, G. "A Temporal Database Model Using Nested  Relations" Revised Edition, Technical Report, No. 9608, Birkbeck College, University of London, 1996

Johnson, R. and Garani, G. "A Temporal Database Model Using Nested  Relations", Technical Report, No. 9518, Birkbeck College, University of London, 1995

Johnson, R.G. (with N.A. Lorentzos). An extension of the relational model to support generic intervals. Proceedings of the International Conference on Extending Database Technology, eds. J.W.Schmidt, S.Ceri and M.Missikoff, Heidelberg: Springer-Verlag, (1988), 528-42.

Johnson, R.G. (with N.A. Lorentzos). Extending relational algebra to manipulate temporal data. Information Systems, 13(1988), 289-96.

Johnson, R.G. (with N.A. Lorentzos). Requirements specification for a temporal extension to the relational model. IEEE TC Bulletin on Database Engineering, 11(1988), 26-33.

Lorentzos, N.A. A formal representation of the relational model for the presentation and manipulation of generic intervals. PhD Thesis, Birkbeck College, University of London, 1988.

Lorentzos, N.A. (with V.J. Kollias). The handling of depth and time intervals in soil information systems. Computers and Geosciences, 15(1989), 395-401.