In this thesis we examine the issues relevant to the design and implementation of an interactive user interface for a binary-relational database management system augmented with a deductive and constraint enforcement capability. We review previous work in the areas of semantic data models, interface language design and the application of first-order logic to databases. We then describe an implemented system which builds on this work and on our own experience with earlier experimental versions. The system provides uniform language constructs based on English sentences for all aspects of the use of a DBMS intended for non-programmers. In particular it facilitates the definition and manipulation of non-binary relationships and complex objects without reference to the underlying binary relational structure.
We examine the assumptions made about the truth or falsity of facts which are neither known to be true nor known to be false. This examination is undertaken in the context of database systems with a deductive and a constraint enforcement component. Consequently, we review those methods which have been proposed for:
Based upon the conclusions reached in our examination of these areas we define and propose the use of a guarded default database as a method of controlling the extent of incompleteness of the database. We describe a prototype implementation of a guarded default database system, and show how such a system can be implemented using binary relational storage structures as the underlying storage component.
An approach to the storage of binary relations is by use of a triple store and lexical token convertor. The triple store contains ordered 3-tuples comprising fixed-length internal identifiers, and the lexical token convertor maps such identifiers to external printable representations. This thesis considers ways of implementing triple stores in software. It describes an implementation based on a development of the grid file of Nievergelt et al. In this development, a query performance tuning facility is proposed which minimises the average number of disk accesses on the basis of the supplied probabilistic information on the expected queries, and a new method is used to organise the grid file directory. The tuning facility and the directory organisation are both evaluated by experimentation. The thesis also considers the functionality of the lexical token convertor and gives a new software design for its implementation.
In this thesis we consider the integration of a functional model for real-world data with the functional computational model of functional programming languages to give a computationally complete database language which has a simple semantics and yet is capable of expressing a variety of information.
We review previous database languages which adopt a functional data model and discuss the weaknesses of these languages, notably with respect to
We also review functional programming and note the unsuitability of existing functional programming languages for a database environment.
In the main part of the thesis we introduce the new database language FDL and show how this language overcomes the drawbacks previously identified. FDL adopts a unified functional model both for the representation of real-world data and for computation. The language uses a single formalism, namely equations, for the representation of information in the form of facts, procedures, derivation rules, default rules and global semantic integrity constraints.
FDL has been implemented over a binary relational storage structure. We discuss this implementation and, in particular, the storage of equations in a pre-interpreted, as opposed to source, form. We show how any expression on the right hand side of an equation is stored in the form of a graph, ready for subsequent evaluation by a graph-reduction based evaluator.
This thesis describes a functional database language, Fudal, which incorporates modal operators along with a flexible type of null value in order to handle incomplete information.
The null values which Fudal allows are of the form
Oneof L, indicating that the value of some
particular data item is unknown but must be a member of a
list L. The information content of a Fudal database is
defined by its set of completions, i.e. the
databases that can be obtained by replacing each
Oneof L with a value in the associated list
L. A Fudal database is queried by entering an expression.
If the database contains no nulls then the answer to the
query is the value of the expression. Otherwise the answer
is V1 OR ... OR Vn where V1, V2, ..., Vn
are the different values of the expression in the different
completions of the database, and where OR
is a reserved word in the language. The Modal operators
used in Fudal are Maybe and
Surely. If P is a truth-valued
expression then Surely P is true iff
P is true in all completions of the database.
Maybe P is true iff there is at least one
completion in which P is true.
The principal advantages of Fudal's treatment of incomplete
information are that, unlike languages using truth-functional
logic, we always get intuitively reasonable answers to
queries, and that incomplete information can be represented
and queried in a more expressive manner.
The thesis gives a detailed, though informal, description
of Fudal's semantics, along with examples of its use. The
implementation is also described. Various optimisations,
in particular the use of integer programming techniques
are discussed.
Most current database systems use a record-oriented data model. This makes efficient use of storage and allows data to be organised to optimise common queries. A drawback, however, is that such systems model the semantics of an application domain poorly and hence cannot provide facilities to retrieve all the information associated with a given entity or its association with another entity. Such associational queries correspond to finding paths through instance-level data. Data models such as the binary-relational or functional, capture more application semantics and make such path searches feasible. However previous systems based on such models have not provided support for associational facilities.
This thesis describes the design, and development of Hydra - a computationally complete functional database language which supports storage of both extensional and intensional data as well as associational features. Associational facilities are provided by built-in primitives which return functions which are applicable to a given value or which correspond to instance-level paths between two values. To allow easy reference to entities Hydra uses visible surrogates which may be system-generated or user specified. To integrate associational primitives into a functional language Hydra adopts a view of functions which distinguishes the application relationship they model from their value. A further enhancement has been to modify the standard polymorphic type system to include a new universal type. Besides allowing associational primitives to be well-typed this also makes possible the storage and retrieval of heterogeneous functions which store values of arbitrary type.
Hydra is, we believe, the first database language which organises information in a network and supports both conventional and associational facilities. This provides greater semantic power particularly for specialist application domains, such as criminal investigations, where there is a need to find associations in large quantities of data.
Recent advances in hardware and software combined with the human ability to efficiently process visual information have generated currently considerable interest in the field of Visual Programming. This thesis considers visual techniques for extracting and transforming information from a database. To this end we propose Gql, a declarative graphical query language based on the functional data model. The aim is to enable non-programmer users to pose ad-hoc queries directly to the database; database specialists could also use the language as a productivity enhancement tool.
In Gql a query is represented by a single diagram and the language is declarative in the sense that:
In order to formally define the syntax and semantics of the language, we abstract from particular graphical representations and concentrate on sets of conceptual objects. At this conceptual level, a query corresponds to a collection of objects and their attributes, called the base structure, and it must obey certain rules for it to be a legal Gql query. The semantics are given as a mapping from a base structure instance to a list abstraction based on the l-calculus.
A system has been implemented which allows users to enter their queries in Gql and retrieve the results from a database maintained by FDL (a Functional Database Language). Major parts of this system are a graphical editor for drawing Gql queries and a Gql to FDL translator. We note that in the same way that we can have different visual representations and interfaces for Gql, it is possible to have different underlying database engines and translators. The language could also be used with a number of different object-based data models.
Finally, a number of extensions to Gql which allow for the expression of updates, recursive queries and ultimately general computations are discussed.
Textual database programming languages are computationally complete, but have the disadvantage of giving the user a non-intuitive view of the information being manipulated. Visual languages developed in recent years allow naive users access to direct representations of data, often in graph form, but have concentrated on user interface considerations rather than complex programming tasks. There is a need for a system which combines the advantages of both these approaches.
We describe an implementation of Spider, an experimental visual database programming language aimed at programmers. It uses a graph rewriting paradigm as a basis for a fully visual, computationally complete language. The graphs it rewrites represent the schema and instances of a database. In principle the graphs may be arbitrary in structure. They also contain application nodes that initiate computation.
Programs written in Spider consist of transformations formed from a sequence of graph rewrites. Each has a LHS graph that specifies a template of the subgraph to be rewritten and a corresponding RHS graph that indicates which parts of the matched subgraph are to be deleted, and where to add nodes and arcs.
There may be many potential matching subgraphs. To choose one in a deterministic manner, Spider uses a subgraph ordering method that is based on sorting the nodes and arcs that appear in the entire graph. The method ensures that any non-equivalent primitives have a defined order.
The unique graph rewriting method used by Spider has syntactic and semantic simplicity. This can be seen by the small number of primitives and constructs that define the language. Its form of algorithmic expression allows complex computation to be represented in concise programs. Furthermore, Spider has greater power than normally provided in textual database systems, and we show that queries on the schema and associative queries can be performed without requiring additions to the language.
An inherent conflict arises with functional languages in the database context between change of state on update and the concept of referential transparency. Passing the database itself as a parameter resolves the issue conceptually but is not a practical approach. Update by side-effect is efficient but necessarily implies loss of referential transparency. Additionally side-effects are often incompatible with lazy evaluation since the occurrence and sequence of updates can become non-intuitive.
In this thesis we consider ways of applying updates to a functional database system in a functional manner. The database system in question is FDL which is lazy and supports the entity-function model. We include a review of current relevant research in handling state in functional programming languages. Our approach is to accept that a destructive change of database state will result in loss of global referential transparency but that referential transparency can be scoped so that areas of a program can be identified which retain the property. Equational reasoning and consequent optimisation can then be applied within these scopes.
We design and implement a variant of the functional database language FDL, called Relief, which is lazy and has been extended with functions to facilitate updates and reading files. These functions work by means of side-effects which are explicitly sequenced with an eager let expression and a derivative sequencing construct. Our approach includes the ability to program transactions. To redress the loss of global referential transparency an effects checker identifies referentially transparent regions or scopes. We discuss the benefitrs of our solution and illustrate them with examples.
Indexing of multi-dimensional data has been the focus of a considerable amount of research effort over many years but no generally agreed paradigm has emerged to compare with the impact of the B-Tree, for example, on the indexing of one-dimensional data. At the same time, the need for efficient methods is ever more important in an environment where databases become larger and more complex in their structures.
Mapping multi-dimensional data to one dimension, thus enabling one-dimensional access methods to be exploited, has been suggested in the literature but for the most part interest has been confined to the Z-order curve. The possibility of using other curves, such as the Hilbert and Gray-code curves, whose characteristics differ from those of the Z-order curve, has also been suggested.
In this thesis we design and implement a working file store which is underpinned by the principle of mapping multi-dimensional data to one of a variety of space-filling curves and their variants. Data is then indexed using a B+ Tree which remains compact, regardless of the volume and number of dimensions. The implementation has entailed developing algorithms for mapping data to one dimension and, most importantly, developing algorithms to facilitate the querying of data in a flexible way. We focus on the Hilbert curve but also consider other curves and propose new alternative algorithms for querying data mapped to the Z-order curve.
The current implementation accommodates data in up to sixteen dimensions but the approach is generic and not limited to this number. We report on preliminary testing of the implemetation, which provides very encouraging results. We also undertake a brief exploration of the application of space-filling curves to the indexing of spatial data.
Link Analysis (LA) is a visual data analysis technique originally developed for analysing crime related data. The technique enables the user to gain a better understanding of the connections between objects of interest in the problem domain by displaying the connections in a form of network diagram referred to as a link chart. A link chart is often altered during its lifetime, as part of the exploratory nature of the knowledge discovery process, to reflect new information, and to increase the level of comp rehensibility.
To provide the necessary flexibility for accessing and manipulating large volumes of data the data collected in an investigation is often stored in a database. Permitting link charts to be constructed from this data is of great value, as LA is not adequately supported by the database query systems currently available. This is because the record-based data models they often use are inappropriate for modelling connections between objects, the query languages they typically provide can only retrieve connections between objects if the way connections may be derived are specified in the query, and their data visualisation facilities generally do not allow the results of a number of queries to be integrated and edited.
This thesis concerns the Exploratory Database View Constructor (EDVC), an experimental visual database interface for supporting LA. The results obtained from a user evaluation of EDVC indicate that the system may be used by individuals with no experience of interacting directly with a database management system (DBMS). This is a consequence of the style of interaction supported, which allows the data stored to be browsed without having to possess explicit knowledge of the database schema. Such knowledge is typically a prerequisite for using a database query language and can prevent productive use of such a language by an inexperienced DBMS user.