
1995-present
Dr Keith Mannock; (past members to be listed)
General purpose search engines exhibit a number of problems, 1) they have a reasonably useful coverage but aren't specialised, which leads to 2) it is difficult to get high precision with them, and 3) query formulations are not clear to the naïve user. Now for a user of a portal these type of features need to be addressed; specifically the precision of the information that is being presented. In our work a portal is taken as being:
Most current portals offer a "my" option, personalised news coverage, etc. in an attempt to make the portal a sticky site. Portals typically command the highest pricing for banner ad placement. There are a couple of problems with creating and maintaining a portal, 1) building portals is a labour intensive process, and 2) they require a significant ongoing effort.
In this research we are building a prototype system which has the following objectives:
To this end we have developed an architecture which draws upon techniques from the following domains, Information Retrieval, Database Management, Machine Learning and Distributed Systems. The system is based upon a search engine architecture which has
We highlight three main areas where the novel features of our architecture can be highlighted:
DoSE has been used on a number of real world portals to determine the functionality and efficiency of the architecture, specifically it has been tested on:
The resulting study found that DoSE was twice as efficient as topic-focussed spider and three times more efficient than breadth-first search. DoSE extracts ten fields from spidered documents with 80% accuracy (including extraction of multimedia content.). That DoSE places the URI into a fifty-leaf hierarchy with 75% accuracy; which compares favourably with human levels of agreement.
Last updated: Sunday, November 17, 2002