Web Dynamics 2001

International Workshop on Web Dynamics

(In conjunction with the 8th International Conference on Database Theory)

London, UK, 3 January 2001

Abstracts of the Invited Talks

Soumen Chakrabarti, Indian Institute of Technology, Bombay

After crawling and keyword indexing, the next wave that has made a significant impact on Web search is topic distillation: analyzing properties of the hyperlink graph for enhanced ranking of Web pages in response to a query. Hyperlink induced topic search (HITS) and PageRank (used in Google) are two examples. The linear algebra involved in HITS and PageRank is standard, but selecting the relevant subgraph of the Web to which these algorithms should be applied is considerably less clear. PageRank was intended for the entire Web graph (or as much as a crawler can collect) whereas HITS used keyword match followed by a distance-one graph expansion to determine the relevant subgraph.

The clean graph model used in HITS and PageRank, where pages are nodes with no finer characteristics other than a few scalar popularity scores, is also in question. Pages have valuable markup structure and accompanying text. Moreover, the `hubs' or resource lists that make HITS so successful are often `mixed', meaning only specific regions in those pages are relevant to the query.

In this talk we will discuss two enhancements to the graph selection process. First we will describe a learning system called a ``focused crawler'' which discovers and collects large relevant graphs useful for enhanced topic distillation, starting with a few relevant examples and without crawling the Web at large. Second we will discuss a fine-grained model for `micro-hubs' and new algorithms based on the Minimum Description Length principle which let us identify regions in mixed hubs which are relevant to a query, which enhances both topic distillation as well as information extraction.

We will justify, using analyses and anecdotes, that as the Web evolves from static files to dynamically generated semi-structured content, these more complex models and algorithms will become crucial to the continued success of automatic resource discovery, extraction, and annotation.

Biography:

Soumen Chakrabarti received his B.Tech from I.I.T. in 1991 and his M.S. and Ph.D. in Computer Science from U.C. Berkeley in 1992 and 1996. At Berkeley he worked on compilers and runtime systems for scalable parallel scientific software. He was a Research Staff Member at IBM Almaden Research Center between 1996 and 1999. He is currently an Assistant Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology, Bombay. His current research interests include hypertext information retrieval, web analysis and data mining. At IBM he was a founder of the Clever search engine project. His work on Focused Crawling got the Best Paper award at the 8th International World Wide Web Conference. He has been on the program committees of KDD-1998, KDD-1999, WWW-2000, WWW-2001, and SIGIR-2001.


Knut Magne Risvik, R&D Director Search Technology, Fast Search & Transfer ASA

Search Engines experience the effect of web dynamics all the time. Crawling and link management for a search engine is struggling as the web is getting more dynamic. We will present some studies of the web dynamics based upon our crawling and link management, and we will also present some ideas to make the search engines more useful in a dynamic web.

Biography:

Mr. Knut Magne Risvik joined FAST right after the inception, and serves as our Director of Search Technology. Risvik holds a M.Sc. degree from the Norwegian University of Science and Technology (NTNU). He directs the R&D activities on search technology in FAST, and has been a key architect behind the FAST Search technology. Mr. Risvik holds two patents, with three more patents pending. Risvik's main fields of interest are search technology, parallel architectures and scalable computing. Mr. Risvik is also pursuing a PhD related to Search Technology while holding his position with FAST.

Luca Cardelli, Microsoft Research, Cambridge
Logics for Mobility

The ambient calculus is a process calculus based on mobility, where processes reside at the nodes of a dynamic hierarchy of locations. It becomes natural to discuss properties that hold at particular locations, and to discuss the dynamic evolution of the hierarchy of locations. We use a logic as a way of formalizing such descriptions.

We describe a modal logic for the ambient calculus, whose main novelty is the introduction of spatial connectives (in addition to standard and temporal connectives). Our logic can be used for specifying mobility protocols, for expressing mobility policies, and as a playground for model checking of mobile computation, with potential applications to bytecode verification of mobile code. Mobility properties of varying degrees of difficultly can be established and checked by typechecking, by model checking, or by proof search (as in proof-carrying code). In our latest development, we have extended our logic to describe systems including hidden and secret locations.

Biography:

Luca Cardelli was born in Montecatini Terme, Italy, studied at the University of Pisa (until 1979), and has a Ph.D. in computer science from the University of Edinburgh (1982). He worked at Bell Labs, Murray Hill, from 1982 to 1985, and at Digital Equipment Corporation, Systems Research Center in Palo Alto, from 1985 to 1997, before assuming his current position at Microsoft Research Ltd, in Cambridge UK.

His main interests are in type theory and operational semantics, mostly for applications to language design, semantics, and implementation. He implemented the first compiler for ML and one of the earliest direct-manipulation user-interface editors. He was a member of the Modula-3 design committee, and has designed a few experimental languages, of which the latest is Obliq: a distributed higher-order scripting language. His more protracted research activity has been in establishing the semantic and type-theoretic foundations of object-oriented languages, resulting in the recent book "A Theory of Objects" with Martin Abadi. Currently he is working on global and mobile computation issues.