Characteristics of Semi-Structured Data
structure is
irregular
: missing or additional attributes (labels)
parts of data
lack
structure, e.g., images
some may yield
little
structure, e.g., plain text
a-priori
schema vs
a-posteriori
dataguide
db: fix the schema, then populate the db
web: design pages, then design schema to facilitate access
schema is
large
schema is often
ignored
, e.g., information retrieval queries
schema is
rapidly evolving
Peter Wood
12 of 21