Characteristics of Semi-Structured Data

 previous contents next

  • structure is irregular: missing or additional attributes (labels)
  • parts of data lack structure, e.g., images
  • some may yield little structure, e.g., plain text
  • a-priori schema vs a-posteriori dataguide
    • db: fix the schema, then populate the db
    • web: design pages, then design schema to facilitate access
  • schema is large
  • schema is often ignored, e.g., information retrieval queries
  • schema is rapidly evolving

Peter Wood

12 of 21