- idea predates XML but not HTML
- data is available electronically in
- database systems
- file systems, e.g., bibliographic data, Web data
- data exchange formats, e.g., EDI, scientific data
- attempt to reconcile database and document "worlds"
- semi-structured data
- organised in semantic entities
- similar entities are grouped together
- entities in same group may not have same attributes
- order of attributes not necessarily important
- not all attributes may be required
- size of same attributes in a group may differ
- type of same attributes in a group may differ