Schema-Driven Generation of Synthetic Graphs and Queries with gMark
- Speaker: Prof. George Fletcher, Eindhoven University of Technology
- Date: Wednesday, 1 March 2017 from 16:00 to 17:00
- Location: Room 151
Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this talk, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.
This is joint work with colleagues at CNRS Lyon (France), INRIA Lille (France), Université Clermont Auvergne (France), and TU Eindhoven (Netherlands).
George Fletcher (PhD, Indiana University Bloomington) is an associate professor of computer science at Eindhoven University of Technology. His research interests span query language design and engineering, foundations of databases, and data integration. His current focus is on management of massive graphs such as social networks and linked open data. He is a member of the LDBC Graph Query Language Standardization Task Force.