The data set for running the benchmark is publically available:
As part of Google Summer of Code 2017, I am volunteering to mentor
a Summer of Code intern who is interested in using these tools to
improve the Derby query optimizer.
My suggestion for the overall process is this:
1) Acquire the benchmark tools, and the data set
2) Run the benchmark.
2a) Some of the benchmark queries may reveal bugs in Derby.
For each such bug, we need to isolate the bug and fix it.
3) Once we are able to run the entire benchmark, we need to
analyze the results.
3a) Some of the benchmark queries may reveal opportunities
for Derby to improve the query plans that it chooses for
various classes of queries (this is explained in detail in the
VLDB paper and other information available at Dr. Leis's site)
For each such improvement, we need to isolate the issue,
report it as a separable improvement, and fix it (if we can)
While the benchmark is an interesting exercise in and of itself,
the overall goal of the project is to find-and-fix problems in the
Derby query optimizer, specifically in the 3 areas which are
the focus of the benchmark tool:
1) How good is the Derby cardinality estimator and when does
it lead to slow queries?
2) How good it the Derby cost model, and how well is it guiding
the overall query optimization process?
3) How large is the Derby enumerated plan space, and is it
While other Derby issues have been filed against these questions
in the past, the intent of this specific project is to use the concrete
tools provided by the VLDB paper to make this effort rigorous and
successful at making concrete improvements to the Derby query
This message was sent by Atlassian JIRA