[ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Pendleton updated DERBY-6921:
Labels: database gsoc2017 java optimizer (was: gsoc2017)
> How good is the Derby Query Optimizer, really
> Key: DERBY-6921
> URL: https://issues.apache.org/jira/browse/DERBY-6921
> Project: Derby
> Issue Type: Improvement
> Components: SQL
> Reporter: Bryan Pendleton
> Priority: Minor
> Labels: database, gsoc2017, java, optimizer
> Original Estimate: 2,016h
> Remaining Estimate: 2,016h
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> The data set for running the benchmark is publically available:
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
> For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
> analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
> for Derby to improve the query plans that it chooses for
> various classes of queries (this is explained in detail in the
> VLDB paper and other information available at Dr. Leis's site)
> For each such improvement, we need to isolate the issue,
> report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
> it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
> the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
> with DBMS systems, and with SQL, and in particular with
> cost-based query optimization. If terms such as "cardinality
> estimation", "correlated query predicates", or "bushy trees"
> aren't comfortable terms for you ,this probably isn't the
> project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
> of the extensive body of introductory material on Derby to
> become familiar with it: read the Derby Getting Started manual,
> download the software and follow the tutorials, read the documentation,
> download the source code and learn how to build and run the
> test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
> need to read the paper(s), study the benchmark queries, and
> propose a detailed plan for how to use this benchmark as a tool
> for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!
This message was sent by Atlassian JIRA
|Free forum by Nabble||Edit this page|