[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942035#comment-15942035 ]

Harshvardhan Gupta commented on DERBY-6921:

Hi Derby community,

I have submitted a draft proposal for this project through google. Please review it and help me further refine it. Please comment if you are not able to access it through google.

Harshvardhan Gupta

> How good is the Derby Query Optimizer, really
> ---------------------------------------------
>                 Key: DERBY-6921
>                 URL: https://issues.apache.org/jira/browse/DERBY-6921
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Bryan Pendleton
>            Priority: Minor
>              Labels: database, gsoc2017, java, optimizer
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>      For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>    analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>    for Derby to improve the query plans that it chooses for
>    various classes of queries (this is explained in detail in the
>    VLDB paper and other information available at Dr. Leis's site)
>    For each such improvement, we need to isolate the issue,
>    report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>    it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>    the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>    appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>    with DBMS systems, and with SQL, and in particular with
>    cost-based query optimization. If terms such as "cardinality
>    estimation", "correlated query predicates", or "bushy trees"
>    aren't comfortable terms for you ,this probably isn't the
>    project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>    of the extensive body of introductory material on Derby to
>    become familiar with it: read the Derby Getting Started manual,
>    download the software and follow the tutorials, read the documentation,
>    download the source code and learn how to build and run the
>    test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>    need to read the paper(s), study the benchmark queries, and
>    propose a detailed plan for how to use this benchmark as a tool
>    for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!

This message was sent by Atlassian JIRA