Quantcast

[jira] [Updated] (DERBY-6921) How good is the Derby Query Optimizer, really

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Updated] (DERBY-6921) How good is the Derby Query Optimizer, really

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Pendleton updated DERBY-6921:
-----------------------------------
    Description:
At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
Technical University introduced a new benchmark suite for evaluating
database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf

The benchmark test suite is publically available:
http://db.in.tum.de/people/sites/leis/qo/job.tgz

The data set for running the benchmark is publically available:
ftp://ftp.fu-berlin.de/pub/misc/movies/database/

As part of Google Summer of Code 2017, I am volunteering to mentor
a Summer of Code intern who is interested in using these tools to
improve the Derby query optimizer.

My suggestion for the overall process is this:
1) Acquire the benchmark tools, and the data set
2) Run the benchmark.
2a) Some of the benchmark queries may reveal bugs in Derby.
     For each such bug, we need to isolate the bug and fix it.
3) Once we are able to run the entire benchmark, we need to
   analyze the results.
3a) Some of the benchmark queries may reveal opportunities
   for Derby to improve the query plans that it chooses for
   various classes of queries (this is explained in detail in the
   VLDB paper and other information available at Dr. Leis's site)
   For each such improvement, we need to isolate the issue,
   report it as a separable improvement, and fix it (if we can)

While the benchmark is an interesting exercise in and of itself,
the overall goal of the project is to find-and-fix problems in the
Derby query optimizer, specifically in the 3 areas which are
the focus of the benchmark tool:
1) How good is the Derby cardinality estimator and when does
   it lead to slow queries?
2) How good it the Derby cost model, and how well is it guiding
   the overall query optimization process?
3) How large is the Derby enumerated plan space, and is it
   appropriately-sized?

While other Derby issues have been filed against these questions
in the past, the intent of this specific project is to use the concrete
tools provided by the VLDB paper to make this effort rigorous and
successful at making concrete improvements to the Derby query
optimizer.

If you are interested in pursuing this project, please take these
considerations into mind:
1) This is NOT an introductory project. You must be quite familiar
   with DBMS systems, and with SQL, and in particular with
   cost-based query optimization. If terms such as "cardinality
   estimation", "correlated query predicates", or "bushy trees"
   aren't comfortable terms for you ,this probably isn't the
   project you're interested in.
2) If you are new to Derby, that is fine, but please take advantage
   of the extensive body of introductory material on Derby to
   become familiar with it: read the Derby Getting Started manual,
   download the software and follow the tutorials, read the documentation,
   download the source code and learn how to build and run the
   test suites, etc.
3) All I have presented here is an **outline** of the project. You will
   need to read the paper(s), study the benchmark queries, and
   propose a detailed plan for how to use this benchmark as a tool
   for improving the Derby query optimizer.

If these sorts of tasks sound like exciting things to do, then please
let us know!

  was:
At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
Technical University introduced a new benchmark suite for evaluating
database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf

The benchmark test suite is publically available:
http://db.in.tum.de/people/sites/leis/qo/job.tgz

The data set for running the benchmark is publically available:
ftp://ftp.fu-berlin.de/pub/misc/movies/database/

As part of Google Summer of Code 2017, I am volunteering to mentor
a Summer of Code intern who is interested in using these tools to
improve the Derby query optimizer.

My suggestion for the overall process is this:
1) Acquire the benchmark tools, and the data set
2) Run the benchmark.
2a) Some of the benchmark queries may reveal bugs in Derby.
     For each such bug, we need to isolate the bug and fix it.
3) Once we are able to run the entire benchmark, we need to
   analyze the results.
3a) Some of the benchmark queries may reveal opportunities
   for Derby to improve the query plans that it chooses for
   various classes of queries (this is explained in detail in the
   VLDB paper and other information available at Dr. Leis's site)
   For each such improvement, we need to isolate the issue,
   report it as a separable improvement, and fix it (if we can)

While the benchmark is an interesting exercise in and of itself,
the overall goal of the project is to find-and-fix problems in the
Derby query optimizer, specifically in the 3 areas which are
the focus of the benchmark tool:
1) How good is the Derby cardinality estimator and when does
   it lead to slow queries?
2) How good it the Derby cost model, and how well is it guiding
   the overall query optimization process?
3) How large is the Derby enumerated plan space, and is it
   appropriately-sized?

While other Derby issues have been filed against these questions
in the past, the intent of this specific project is to use the concrete
tools provided by the VLDB paper to make this effort rigorous and
successful at making concrete improvements to the Derby query
optimizer.


> How good is the Derby Query Optimizer, really
> ---------------------------------------------
>
>                 Key: DERBY-6921
>                 URL: https://issues.apache.org/jira/browse/DERBY-6921
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Bryan Pendleton
>            Priority: Minor
>              Labels: gsoc2017
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>      For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>    analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>    for Derby to improve the query plans that it chooses for
>    various classes of queries (this is explained in detail in the
>    VLDB paper and other information available at Dr. Leis's site)
>    For each such improvement, we need to isolate the issue,
>    report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>    it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>    the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>    appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>    with DBMS systems, and with SQL, and in particular with
>    cost-based query optimization. If terms such as "cardinality
>    estimation", "correlated query predicates", or "bushy trees"
>    aren't comfortable terms for you ,this probably isn't the
>    project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>    of the extensive body of introductory material on Derby to
>    become familiar with it: read the Derby Getting Started manual,
>    download the software and follow the tutorials, read the documentation,
>    download the source code and learn how to build and run the
>    test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>    need to read the paper(s), study the benchmark queries, and
>    propose a detailed plan for how to use this benchmark as a tool
>    for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Loading...