[jira] [Comment Edited] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Comment Edited] (DERBY-6938) Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/DERBY-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097260#comment-16097260 ]

Harshvardhan Gupta edited comment on DERBY-6938 at 7/22/17 11:41 AM:
---------------------------------------------------------------------

Hi Bryan,

The Optimizer produces memory and cost estimates for all the possible access paths that it can build. Derby will only consider those HASH Joins to go through which it _predicts_ will fit into memory and rejects all others (including those which will be very efficient for some queries as opposed to loop based joins).

However, the memory predictions of Derby may not match memory requirement at execution and HASH table may become larger than what Derby predicted. DERBY-106 was an effort to mitigate the OOM exceptions arising out of this behaviour. So although HASH tables do spill to disk but only for those access path go forward to execution where Derby predicts that the HASH Tables will not spill to disk.

The above observations of mine match with the comment on DERBY-1259 which was written a decade ago.

Coming to the resolution of this problem, we need to investigate how to measure the cost of execution when Derby will eventually allow those HASH based access paths that it predicts will spill to disk instead of simply ignoring them.

Also, interesting is the observation that Derby currently allows hash based joins when users specify their join strategies via Query Hints.



was (Author: harshvardhan145):
Hi Bryan,

The Optimizer produces memory and cost estimates for all the possible access paths that it can build. Derby will only consider those HASH Joins to go through which it _predicts _ will fit into memory and rejects all others (including those which will be very efficient for some queries as opposed to loop based joins).

However, the memory predictions of Derby may not match memory requirement at execution and HASH table may become larger than what Derby predicted. DERBY-106 was an effort to mitigate the OOM exceptions arising out of this behaviour. So although HASH tables do spill to disk but only for those access path go forward to execution where Derby predicts that the HASH Tables will not spill to disk.

The above observations of mine match with the comment on DERBY-1259 which was written a decade ago.

Coming to the resolution of this problem, we need to investigate how to measure the cost of execution when Derby will eventually allow those HASH based access paths that it predicts will spill to disk instead of simply ignoring them.

Also, interesting is the observation that Derby currently allows hash based joins when users specify their join strategies via Query Hints.


>  Obtain cardinality estimates and true estimates for base tables as well as for intermediate results for queries involving multiple joins.
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-6938
>                 URL: https://issues.apache.org/jira/browse/DERBY-6938
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>         Attachments: explain.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Loading...