Derby Scheduler and FETCH FIRST question

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Derby Scheduler and FETCH FIRST question

Hohl, Gerrit

Hello everyone,

 

I'm using Apache Derby v10.14.1.0 and having some problems using the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

I'm accessing the database using the Derby Embedded driver.

 

I have a table which contains some indexes as well as some fields and a BLOB field. The table is somewhat big (means many rows, ~13 GB).

I'm using a query like this (timestamp has an index):

 

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

 

The query takes ages (about 27 minutes for that ~13 GB table) and I can see how Derby slowly fills up my harddisk.

And a look in the "tmp" folder of the database shows several ".tmp" files.

First I get several files having 10 MB, then I get two big files having 5 GB, then the 10 MB files are deleted, then the 5 GB files are deleted and finally I get the result.

As I thought something is wrong with my application I also did the same query on the same database and table using SQuirreL v3.8.1. But the result is the same.

 

I would have expected that the scheduler of Derby would first look at the timestamp column / index (which should be sorted), taking the first 10 values from there and

finally reading the first 10 rows matching these values.

Instead it seems that it first processes the " SELECT * FROM history" part (as memory is not sufficient it swaps it to the harddisk), orders it and takes the first 10 elements.

 

Is that correct?

And if that is correct, where is the benefit of FETCH FIRST - beside that maybe not that much data is transferred (maybe only interesting if you use Derby not by the Embedded Driver because of the TCP/IP connection)?

 

 

Regards,

Gerrit

 

Reply | Threaded
Open this post in threaded view
|

AW: Derby Scheduler and FETCH FIRST question

Hohl, Gerrit

Hello everyone,

 

tried something different. I thought I could fool the scheduler.

First maybe I should add that the "timestamp" column is a VARCHAR.

But it shouldn't make much difference as it has an index, right?

 

SELECT t2.timestamp FROM history t2 ORDER BY t2.timestamp DESC FETCH FIRST 10 ROWS ONLY

That took 3 seconds and returned just the timestamps. Okay, maybe - also the column has an index - there are really many rows so it takes some time.

But to be honest: I thought it would run in a blink of an eye.

 

Now I tried to inject that in the original query as sub-select, but I only select the timestamp as field for the result:

SELECT t1.timestamp FROM history t1 WHERE (t1.timestamp IN (

SELECT t2.timestamp FROM history t2 ORDER BY t2.timestamp DESC FETCH FIRST 10 ROWS ONLY

))

That took 11 seconds - more than tripled, also - as I already mentioned - that column has an index.

And this time I even filter for it.

 

If I query for the whole row instead of just one column it even gets worse:

SELECT t1.* FROM history t1 WHERE (t1.timestamp IN (

SELECT t2.timestamp FROM history t2 ORDER BY t2.timestamp DESC FETCH FIRST 10 ROWS ONLY

))

It takes 3 minutes. Maybe one would say that it is maybe the fault of the size of the Blob field. So it takes some time to transfer the selected rows. But the maximum size of the Blob field is 7 KB, the minimum 64 Bytes. So for 10 rows you have 70KB - which is almost nothing.

 

Would be great if someone could shed some light on this matter.

 

Regards,

Gerrit

 

Von: Hohl, Gerrit
Gesendet: Mittwoch, 24. Januar 2018 13:46
An: 'Derby Dis
cussion'
Betreff: Derby Scheduler and FETCH FIRST question

 

Hello everyone,

 

I'm using Apache Derby v10.14.1.0 and having some problems using the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

I'm accessing the database using the Derby Embedded driver.

 

I have a table which contains some indexes as well as some fields and a BLOB field. The table is somewhat big (means many rows, ~13 GB).

I'm using a query like this (timestamp has an index):

 

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

 

The query takes ages (about 27 minutes for that ~13 GB table) and I can see how Derby slowly fills up my harddisk.

And a look in the "tmp" folder of the database shows several ".tmp" files.

First I get several files having 10 MB, then I get two big files having 5 GB, then the 10 MB files are deleted, then the 5 GB files are deleted and finally I get the result.

As I thought something is wrong with my application I also did the same query on the same database and table using SQuirreL v3.8.1. But the result is the same.

 

I would have expected that the scheduler of Derby would first look at the timestamp column / index (which should be sorted), taking the first 10 values from there and

finally reading the first 10 rows matching these values.

Instead it seems that it first processes the " SELECT * FROM history" part (as memory is not sufficient it swaps it to the harddisk), orders it and takes the first 10 elements.

 

Is that correct?

And if that is correct, where is the benefit of FETCH FIRST - beside that maybe not that much data is transferred (maybe only interesting if you use Derby not by the Embedded Driver because of the TCP/IP connection)?

 

 

Regards,

Gerrit

 

Reply | Threaded
Open this post in threaded view
|

Re: Derby Scheduler and FETCH FIRST question

Rick Hillegas-3
In reply to this post by Hohl, Gerrit
On 1/24/18 4:45 AM, [hidden email] wrote:

Hello everyone,

 

I'm using Apache Derby v10.14.1.0 and having some problems using the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

I'm accessing the database using the Derby Embedded driver.

 

I have a table which contains some indexes as well as some fields and a BLOB field. The table is somewhat big (means many rows, ~13 GB).

I'm using a query like this (timestamp has an index):

 

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

 

The query takes ages (about 27 minutes for that ~13 GB table) and I can see how Derby slowly fills up my harddisk.

And a look in the "tmp" folder of the database shows several ".tmp" files.

First I get several files having 10 MB, then I get two big files having 5 GB, then the 10 MB files are deleted, then the 5 GB files are deleted and finally I get the result.

As I thought something is wrong with my application I also did the same query on the same database and table using SQuirreL v3.8.1. But the result is the same.

 

I would have expected that the scheduler of Derby would first look at the timestamp column / index (which should be sorted), taking the first 10 values from there and

finally reading the first 10 rows matching these values.

Instead it seems that it first processes the " SELECT * FROM history" part (as memory is not sufficient it swaps it to the harddisk), orders it and takes the first 10 elements.

 

Is that correct?

And if that is correct, where is the benefit of FETCH FIRST - beside that maybe not that much data is transferred (maybe only interesting if you use Derby not by the Embedded Driver because of the TCP/IP connection)?

 

 

Regards,

Gerrit

 

Hi Gerrit,

Can you share table and index DDL for this problem as well as the query plan which Derby chose for the query? See the section on "Working with RunTimeStatistics" in the Derby Tuning Guide: http://db.apache.org/derby/docs/10.14/tuning/index.html

It may be that Derby did not choose the index. That in turn, may have happened for 2 reasons:

1) You're selecting all of the columns in the table and there is no filtering WHERE clause. That reduces the likelihood that Derby will pick an indexed access path since the optimizer sees this as a full table scan.

2) I don't think that any optimizer support was built for the FETCH FIRST clause. That's worth filing a performance bug for. I think that the FETCH FIRST clause is only applied at execution time in order to short-circuit the number of rows which are returned.

Thanks,

-Rick

Reply | Threaded
Open this post in threaded view
|

AW: Derby Scheduler and FETCH FIRST question

Hohl, Gerrit

Hello Rick,

 

and thanks for your reply.

I will try what you've written as soon as I have time for it.

Unfortunately I already switch to another project and I don't know when I will get the time to have a look on this again (blame my superiors ;-) ).

 

2) is surely a good idea.

 

1) Did you also read my 2nd mail?

I also tried using a subselect, so I have a WHERE clause. I had the same idea as you that the scheduler might not recognize the ORDER BY and FETCH FIRST.

It was faster, but still not what I would have expected. I've worked a lot with Borland Interbase / Firebird, MySQL and especially with PostgreSQL.

And PostgreSQL would have done a lot faster than this.

By the way: PostgreSQL also has a more easy to use approach in aspect of analysis: https://www.postgresql.org/docs/9.6/static/using-explain.html

Would be great of Derby would offer something similar.

 

 

Regards,

Gerrit

 

Von: Rick Hillegas [mailto:[hidden email]]
Gesendet: Freitag, 26. Januar 2018 00:39
An: [hidden email]
Betreff: Re: Derby Scheduler and FETCH FIRST question

 

On 1/24/18 4:45 AM, [hidden email] wrote:

Hello everyone,

 

I'm using Apache Derby v10.14.1.0 and having some problems using the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

I'm accessing the database using the Derby Embedded driver.

 

I have a table which contains some indexes as well as some fields and a BLOB field. The table is somewhat big (means many rows, ~13 GB).

I'm using a query like this (timestamp has an index):

 

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

 

The query takes ages (about 27 minutes for that ~13 GB table) and I can see how Derby slowly fills up my harddisk.

And a look in the "tmp" folder of the database shows several ".tmp" files.

First I get several files having 10 MB, then I get two big files having 5 GB, then the 10 MB files are deleted, then the 5 GB files are deleted and finally I get the result.

As I thought something is wrong with my application I also did the same query on the same database and table using SQuirreL v3.8.1. But the result is the same.

 

I would have expected that the scheduler of Derby would first look at the timestamp column / index (which should be sorted), taking the first 10 values from there and

finally reading the first 10 rows matching these values.

Instead it seems that it first processes the " SELECT * FROM history" part (as memory is not sufficient it swaps it to the harddisk), orders it and takes the first 10 elements.

 

Is that correct?

And if that is correct, where is the benefit of FETCH FIRST - beside that maybe not that much data is transferred (maybe only interesting if you use Derby not by the Embedded Driver because of the TCP/IP connection)?

 

 

Regards,

Gerrit

 

Hi Gerrit,

Can you share table and index DDL for this problem as well as the query plan which Derby chose for the query? See the section on "Working with RunTimeStatistics" in the Derby Tuning Guide: http://db.apache.org/derby/docs/10.14/tuning/index.html

It may be that Derby did not choose the index. That in turn, may have happened for 2 reasons:

1) You're selecting all of the columns in the table and there is no filtering WHERE clause. That reduces the likelihood that Derby will pick an indexed access path since the optimizer sees this as a full table scan.

2) I don't think that any optimizer support was built for the FETCH FIRST clause. That's worth filing a performance bug for. I think that the FETCH FIRST clause is only applied at execution time in order to short-circuit the number of rows which are returned.

Thanks,

-Rick

Reply | Threaded
Open this post in threaded view
|

AW: Derby Scheduler and FETCH FIRST question

Hohl, Gerrit
In reply to this post by Rick Hillegas-3

Hello Rick,

 

me again. As we had a blackout of our Internet connection I had some unexpected time. ;-)

So I tried the analysis of Derby:

 

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1)

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS()

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(0)

 

The result of the "VALUES" command was:

 

> 1                                                  

> ----------------------------------------------------

> Statement Name: \n null\nStatement Text: \n            SELECT 

 

Not quite what I would have expected. But then I read on a different page of the documentation that I have to declare where the information of the analysis should be stored.

So I did ('myapp' is the schema of my app):

 

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1)

CALL SYSCS_UTIL.SYSCS_SET_XPLAIN_SCHEMA('myapp')

 

Executing that command I get:

> ExampleExceptionFormatter: exception message was: 'SYSCS_UTIL.SYSCS_SET_XPLAIN_SCHEMA' is not recognized as a function or procedure.

 

I have a Derby 10.14 and took that command from the documentation of that version. So I'm not sure why it gives me this error message.

 

 

Regards,

Gerrit

 

Von: Hohl, Gerrit
Gesendet: Freitag, 26. Januar 2018 09:11
An: 'Derby Discussion'
Betreff: AW: Derby Scheduler and FETCH FIRST question

 

Hello Rick,

 

and thanks for your reply.

I will try what you've written as soon as I have time for it.

Unfortunately I already switch to another project and I don't know when I will get the time to have a look on this again (blame my superiors ;-) ).

 

2) is surely a good idea.

 

1) Did you also read my 2nd mail?

I also tried using a subselect, so I have a WHERE clause. I had the same idea as you that the scheduler might not recognize the ORDER BY and FETCH FIRST.

It was faster, but still not what I would have expected. I've worked a lot with Borland Interbase / Firebird, MySQL and especially with PostgreSQL.

And PostgreSQL would have done a lot faster than this.

By the way: PostgreSQL also has a more easy to use approach in aspect of analysis: https://www.postgresql.org/docs/9.6/static/using-explain.html

Would be great of Derby would offer something similar.

 

 

Regards,

Gerrit

 

Von: Rick Hillegas [[hidden email]]
Gesendet: Freitag, 26. Januar 2018 00:39
An: [hidden email]
Betreff: Re: Derby Scheduler and FETCH FIRST question

 

On 1/24/18 4:45 AM, [hidden email] wrote:

Hello everyone,

 

I'm using Apache Derby v10.14.1.0 and having some problems using the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

I'm accessing the database using the Derby Embedded driver.

 

I have a table which contains some indexes as well as some fields and a BLOB field. The table is somewhat big (means many rows, ~13 GB).

I'm using a query like this (timestamp has an index):

 

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

 

The query takes ages (about 27 minutes for that ~13 GB table) and I can see how Derby slowly fills up my harddisk.

And a look in the "tmp" folder of the database shows several ".tmp" files.

First I get several files having 10 MB, then I get two big files having 5 GB, then the 10 MB files are deleted, then the 5 GB files are deleted and finally I get the result.

As I thought something is wrong with my application I also did the same query on the same database and table using SQuirreL v3.8.1. But the result is the same.

 

I would have expected that the scheduler of Derby would first look at the timestamp column / index (which should be sorted), taking the first 10 values from there and

finally reading the first 10 rows matching these values.

Instead it seems that it first processes the " SELECT * FROM history" part (as memory is not sufficient it swaps it to the harddisk), orders it and takes the first 10 elements.

 

Is that correct?

And if that is correct, where is the benefit of FETCH FIRST - beside that maybe not that much data is transferred (maybe only interesting if you use Derby not by the Embedded Driver because of the TCP/IP connection)?

 

 

Regards,

Gerrit

 

Hi Gerrit,

Can you share table and index DDL for this problem as well as the query plan which Derby chose for the query? See the section on "Working with RunTimeStatistics" in the Derby Tuning Guide: http://db.apache.org/derby/docs/10.14/tuning/index.html

It may be that Derby did not choose the index. That in turn, may have happened for 2 reasons:

1) You're selecting all of the columns in the table and there is no filtering WHERE clause. That reduces the likelihood that Derby will pick an indexed access path since the optimizer sees this as a full table scan.

2) I don't think that any optimizer support was built for the FETCH FIRST clause. That's worth filing a performance bug for. I think that the FETCH FIRST clause is only applied at execution time in order to short-circuit the number of rows which are returned.

Thanks,

-Rick

Reply | Threaded
Open this post in threaded view
|

Re: AW: Derby Scheduler and FETCH FIRST question

Rick Hillegas-3
Hi Gerrit,

Some responses inline...

On 1/26/18 12:50 AM, [hidden email] wrote:

Hello Rick,

 

me again. As we had a blackout of our Internet connection I had some unexpected time. ;-)

So I tried the analysis of Derby:

 

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1)

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS()

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(0)

 

The result of the "VALUES" command was:

 

> 1                                                  

> ----------------------------------------------------

> Statement Name: \n null\nStatement Text: \n            SELECT 

This is indeed not what you'd expect. The output has been truncated by the default display width for the ij tool. You need to set MAXIMUMDISPLAYWIDTH to something very large. Here's an example of how to do this:

ij version 10.15
ij> connect 'jdbc:derby:memory:db;create=true';
ij> create table foo( a int );
0 rows inserted/updated/deleted
ij> MAXIMUMDISPLAYWIDTH 7000;
ij> CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
0 rows inserted/updated/deleted
ij> CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1);
0 rows inserted/updated/deleted
ij> select count(*)  from foo where a <> 0;
1         
-----------
0         

1 row selected
ij> values SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Statement Name:
    null
Statement Text:
    select count(*)  from foo where a <> 0
Parse Time: 25
Bind Time: 18
Optimize Time: 21
Generate Time: 15
Compile Time: 79
Execute Time: 1
Begin Compilation Timestamp : 2018-01-27 13:27:10.819
End Compilation Timestamp : 2018-01-27 13:27:10.898
Begin Execution Timestamp : 2018-01-27 13:27:10.963
End Execution Timestamp : 2018-01-27 13:27:11.003
Statement Execution Plan Text:
Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 1
Rows filtered = 0
restriction = false
projection = true
    constructor time (milliseconds) = 0
    open time (milliseconds) = 1
    next time (milliseconds) = 0
    close time (milliseconds) = 0
    restriction time (milliseconds) = 0
    projection time (milliseconds) = 0
    optimizer estimated row count: 1.00
    optimizer estimated cost: 100.40
Source result set:
    Scalar Aggregate ResultSet:
    Number of opens = 1
    Rows input = 0
        constructor time (milliseconds) = 55
        open time (milliseconds) = 1
        next time (milliseconds) = 0
        close time (milliseconds) = 25
        optimizer estimated row count: 1.98
        optimizer estimated cost: 100.40
    Index Key Optimization = false
    Source result set:
        Project-Restrict ResultSet (3):
        Number of opens = 1
        Rows seen = 0
        Rows filtered = 0
        restriction = false
        projection = true
            constructor time (milliseconds) = 0
            open time (milliseconds) = 1
            next time (milliseconds) = 0
            close time (milliseconds) = 25
            restriction time (milliseconds) = 0
            projection time (milliseconds) = 0
            optimizer estimated row count: 1.98
            optimizer estimated cost: 100.40
        Source result set:
            Table Scan ResultSet for FOO at read committed isolation level using instantaneous share row locking chosen by the optimizer
            Number of opens = 1
            Rows seen = 0
            Rows filtered = 0
            Fetch Size = 16
                constructor time (milliseconds) = 0
                open time (milliseconds) = 59
                next time (milliseconds) = 0
                close time (milliseconds) = 25

            scan information:
                Bit set of columns fetched=All
                Number of columns fetched=1
                Number of pages visited=1
                Number of rows qualified=0
                Number of rows visited=0
                Scan type=heap
                start position:
                    null
                stop position:
                    null
                qualifiers:
                    Column[0][0] Id: 0
                    Operator: =
                    Ordered nulls: false
                    Unknown return value: true
                    Negate comparison result: true
                optimizer estimated row count: 1.98
                optimizer estimated cost: 100.40

 

Not quite what I would have expected. But then I read on a different page of the documentation that I have to declare where the information of the analysis should be stored.

So I did ('myapp' is the schema of my app):

 

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1)

CALL SYSCS_UTIL.SYSCS_SET_XPLAIN_SCHEMA('myapp')

 

Executing that command I get:

> ExampleExceptionFormatter: exception message was: 'SYSCS_UTIL.SYSCS_SET_XPLAIN_SCHEMA' is not recognized as a function or procedure.

 

I have a Derby 10.14 and took that command from the documentation of that version. So I'm not sure why it gives me this error message.

 

 

Regards,

Gerrit

 

Von: Hohl, Gerrit
Gesendet: Freitag, 26. Januar 2018 09:11
An: 'Derby Discussion'
Betreff: AW: Derby Scheduler and FETCH FIRST question

 

Hello Rick,

 

and thanks for your reply.

I will try what you've written as soon as I have time for it.

Unfortunately I already switch to another project and I don't know when I will get the time to have a look on this again (blame my superiors ;-) ).

 

2) is surely a good idea.

 

1) Did you also read my 2nd mail?

I also tried using a subselect, so I have a WHERE clause. I had the same idea as you that the scheduler might not recognize the ORDER BY and FETCH FIRST.

It was faster, but still not what I would have expected. I've worked a lot with Borland Interbase / Firebird, MySQL and especially with PostgreSQL.

And PostgreSQL would have done a lot faster than this.

By the way: PostgreSQL also has a more easy to use approach in aspect of analysis: https://www.postgresql.org/docs/9.6/static/using-explain.html

Would be great of Derby would offer something similar.

 

 

Regards,

Gerrit

 

Von: Rick Hillegas [[hidden email]]
Gesendet: Freitag, 26. Januar 2018 00:39
An: [hidden email]
Betreff: Re: Derby Scheduler and FETCH FIRST question

 

On 1/24/18 4:45 AM, [hidden email] wrote:

Hello everyone,

 

I'm using Apache Derby v10.14.1.0 and having some problems using the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

I'm accessing the database using the Derby Embedded driver.

 

I have a table which contains some indexes as well as some fields and a BLOB field. The table is somewhat big (means many rows, ~13 GB).

I'm using a query like this (timestamp has an index):

 

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

 

The query takes ages (about 27 minutes for that ~13 GB table) and I can see how Derby slowly fills up my harddisk.

And a look in the "tmp" folder of the database shows several ".tmp" files.

First I get several files having 10 MB, then I get two big files having 5 GB, then the 10 MB files are deleted, then the 5 GB files are deleted and finally I get the result.

As I thought something is wrong with my application I also did the same query on the same database and table using SQuirreL v3.8.1. But the result is the same.

 

I would have expected that the scheduler of Derby would first look at the timestamp column / index (which should be sorted), taking the first 10 values from there and

finally reading the first 10 rows matching these values.

Instead it seems that it first processes the " SELECT * FROM history" part (as memory is not sufficient it swaps it to the harddisk), orders it and takes the first 10 elements.

 

Is that correct?

And if that is correct, where is the benefit of FETCH FIRST - beside that maybe not that much data is transferred (maybe only interesting if you use Derby not by the Embedded Driver because of the TCP/IP connection)?

 

 

Regards,

Gerrit

 

Hi Gerrit,

Can you share table and index DDL for this problem as well as the query plan which Derby chose for the query? See the section on "Working with RunTimeStatistics" in the Derby Tuning Guide: http://db.apache.org/derby/docs/10.14/tuning/index.html

It may be that Derby did not choose the index. That in turn, may have happened for 2 reasons:

1) You're selecting all of the columns in the table and there is no filtering WHERE clause. That reduces the likelihood that Derby will pick an indexed access path since the optimizer sees this as a full table scan.

2) I don't think that any optimizer support was built for the FETCH FIRST clause. That's worth filing a performance bug for. I think that the FETCH FIRST clause is only applied at execution time in order to short-circuit the number of rows which are returned.

Thanks,

-Rick


Reply | Threaded
Open this post in threaded view
|

Re: AW: Derby Scheduler and FETCH FIRST question

Rick Hillegas-3
In reply to this post by Hohl, Gerrit
On 1/26/18 12:11 AM, [hidden email] wrote:

Hello Rick,

 

and thanks for your reply.

I will try what you've written as soon as I have time for it.

Unfortunately I already switch to another project and I don't know when I will get the time to have a look on this again (blame my superiors ;-) ).

 

2) is surely a good idea.

 

1) Did you also read my 2nd mail?

I just responded to that 2nd message. Can you share the CREATE TABLE and CREATE INDEX statements associated with this table? They may provide some further clues.

Thanks,
-Rick

I also tried using a subselect, so I have a WHERE clause. I had the same idea as you that the scheduler might not recognize the ORDER BY and FETCH FIRST.

It was faster, but still not what I would have expected. I've worked a lot with Borland Interbase / Firebird, MySQL and especially with PostgreSQL.

And PostgreSQL would have done a lot faster than this.

By the way: PostgreSQL also has a more easy to use approach in aspect of analysis: https://www.postgresql.org/docs/9.6/static/using-explain.html

Would be great of Derby would offer something similar.

 

 

Regards,

Gerrit

 

Von: Rick Hillegas [[hidden email]]
Gesendet: Freitag, 26. Januar 2018 00:39
An: [hidden email]
Betreff: Re: Derby Scheduler and FETCH FIRST question

 

On 1/24/18 4:45 AM, [hidden email] wrote:

Hello everyone,

 

I'm using Apache Derby v10.14.1.0 and having some problems using the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

I'm accessing the database using the Derby Embedded driver.

 

I have a table which contains some indexes as well as some fields and a BLOB field. The table is somewhat big (means many rows, ~13 GB).

I'm using a query like this (timestamp has an index):

 

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

 

The query takes ages (about 27 minutes for that ~13 GB table) and I can see how Derby slowly fills up my harddisk.

And a look in the "tmp" folder of the database shows several ".tmp" files.

First I get several files having 10 MB, then I get two big files having 5 GB, then the 10 MB files are deleted, then the 5 GB files are deleted and finally I get the result.

As I thought something is wrong with my application I also did the same query on the same database and table using SQuirreL v3.8.1. But the result is the same.

 

I would have expected that the scheduler of Derby would first look at the timestamp column / index (which should be sorted), taking the first 10 values from there and

finally reading the first 10 rows matching these values.

Instead it seems that it first processes the " SELECT * FROM history" part (as memory is not sufficient it swaps it to the harddisk), orders it and takes the first 10 elements.

 

Is that correct?

And if that is correct, where is the benefit of FETCH FIRST - beside that maybe not that much data is transferred (maybe only interesting if you use Derby not by the Embedded Driver because of the TCP/IP connection)?

 

 

Regards,

Gerrit

 

Hi Gerrit,

Can you share table and index DDL for this problem as well as the query plan which Derby chose for the query? See the section on "Working with RunTimeStatistics" in the Derby Tuning Guide: http://db.apache.org/derby/docs/10.14/tuning/index.html

It may be that Derby did not choose the index. That in turn, may have happened for 2 reasons:

1) You're selecting all of the columns in the table and there is no filtering WHERE clause. That reduces the likelihood that Derby will pick an indexed access path since the optimizer sees this as a full table scan.

2) I don't think that any optimizer support was built for the FETCH FIRST clause. That's worth filing a performance bug for. I think that the FETCH FIRST clause is only applied at execution time in order to short-circuit the number of rows which are returned.

Thanks,

-Rick