Avoid locking on DELETE

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoid locking on DELETE

Peter
Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter

Reply | Threaded
Open this post in threaded view
|

Re: Avoid locking on DELETE

Peter Ondruška-4
Peter, try this if it makes a difference:

1. Select entries to be deleted, note their primary keys.
2. Issue delete using keys to be deleted (1.) and use short transaction batches.

On Sun, 6 Oct 2019, 01:33 Peter, <[hidden email]> wrote:
Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter

Reply | Threaded
Open this post in threaded view
|

Re: Avoid locking on DELETE

Peter
Hi Peter,

Thanks a lot for the suggestion.This would be nice if it performs better.

Is the idea to split one request into smaller parts or will "Select+Delete IDs" just perform better?

And regarding the latter option - is this possible in one SQL request? So something like
DELETE FROM mytable WHERE id IN 
( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY )

And then loop through the results via changing OFFSET and ROWS? (Btw: the column created_at is indexed)

Or would you recommend doing this as 2 separate statements in Java/JDBC? Or via maybe even just issuing the original DELETE request more frequent?

Regards
Peter

On 06.10.19 03:50, Peter Ondruška wrote:
Peter, try this if it makes a difference:

1. Select entries to be deleted, note their primary keys.
2. Issue delete using keys to be deleted (1.) and use short transaction batches.

On Sun, 6 Oct 2019, 01:33 Peter, <[hidden email]> wrote:
Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter


Reply | Threaded
Open this post in threaded view
|

Re: Avoid locking on DELETE

Peter Ondruška-4
In my case I have two separate steps. First SELECT primary keys of those records to be deleted (in your case SELECT id FROM mytable WHERE created_at < some_fixed_millis). And then I issue DELETE for those primary keys in batches of N statements (N being configuration parameter). You could create stored procedure for this with two parameters (some_fixed_millis, batch_size).

Your idea DELETE WHERE SELECT and limiting rows needs to be run for every DELETE step making unnecessary read I/O.


On Mon, 7 Oct 2019 at 14:10, Peter <[hidden email]> wrote:
Hi Peter,

Thanks a lot for the suggestion.This would be nice if it performs better.

Is the idea to split one request into smaller parts or will "Select+Delete IDs" just perform better?

And regarding the latter option - is this possible in one SQL request? So something like
DELETE FROM mytable WHERE id IN 
( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY )

And then loop through the results via changing OFFSET and ROWS? (Btw: the column created_at is indexed)

Or would you recommend doing this as 2 separate statements in Java/JDBC? Or via maybe even just issuing the original DELETE request more frequent?

Regards
Peter

On 06.10.19 03:50, Peter Ondruška wrote:
Peter, try this if it makes a difference:

1. Select entries to be deleted, note their primary keys.
2. Issue delete using keys to be deleted (1.) and use short transaction batches.

On Sun, 6 Oct 2019, 01:33 Peter, <[hidden email]> wrote:
Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter


Reply | Threaded
Open this post in threaded view
|

Re: Avoid locking on DELETE

Peter
Hi Peter,

Thanks! I have implemented this and indeed the maximum delays are lower but the time for a delete batch to complete takes now longer (roughly 3-4 times; for batchSize=500, total deleted items around ~10000). The problem is likely that I have VARCHAR for the ID column.

If I increase the frequency of issuing the original DELETE statement:

DELETE FROM mytable WHERE created_at < ?

Won't it have a similar effect due to smaller batches?

Regards
Peter

On 07.10.19 16:31, Peter Ondruška wrote:
In my case I have two separate steps. First SELECT primary keys of those records to be deleted (in your case SELECT id FROM mytable WHERE created_at < some_fixed_millis). And then I issue DELETE for those primary keys in batches of N statements (N being configuration parameter). You could create stored procedure for this with two parameters (some_fixed_millis, batch_size).
Your idea DELETE WHERE SELECT and limiting rows needs to be run for every DELETE step making unnecessary read I/O.


On Mon, 7 Oct 2019 at 14:10, Peter <[hidden email]> wrote:
Hi Peter,

Thanks a lot for the suggestion.This would be nice if it performs better.

Is the idea to split one request into smaller parts or will "Select+Delete IDs" just perform better?

And regarding the latter option - is this possible in one SQL request? So something like
DELETE FROM mytable WHERE id IN 
( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY )

And then loop through the results via changing OFFSET and ROWS? (Btw: the column created_at is indexed)

Or would you recommend doing this as 2 separate statements in Java/JDBC? Or via maybe even just issuing the original DELETE request more frequent?

Regards
Peter

On 06.10.19 03:50, Peter Ondruška wrote:
Peter, try this if it makes a difference:

1. Select entries to be deleted, note their primary keys.
2. Issue delete using keys to be deleted (1.) and use short transaction batches.

On Sun, 6 Oct 2019, 01:33 Peter, <[hidden email]> wrote:
Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter



Reply | Threaded
Open this post in threaded view
|

Re: Avoid locking on DELETE

Peter Ondruška-4
You would need to test various scenarios. First I would propose larger batch size (N thousands of rows). Are you sure you execute deletes in batches? You should have autocommit off, execute N times delete statement, commit, repeat. Pseudo code (I am on mobile phone):

1. Acquire connection
2. Set connection autocommit to false
3. Create prepared statement with delete, DELETE FROM WHERE primary key = ?
4. Create set of primary keys to be deleted
5. Iterate set (4.) with adding those keys to delete statement (3.) as batch
6. When you reach batch size or end of key set execute batch and commit, continue (5.)

In my case with slow disk this really performs better and should avoid your issue as well.

Peter

On Mon, 7 Oct 2019, 22:11 Peter, <[hidden email]> wrote:
Hi Peter,

Thanks! I have implemented this and indeed the maximum delays are lower but the time for a delete batch to complete takes now longer (roughly 3-4 times; for batchSize=500, total deleted items around ~10000). The problem is likely that I have VARCHAR for the ID column.

If I increase the frequency of issuing the original DELETE statement:

DELETE FROM mytable WHERE created_at < ?

Won't it have a similar effect due to smaller batches?

Regards
Peter

On 07.10.19 16:31, Peter Ondruška wrote:
In my case I have two separate steps. First SELECT primary keys of those records to be deleted (in your case SELECT id FROM mytable WHERE created_at < some_fixed_millis). And then I issue DELETE for those primary keys in batches of N statements (N being configuration parameter). You could create stored procedure for this with two parameters (some_fixed_millis, batch_size).
Your idea DELETE WHERE SELECT and limiting rows needs to be run for every DELETE step making unnecessary read I/O.


On Mon, 7 Oct 2019 at 14:10, Peter <[hidden email]> wrote:
Hi Peter,

Thanks a lot for the suggestion.This would be nice if it performs better.

Is the idea to split one request into smaller parts or will "Select+Delete IDs" just perform better?

And regarding the latter option - is this possible in one SQL request? So something like
DELETE FROM mytable WHERE id IN 
( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY )

And then loop through the results via changing OFFSET and ROWS? (Btw: the column created_at is indexed)

Or would you recommend doing this as 2 separate statements in Java/JDBC? Or via maybe even just issuing the original DELETE request more frequent?

Regards
Peter

On 06.10.19 03:50, Peter Ondruška wrote:
Peter, try this if it makes a difference:

1. Select entries to be deleted, note their primary keys.
2. Issue delete using keys to be deleted (1.) and use short transaction batches.

On Sun, 6 Oct 2019, 01:33 Peter, <[hidden email]> wrote:
Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter



Reply | Threaded
Open this post in threaded view
|

Re: Avoid locking on DELETE

Peter
Hi Peter,

Thanks.

This procedure with disabling the autocommit is indeed simpler than I had before via "DELETE FROM mytable WHERE id IN (...)" but the delete itself takes longer (5-6 times) and I do not see differences with different batch sizes.

I've also benchmarked this process against postgresql and derby seems to be much slower here. Will investigate more as migrating is also a risk.

Regards
Peter

On 16.10.19 14:03, Peter Ondruška wrote:
You would need to test various scenarios. First I would propose larger batch size (N thousands of rows). Are you sure you execute deletes in batches? You should have autocommit off, execute N times delete statement, commit, repeat. Pseudo code (I am on mobile phone):

1. Acquire connection
2. Set connection autocommit to false
3. Create prepared statement with delete, DELETE FROM WHERE primary key = ?
4. Create set of primary keys to be deleted
5. Iterate set (4.) with adding those keys to delete statement (3.) as batch
6. When you reach batch size or end of key set execute batch and commit, continue (5.)

In my case with slow disk this really performs better and should avoid your issue as well.

Peter

On Mon, 7 Oct 2019, 22:11 Peter, <[hidden email]> wrote:
Hi Peter,

Thanks! I have implemented this and indeed the maximum delays are lower but the time for a delete batch to complete takes now longer (roughly 3-4 times; for batchSize=500, total deleted items around ~10000). The problem is likely that I have VARCHAR for the ID column.

If I increase the frequency of issuing the original DELETE statement:

DELETE FROM mytable WHERE created_at < ?

Won't it have a similar effect due to smaller batches?

Regards
Peter

On 07.10.19 16:31, Peter Ondruška wrote:
In my case I have two separate steps. First SELECT primary keys of those records to be deleted (in your case SELECT id FROM mytable WHERE created_at < some_fixed_millis). And then I issue DELETE for those primary keys in batches of N statements (N being configuration parameter). You could create stored procedure for this with two parameters (some_fixed_millis, batch_size).
Your idea DELETE WHERE SELECT and limiting rows needs to be run for every DELETE step making unnecessary read I/O.


On Mon, 7 Oct 2019 at 14:10, Peter <[hidden email]> wrote:
Hi Peter,

Thanks a lot for the suggestion.This would be nice if it performs better.

Is the idea to split one request into smaller parts or will "Select+Delete IDs" just perform better?

And regarding the latter option - is this possible in one SQL request? So something like
DELETE FROM mytable WHERE id IN 
( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY )

And then loop through the results via changing OFFSET and ROWS? (Btw: the column created_at is indexed)

Or would you recommend doing this as 2 separate statements in Java/JDBC? Or via maybe even just issuing the original DELETE request more frequent?

Regards
Peter

On 06.10.19 03:50, Peter Ondruška wrote:
Peter, try this if it makes a difference:

1. Select entries to be deleted, note their primary keys.
2. Issue delete using keys to be deleted (1.) and use short transaction batches.

On Sun, 6 Oct 2019, 01:33 Peter, <[hidden email]> wrote:
Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter




Reply | Threaded
Open this post in threaded view
|

RE: Avoid locking on DELETE

Jerry Lampi

Peter,

We have the same situation.  We solved the DB locking issue by doing many smaller deletes instead of one large one. 

Like you, we have a date column to base our delete off of.

Let’s say you want to delete all rows whose created_at is < yesterday.  What you can do is take the date representing yesterday and add an hour or two to it.  Now, starting at the new date, do a delete.  You will delete fewer (or no rows), but then you do another delete after adding one hour; maybe you only add 10 minutes – whatever, but the point is to delete fewer rows each time to prevent locking.  Of course this will result in a longer overall “purge” duration, but at least in our case, we didn’t care – the goal is to not LOCK out everyone else while a delete is being run. 

The finest granularity we required was 15 minutes.  It sounds like you need more granularity.

Here’s sample code.  This is meant to get a feel for what to do.  It is working code, but custom to our sue case.  I imagine the main thing you would tweak is method getTimeToAddToIntervalBasedOnRowsdDeleted’s return value.  Ignore convertTimeFromSystem390ToMilliseconds(); hopefully you just have a Date object and can work with milliseconds directly.

/**

      *

       * @param stckTimestamp - Purge rows older than this date

      * @param tableName - mytable

      * @param timestampName - TIMESTAMP or SESSTERMTIME

      */

      private static long doIncrementalTimePurge(long stckTimestamp,

                                                 String tableName,

                                                 String timestampName) {

            long totalRowsDeleted = 0;

            try {

                  Class.forName(driver);

//                System.out.println("STCK before (purge everything older than this date): " + stckTimestamp + " = " + DateUtil.convertTimeFromSystem390ToIsoString(stckTimestamp));

                  long stckTimestampMs = DateUtil.convertTimeFromSystem390ToMilliseconds(stckTimestamp);

                  int daysToGoBack = 5;

                  stckTimestampMs-=((((daysToGoBack*24)+1)*60)*60)*1000; // Subtract daysToGoBack days plus 1 hour ((((49*60)*60)*1000)=176,400,000) in milliseconds to target "older than" date

                  long startingTimestamp = DateUtil.convertTimeFromMillisecondsToSystem390(stckTimestampMs);

//                System.out.println("STCK after (begin at this date & increment upward): " + startingTimestamp + " = " + DateUtil.convertTimeFromSystem390ToIsoString(startingTimestamp));

                  Connection deleteConnection = DriverManager.getConnection(url, user, psw);

                  Statement deleteStatement = deleteConnection.createStatement();

//                int cnt = 0;

                  String sqlDeletePrefix = "delete from " + schema + "." + tableName + " where " + timestampName + " < ";

                  int prevIterationRowsDeleted = 1; // non-zero so as not to report two consecutive zero row deletion intervals

                  boolean finalTimeStampProcessed = false;

                  while(startingTimestamp <= stckTimestamp) {

                        String sqlDelete = sqlDeletePrefix + startingTimestamp;

                        int rowsDeletedThisIteration = deleteStatement.executeUpdate(sqlDelete);

                        totalRowsDeleted+=rowsDeletedThisIteration;

//                System.out.println("rowsDeletedThisIteration="+nbrFormatter.format(rowsDeletedThisIteration));

                        if(finalTimeStampProcessed == true) {

                              break; // we're done

                        }

//                      System.out.println("STCK before ("+cnt+"): " + startingTimestamp + " = " + DateUtil.convertTimeFromSystem390ToIsoString(startingTimestamp));

                        long timeToAdd = getTimeToAddToIntervalBasedOnRowsdDeleted(rowsDeletedThisIteration, prevIterationRowsDeleted);

                        stckTimestampMs+=timeToAdd;

                        startingTimestamp = DateUtil.convertTimeFromMillisecondsToSystem390(stckTimestampMs);

                        prevIterationRowsDeleted = rowsDeletedThisIteration;

                        if(startingTimestamp > stckTimestamp) {

//                            System.out.println("Set final purge Interval");

                              startingTimestamp = stckTimestamp; // final purge to honor actual timestamp

                              finalTimeStampProcessed = true;

                        }

//                      System.out.println("STCK after  ("+cnt+"): " + startingTimestamp + " = " + DateUtil.convertTimeFromSystem390ToIsoString(startingTimestamp));

//                      cnt++;

                  }

                  deleteStatement.close();

                  deleteConnection.close();

//                System.out.println("totalRowsDeleted="+nbrFormatter.format(totalRowsDeleted));

            } catch (SQLException sqle) {

                  sqle.printStackTrace();

                  while(sqle != null) {

                        String logMessage = "\n SQL Error: " + sqle.getMessage() + "\n\t\t"

                                                      + "Error code: " + sqle.getErrorCode() + "\n\t\t"

                                                      + "SQLState: "   + sqle.getSQLState() + "\n";

                        System.out.println(getLogPrefix() + logMessage);

                        sqle = sqle.getNextException();

                  }

            } catch (Exception e) {

                  System.out.println("Exception in doIncrementalTimePurge(): " + e);

                  e.printStackTrace();

            }

            return totalRowsDeleted;

      }

 

      private static long getTimeToAddToIntervalBasedOnRowsdDeleted(int rowsDeletedThisIteration,

                                                                    int prevIterationRowsDeleted) {

            long oneMinuteMs = 60000; // (60*1000)

            long returnDurationMs = oneMinuteMs*60; // 1 hour default

            if(rowsDeletedThisIteration > 2000) {

                  returnDurationMs = oneMinuteMs*15; // 15 minutes

            } else if(rowsDeletedThisIteration > 1500) {

                  returnDurationMs = oneMinuteMs*30; // 30 minutes

            } else if(rowsDeletedThisIteration > 1000) {

                  returnDurationMs = oneMinuteMs*45; // 45 minutes

            } else if(rowsDeletedThisIteration > 0) {

                  if(rowsDeletedThisIteration > 500) {

                        returnDurationMs = oneMinuteMs*60; // 1 hour

                  } else if(rowsDeletedThisIteration > 250) {

                        returnDurationMs = ((oneMinuteMs*60)*1)+(oneMinuteMs*15); // 1 hour, 15 minutes

                  } else if(rowsDeletedThisIteration > 125) {

                        returnDurationMs = ((oneMinuteMs*60)*1)+(oneMinuteMs*30); // 1 hour, 30 minutes

                  } else { // fewer than 125 records deleted, so increase interval

                        returnDurationMs = ((oneMinuteMs*60)*1)+(oneMinuteMs*45); // 1 hour, 45 minutes

                  }

            } else { // zero rows deleted...

                  if(prevIterationRowsDeleted <= 0) { // if we got two consecutive 0 row deletions...

                        returnDurationMs = (oneMinuteMs*60)*3; // 3 hours

                  } else {

                        returnDurationMs = (oneMinuteMs*60)*2; // 2 hours

                  }

            }

            return returnDurationMs;

      }

Jerry

 

From: Peter <[hidden email]>
Sent: Wednesday, October 16, 2019 10:26 AM
To: [hidden email]
Subject: Re: Avoid locking on DELETE

 

Hi Peter,

 

Thanks.

 

This procedure with disabling the autocommit is indeed simpler than I had before via "DELETE FROM mytable WHERE id IN (...)" but the delete itself takes longer (5-6 times) and I do not see differences with different batch sizes.

 

I've also benchmarked this process against postgresql and derby seems to be much slower here. Will investigate more as migrating is also a risk.

 

Regards
Peter

 

On 16.10.19 14:03, Peter Ondruška wrote:

You would need to test various scenarios. First I would propose larger batch size (N thousands of rows). Are you sure you execute deletes in batches? You should have autocommit off, execute N times delete statement, commit, repeat. Pseudo code (I am on mobile phone):

 

1. Acquire connection

2. Set connection autocommit to false

3. Create prepared statement with delete, DELETE FROM WHERE primary key = ?

4. Create set of primary keys to be deleted

5. Iterate set (4.) with adding those keys to delete statement (3.) as batch

6. When you reach batch size or end of key set execute batch and commit, continue (5.)

 

In my case with slow disk this really performs better and should avoid your issue as well.

 

Peter

 

On Mon, 7 Oct 2019, 22:11 Peter, <[hidden email]> wrote:

Hi Peter,

 

Thanks! I have implemented this and indeed the maximum delays are lower but the time for a delete batch to complete takes now longer (roughly 3-4 times; for batchSize=500, total deleted items around ~10000). The problem is likely that I have VARCHAR for the ID column.

 

If I increase the frequency of issuing the original DELETE statement:

 

DELETE FROM mytable WHERE created_at < ?

 

Won't it have a similar effect due to smaller batches?

 

Regards

Peter

 

On 07.10.19 16:31, Peter Ondruška wrote:

In my case I have two separate steps. First SELECT primary keys of those records to be deleted (in your case SELECT id FROM mytable WHERE created_at < some_fixed_millis). And then I issue DELETE for those primary keys in batches of N statements (N being configuration parameter). You could create stored procedure for this with two parameters (some_fixed_millis, batch_size).

Your idea DELETE WHERE SELECT and limiting rows needs to be run for every DELETE step making unnecessary read I/O.

 

 

On Mon, 7 Oct 2019 at 14:10, Peter <[hidden email]> wrote:

Hi Peter,

 

Thanks a lot for the suggestion.This would be nice if it performs better.

 

Is the idea to split one request into smaller parts or will "Select+Delete IDs" just perform better?

 

And regarding the latter option - is this possible in one SQL request? So something like

DELETE FROM mytable WHERE id IN 
( SELECT id FROM mytable WHERE created_at < some_fixed_millis OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY )

 

And then loop through the results via changing OFFSET and ROWS? (Btw: the column created_at is indexed)

 

Or would you recommend doing this as 2 separate statements in Java/JDBC? Or via maybe even just issuing the original DELETE request more frequent?

 

Regards

Peter

 

On 06.10.19 03:50, Peter Ondruška wrote:

Peter, try this if it makes a difference:

 

1. Select entries to be deleted, note their primary keys.

2. Issue delete using keys to be deleted (1.) and use short transaction batches.

 

On Sun, 6 Oct 2019, 01:33 Peter, <[hidden email]> wrote:

Hi,

I have a table "mytable" with columns "id", "created_at" and "json"
(VARCHAR, BIGINT, LONG VARCHAR), where data is coming in like new 200k
entries every hour and I would like to keep only entries of the last 1
or 2 hours. It is expected behaviour for the user if too old entries
gets lost as it is some kind of a LRU cache.

The current solution is to delete entries older than 4 hours every 30
minutes:

DELETE FROM mytable WHERE created_at < ?

I'm using this in a prepared statement where ? is "4 hours ago" in
milliseconds (new DateTime().getMillis()).

This works, but some (not all) INSERT statement get a bigger delay in
the same order (2-5 seconds) that this DELETE takes, which is ugly.
These INSERT statements are executed independently (using different
threads) of the DELETE.

Is there a better way? Can I somehow avoid locking the unrelated INSERT
operations?

What helps a bit is when I make those deletes more frequently than the
delays will get smaller, but then the number of those delayed requests
will increase.

What also helps a bit (currently have not seen a negative impact) is
increasing the page size for the Derby Network Server:
-Dderby.storage.pageSize=32768

Regards
Peter