[jira] [Commented] (DERBY-6955) All database threads stuck and not making any progress

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Commented] (DERBY-6955) All database threads stuck and not making any progress

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/DERBY-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121792#comment-16121792 ]

Brett Bergquist commented on DERBY-6955:
----------------------------------------

A quick look at the LogToFile.flush() method, it looks like there is two possible "wait" conditions.  The first is waiting if the database is "frozen" and the second is waiting if the log is being flushed.  There is comments that there may be multiple threads trying to flush the log, so that would explain a bit why there are multiple threads that have acquired the lock: they have acquired the lock and are waiting.

I do not see any thread that indicates that the database is frozen.   However, out backup process does freeze the database, take a filesystem snapshot, and unfreeze the database.   It is possible that this occurred at this point.

I had questioned a while back that if a connection is lost while the database is frozen, the database should be unfrozen automatically.   I have seen the database become frozen and no connections can be made to unfreeze the database.



> All database threads stuck and not making any progress
> ------------------------------------------------------
>
>                 Key: DERBY-6955
>                 URL: https://issues.apache.org/jira/browse/DERBY-6955
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.10.2.0
>         Environment: Solaris 10
> Very large database (about 750G) with about 100 tx/second insertion rate
> Network Server
>            Reporter: Brett Bergquist
>         Attachments: stacktrace1, stacktrace2
>
>
> A system in production got into a very strange state with none of the database engine threads making any progress.   Initially a connection could be made to the network server, but any query would hang.  The network server also hung on shutdown.
> The lack of progresss was observed by generating 6 stack traces, 10 seconds apart.  Each of these stack traces show the threads exactly as in the previous stack trace.  I will attach 2 of the stack traces which are 10 seconds apart.
> About half of the threads were in a call chain the looks like:
> {code:java}
> "DRDAConnThread_161" #187 prio=5 os_prio=64 tid=0x0000000101f1c800 nid=0xd6 waiting on condition [0xfffffffcad0fe000]
>    java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xfffffffefa4524c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
> at org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown Source)
> at org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
> at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown Source)
> at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown Source)
> at org.apache.derby.impl.store.access.RAMTransaction.findConglomerate(Unknown Source)
> at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown Source)
> at org.apache.derby.impl.store.access.RAMTransaction.openStoreCost(Unknown Source)
> at org.apache.derby.impl.sql.compile.CompilerContextImpl.getStoreCostController(Unknown Source)
> at org.apache.derby.impl.sql.compile.FromBaseTable.getStoreCostController(Unknown Source)
> at org.apache.derby.impl.sql.compile.FromBaseTable.estimateCost(Unknown Source)
> at org.apache.derby.impl.sql.compile.OptimizerImpl.estimateTotalCost(Unknown Source)
> at org.apache.derby.impl.sql.compile.OptimizerImpl.costBasedCostOptimizable(Unknown Source)
> at org.apache.derby.impl.sql.compile.OptimizerImpl.costOptimizable(Unknown Source)
> at org.apache.derby.impl.sql.compile.FromBaseTable.optimizeIt(Unknown Source)
> at org.apache.derby.impl.sql.compile.ProjectRestrictNode.optimizeIt(Unknown Source)
> at org.apache.derby.impl.sql.compile.OptimizerImpl.costPermutation(Unknown Source)
> at org.apache.derby.impl.sql.compile.SelectNode.optimize(Unknown Source)
> at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(Unknown Source)
> at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(Unknown Source)
> at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
> at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
> at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown Source)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(Unknown Source)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(Unknown Source)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(Unknown Source)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.<init>(Unknown Source)
> at org.apache.derby.jdbc.Driver40.newEmbedPreparedStatement(Unknown Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source)
> - locked <0xffffffff12e07950> (a org.apache.derby.impl.jdbc.EmbedConnection40)
> at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source)
> at org.apache.derby.impl.drda.DRDAStatement.prepare(Unknown Source)
> at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(Unknown Source)
> at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(Unknown Source)
> at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
> at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> {code}
> and the majority of the others look like:
> {code:java}
> "DRDAConnThread_118" #144 prio=5 os_prio=64 tid=0x0000000101072800 nid=0xab in Object.wait() [0xfffffffd342fe000]
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
> - locked <0xfffffffd605393c0> (a org.apache.derby.impl.store.raw.log.LogToFile)
> at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
> at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source)
> at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown Source)
> at org.apache.derby.impl.store.raw.xact.Xact.xa_commit(Unknown Source)
> at org.apache.derby.impl.store.access.RAMTransaction.xa_commit(Unknown Source)
> at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(Unknown Source)
> at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.xaCommit(Unknown Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.xa_commit(Unknown Source)
> - locked <0xfffffffef72c71d8> (a org.apache.derby.impl.jdbc.EmbedConnection40)
> at org.apache.derby.jdbc.XATransactionState.xa_commit(Unknown Source)
> - eliminated <0xfffffffefa7b2958> (a org.apache.derby.jdbc.XATransactionState)
> at org.apache.derby.jdbc.EmbedXAResource.commit(Unknown Source)
> - locked <0xfffffffefa7b2958> (a org.apache.derby.jdbc.XATransactionState)
> - locked <0xffffffff5343b4c0> (a org.apache.derby.jdbc.EmbedXAResource)
> at org.apache.derby.impl.drda.DRDAXAProtocol.commitXATransaction(Unknown Source)
> at org.apache.derby.impl.drda.DRDAXAProtocol.commitTransaction(Unknown Source)
> at org.apache.derby.impl.drda.DRDAXAProtocol.parseSYNCCTL(Unknown Source)
> at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
> at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> {code}
> An interesting thing with the second one is the lock on
> - locked <0xfffffffd605393c0> (a org.apache.derby.impl.store.raw.log.LogToFile)
> Multiple threads claim to have locked this object which I find concerning as one would assume that only one thread could have a lock at a time.   See the attached stack trace and you will see that there are many threads that seem to have locked this object and are in the same state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Loading...