SwiftMQ doesn't handle disk-full errors cleanly
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Fergus_Gallagher
Posted On:   Thursday, July 4, 2002 03:51 AM

[SwiftMQ 3.2.0 Production, Solaris, Sun Java 1.4] This disks on one of our boxes filled up yesterday (oops!). We found that SwiftMQ didn't react in a very application friendly manner. In the swiftmq logs there was a message saying that it was shutting down, but in fact it didn't. Instead we got lots of application errors like: javax.naming.NamingException: unable to connect, exception = javax.jms.JMSException: error creating socket connection to localhost:4001, message: errno: 149, error: Operation already in progress for fd: 7 at com.swiftmq.jndi.ContextImpl. (ContextImpl.java:48) at com.swiftmq.jndi.InitialContextFactoryImpl.getInitialContext(InitialContextFa   More>>

[SwiftMQ 3.2.0 Production, Solaris, Sun Java 1.4]


This disks on one of our boxes filled up yesterday (oops!). We found that SwiftMQ didn't react in a very application friendly manner.


In the swiftmq logs there was a message saying that it was shutting down, but in fact it didn't. Instead we got lots of application errors like:

			
javax.naming.NamingException: unable to connect, exception
= javax.jms.JMSException: error creating socket connection to localhost:4001, message: errno: 149, error: Operation
already in progress for fd: 7
at com.swiftmq.jndi.ContextImpl. (ContextImpl.java:48)
at com.swiftmq.jndi.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:18)
at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:662)
at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:243)
at javax.naming.InitialContext.init(InitialContext.java:219)
at javax.naming.InitialContext. (InitialContext.java:195)


even after the disks were cleaned up. We got exactly the same error (including "fd: 7") even though each error was from a seperate JVM.

Some connections didn't get this error but would hang indefinitely.


The only solution was to restart SwiftMQ (kill -KILL was needed)


I would like to suggest that SwiftMQ should shutdown cleanly if it can't recover properly. Shutting down would certainly be acceptable for us as having the disks fill up is undeniably our problem!


P.S. FYI, despite all our messages being non-persistent, SwiftMQ "recovery" took about 5 minutes.

   <<Less

Re: SwiftMQ doesn't handle disk-full errors cleanly

Posted By:   Andreas_Mueller  
Posted On:   Thursday, July 4, 2002 07:26 AM

Sorry that your disk filled up ;-)


SwiftMQ gets panic if there's an IOException during writing the transaction log or flushing cache pages. In that case, a System.exit(-1) is used and a message is written into the error log. The latter isn't a good idea if the disk is full. There are also messages written to the log during shutdown, triggered from the shutdown hook.


For the recovery. If you see "RecoverManager, blabla" during startup and it displays the percentage "10% so far", and this takes 5 minutes, then your transaction log had a large size and your messages are definitely persistent (please check this). What is your transaction log size?
About | Sitemap | Contact