dcsimg
Getting around EAGAIN
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   j_n
Posted On:   Thursday, October 28, 2004 05:38 PM

We're working with a large codebase to perform some data conversion activities and have struggled with an intermittent problem with EAGAINs appearing. We've seen different behavior on Solaris (Sun JVM 1.4.2_05) and Linux (JRockit 1.4.2_04-b05). Overall, the former of these two instances misbehaves far more frequently than the latter. We can reproduce (but not necessarily isolate) the former's behavior (happens all the time). With the latter, the condition only appears after hours of work (more difficult to reproduce). We'd love to get a better sense of what can cause EAGAINs (in a practical sense). Some facts: 1. We have three multithreaded applications running - one ("send   More>>

We're working with a large codebase to perform some data conversion activities and have struggled with an intermittent problem with EAGAINs appearing. We've seen different behavior on Solaris (Sun JVM 1.4.2_05) and Linux (JRockit 1.4.2_04-b05).


Overall, the former of these two instances misbehaves far more frequently than the latter. We can reproduce (but not necessarily isolate) the former's behavior (happens all the time). With the latter, the condition only appears after hours of work (more difficult to reproduce).


We'd love to get a better sense of what can cause EAGAINs (in a practical sense).


Some facts:


1. We have three multithreaded applications running - one ("sender") which sends messages to a topic , a second application ("switch") which reads from the topic (durable subscriber) and sends to a queue, and a third application ("processor") that reads from the queue (and processes the message in a meaningful way).


2. The messages are large (~20k in size).


3. Typically, the processor is the slowest link in the chain. Flow control is enabled and flow control signals make their way through the chain.


4. When we attach a trivial "processor", flow control doesn't need to become active (and the problem disappears).


5. We've been allocating 512 MB of RAM (never actually uses that much), and many many threads to each of the processes that are running. CPU utilization is high, usually 80% or so with multiple processors on both boxes.


Any thoughts, considerations, or suggestions?


-jonathan

   <<Less

Re: Getting around EAGAIN

Posted By:   Andreas_Mueller  
Posted On:   Friday, October 29, 2004 02:17 AM

Are you using the Network NIO Swiftlet? It was an issue in the past but has been solved long time ago (see here "Using OP_WRITE Events").


Rather I think it has to do with the network buffers. You have quite large messages (20 KB). SwiftMQ uses prefetching at the consumer side. It is configurable via attribute "smqp-consumer-cache-size" of the connection factory you use. Default is 500 messages. So if you have a slow consumer and its queue gets filled up, the router tries to send up to 500 messages in one go (1 batch). This requires 500*20=~10 MB output buffer at the router. The router has 128 KB by default. During output it extends the buffer by 64 KB by default and then again when the buffer is full and so on. So to create a large buffer you will have a high cpu utilization. However, this is done only once per consumer connection. But if you disconnect and reconnect, the buffer has to created again.


Solution is:


  • Either: Decrease the smqp-consumer-cache-size of your consumer's connection factory to e.g. 5 or 10

  • Or: Use a dedicated JMS listener for that consumer and use larger network buffers (attributes "router-output-buffer-size", "router-output-extend-size", see here)

About | Sitemap | Contact