Slow distributed-ack Messages
In systems with distributed-ack regions, a sudden large number of distributed-no-ack operations can cause distributed-ack operations to take a long time to complete.
The distributed-no-ack
operations can come from anywhere. They may be updates to distributed-no-ack
regions or they may be other distributed-no-ack
operations, like destroys, performed on any region in the cache, including the distributed-ack
regions.
The main reasons why a large number of distributed-no-ack
messages may delay distributed-ack
operations are:
- For any single socket connection, all operations are executed serially. If there are any other operations buffered for transmission when a
distributed-ack
is sent, thedistributed-ack
operation must wait to get to the front of the line before being transmitted. Of course, the operation’s calling process is also left waiting. - The
distributed-no-ack
messages are buffered by their threads before transmission. If many messages are buffered and then sent to the socket at once, the line for transmission might be very long.
You can take these steps to reduce the impact of this problem:
- If you’re using TCP, check whether you have socket conservation enabled for your members. It is configured by setting the Geode property
conserve-sockets
to true. If enabled, each application’s threads will share sockets unless you override the setting at the thread level. Work with your application programmers to see whether you might disable sharing entirely or at least for the threads that performdistributed-ack
operations. These include operations ondistributed-ack
regions and alsonetSearches
performed on regions of any distributed scope. (Note:netSearch
is only performed on regions with a data-policy of empty, normal and preloaded.) If you give each thread that performsdistributed-ack
operations its own socket, you effectively let it scoot to the front of the line ahead of thedistributed-no-ack
operations that are being performed by other threads. The thread-level override is done by calling theDistributedSystem.setThreadsSocketPolicy(false)
method. - Reduce your buffer sizes to slow down the distributed-no-ack operations. These changes slow down the threads performing distributed-no-ack operations and allow the thread doing the distributed-ack operations to be sent in a more timely manner.
- If you’re using UDP (you either have multicast enabled regions or have set
disable-tcp
to true in gemfire.properties), consider reducing the byteAllowance of mcast-flow-control to something smaller than the default of 3.5 megabytes. - If you’re using TCP/IP, reduce the
socket-buffer-size
in gemfire.properties.
- If you’re using UDP (you either have multicast enabled regions or have set