At $$dayjob we use wildfly and infinispan as our clustered caching solution. We run in standalone-ha mode, and deploy inside a kubernetes cluster.
We ran into troubles using the invalidation cache in a clustered environment where the acknowledge responses were either lost or delayed for over 15 seconds, so the transaction was rolled back. But this delay caused other concurrent transactions to be delayed as well, and those were rolled back as well, causing a cascade and bringing the whole wildfly cluster down.
As such we are looking into the various configuration options for infinispan. One option is decreasing the remote-timeout to a much shorter value. This would block the thread much shorter, resulting in a quicker rollback and therefore less contention in a highly used application. However, in preliminary testing the 1s timeout is triggered even in a 2 node cluster running on a single laptop, so this doesn’t seem like a good way to go about to preserve our cluster’s availability. Does anybody know the reason for the 15s remote-timeout default? Was it picked because it is reasonable, or an upper bound, or is lowering the remote-timeout a sensible thing to consider?
As in my opinion the door closes on lowering the remote-timeout, I also noticed that the timeout only applies when you run in a SYNC mode for cache invalidation (clustered caches). So my next experiment was to use ASYNC mode, as that seems to give us a possible solution to our cluster going the way of the dodo. Since we use versioning on all our entities, and web application realities already need to take lazy locking into account, using an asynchronous invalidation might be our way out when synchronous invalidation causes problems.
But I am not able to configure wildfly’s hibernate infinispan caches to use the mode=“ASYNC”, as the wildfly supplied infinispan configuration XSD doesn’t include the mode attribute for clustered-cache XML-nodes. The official infinispan configuration XSD does allow for this (since a long time, e.g. the infinispan 9 xsd allows the mode attribute, 10 and 11 too).
I see that pferraro has deprecated the mode=ASYNC option in wildfly 12, in this commit: https://github.com/wildfly/wildfly/commit/d778c2b2dd557cf6c2e6649a6bd4ae568e41abc7#diff-743127fb0ceae5222d9d9105cee80e10, however, it doesn’t state why this is deprecated. Is running mode=async not supported? Not tested? Not adviced?