Google brought down my house of cards
For almost three years now we've been running ColdFusion (well, really JRun) session based replication and failover on all of our hosted ColdFusion environs. Things were great! And then, we let google in the door to index our content. Things went from "great", to "why is this always down??". I spent far too much time looking at JVM settings, JRun settings, etc., but never really came up with a concrete answer as to what was happening. Ultimately, we'd see a deluge of errors about the failure of session replication in the log files and a complete breaking of failover and sometimes even the JRun -> IIS connector.
Whilst searching for the session replication errors, I came across Sean Corfields postings across various blogs that replication just does not scale well, and that it remains a broken feature. Up until a few months ago, I'd argue this to be untrue. However, it appears that replication works... if you can manage to keep the amount of crap you throw into shared memory scopes down to a minimum.
It's been 4 days since I've disabled session replication, and my servers have remained standing for an equal amount of time. We've been getting indexed by google (and others), and things are sailing along with nary a blip. Next stop, creating a guide for users that want/need session based failover that can be accomplished in code. Otherwise known as, "Help me Sean Corfield, you're my only hope!" ;).