919 - 926 - 9847

Google brought down my house of cards

For almost three years now we've been running ColdFusion (well, really JRun) session based replication and failover on all of our hosted ColdFusion environs. Things were great! And then, we let google in the door to index our content. Things went from "great", to "why is this always down??". I spent far too much time looking at JVM settings, JRun settings, etc., but never really came up with a concrete answer as to what was happening. Ultimately, we'd see a deluge of errors about the failure of session replication in the log files and a complete breaking of failover and sometimes even the JRun -> IIS connector.

Whilst searching for the session replication errors, I came across Sean Corfields postings across various blogs that replication just does not scale well, and that it remains a broken feature. Up until a few months ago, I'd argue this to be untrue. However, it appears that replication works... if you can manage to keep the amount of crap you throw into shared memory scopes down to a minimum.

It's been 4 days since I've disabled session replication, and my servers have remained standing for an equal amount of time. We've been getting indexed by google (and others), and things are sailing along with nary a blip. Next stop, creating a guide for users that want/need session based failover that can be accomplished in code. Otherwise known as, "Help me Sean Corfield, you're my only hope!" ;).

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Sean Corfield's Gravatar For your readers... Matthew sent me an email and I explained in outline how Macromedia/Adobe deals with this. We never used JRun's session replication (because of this sort of problem - amongst others) so for applications that care about 100% session failover, we write-through certain session to the database and if an application finds itself without expected session data but with an encrypted cookie indicating a user is supposed to have a session, it retrieves the data from the DB.

The benefits include: only applications that care about session failover (a very small minority in real world use) need to implement anything special and only in the case of actual failover do you pay the penalty of restoring in-memory data from the DB.
# Posted By Sean Corfield | 7/4/09 2:14 AM
Matthew Williams's Gravatar Thank you sir, I appreciate both the return email and the comment on the blog. And you're right, I can't really see any but the most outer edge cases requiring failover. Since the servers are no longer crashing (stable from last Sunday to present - 7 days) the event of a node just going down is rather unlikely. We try to avoid downtime during the week anyway and reserve maintenance for the weekends. I'll still be doing a quick write up of how our devs should go about saving themselves from themselves, and share it here when it's completed.
# Posted By Matthew Williams | 7/6/09 1:06 PM
Nathan Mische's Gravatar @Sean - You mention "...we write-through certain session to the database and if an application finds itself without expected session data but with an encrypted cookie indicating a user is supposed to have a session, it retrieves the data from the DB."

In ColdFusion, couldn't this be done using client variables? I'm just wondering if the client scope has some caveat I'm not aware of that would warrant rolling my own session management instead of using database persisted client variables. Thanks.
# Posted By Nathan Mische | 7/6/09 11:51 PM
Sean Corfield's Gravatar @Nathan, client variables are always loaded on every request. Our approach only loaded the data if the session actually failed over (which was rare). We were careful to NOT use client variables due to the performance implications of such non-lazy behavior.
# Posted By Sean Corfield | 7/7/09 3:42 AM
Nathan Mische's Gravatar Thanks Sean, I forgot about the non-lazy-loading behavior of client variables.
# Posted By Nathan Mische | 7/7/09 12:37 PM