Exceptions – 4

In my last post on exceptions I covered “Boneheaded Exceptions” and why they should not be caught (and what to do about them instead). Next-up is another category that should hardly ever be caught… except in a very specific fashion.

System Failures” / “Fatal Exceptions” (also: the system is down)
These are exceptions that originate in the implementation of the execution environment. Some can get thrown by specific (types of) IL instructions, such as “TypeLoadException” or “OutOfMemoryException“. Others can get thrown at literally any instruction, such as “ExecutionEngineException“.

The two key observations about these exceptions is that they cannot be prevented (because they originate from the low-level execution of your code itself), and there are virtually no circumstances where your application code can do anything to resolve the indicated problem (something went wrong that is by definition out of the control of your code). They can happen at any time and there is no way to fix them; it should be obvious why they should not normally be caught.

If like me, you find yourself trying to construct a scenario where you might want to catch one of these, ask the following questions. If a type fails to load cleanly indicating a broken deployment, can you trust any further remedial action to even work? If you run out of memory, what kind of logic could you write that does not itself need to allocate memory? Worst of all; if the execution engine failed in some unspecified way, can you even rely upon correct execution of any further instructions?

Even if there are specific corner-cases where anything can be done at all, how much value would it add over just letting the application terminate from its illegal state and construct some external mechanism to restart it into a valid state instead?

So, what to do?
If the foregone conclusion is that these cannot be handled in any way, then all that is left is ensuring the application dies as gracefully as possible.

First and foremost, use the “try {} finally {}” pattern wherever possible. There may be cases where the “finally” will fail in part or whole due to the nature of the system failure, but it maximizes the chances that files flush the last useful fragments, transactions get cleanly aborted, and shared system resources are restored to a safer state.

Very few “System Errors” / “Fatal Exceptions” get caught explicitly in a handler. This is precisely because there is nothing specific that can be done to remedy them. There is however a very commonly used handler that deserves scrutiny; the much-reviled “catch (Exception ex) {}“.

Since there are precious few fatal exceptions that can be meaningfully handled in any fashion, it should be obvious that writing a handler purporting to deal with all of them is even more preposterous. That is why the following is the only valid pattern for a general exception handler:

try
{
    // Some code
}
catch (Exception ex)
{
    // ???
    throw;
}

Only by re-throwing the exception at the end of the handler can we guarantee that all the various fatal exceptions keep bubbling up to the top of the application, where termination of the application is the final signal of an unrecoverable problem.

The following two questions need to be answered then:

  • What kind of “some code” could be protected in this structure?
  • What kind of logic can sensibly be placed at “???”

To start with the latter; when something non-specified goes wrong, the only sensible options are to either record details not generally available in a stack trace in some fashion, or to make general fixes to the state-space that “some code” may have trashed.

Recording additional detail can be done by either logging something somewhere about values of relevant variables at the time the execution failed, or alternately to wrap the exception in a custom exception that records the values in its properties (in which case it should hold the original exception as an inner exception!)

Writing a general fix for corrupted state-space can be difficult. As one extreme, the fatal exception may have occurred in the middle of an allocation inside the “Dictionary.Add()” method, and now you’re stuck with a dictionary in an inconsistent and unrecoverable state. It may however be possible to just replace the dictionary with a new empty dictionary in the catch handler, providing that does not break any invariants that need to hold. In many cases, the “some code” will have made state-space changes that cannot be credibly put back in some correct default state, at which point you should resist the temptation to write any catch handler. If you cannot do anything,… then don’t.

Now, it should be obvious what “some code” could be; anything that either can benefit from additional information about the local state-space being recorded when a problem occurs, or anything for which affected state-space can be restored to some kind of safe default that does not break any invariants. (An example of the latter might be a manipulation of a cache of some sort that fails; restoring the cache to an empty state does not invalidate it’s invariants. It may hurt ongoing performance, but it does neatly restore the local state into a valid default.)

How to fix Fatal Handlers?
Many libraries or applications will have fallen prey to catching and swallowing “Exception” somewhere (including code I have written myself). The logical-sounding rationale usually is something like “If anything goes wrong while doing this, then let me put some default behaviour in that is good enough in its stead”. Default behaviour can range from returning a default value, all the way up to just logging the exception and moving on, hoping for the best.

while (...some high-level loop...)
    try
    {
        ...some piece of processing logic...
    }
    catch (Exception ex)
    {
        LogException("Could not process, retry next iteration", ex)
    }

On the face of it, it is easy to make yourself believe this improves the robustness of the above processing loop. Now, if anything goes wrong, it will try again some number of times depending on the high-level loop.

But as we’ve seen above, this really just makes a whole range of potential problems worse rather than better. There is no guarantee that the next iteration of the loop will even do the same thing that the failed iteration did. Instead of producing a file, the next loop could be deleting them. Rather than having a simple understandable error fall out of the application at the point of the original problem, we may end up doing all kinds of unpredictable things that are going to be impossible to diagnose or recover after-the-fact.

When you find code that contains general exception handlers, warning bells should be ringing. There is a reason there is an FxCop rule that triggers on this coding pattern. It is an evil pattern that must be exorcised.

The only valid fixes for “Exception” handlers are as follows:

  • Re-throw the original exception at the end of the handler (see “what to do?” above)
  • Throw a new exception that includes further details about the problem, and which must include the original exception as an inner-exception (see “what to do?” above)
  • Make the exception type more specific so that a problem that can be credibly recovered from is caught instead (and make sure the handling logic actually addresses that problem!)
  • Remove the handler altogether, and just let the exception mechanism do it’s thing

Some of these remedies edge into the territory of “Logical Errors” / “Exogenous and Vexing Exceptions” and my next post will dig much deeper into how to deal with those. That’ll be where the rubber meets the road on what many would consider actual exception handling, and what kind of exceptions you can declare and throw yourself (and how to do so).