Making systems more resilient (1) - Circuit Breaker

Things break. Usually in production. In this article series we will present various mechanisms for making your systems more resilient so your users will never experience real outage in spite of possible problems under your hood.
, 24.05.2019.
Circuit Breaker has an interesting name that comes from the analogy found in physical electrical world. Every decent electrician recognizes that playing with electricity is dangerous and that she must install security mechanisms to protect the rest of electrical network if one device breaks down. What do electricians do? They install fuses. If a device (say toaster!) breaks down and produces short circuit, strong electric current will flow through the network resulting in damage, overheating and possibly fire. Unless there is a fuse in the network! If there is strong electric current flowing, fuse will melt down or otherwise break the electrical circuit, stop the current and save the rest of the network.
Circuit Breaker mechanisms in software are very much like electrical fuses in the real world. Suppose one of your services is calling another:
If Service B stops responding, Service A will patiently wait until timeout happens and then will either throw an exception or return questionable results. None of this is acceptable. We need a better solution that would prevent the whole system from going down. That’s where the Circuit Breakers come into play.
Circuit Breaker is positioned between two services and service call passes through it. If everything is fine, Circuit Breaker just passes calls through. What happens when Service B doesn’t respond?
Circuit Breaker will detect that and return an error or alternative response. In electrical terms, Circuit Breaker will “open the circuit“. What’s more, Circuit Breaker will remember that Service B is down and whoever calls Service B will immediately get an error without Circuit Breaker even passing the request to Service B. Circuit Breaker will periodically pass one of the requests through to Service B, just to check if Service B is still down. If that is so, Circuit Breaker will continue to return error without even passing requests to Service B.
If, on the other hand, Service B recovers and returns valid result, Circuit Breaker will detect that, return the result to caller (Service A) and “close the circuit” again, i.e. continue passing all requests through to Service B.
When Circuit Breaker detects that called service is down, it can do various things. It could return a meaningful error to caller but it could also return a “good enough” result that would provide some value for the user. Let’s take an example of banking application which shows account balance fetched via separate service. If that service goes down, Circuit Breaker could return last known balance from the cache. Yes, it is clear that yesterday’s balance might not be valid today, but accompanied with a proper message it’s better than nothing. “Dear customer, we’re experiencing some trouble getting your balance at the moment, but your balance 3 hours ago was €234.55” beats “Your balance is €null” anytime. Of course, what Circuit Breaker can return needs to be carefully thought through from business perspective.
Circuit Breakers are really cool. In Part 2 of this series we will lift the hood and show concrete Circuit Breaker implementation.
— Photo by from Pexels.