Circuit Breaker Pattern with Polly
A circuit breaker is a state machine that stops sending requests to a failing service, protecting both the service and your system from cascading failures. It transitions through three states—Closed (pass requests), Open (reject immediately), and Half-Open (probe for recovery)—based on failure thresholds. When deployed correctly, circuit breakers can reduce failed request volume by 60-80% and give downstream services time to recover.
Circuit Breaker States Explained
The circuit breaker monitors failures and transitions between states:
Closed: Normal operation. Requests pass through to the target service. If failures stay below a threshold, the circuit remains closed.
Open: The failure threshold is exceeded. New requests fail immediately without calling the target service. This prevents hammering a degraded service and gives it time to recover.
Half-Open: After an open window expires, the circuit tests the service with a single request (or a small number). If it succeeds, the circuit closes; if it fails, it opens again.
Closed ──(failures exceed threshold)──> Open
↑ │
└──(test request succeeds)─ Half-Open──┘
Implementing a Basic Circuit Breaker
using Polly;
using System;
using System.Net.Http;
using System.Threading.Tasks;
var circuitBreaker = Policy
.Handle<HttpRequestException>()
.OrResult<HttpResponseMessage>(r =>
!r.IsSuccessStatusCode
)
.CircuitBreakerAsync<HttpResponseMessage>(
handledEventsAllowedBeforeBreaking: 5,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: async (outcome, duration) =>
{
Console.WriteLine($"Circuit opened for {duration.TotalSeconds}s. Error: {outcome.Exception?.Message}");
await Task.CompletedTask;
},
onReset: async () =>
{
Console.WriteLine("Circuit closed. Service recovered.");
await Task.CompletedTask;
},
onHalfOpen: async () =>
{
Console.WriteLine("Circuit half-open. Testing service...");
await Task.CompletedTask;
}
);
using var client = new HttpClient();
for (int i = 0; i < 10; i++)
{
try
{
var response = await circuitBreaker.ExecuteAsync(async () =>
await client.GetAsync("https://api.example.com/data")
);
Console.WriteLine($"Request {i + 1}: Success");
}
catch (BrokenCircuitException)
{
Console.WriteLine($"Request {i + 1}: Circuit is open. Failing fast.");
}
catch (Exception ex)
{
Console.WriteLine($"Request {i + 1}: Error - {ex.Message}");
}
await Task.Delay(TimeSpan.FromSeconds(2));
}
Configuration breakdown:
handledEventsAllowedBeforeBreaking: 5— Open the circuit after 5 failures.durationOfBreak: TimeSpan.FromSeconds(30)— Keep the circuit open for 30 seconds.onBreak— Callback when circuit opens (log, alert, notify).onReset— Callback when circuit closes after recovery.onHalfOpen— Callback when testing if the service recovered.
Real-world scenario: Database connection pool is exhausted. Requests fail for 5 consecutive calls. The circuit opens, rejecting new requests immediately. After 30 seconds, the circuit enters half-open state and probes with a single request. If it succeeds, the circuit closes and normal traffic resumes.
Advanced Configuration: Failure Threshold as Percentage
Instead of a fixed failure count, you can use a percentage threshold. Open the circuit if 50% of requests fail:
using Polly;
using System;
using System.Net.Http;
using System.Threading.Tasks;
var circuitBreaker = Policy
.Handle<HttpRequestException>()
.OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
.CircuitBreakerAsync<HttpResponseMessage>(
failureThreshold: 0.5, // 50% failure rate
samplingDuration: TimeSpan.FromSeconds(10),
minimumThroughput: 5, // Require at least 5 requests to measure
durationOfBreak: TimeSpan.FromSeconds(60),
onBreak: async (outcome, duration) =>
{
Console.WriteLine($"Circuit opened: 50% failure rate detected. Waiting {duration.TotalSeconds}s.");
await Task.CompletedTask;
}
);
// Usage
using var client = new HttpClient();
var response = await circuitBreaker.ExecuteAsync(async () =>
await client.GetAsync("https://api.example.com/data")
);
Why use percentage? Fixed thresholds may be too sensitive in low-traffic scenarios. A percentage-based threshold ignores noise and opens only during genuine degradation.
Testing Circuit Breaker State Transitions
Unit tests verify the circuit breaker transitions correctly:
using Polly;
using Xunit;
using System;
using System.Threading.Tasks;
public class CircuitBreakerTests
{
[Fact]
public async Task CircuitOpensAfterThresholdExceeded()
{
var policy = Policy
.Handle<Exception>()
.CircuitBreakerAsync(
handledEventsAllowedBeforeBreaking: 3,
durationOfBreak: TimeSpan.FromMilliseconds(100)
);
// Trigger 3 failures
for (int i = 0; i < 3; i++)
{
await Assert.ThrowsAsync<Exception>(
() => policy.ExecuteAsync(() =>
throw new Exception("Simulated failure")
)
);
}
// Circuit should now be open; this should fail fast with BrokenCircuitException
await Assert.ThrowsAsync<BrokenCircuitException>(
() => policy.ExecuteAsync(() =>
throw new Exception("This should not execute")
)
);
}
[Fact]
public async Task CircuitRecoversAfterBreakDuration()
{
var policy = Policy
.Handle<Exception>()
.CircuitBreakerAsync(
handledEventsAllowedBeforeBreaking: 1,
durationOfBreak: TimeSpan.FromMilliseconds(100)
);
// Trigger one failure to open circuit
await Assert.ThrowsAsync<Exception>(
() => policy.ExecuteAsync(() => throw new Exception("Fail"))
);
// Wait for break duration to expire
await Task.Delay(TimeSpan.FromMilliseconds(150));
// Circuit should enter half-open and allow the request
var result = await policy.ExecuteAsync(() =>
Task.FromResult("Success")
);
Assert.Equal("Success", result);
}
}
Combining Circuit Breaker with Retry
A real microservice combines circuit breaker with retry: retry for transient glitches, but circuit breaker to protect against sustained outages:
using Polly;
using System;
using System.Net.Http;
using System.Threading.Tasks;
var retryPolicy = Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(
retryCount: 3,
sleepDurationProvider: attempt =>
TimeSpan.FromMilliseconds(Math.Pow(2, attempt) * 100)
);
var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.CircuitBreakerAsync(
handledEventsAllowedBeforeBreaking: 5,
durationOfBreak: TimeSpan.FromSeconds(30)
);
// Wrap: retry first, then circuit breaker
var combinedPolicy = Policy.WrapAsync(retryPolicy, circuitBreakerPolicy);
using var client = new HttpClient();
var response = await combinedPolicy.ExecuteAsync(async () =>
await client.GetAsync("https://api.example.com/data")
);
Execution order: Request → Retry (up to 3 times) → Circuit Breaker (check state, open/reject if needed) → call target.
Key Takeaways
- Circuit breaker prevents cascading failures by stopping requests to a failing service.
- Three states (Closed, Open, Half-Open) manage the transition from normal operation to recovery.
- Threshold options include fixed failure count or percentage-based rate; choose based on your traffic volume.
- Break duration should allow time for the target service to recover (30 seconds to 5 minutes typical).
- Always combine circuit breaker with retry: retry handles transients, circuit breaker handles sustained outages.
Frequently Asked Questions
What is the difference between circuit breaker open and closed?
Closed means the circuit is operational; requests pass through and failures are tracked. Open means the failure threshold is exceeded; new requests fail immediately with BrokenCircuitException without calling the target. This protects both the service and your threads.
How do I set the break duration?
Start with 30-60 seconds for most services. If the service takes longer to recover (e.g., database full restart), use 2-5 minutes. Monitor recovery time in production and adjust. Too short, and the circuit reopens before recovery completes. Too long, and users see unnecessary downtime.
Can I use circuit breaker for all operations?
Circuit breaker is most valuable for calls to external services (APIs, databases, message queues) where failures can cascade. Don't use it for in-process operations or where every millisecond counts; the state tracking adds small overhead.
What happens to requests in flight when the circuit opens?
In-flight requests complete or fail normally. Only new requests are rejected immediately with BrokenCircuitException. This is why the break duration must be long enough for the service to actually recover.