Skip to main content

Resilience Patterns in .NET: Circuit Breaker

Resilience patterns protect microservices from cascading failures. When Service B (Inventory) is slow or crashes, Service A (Order) should not cascade the failure to its callers. Instead, Order Service should retry quickly, timeout, use a circuit breaker to avoid hammering a broken service, and degrade gracefully (return cached data, a default response, or an error). The Polly library in .NET provides production-grade implementations of these patterns.

A circuit breaker monitors calls to a dependency. When failures exceed a threshold, the circuit "opens" and stops sending requests, returning an error immediately (fast-fail). After a timeout, the circuit "half-opens" and allows a few test requests; if they succeed, the circuit closes and normal traffic resumes. This prevents a broken service from being continuously pounded with requests.

Implementing Circuit Breaker with Polly

Install Polly:

dotnet add package Polly
dotnet add package Polly.CircuitBreaker

Define a circuit breaker policy:

public static class ResiliencePolicies
{
public static IAsyncPolicy<HttpResponseMessage> GetInventoryPolicy()
{
var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.Or<TimeoutRejectedException>()
.OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
.CircuitBreakerAsync<HttpResponseMessage>(
handledEventsAllowedBeforeBreaking: 3, // Open after 3 failures
durationOfBreak: TimeSpan.FromSeconds(30), // Stay open for 30s
onBreak: (outcome, timespan) =>
{
Console.WriteLine($"Circuit breaker opened for 30 seconds. Reason: {outcome.Outcome}");
},
onReset: () =>
{
Console.WriteLine("Circuit breaker closed; resuming normal operation");
}
);

var retryPolicy = Policy
.Handle<HttpRequestException>()
.Or<TimeoutRejectedException>()
.OrResult<HttpResponseMessage>(r => r.StatusCode == System.Net.HttpStatusCode.RequestTimeout)
.WaitAndRetryAsync<HttpResponseMessage>(
retryCount: 3,
sleepDurationProvider: retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)), // Exponential backoff
onRetry: (outcome, timespan, retryCount, context) =>
{
Console.WriteLine($"Retry {retryCount} after {timespan.TotalSeconds}s");
}
);

var timeoutPolicy = Policy
.TimeoutAsync<HttpResponseMessage>(
timeout: TimeSpan.FromSeconds(5),
timeoutStrategy: TimeoutStrategy.Optimistic
);

// Combine: timeout -> retry -> circuit breaker
return Policy.WrapAsync(timeoutPolicy, retryPolicy, circuitBreakerPolicy);
}
}

Use the policy in an HTTP client:

public class InventoryServiceClient
{
private readonly HttpClient _httpClient;
private readonly IAsyncPolicy<HttpResponseMessage> _resiliencePolicy;
private readonly ILogger<InventoryServiceClient> _logger;

public InventoryServiceClient(HttpClient httpClient, ILogger<InventoryServiceClient> logger)
{
_httpClient = httpClient;
_resiliencePolicy = ResiliencePolicies.GetInventoryPolicy();
_logger = logger;
}

public async Task<AvailabilityResult> CheckAvailabilityAsync(List<CartItem> items)
{
try
{
var response = await _resiliencePolicy.ExecuteAsync(async () =>
{
_logger.LogInformation("Calling Inventory Service to check availability");

return await _httpClient.PostAsJsonAsync(
"https://inventory-service/api/check-availability",
new { items }
);
});

if (!response.IsSuccessStatusCode)
{
_logger.LogWarning("Inventory Service returned {StatusCode}", response.StatusCode);
return new AvailabilityResult { AllAvailable = false, Reason = "Inventory Service error" };
}

var result = await response.Content.ReadAsAsync<AvailabilityResult>();
_logger.LogInformation("Availability check succeeded");
return result;
}
catch (BrokenCircuitException ex)
{
_logger.LogError(ex, "Circuit breaker is open; Inventory Service is unavailable");
return new AvailabilityResult { AllAvailable = false, Reason = "Service temporarily unavailable" };
}
catch (TimeoutRejectedException ex)
{
_logger.LogError(ex, "Inventory Service request timed out");
return new AvailabilityResult { AllAvailable = false, Reason = "Service request timeout" };
}
}
}

When called, the policy:

  1. Timeout: If the request takes longer than 5 seconds, fail fast.
  2. Retry: On failure, retry up to 3 times with exponential backoff (1s, 2s, 4s).
  3. Circuit Breaker: If 3 failures occur quickly, open the circuit. For the next 30 seconds, fail immediately without calling Inventory Service.
  4. Half-open: After 30 seconds, allow one test request. If it succeeds, close the circuit. If it fails, open again.

Bulkhead Isolation

Isolate dependencies to prevent one slow service from blocking all requests. The bulkhead pattern uses a dedicated thread pool per service:

public class ResilienceConfig
{
public static IAsyncPolicy<HttpResponseMessage> GetInventoryPolicyWithBulkhead()
{
var bulkheadPolicy = Policy.BulkheadAsync<HttpResponseMessage>(
maxParallelization: 10, // Max 10 concurrent calls to Inventory
maxQueuingActions: 5 // Queue up to 5 additional requests
);

var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
.CircuitBreakerAsync<HttpResponseMessage>(3, TimeSpan.FromSeconds(30));

return Policy.WrapAsync(bulkheadPolicy, circuitBreakerPolicy);
}
}

With the bulkhead, Order Service reserves 10 threads for Inventory calls. If Inventory is slow, these 10 threads get blocked, but other 990 threads in the pool continue serving requests for other operations. This prevents a single slow dependency from exhausting all threads and bringing down the entire service.

Fallback Strategy

When a service fails, provide a fallback response:

public class InventoryServiceClient
{
private readonly HttpClient _httpClient;
private readonly IAsyncPolicy<AvailabilityResult> _resiliencePolicy;
private readonly IMemoryCache _cache;

public async Task<AvailabilityResult> CheckAvailabilityAsync(List<CartItem> items)
{
var policy = Policy<AvailabilityResult>
.Handle<HttpRequestException>()
.FallbackAsync<AvailabilityResult>(
async context =>
{
// Fallback 1: return cached result
var cacheKey = $"availability:{string.Join(",", items.Select(i => i.ProductId))}";
if (_cache.TryGetValue(cacheKey, out AvailabilityResult cached))
{
Console.WriteLine("Using cached availability (fallback)");
return cached;
}

// Fallback 2: assume items are available (optimistic)
Console.WriteLine("Assuming all items are available (fallback)");
return new AvailabilityResult { AllAvailable = true };
}
);

return await policy.ExecuteAsync(async () =>
{
var response = await _httpClient.PostAsJsonAsync(
"https://inventory-service/api/check-availability",
new { items }
);

return await response.Content.ReadAsAsync<AvailabilityResult>();
});
}
}

If Inventory Service fails, Order Service returns cached data or assumes items are available (graceful degradation). The user can place the order; if items are not actually available, Inventory Service later sends a cancellation event.

Registering Polly in DI

// Program.cs
var builder = WebApplication.CreateBuilder(args);

builder.Services.AddHttpClient<InventoryServiceClient>()
.AddTransientHttpErrorPolicy(p =>
p.WaitAndRetryAsync(3, _ => TimeSpan.FromSeconds(2)))
.AddTransientHttpErrorPolicy(p =>
p.CircuitBreakerAsync(3, TimeSpan.FromSeconds(30)));

builder.Services.AddScoped<InventoryServiceClient>();
var app = builder.Build();

Polly policies can be registered on the HttpClientBuilder for a fluent configuration.

Monitoring Resilience

Log circuit breaker state changes and use metrics to monitor resilience:

var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.CircuitBreakerAsync<HttpResponseMessage>(
handledEventsAllowedBeforeBreaking: 3,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (outcome, timespan) =>
{
_logger.LogCritical("Circuit breaker opened for {ServiceName}", "InventoryService");
_metrics.IncrementCounter("circuit_breaker_opened", new[] { ("service", "inventory") });
},
onReset: () =>
{
_logger.LogInformation("Circuit breaker closed for {ServiceName}", "InventoryService");
_metrics.IncrementCounter("circuit_breaker_closed", new[] { ("service", "inventory") });
}
);

Use Application Insights, Prometheus, or DataDog to track circuit breaker state, retry counts, and timeout rates. Alert when a circuit breaker opens (indicates a failing dependency).

Key Takeaways

  • Circuit breakers prevent cascading failures by fast-failing when a dependency is broken.
  • Combine timeout, retry, and circuit breaker policies for comprehensive resilience.
  • Use bulkhead isolation to prevent one slow dependency from blocking all requests.
  • Implement graceful fallbacks: use cached data, return defaults, or degrade functionality.
  • Monitor circuit breaker state and alert on failures.

Frequently Asked Questions

How do I choose the circuit breaker threshold?

Start with 3–5 failures before breaking. If the service is critical and fast, use a lower threshold (2–3). If it is tolerant of brief outages, use a higher threshold (5–10). Monitor in production and adjust based on false positives (circuits opening when the service is actually healthy).

Should I retry all errors?

No. Retry only transient errors (timeouts, 503 Service Unavailable, network errors). Do not retry permanent errors (400 Bad Request, 401 Unauthorized, 404 Not Found). Use IAsyncPolicy.Or() to specify which exceptions or status codes trigger retries.

What is the optimal retry count and backoff strategy?

Retry 2–3 times. Use exponential backoff (1s, 2s, 4s) to give the service time to recover. For critical operations, add jitter (random delay) to prevent thundering herd (all clients retrying simultaneously).

How long should the circuit breaker stay open?

30 seconds to 1 minute is typical. For critical services, use shorter intervals (10–30s). For less critical services, longer intervals (1–2 minutes). Monitor how long the service takes to recover in production and adjust.

Can I use circuit breakers for database calls?

Yes. Wrap database operations in a policy. However, database failures are usually not transient (wrong credentials, corrupted data), so retries may not help. Use timeouts and circuit breakers to prevent hanging requests.

Further Reading