Troubleshooting applications can be tedious and time consuming. It generally involves checking application logs, networking, firewalls, databases, database logs, third party services, internal service logs... you get the idea. When service is interrupted every second can count. Let your application help you track down issues faster. Your application already has configurations set up for connecting to each external service you need. Why not spend a few extra minutes and make your life a little easier.

What is a Health Check?

Health checks are generally very simple binary checks. A check can be as simple as a ping command. This covers two troubleshooting use cases. Can we connect to the service and is that service operational. A failing check won't tell you what is wrong but it can quickly point you in the right direction. Health checks can also be used for preventing and automatically resolving some issues. If you have auto scaling or load balancers that montior application's health, they can quickly add/remove servers when there are issues. As with all fully automated systems this could backfire and accidentally remove all servers when minor hiccups occur. This is one reason health checks are recommended to be used for simple up/down checks and not for variable / metric related checks.

Dropwizard Health Checks

Once again Dropwizard Metrics comes through with a great addition to its jvm metrics. Simply create a class that extends HealthCheck and overrides the check() method and register it to the HealthCheckRegistry. You can now easily call all health checks and see if the application is having any issues. Here is a sample health check that assumes we have an external service as a dependency. All it does is requst a url and expect a 20x response code. HealthCheck also exposes a way to add a little context to health checks for some additional debugging info.

public class ExternalServiceHealthCheck extends HealthCheck {
    private final OkHttpClient client;
    private final HttpUrl path;

    public ExternalServiceHealthCheck(OkHttpClient client, HttpUrl path) {
        this.client = client;
        this.path = path;
    }

    @Override
    protected Result check() throws Exception {
        Request request = new Request.Builder()
            .url(path)
            .get()
            .build();
        Response response = client.newCall(request).execute();

        // If response is a 20x response code pass it.
        if (response.isSuccessful()) {
            return Result.healthy();
        }
        return Result.unhealthy("code: %s - body: %s", response.code(), response.body().string());
    }
}

What should you add health checks for? As much as you can or that makes sense. (Databases, third party services, internal services, caches, and maybe even critial files). Some third parties already implement health checks for you. HikariCP connection pooling provides an out of the box SQL database health check.

HealthCheckRegistry

Simple static singleton HealthCheckRegistry. Feel free to use DI if your heart desires.

public class HealthChecks {
    private HealthChecks() {}

    private static final HealthCheckRegistry healthCheckRegistry;
    static {
        healthCheckRegistry = new HealthCheckRegistry();
    }

    public static HealthCheckRegistry getHealthCheckRegistry() {
        return healthCheckRegistry;
    }
}

HttpHandler for the HealthCheckRegistry

Since we are making a web service let's expose our health checks via HTTP as JSON. Notice how we also change the status code if any of the checks are unhealthy.

public static void health(HttpServerExchange exchange) {
    SortedMap<String, Result> results = HealthChecks.getHealthCheckRegistry().runHealthChecks();
    boolean unhealthy = results.values().stream().anyMatch(result -> !result.isHealthy());

    if (unhealthy) {
        /*
         *  Set a 500 status code also. A lot of systems / dev ops tools can
         *  easily test status codes but are not set up to parse JSON.
         *  Let's keep it simple for everyone.
         */
        exchange.setStatusCode(500);
    }
    Exchange.body().sendJson(exchange, results);
}

Example Routes

Adding a few quick example routes.

private static final HttpHandler ROUTES = new RoutingHandler()
    .get("/ping", timed("ping", (exchange) -> Exchange.body().sendText(exchange, "ok")))
    .get("/metrics", timed("metrics", CustomHandlers::metrics))
    .get("/health", timed("health", CustomHandlers::health))
    .setFallbackHandler(timed("notFound", RoutingHandlers::notFoundHandler))
;

Wiring up the Health Checks

Let's reuse our connection pools from our previous post HikariCP connection pooling which will automatically add themselves as health checks. We will also add two of our custom HealthChecks (Let's pretend they are actually hitting an external service and not itself). One will be set to always fail just as an example.

public static void main(String[] args) {
    /*
     *  Init connection pools. They auto register their own health checks.
     */
    ConnectionPools.getProcessing();
    ConnectionPools.getTransactional();

    // Assume some global HttpClient.
    OkHttpClient client = new OkHttpClient.Builder().build();

    HttpUrl passingPath = HttpUrl.parse("http://localhost:8080/ping");
    HealthCheck passing = new ExternalServiceHealthCheck(client, passingPath);
    HealthChecks.getHealthCheckRegistry().register("ping", passing);

    // Since this route doesn't exist it will respond with 404 and should fail the check.
    HttpUrl failingPath = HttpUrl.parse("http://localhost:8080/failingPath");
    HealthCheck failing = new ExternalServiceHealthCheck(client, failingPath);
    HealthChecks.getHealthCheckRegistry().register("shouldFail", failing);

    // Once again pull in a bunch of common middleware.
    SimpleServer server = SimpleServer.simpleServer(Middleware.common(ROUTES));
    server.start();
}

See it in action

curl -v localhost:8080/health
*   Trying ::1...
* Connected to localhost (::1) port 8080 (#0)
> GET /health HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.49.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Connection: keep-alive
< Content-Type: application/json
< Content-Length: 212
< Date: Tue, 07 Mar 2017 00:55:17 GMT
<
* Connection #0 to host localhost left intact
{
  "ping": {
    "healthy": true
  },
  "processing.pool.ConnectivityCheck": {
    "healthy": true
  },
  "shouldFail": {
    "healthy": false,
    "message": "code: 404 - body: Page Not Found!!"
  },
  "transactional.pool.ConnectivityCheck": {
    "healthy": true
  }
}

Once again notice the status code is 500 since one of the checks was unhealthy.