Time series metrics reporting and alerting is an essential tool when it comes to monitoring production services. Graphs help you monitor trends over time, identify spikes in load / latency, identify bottlenecks with constrained resources, etc. Dropwizard Metrics is a great library for collecting metrics and has a lot of features out of the box including various JVM metrics. There are also many third party library hooks for collections metrics on HikariCP connections pools, Redis client connections, HTTP client connections, and many more.

Once metrics are being collected we need a time series datastore as well as a graphing and alerting system to get the most out of our metrics. This example will be utilizing Grafana Cloud which offers cloud hosted Grafana a graphing and alerting application that hooks into many datasources, as well as two options for time series datasources Graphite and Prometheus. StubbornJava has public facing Grafana dashboards that will continue to add new metrics as new content is added. Take a look at the StubbornJava Overview dashboard to start with.

Custom Dropwizard GraphiteSender

Note: This is not the Grafana Cloud recommended implementation. Grafana Cloud recommends using a Carbon-Relay-NG process for pre-aggregating and batch sending metrics to Grafana Cloud. Since this site is currently only a single server we opted to implement an HTTP sender using the Grafana Cloud API to have less infrastructure overhead. If your system has multiple environments and services it is highly recommended to use the Carbon-Relay-NG process.

This implementation should be fairly straightforward. Dropwizard Metrics reporters are run on a single thread on a timer so we should not have to worry about thread safety in this class. Every time the reporter runs it will iterate all of the metrics contained in our MetricRegistry convert them to the appropriate format and send the data to the Grafana API using OkHttp and serializing to JSON with Jackson.

/**
 * This is a hacked together HTTP sender for grafana cloud.
 * This is NOT the recommended approach to collect metrics.
 * The recommended approach is to use a Carbon-Relay-NG.
 * @author billoneil
 *
 */
class GraphiteHttpSender implements GraphiteSender {
    @SuppressWarnings("unused")
	private static final Logger log = LoggerFactory.getLogger(GraphiteHttpSender.class);

    private final OkHttpClient client;
    private final String host;
    private final List<GraphiteMetric> metrics = Lists.newArrayList();

    public GraphiteHttpSender(OkHttpClient client, String host, String apiKey) {
        this.client = client.newBuilder()
                            .addInterceptor(HttpClient.getHeaderInterceptor("Authorization", "Bearer " + apiKey))
                            .build();
        this.host = host;
    }

    @Override
    public void connect() throws IllegalStateException, IOException {
        // Just no op here
    }

    @Override
    public void close() throws IOException {
        // no op
    }

    @Override
    public void send(String name, String value, long timestamp) throws IOException {
        metrics.add(new GraphiteMetric(name, 10, Double.parseDouble(value), timestamp));
    }

    @Override
    public void flush() throws IOException {
        Request request = new Request.Builder()
                .url(host + "/metrics")
                .post(RequestBody.Companion.create(Json.serializer().toByteArray(metrics), MediaType.Companion.parse("application/json")))
                .build();
        Retry.retryUntilSuccessfulWithBackoff(() -> client.newCall(request).execute());
        metrics.clear();
    }

    @Override
    public boolean isConnected() {
        // TODO Auto-generated method stub
        return false;
    }

    @Override
    public int getFailures() {
        // TODO Auto-generated method stub
        return 0;
    }

    private static final class GraphiteMetric {
        private final String name;
        private final int interval;
        private final double value;
        private final long time;

        public GraphiteMetric(@JsonProperty("name") String name,
                              @JsonProperty("interval") int interval,
                              @JsonProperty("value") double value,
                              @JsonProperty("time") long time) {
            this.name = name;
            this.interval = interval;
            this.value = value;
            this.time = time;
        }

        @SuppressWarnings("unused")
		public String getName() {
            return name;
        }
        @SuppressWarnings("unused")
		public int getInterval() {
            return interval;
        }
        @SuppressWarnings("unused")
		public double getValue() {
            return value;
        }
        @SuppressWarnings("unused")
		public long getTime() {
            return time;
        }
    }
}

DropwizardMetrics Reporter

Once we have our custom GraphiteSender implemented all we are left to do is plug it into the existing GraphiteReporter and start it. We have our keys partitioned by environment and host so that all metrics are easier to split up and view aggregates or host by host metrics. See it in action at StubbornJava Overview.

class MetricsReporters {
    private static final Logger log = LoggerFactory.getLogger(MetricsReporters.class);

    public static void startReporters(MetricRegistry registry) {
        // Graphite reporter to Grafana Cloud
        OkHttpClient client = new OkHttpClient.Builder()
            //.addNetworkInterceptor(HttpClient.getLoggingInterceptor())
            .build();

        if (!Configs.properties().hasPath("metrics.graphite.host")
            || !Configs.properties().hasPath("metrics.grafana.api_key")) {
            log.info("Missing metrics reporter key or host skipping");
            return;
        }

        String graphiteHost = Configs.properties().getString("metrics.graphite.host");
        String grafanaApiKey = Configs.properties().getString("metrics.grafana.api_key");
        final GraphiteHttpSender graphite = new GraphiteHttpSender(client, graphiteHost, grafanaApiKey);
        final GraphiteReporter reporter = GraphiteReporter.forRegistry(registry)
                                                          .prefixedWith(Metrics.metricPrefix("stubbornjava"))
                                                          .convertRatesTo(TimeUnit.MINUTES)
                                                          .convertDurationsTo(TimeUnit.MILLISECONDS)
                                                          .filter(MetricFilter.ALL)
                                                          .build(graphite);
        reporter.start(10, TimeUnit.SECONDS);
    }
}