We briefly mentioned Dropwizard Metrics in our Simple Embedded Java REST server example. Let's see what some of the built in metrics such as GarbageCollectorMetricSet, CachedThreadStatesGaugeSet, MemoryUsageGaugeSet, and a Logback InstrumentedAppender can offer us.

Metrics Implementation

Live stats for this server. Once we create our MetricRegistry we can register some built in metric sets. This is also where we can set up our metrics reporters where we can report and graph metrics to vairous systems such as hosted grafana used with hosted metrics or many self hosted solutions including but not limited to Grafana, InfluxDB, Graphite, and Prometheus.

public class Metrics {
    private static final Logger log = LoggerFactory.getLogger(Metrics.class);
    private static final MetricRegistry registry;
    static {
        registry = new MetricRegistry();
        registry.register("gc", new GarbageCollectorMetricSet());
        registry.register("threads", new CachedThreadStatesGaugeSet(10, TimeUnit.SECONDS));
        registry.register("memory", new MemoryUsageGaugeSet());

        // Logback metrics
        final LoggerContext factory = (LoggerContext) LoggerFactory.getILoggerFactory();
        final ch.qos.logback.classic.Logger root = factory.getLogger(ch.qos.logback.classic.Logger.ROOT_LOGGER_NAME);
        final InstrumentedAppender metrics = new InstrumentedAppender(registry);
        metrics.setContext(root.getLoggerContext());
        metrics.start();
        root.addAppender(metrics);

        // Register reporters here.
        MetricsReporters.startReporters(registry);
    }

    public static MetricRegistry registry() {
        return registry;
    }

    public static Timer timer(String first, String... keys) {
        return registry.timer(MetricRegistry.name(first, keys));
    }

    public static Meter meter(String first, String... keys) {
        return registry.meter(MetricRegistry.name(first, keys));
    }

    static String metricPrefix(String app) {
        Env env = Env.get();
        String host = env == Env.LOCAL ? "localhost" : getHost();
        String prefix = MetricRegistry.name(app, env.getName(), host);
        log.info("Setting Metrics Prefix {}", prefix);
        return prefix;
    }

    private static String getHost() {
        return EC2MetadataUtils.getLocalHostName().split("\\.")[0];
    }
}

Garbage Collector Stats

Garbage collector stats from GarbageCollectorMetricSet provide us with metrics that tell us how frequently we mark and sweep or scavenge as well as how long they take to run. Monitoring these over time can help identify unecessary object creation or anytime you introduce new code that incurs much higher garbage collection costs.

"gc.PS-MarkSweep.count": {
  "value": 2
},
"gc.PS-MarkSweep.time": {
  "value": 156
},
"gc.PS-Scavenge.count": {
  "value": 7
},
"gc.PS-Scavenge.time": {
  "value": 102
}

Thread State Stats

The CachedThreadStatesGaugeSet provides us with snapshots of how many threads we have in which states. It's best to use the cached version since this requires traversing all threads and can be a little more expensive than some other metrics. These stats help identify whether or not the application is utilizing its threads well. It's not uncommon to see microservices these days running on 2 core machines with 200-400 threads. Get your services under control people.

"threads.blocked.count": {
  "value": 0
},
"threads.count": {
  "value": 40
},
"threads.daemon.count": {
  "value": 21
},
"threads.deadlock.count": {
  "value": 0
},
"threads.deadlocks": {
  "value": []
},
"threads.new.count": {
  "value": 0
},
"threads.runnable.count": {
  "value": 10
},
"threads.terminated.count": {
  "value": 0
},
"threads.timed_waiting.count": {
  "value": 10
},
"threads.waiting.count": {
  "value": 20
}

Logging Level Hit Counts

The InstrumentedAppender is a great way to monitor your logging levels over time. This can help track if you have increasing / decreasing errors over time or notice any bursts in one specific level. For instance If you log validation errors at the INFO level and notice a very large spike in INFO logs it might be worthwhile to check out what is going on.

"ch.qos.logback.core.Appender.all": {
  "count": 383,
  "m15_rate": 34.36128532787729,
  "m1_rate": 39.7663347840205,
  "m5_rate": 49.414412160499325,
  "mean_rate": 42.023122957333975,
  "units": "events/minute"
},
"ch.qos.logback.core.Appender.debug": {
  "count": 367,
  "m15_rate": 26.895453559439556,
  "m1_rate": 36.48022526729547,
  "m5_rate": 45.363948799581394,
  "mean_rate": 40.26758054657317,
  "units": "events/minute"
},
"ch.qos.logback.core.Appender.error": {
  "count": 0,
  "m15_rate": 0,
  "m1_rate": 0,
  "m5_rate": 0,
  "mean_rate": 0,
  "units": "events/minute"
},
"ch.qos.logback.core.Appender.info": {
  "count": 15,
  "m15_rate": 0.8800921353094169,
  "m1_rate": 3.2846285990759982,
  "m5_rate": 2.066876702258907,
  "mean_rate": 1.6458133427254675,
  "units": "events/minute"
},
"ch.qos.logback.core.Appender.trace": {
  "count": 0,
  "m15_rate": 0,
  "m1_rate": 0,
  "m5_rate": 0,
  "mean_rate": 0,
  "units": "events/minute"
},
"ch.qos.logback.core.Appender.warn": {
  "count": 1,
  "m15_rate": 6.585739633128305,
  "m1_rate": 0.0014809176490401623,
  "m5_rate": 1.9835866586590396,
  "mean_rate": 0.10972083238473278,
  "units": "events/minute"
}

JVM Memory Stats

The MemoryUsageGaugeSet gives you many useful metrics around the JVM's memory internals.

"memory.heap.committed": {
  "value": 294649856
},
"memory.heap.init": {
  "value": 268435456
},
"memory.heap.max": {
  "value": 3817865216
},
"memory.heap.usage": {
  "value": 0.030356478671456587
},
"memory.heap.used": {
  "value": 116056056
},
"memory.non-heap.committed": {
  "value": 77692928
},
"memory.non-heap.init": {
  "value": 2555904
},
"memory.non-heap.max": {
  "value": -1
},
"memory.non-heap.usage": {
  "value": -75373272
},
"memory.non-heap.used": {
  "value": 75373336
},
"memory.pools.Code-Cache.committed": {
  "value": 16252928
},
"memory.pools.Code-Cache.init": {
  "value": 2555904
},
"memory.pools.Code-Cache.max": {
  "value": 251658240
},
"memory.pools.Code-Cache.usage": {
  "value": 0.06367696126302083
},
"memory.pools.Code-Cache.used": {
  "value": 16024832
},
"memory.pools.Compressed-Class-Space.committed": {
  "value": 6340608
},
"memory.pools.Compressed-Class-Space.init": {
  "value": 0
},
"memory.pools.Compressed-Class-Space.max": {
  "value": 1073741824
},
"memory.pools.Compressed-Class-Space.usage": {
  "value": 0.00545993447303772
},
"memory.pools.Compressed-Class-Space.used": {
  "value": 5862560
},
"memory.pools.Metaspace.committed": {
  "value": 55099392
},
"memory.pools.Metaspace.init": {
  "value": 0
},
"memory.pools.Metaspace.max": {
  "value": -1
},
"memory.pools.Metaspace.usage": {
  "value": 0.9707341961232531
},
"memory.pools.Metaspace.used": {
  "value": 53486864
},
"memory.pools.PS-Eden-Space.committed": {
  "value": 95420416
},
"memory.pools.PS-Eden-Space.init": {
  "value": 67108864
},
"memory.pools.PS-Eden-Space.max": {
  "value": 1401421824
},
"memory.pools.PS-Eden-Space.usage": {
  "value": 0.06529679960228735
},
"memory.pools.PS-Eden-Space.used": {
  "value": 91508360
},
"memory.pools.PS-Old-Gen.committed": {
  "value": 184025088
},
"memory.pools.PS-Old-Gen.init": {
  "value": 179306496
},
"memory.pools.PS-Old-Gen.max": {
  "value": 2863661056
},
"memory.pools.PS-Old-Gen.usage": {
  "value": 0.006594411709595844
},
"memory.pools.PS-Old-Gen.used": {
  "value": 18884160
},
"memory.pools.PS-Survivor-Space.committed": {
  "value": 15204352
},
"memory.pools.PS-Survivor-Space.init": {
  "value": 11010048
},
"memory.pools.PS-Survivor-Space.max": {
  "value": 15204352
},
"memory.pools.PS-Survivor-Space.usage": {
  "value": 0.3724944016029095
},
"memory.pools.PS-Survivor-Space.used": {
  "value": 5663536
},
"memory.total.committed": {
  "value": 372342784
},
"memory.total.init": {
  "value": 270991360
},
"memory.total.max": {
  "value": 3817865215
},
"memory.total.used": {
  "value": 191603408
}