Metrics & Monitoring
Comprehensive metrics collection and monitoring for ML inference operations in streaming applications.
Metrics Overview
Otter Streams provides comprehensive metrics collection using Micrometer for monitoring inference operations, performance, and system health. All metrics are exposed through standard monitoring systems.
Latency Metrics
PerformanceTrack inference latency at various stages of the pipeline.
- inference.latency - Total inference time
- preprocessing.latency - Feature extraction time
- postprocessing.latency - Result transformation time
- cache.latency - Cache lookup time
Throughput Metrics
VolumeMonitor request volume and processing rates.
- inference.requests - Total inference requests
- inference.throughput - Requests per second
- batch.size - Average batch size
- queue.size - Pending request queue size
Error Metrics
ReliabilityTrack errors and failures in inference operations.
- inference.errors - Total inference errors
- inference.error_rate - Error percentage
- timeout.errors - Timeout occurrences
- retry.count - Retry attempts
Cache Metrics
EfficiencyMonitor caching performance and efficiency.
- cache.hits - Cache hit count
- cache.misses - Cache miss count
- cache.hit_ratio - Cache hit percentage
- cache.size - Current cache size
Model Metrics
Model InfoTrack model-specific performance and usage.
- model.load.time - Model loading time
- model.memory.usage - Memory consumption
- model.predictions - Total predictions made
- model.version - Model version tracking
System Metrics
InfrastructureMonitor system resources and infrastructure.
- cpu.usage - CPU utilization
- memory.usage - Memory usage
- thread.count - Active thread count
- jvm.metrics - JVM health metrics
Configuring Metrics
Enable and configure metrics collection in your Otter Streams application.
// 1. Configure inference with metrics enabled
InferenceConfig config = InferenceConfig.builder()
.modelConfig(modelConfig)
.enableMetrics(true) // Enable metrics collection
.metricsPrefix("app.ml") // Custom metrics prefix
.build();
// 2. Setup Micrometer registry (e.g., Prometheus)
PrometheusMeterRegistry prometheusRegistry =
new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
// 3. Create metrics collector
MetricsCollector metricsCollector = new MetricsCollector(prometheusRegistry);
// 4. Initialize inference engine with metrics
OnnxInferenceEngine engine = new OnnxInferenceEngine();
engine.initialize(config);
engine.setMetricsCollector(metricsCollector);
// 5. Expose metrics endpoint (for Prometheus)
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
server.createContext("/metrics", httpExchange -> {
String response = prometheusRegistry.scrape();
httpExchange.sendResponseHeaders(200, response.getBytes().length);
try (OutputStream os = httpExchange.getResponseBody()) {
os.write(response.getBytes());
}
});
server.start();
📊 Monitoring Dashboards
Create comprehensive monitoring dashboards with popular monitoring systems.
Supported Monitoring Systems
| System | Integration | Key Features | Setup Difficulty |
|---|---|---|---|
| Prometheus | Pull-based metrics | Time series, alerting, Grafana integration | Easy |
| InfluxDB | Push-based metrics | High throughput, downsampling, retention policies | Medium |
| DataDog | Agent-based | APM, logs, traces, AI-powered insights | Easy |
| New Relic | Agent-based | Full-stack observability, AIOps | Easy |
| Grafana Cloud | Push/Pull metrics | Dashboards, alerting, Loki logs | Medium |
// Example: Prometheus + Grafana Dashboard Queries
// 1. Inference Latency (95th percentile)
histogram_quantile(0.95,
rate(inference_latency_seconds_bucket[5m])
)
// 2. Throughput (requests per second)
rate(inference_requests_total[5m])
// 3. Error Rate Percentage
(inference_errors_total / inference_requests_total) * 100
// 4. Cache Hit Ratio
cache_hits_total / (cache_hits_total + cache_misses_total)
// 5. Batch Size Average
avg_over_time(batch_size[5m])
// 6. Model Performance by Version
sum by (model_version) (inference_requests_total)
🔍 Custom Metrics
Add custom business metrics to track specific application requirements.
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.DistributionSummary;
import io.micrometer.core.instrument.Timer;
// 1. Create custom counters
Counter fraudCounter = Counter.builder("fraud.detections")
.tag("risk_level", "HIGH")
.description("Number of high-risk fraud detections")
.register(meterRegistry);
// 2. Create custom timers
Timer predictionTimer = Timer.builder("custom.prediction.time")
.publishPercentiles(0.5, 0.95, 0.99)
.description("Custom prediction timing")
.register(meterRegistry);
// 3. Create distribution summaries
DistributionSummary predictionSize = DistributionSummary.builder("prediction.size")
.baseUnit("bytes")
.description("Size of prediction payloads")
.register(meterRegistry);
// 4. Use in your application
void processTransaction(Transaction transaction) {
Timer.Sample sample = Timer.start(meterRegistry);
try {
InferenceResult result = engine.infer(extractFeatures(transaction));
// Record custom metrics
if (result.getScore() > 0.8) {
fraudCounter.increment();
}
predictionSize.record(result.getPayloadSize());
} finally {
sample.stop(predictionTimer);
}
}
// 5. Create gauge for business metrics
AtomicInteger activeSessions = new AtomicInteger(0);
Gauge.builder("active.sessions", activeSessions, AtomicInteger::get)
.description("Number of active user sessions")
.register(meterRegistry);
Best Practices for Metrics
// Best Practices Checklist:
✅ 1. Enable metrics in production
InferenceConfig.builder()
.enableMetrics(true)
.build();
✅ 2. Set appropriate metrics prefix
InferenceConfig.builder()
.metricsPrefix("myapp.ml")
.build();
✅ 3. Monitor key SLAs
- P95 latency < 100ms
- Error rate < 0.1%
- Cache hit ratio > 80%
✅ 4. Set up alerts
- Alert on error rate > 1%
- Alert on P99 latency > 500ms
- Alert on cache hit ratio < 50%
✅ 5. Use dimensional metrics
Counter.builder("inference.requests")
.tag("model", "fraud-detector")
.tag("version", "v2.1")
.tag("environment", "production")
.register(registry);
✅ 6. Regular metric review
- Weekly metric analysis
- Capacity planning based on trends
- Performance optimization based on metrics