Metrics & Monitoring

Observability

Comprehensive metrics collection and monitoring for ML inference operations in streaming applications.

Metrics Overview

Otter Streams provides comprehensive metrics collection using Micrometer for monitoring inference operations, performance, and system health. All metrics are exposed through standard monitoring systems.

⏱️

Latency Metrics

Performance

Track inference latency at various stages of the pipeline.

  • inference.latency - Total inference time
  • preprocessing.latency - Feature extraction time
  • postprocessing.latency - Result transformation time
  • cache.latency - Cache lookup time
📈

Throughput Metrics

Volume

Monitor request volume and processing rates.

  • inference.requests - Total inference requests
  • inference.throughput - Requests per second
  • batch.size - Average batch size
  • queue.size - Pending request queue size
⚠️

Error Metrics

Reliability

Track errors and failures in inference operations.

  • inference.errors - Total inference errors
  • inference.error_rate - Error percentage
  • timeout.errors - Timeout occurrences
  • retry.count - Retry attempts
💾

Cache Metrics

Efficiency

Monitor caching performance and efficiency.

  • cache.hits - Cache hit count
  • cache.misses - Cache miss count
  • cache.hit_ratio - Cache hit percentage
  • cache.size - Current cache size
🧠

Model Metrics

Model Info

Track model-specific performance and usage.

  • model.load.time - Model loading time
  • model.memory.usage - Memory consumption
  • model.predictions - Total predictions made
  • model.version - Model version tracking

System Metrics

Infrastructure

Monitor system resources and infrastructure.

  • cpu.usage - CPU utilization
  • memory.usage - Memory usage
  • thread.count - Active thread count
  • jvm.metrics - JVM health metrics

Configuring Metrics

Enable and configure metrics collection in your Otter Streams application.

// 1. Configure inference with metrics enabled
InferenceConfig config = InferenceConfig.builder()
    .modelConfig(modelConfig)
    .enableMetrics(true)           // Enable metrics collection
    .metricsPrefix("app.ml")       // Custom metrics prefix
    .build();

// 2. Setup Micrometer registry (e.g., Prometheus)
PrometheusMeterRegistry prometheusRegistry =
    new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);

// 3. Create metrics collector
MetricsCollector metricsCollector = new MetricsCollector(prometheusRegistry);

// 4. Initialize inference engine with metrics
OnnxInferenceEngine engine = new OnnxInferenceEngine();
engine.initialize(config);
engine.setMetricsCollector(metricsCollector);

// 5. Expose metrics endpoint (for Prometheus)
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
server.createContext("/metrics", httpExchange -> {
    String response = prometheusRegistry.scrape();
    httpExchange.sendResponseHeaders(200, response.getBytes().length);
    try (OutputStream os = httpExchange.getResponseBody()) {
        os.write(response.getBytes());
    }
});
server.start();

📊 Monitoring Dashboards

Create comprehensive monitoring dashboards with popular monitoring systems.

Supported Monitoring Systems

System Integration Key Features Setup Difficulty
Prometheus Pull-based metrics Time series, alerting, Grafana integration Easy
InfluxDB Push-based metrics High throughput, downsampling, retention policies Medium
DataDog Agent-based APM, logs, traces, AI-powered insights Easy
New Relic Agent-based Full-stack observability, AIOps Easy
Grafana Cloud Push/Pull metrics Dashboards, alerting, Loki logs Medium
// Example: Prometheus + Grafana Dashboard Queries

// 1. Inference Latency (95th percentile)
histogram_quantile(0.95,
    rate(inference_latency_seconds_bucket[5m])
)

// 2. Throughput (requests per second)
rate(inference_requests_total[5m])

// 3. Error Rate Percentage
(inference_errors_total / inference_requests_total) * 100

// 4. Cache Hit Ratio
cache_hits_total / (cache_hits_total + cache_misses_total)

// 5. Batch Size Average
avg_over_time(batch_size[5m])

// 6. Model Performance by Version
sum by (model_version) (inference_requests_total)

🔍 Custom Metrics

Add custom business metrics to track specific application requirements.

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.DistributionSummary;
import io.micrometer.core.instrument.Timer;

// 1. Create custom counters
Counter fraudCounter = Counter.builder("fraud.detections")
    .tag("risk_level", "HIGH")
    .description("Number of high-risk fraud detections")
    .register(meterRegistry);

// 2. Create custom timers
Timer predictionTimer = Timer.builder("custom.prediction.time")
    .publishPercentiles(0.5, 0.95, 0.99)
    .description("Custom prediction timing")
    .register(meterRegistry);

// 3. Create distribution summaries
DistributionSummary predictionSize = DistributionSummary.builder("prediction.size")
    .baseUnit("bytes")
    .description("Size of prediction payloads")
    .register(meterRegistry);

// 4. Use in your application
void processTransaction(Transaction transaction) {
    Timer.Sample sample = Timer.start(meterRegistry);

    try {
        InferenceResult result = engine.infer(extractFeatures(transaction));

        // Record custom metrics
        if (result.getScore() > 0.8) {
            fraudCounter.increment();
        }

        predictionSize.record(result.getPayloadSize());

    } finally {
        sample.stop(predictionTimer);
    }
}

// 5. Create gauge for business metrics
AtomicInteger activeSessions = new AtomicInteger(0);
Gauge.builder("active.sessions", activeSessions, AtomicInteger::get)
    .description("Number of active user sessions")
    .register(meterRegistry);

Best Practices for Metrics

// Best Practices Checklist:

✅ 1. Enable metrics in production
InferenceConfig.builder()
    .enableMetrics(true)
    .build();

✅ 2. Set appropriate metrics prefix
InferenceConfig.builder()
    .metricsPrefix("myapp.ml")
    .build();

✅ 3. Monitor key SLAs
- P95 latency < 100ms
- Error rate < 0.1%
- Cache hit ratio > 80%

✅ 4. Set up alerts
- Alert on error rate > 1%
- Alert on P99 latency > 500ms
- Alert on cache hit ratio < 50%

✅ 5. Use dimensional metrics
Counter.builder("inference.requests")
    .tag("model", "fraud-detector")
    .tag("version", "v2.1")
    .tag("environment", "production")
    .register(registry);

✅ 6. Regular metric review
- Weekly metric analysis
- Capacity planning based on trends
- Performance optimization based on metrics