ml-inference-core

Foundation Module

The foundation module providing core abstractions, configurations, and utilities for all ML inference operations in Otter Streams.

Module Overview

The Core module provides the essential building blocks for ML inference in streaming applications. It includes configuration management, engine abstractions, caching, metrics, and Flink integration utilities.

⚙️

Configuration

Core

Comprehensive configuration system for inference operations, model settings, and engine options.

  • InferenceConfig - Configure batch size, timeouts, retries, and metrics
  • ModelConfig - Define model paths, formats, and authentication
  • AuthConfig - API keys, tokens, and custom headers for remote endpoints
🧠

Inference Engines

Core

Abstract base classes and interfaces for building inference engine implementations.

  • InferenceEngine - Core interface for all engines
  • LocalInferenceEngine - Base class for local model execution
  • EngineCapabilities - Define engine features and limits
💾

Caching

Performance

High-performance caching for inference results using Caffeine library.

  • ModelCache - Thread-safe LRU cache with time expiration
  • CacheStrategy - INPUT_HASH, MODEL_OUTPUT, FEATURE_BASED
  • Automatic eviction based on size and time constraints
📊

Metrics & Monitoring

Observability

Comprehensive metrics collection using Micrometer for monitoring inference operations.

  • InferenceMetrics - Latency, throughput, error rates
  • MetricsCollector - Central metrics management
  • Prometheus, InfluxDB, DataDog, New Relic support
🔄

Async Functions

Flink Integration

Asynchronous Flink functions for non-blocking ML inference in streaming pipelines.

  • AsyncModelInferenceFunction - Non-blocking async I/O
  • Customizable feature extraction and result transformation
  • Automatic retry and timeout handling
📝

Model Metadata

Model Info

Rich metadata containers for ML models including schema, format, and versioning.

  • ModelMetadata - Name, version, format, schemas
  • ModelFormat - ONNX, TensorFlow, PyTorch, XGBoost, PMML
  • InferenceResult - Predictions, timing, success status

🚀 Quick Start: Core Module

Basic example showing how to configure and use the core inference components.

// 1. Configure your model
ModelConfig modelConfig = ModelConfig.builder()
    .modelId("sentiment-analyzer")
    .modelPath("/models/sentiment.onnx")
    .format(ModelFormat.ONNX)
    .modelVersion("2.0")
    .build();

// 2. Configure inference settings
InferenceConfig inferenceConfig = InferenceConfig.builder()
    .modelConfig(modelConfig)
    .batchSize(32)                    // Process 32 records at once
    .timeout(Duration.ofSeconds(30))   // 30 second timeout
    .maxRetries(3)                     // Retry up to 3 times
    .enableMetrics(true)               // Collect performance metrics
    .build();

// 3. Create inference function with engine factory
Function> engineFactory =
    cfg -> new OnnxInferenceEngine();

AsyncModelInferenceFunction, InferenceResult> inferenceFunction =
    new AsyncModelInferenceFunction<>(inferenceConfig, engineFactory);

// 4. Apply to Flink stream
DataStream predictions = AsyncDataStream.unorderedWait(
    inputStream,
    inferenceFunction,
    30000,                   // timeout in milliseconds
    TimeUnit.MILLISECONDS,
    100                      // max concurrent async requests
);

Maven Dependency

<dependency>
    <groupId>com.codedstreams</groupId>
    <artifactId>ml-inference-core</artifactId>
    <version>1.0.16</version>
</dependency>