ml-inference-core
The foundation module providing core abstractions, configurations, and utilities for all ML inference operations in Otter Streams.
Module Overview
The Core module provides the essential building blocks for ML inference in streaming applications. It includes configuration management, engine abstractions, caching, metrics, and Flink integration utilities.
Configuration
CoreComprehensive configuration system for inference operations, model settings, and engine options.
- InferenceConfig - Configure batch size, timeouts, retries, and metrics
- ModelConfig - Define model paths, formats, and authentication
- AuthConfig - API keys, tokens, and custom headers for remote endpoints
Inference Engines
CoreAbstract base classes and interfaces for building inference engine implementations.
- InferenceEngine - Core interface for all engines
- LocalInferenceEngine - Base class for local model execution
- EngineCapabilities - Define engine features and limits
Caching
PerformanceHigh-performance caching for inference results using Caffeine library.
- ModelCache - Thread-safe LRU cache with time expiration
- CacheStrategy - INPUT_HASH, MODEL_OUTPUT, FEATURE_BASED
- Automatic eviction based on size and time constraints
Metrics & Monitoring
ObservabilityComprehensive metrics collection using Micrometer for monitoring inference operations.
- InferenceMetrics - Latency, throughput, error rates
- MetricsCollector - Central metrics management
- Prometheus, InfluxDB, DataDog, New Relic support
Async Functions
Flink IntegrationAsynchronous Flink functions for non-blocking ML inference in streaming pipelines.
- AsyncModelInferenceFunction - Non-blocking async I/O
- Customizable feature extraction and result transformation
- Automatic retry and timeout handling
Model Metadata
Model InfoRich metadata containers for ML models including schema, format, and versioning.
- ModelMetadata - Name, version, format, schemas
- ModelFormat - ONNX, TensorFlow, PyTorch, XGBoost, PMML
- InferenceResult - Predictions, timing, success status
🚀 Quick Start: Core Module
Basic example showing how to configure and use the core inference components.
// 1. Configure your model
ModelConfig modelConfig = ModelConfig.builder()
.modelId("sentiment-analyzer")
.modelPath("/models/sentiment.onnx")
.format(ModelFormat.ONNX)
.modelVersion("2.0")
.build();
// 2. Configure inference settings
InferenceConfig inferenceConfig = InferenceConfig.builder()
.modelConfig(modelConfig)
.batchSize(32) // Process 32 records at once
.timeout(Duration.ofSeconds(30)) // 30 second timeout
.maxRetries(3) // Retry up to 3 times
.enableMetrics(true) // Collect performance metrics
.build();
// 3. Create inference function with engine factory
Function> engineFactory =
cfg -> new OnnxInferenceEngine();
AsyncModelInferenceFunction
Maven Dependency
<dependency>
<groupId>com.codedstreams</groupId>
<artifactId>ml-inference-core</artifactId>
<version>1.0.16</version>
</dependency>