Architecture Overview - Otter Streams SDK

🏗️ System Architecture Overview

Figure 1: High-level system architecture showing complete inference pipeline

Architecture Principles

Otter Streams is built on these core architectural principles:

Modular Design: Each component is independent and replaceable
Async-First: Non-blocking operations for maximum throughput
Extensible Framework: Easy to add new model formats and inference engines
Production-Ready: Built-in monitoring, caching, and fault tolerance
Resource Efficient: Intelligent batching and memory management

🧩 Module Architecture

Figure 2: Module dependencies and relationships

Core

ml-inference-core

Foundation for all inference operations with async processing, caching, and configuration management.

Neural Networks

otter-stream-onnx

Universal model format support with ONNX Runtime integration and GPU acceleration.

SavedModel

otter-stream-tensorflow

Native TensorFlow SavedModel execution with automatic signature discovery.

TorchScript

otter-stream-pytorch

PyTorch model inference via Deep Java Library with auto GPU detection.

Tabular Data

otter-stream-xgboost

Gradient boosting inference for tabular data using XGBoost4J.

XML Standard

otter-stream-pmml

PMML support via JPMML for portable model deployment across platforms.

🎯 Engine Architecture

Figure 3: ONNX Runtime engine architecture and workflow

TensorFlow Engine Architecture

Figure 4: TensorFlow SavedModel execution architecture

📊 Data Flow Architecture

Figure 5: End-to-end data flow from input to output

Fraud Detection Data Flow

Figure 6: Real-time fraud detection pipeline sequence

🚀 Use Case Architecture

Recommendation System

Architecture Features:

Real-time user behavior processing
Personalization engine integration
A/B testing framework
Content ranking service

Anomaly Detection

Architecture Features:

Time-series window processing
Multiple detection algorithms
Automatic threshold adjustment
Real-time alerting system

📈 Monitoring Architecture

Figure 7: Complete monitoring and observability architecture

Metrics Collection Architecture

Figure 8: Metrics collection and dashboard architecture

// Monitoring Configuration Example
InferenceConfig.builder()
    .enableMetrics(true)
    .metricsPrefix("myapp.ml.inference")
    .collectLatencyMetrics(true)
    .collectThroughputMetrics(true)
    .collectErrorMetrics(true)
    .collectCacheMetrics(true)
    .metricsExportInterval(Duration.ofSeconds(30))
    .build();

⚡ Performance Architecture

Optimization Features

Async Processing

Non-blocking I/O operations with configurable parallelism and backpressure control.

Intelligent Batching

Dynamic batching based on load with timeout and size-based triggers.

Multi-Level Caching

Model, result, and feature caching with TTL and eviction policies.

Resource Pooling

Efficient reuse of model sessions and connection pools for reduced overhead.

// Performance Configuration Example
InferenceConfig.builder()
    .modelConfig(modelConfig)
    .batchSize(32)                     // Optimal batch size
    .batchTimeout(Duration.ofMillis(100)) // Max wait time
    .enableCaching(true)               // Enable result caching
    .cacheSize(10000)                  // Cache entries
    .cacheTtl(Duration.ofMinutes(10))  // Cache TTL
    .parallelism(4)                    // Concurrent instances
    .maxConcurrentRequests(100)        // Per instance limit
    .queueSize(1000)                   // Request queue size
    .build();