otter-stream-onnx

Neural Networks

High-performance ONNX Runtime integration for cross-platform neural network inference with GPU acceleration support.

Module Overview

The ONNX module provides production-ready integration with ONNX Runtime, supporting multiple execution providers (CPU, CUDA, TensorRT, DirectML) for high-performance neural network inference in streaming applications.

🎯

ONNX Inference Engine

Neural Networks

Production-ready ONNX Runtime inference with automatic provider selection and optimization.

  • CPU, CUDA, TensorRT, DirectML execution providers
  • Automatic tensor conversion and shape handling
  • Batch inference with configurable sizes
  • Thread-safe session management
🔧

Session Management

Core

Efficient ONNX Runtime session lifecycle management with metadata extraction.

  • InferenceSession - Wrapper for OrtSession
  • Load from file paths or byte arrays
  • Input/output metadata retrieval
  • Automatic resource cleanup

Implementing ONNX Inference

Step-by-step guide to integrate ONNX models into your Flink pipeline.

  1. Add Maven Dependency
    <dependency>
        <groupId>com.codedstreams</groupId>
        <artifactId>otter-stream-onnx</artifactId>
        <version>1.0.16</version>
    </dependency>
  2. Export Your PyTorch/TensorFlow Model to ONNX
    # PyTorch to ONNX
    import torch
    dummy_input = torch.randn(1, 3, 224, 224)
    torch.onnx.export(model, dummy_input, "model.onnx")
    
    # TensorFlow to ONNX
    python -m tf2onnx.convert --saved-model tensorflow_model \
        --output model.onnx
  3. Configure and Initialize Engine
    ModelConfig modelConfig = ModelConfig.builder()
        .modelId("image-classifier")
        .modelPath("s3://my-bucket/models/resnet50.onnx")
        .format(ModelFormat.ONNX)
        .modelOptions(Map.of(
            "providers", List.of("CUDAExecutionProvider", "CPUExecutionProvider"),
            "intra_op_num_threads", 4
        ))
        .build();
    
    OnnxInferenceEngine engine = new OnnxInferenceEngine();
    engine.initialize(modelConfig);
  4. Perform Inference
    Map inputs = new HashMap<>();
    inputs.put("input", imagePixels);  // float[] array
    
    InferenceResult result = engine.infer(inputs);
    float[] predictions = result.getOutput("output");

Supported Execution Providers

Provider Platform Hardware Performance
CUDAExecutionProvider Linux, Windows NVIDIA GPU Highest
TensorRTExecutionProvider Linux, Windows NVIDIA GPU Highest (Optimized)
CPUExecutionProvider All CPU Good
DirectMLExecutionProvider Windows GPU (AMD/Intel/NVIDIA) High

Maven Dependency

<dependency>
    <groupId>com.codedstreams</groupId>
    <artifactId>otter-stream-onnx</artifactId>
    <version>1.0.16</version>
</dependency>