java.lang.Object
- com.codedstream.otterstream.inference.engine.LocalInferenceEngine<ai.onnxruntime.OrtSession>
- - com.codedstream.otterstream.onnx.OnnxInferenceEngine

All Implemented Interfaces:: InferenceEngine<ai.onnxruntime.OrtSession>

public class OnnxInferenceEngine
extends LocalInferenceEngine<ai.onnxruntime.OrtSession>

ONNX Runtime implementation of LocalInferenceEngine for ML inference.

This engine provides inference capabilities for ONNX models using the ONNX Runtime library. It supports both single and batch inference with comprehensive type handling for various tensor data types (float, int, long, double, string, boolean).

Supported Features:

Single & Batch Inference: Process individual inputs or batches
Multiple Data Types: Float, Int, Long, Double, String, Boolean tensors
Thread Optimization: Configurable inter/intra-op threads
Automatic Cleanup: Proper resource management and cleanup
Shape Validation: Optional tensor shape validation

Performance Configuration:


 ModelConfig config = ModelConfig.builder()
     .modelPath("model.onnx")
     .modelOption("interOpThreads", 2)
     .modelOption("intraOpThreads", 4)
     .modelOption("optimizationLevel", "all")
     .build();

 OnnxInferenceEngine engine = new OnnxInferenceEngine();
 engine.initialize(config);

Inference Example:


 Map<String, Object> input = new HashMap<>();
 input.put("input_ids", new int[]{1, 2, 3, 4});
 input.put("attention_mask", new int[]{1, 1, 1, 1});

 InferenceResult result = engine.infer(input);
 float[] predictions = result.getOutput("logits");

Batch Inference:


 Map<String, Object>[] batch = new Map[32];
 // ... populate batch

 InferenceResult batchResult = engine.inferBatch(batch);
 // Process batch outputs

Tensor Type Support:

Java Type	ONNX Type	Supported Shapes
float[]	FLOAT	1D, 2D arrays
int[]	INT32	1D, 2D arrays
long[]	INT64	1D, 2D arrays
double[]	DOUBLE	1D, 2D arrays
String[]	STRING	1D, 2D arrays
boolean[]	BOOL	1D, 2D arrays

Thread Safety:

This class is not thread-safe for concurrent inference calls. For multi-threaded scenarios, create separate engine instances or synchronize access to infer(java.util.Map<java.lang.String, java.lang.Object>) and inferBatch(java.util.Map<java.lang.String, java.lang.Object>[]) methods.

Resource Management:

Always call close() when finished with the engine to release native resources. The engine implements AutoCloseable for use with try-with-resources.

Since:: 1.0.0
Author:: Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams
See Also:: LocalInferenceEngine, InferenceSession, OnnxModelLoader

Nested Class Summary
- Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine
  InferenceEngine.EngineCapabilities

Field Summary
- Fields inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine
  initialized, loadedModel, modelConfig, modelLoader

Constructor Summary

Constructors
Constructor Description

OnnxInferenceEngine()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`close()`	Closes the engine and releases all native resources.
`InferenceEngine.EngineCapabilities`	`getCapabilities()`	Gets the engine's capabilities.
`ModelMetadata`	`getMetadata()`	Gets metadata about the loaded model.
`InferenceResult`	`infer(Map<String,Object> inputs)`	Performs single inference on the provided inputs.
`InferenceResult`	`inferBatch(Map<String,Object>[] batchInputs)`	Performs batch inference on multiple input sets.
`void`	`initialize(ModelConfig config)`	Initializes the ONNX inference engine with the provided configuration.

Methods inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine
getModelConfig, isReady, loadModelDirectly

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - OnnxInferenceEngine
```
public OnnxInferenceEngine()
```
- Method Detail
  - initialize
```
public void initialize(ModelConfig config)
                throws InferenceException
```
    Initializes the ONNX inference engine with the provided configuration.
    
    Specified by:
    
    initialize in interface InferenceEngine<ai.onnxruntime.OrtSession>
    
    Overrides:
    
    initialize in class LocalInferenceEngine<ai.onnxruntime.OrtSession>
    
    Parameters:
    
    config - model configuration containing path and runtime options
    
    Throws:
    
    InferenceException - if initialization fails
  - infer
```
public InferenceResult infer(Map<String,Object> inputs)
                      throws InferenceException
```
    Performs single inference on the provided inputs.
    
    Specified by:
    
    infer in interface InferenceEngine<ai.onnxruntime.OrtSession>
    
    Specified by:
    
    infer in class LocalInferenceEngine<ai.onnxruntime.OrtSession>
    
    Parameters:
    
    inputs - map of input names to values (arrays of supported types)
    
    Returns:
    
    inference result containing outputs and timing information
    
    Throws:
    
    InferenceException - if inference fails
  - inferBatch
```
public InferenceResult inferBatch(Map<String,Object>[] batchInputs)
                           throws InferenceException
```
    Performs batch inference on multiple input sets.
    All inputs in the batch must have identical structure (same input names and compatible data types). This method is optimized for throughput by processing multiple inputs in a single ONNX Runtime call.
    
    Specified by:
    
    inferBatch in interface InferenceEngine<ai.onnxruntime.OrtSession>
    
    Specified by:
    
    inferBatch in class LocalInferenceEngine<ai.onnxruntime.OrtSession>
    
    Parameters:
    
    batchInputs - array of input maps, each representing one sample
    
    Returns:
    
    inference result containing batch outputs
    
    Throws:
    
    InferenceException - if batch inference fails
  - getCapabilities
```
public InferenceEngine.EngineCapabilities getCapabilities()
```
    Gets the engine's capabilities.
    
    Specified by:
    
    getCapabilities in interface InferenceEngine<ai.onnxruntime.OrtSession>
    
    Specified by:
    
    getCapabilities in class LocalInferenceEngine<ai.onnxruntime.OrtSession>
    
    Returns:
    
    engine capabilities including batch support and max batch size
  - close
```
public void close()
           throws InferenceException
```
    Closes the engine and releases all native resources.
    
    Specified by:
    
    close in interface InferenceEngine<ai.onnxruntime.OrtSession>
    
    Overrides:
    
    close in class LocalInferenceEngine<ai.onnxruntime.OrtSession>
    
    Throws:
    
    InferenceException - if resource cleanup fails
  - getMetadata
```
public ModelMetadata getMetadata()
```
    Gets metadata about the loaded model.
    
    Returns:
    
    model metadata (currently returns null, override for implementation)

Class OnnxInferenceEngine

Supported Features:

Performance Configuration:

Inference Example:

Batch Inference:

Tensor Type Support:

Thread Safety:

Resource Management:

Nested Class Summary

Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine

Field Summary

Fields inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine

Constructor Summary

Method Summary

Methods inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine

Methods inherited from class java.lang.Object

Constructor Detail

OnnxInferenceEngine

Method Detail

initialize

infer

inferBatch

getCapabilities

close

getMetadata