Class OnnxInferenceEngine
- java.lang.Object
-
- com.codedstream.otterstream.inference.engine.LocalInferenceEngine<ai.onnxruntime.OrtSession>
-
- com.codedstream.otterstream.onnx.OnnxInferenceEngine
-
- All Implemented Interfaces:
InferenceEngine<ai.onnxruntime.OrtSession>
public class OnnxInferenceEngine extends LocalInferenceEngine<ai.onnxruntime.OrtSession>
ONNX Runtime implementation ofLocalInferenceEnginefor ML inference.This engine provides inference capabilities for ONNX models using the ONNX Runtime library. It supports both single and batch inference with comprehensive type handling for various tensor data types (float, int, long, double, string, boolean).
Supported Features:
- Single & Batch Inference: Process individual inputs or batches
- Multiple Data Types: Float, Int, Long, Double, String, Boolean tensors
- Thread Optimization: Configurable inter/intra-op threads
- Automatic Cleanup: Proper resource management and cleanup
- Shape Validation: Optional tensor shape validation
Performance Configuration:
ModelConfig config = ModelConfig.builder() .modelPath("model.onnx") .modelOption("interOpThreads", 2) .modelOption("intraOpThreads", 4) .modelOption("optimizationLevel", "all") .build(); OnnxInferenceEngine engine = new OnnxInferenceEngine(); engine.initialize(config);Inference Example:
Map<String, Object> input = new HashMap<>(); input.put("input_ids", new int[]{1, 2, 3, 4}); input.put("attention_mask", new int[]{1, 1, 1, 1}); InferenceResult result = engine.infer(input); float[] predictions = result.getOutput("logits");Batch Inference:
Map<String, Object>[] batch = new Map[32]; // ... populate batch InferenceResult batchResult = engine.inferBatch(batch); // Process batch outputsTensor Type Support:
Java Type ONNX Type Supported Shapes float[] FLOAT 1D, 2D arrays int[] INT32 1D, 2D arrays long[] INT64 1D, 2D arrays double[] DOUBLE 1D, 2D arrays String[] STRING 1D, 2D arrays boolean[] BOOL 1D, 2D arrays Thread Safety:
This class is not thread-safe for concurrent inference calls. For multi-threaded scenarios, create separate engine instances or synchronize access to
infer(java.util.Map<java.lang.String, java.lang.Object>)andinferBatch(java.util.Map<java.lang.String, java.lang.Object>[])methods.Resource Management:
Always call
close()when finished with the engine to release native resources. The engine implementsAutoCloseablefor use with try-with-resources.- Since:
- 1.0.0
- Author:
- Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams
- See Also:
LocalInferenceEngine,InferenceSession,OnnxModelLoader
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine
InferenceEngine.EngineCapabilities
-
-
Field Summary
-
Fields inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine
initialized, loadedModel, modelConfig, modelLoader
-
-
Constructor Summary
Constructors Constructor Description OnnxInferenceEngine()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes the engine and releases all native resources.InferenceEngine.EngineCapabilitiesgetCapabilities()Gets the engine's capabilities.ModelMetadatagetMetadata()Gets metadata about the loaded model.InferenceResultinfer(Map<String,Object> inputs)Performs single inference on the provided inputs.InferenceResultinferBatch(Map<String,Object>[] batchInputs)Performs batch inference on multiple input sets.voidinitialize(ModelConfig config)Initializes the ONNX inference engine with the provided configuration.-
Methods inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine
getModelConfig, isReady, loadModelDirectly
-
-
-
-
Method Detail
-
initialize
public void initialize(ModelConfig config) throws InferenceException
Initializes the ONNX inference engine with the provided configuration.- Specified by:
initializein interfaceInferenceEngine<ai.onnxruntime.OrtSession>- Overrides:
initializein classLocalInferenceEngine<ai.onnxruntime.OrtSession>- Parameters:
config- model configuration containing path and runtime options- Throws:
InferenceException- if initialization fails
-
infer
public InferenceResult infer(Map<String,Object> inputs) throws InferenceException
Performs single inference on the provided inputs.- Specified by:
inferin interfaceInferenceEngine<ai.onnxruntime.OrtSession>- Specified by:
inferin classLocalInferenceEngine<ai.onnxruntime.OrtSession>- Parameters:
inputs- map of input names to values (arrays of supported types)- Returns:
- inference result containing outputs and timing information
- Throws:
InferenceException- if inference fails
-
inferBatch
public InferenceResult inferBatch(Map<String,Object>[] batchInputs) throws InferenceException
Performs batch inference on multiple input sets.All inputs in the batch must have identical structure (same input names and compatible data types). This method is optimized for throughput by processing multiple inputs in a single ONNX Runtime call.
- Specified by:
inferBatchin interfaceInferenceEngine<ai.onnxruntime.OrtSession>- Specified by:
inferBatchin classLocalInferenceEngine<ai.onnxruntime.OrtSession>- Parameters:
batchInputs- array of input maps, each representing one sample- Returns:
- inference result containing batch outputs
- Throws:
InferenceException- if batch inference fails
-
getCapabilities
public InferenceEngine.EngineCapabilities getCapabilities()
Gets the engine's capabilities.- Specified by:
getCapabilitiesin interfaceInferenceEngine<ai.onnxruntime.OrtSession>- Specified by:
getCapabilitiesin classLocalInferenceEngine<ai.onnxruntime.OrtSession>- Returns:
- engine capabilities including batch support and max batch size
-
close
public void close() throws InferenceExceptionCloses the engine and releases all native resources.- Specified by:
closein interfaceInferenceEngine<ai.onnxruntime.OrtSession>- Overrides:
closein classLocalInferenceEngine<ai.onnxruntime.OrtSession>- Throws:
InferenceException- if resource cleanup fails
-
getMetadata
public ModelMetadata getMetadata()
Gets metadata about the loaded model.- Returns:
- model metadata (currently returns null, override for implementation)
-
-