Class OnnxInferenceEngine

  • All Implemented Interfaces:
    InferenceEngine<ai.onnxruntime.OrtSession>

    public class OnnxInferenceEngine
    extends LocalInferenceEngine<ai.onnxruntime.OrtSession>
    ONNX Runtime implementation of LocalInferenceEngine for ML inference.

    This engine provides inference capabilities for ONNX models using the ONNX Runtime library. It supports both single and batch inference with comprehensive type handling for various tensor data types (float, int, long, double, string, boolean).

    Supported Features:

    • Single & Batch Inference: Process individual inputs or batches
    • Multiple Data Types: Float, Int, Long, Double, String, Boolean tensors
    • Thread Optimization: Configurable inter/intra-op threads
    • Automatic Cleanup: Proper resource management and cleanup
    • Shape Validation: Optional tensor shape validation

    Performance Configuration:

    
     ModelConfig config = ModelConfig.builder()
         .modelPath("model.onnx")
         .modelOption("interOpThreads", 2)
         .modelOption("intraOpThreads", 4)
         .modelOption("optimizationLevel", "all")
         .build();
    
     OnnxInferenceEngine engine = new OnnxInferenceEngine();
     engine.initialize(config);
     

    Inference Example:

    
     Map<String, Object> input = new HashMap<>();
     input.put("input_ids", new int[]{1, 2, 3, 4});
     input.put("attention_mask", new int[]{1, 1, 1, 1});
    
     InferenceResult result = engine.infer(input);
     float[] predictions = result.getOutput("logits");
     

    Batch Inference:

    
     Map<String, Object>[] batch = new Map[32];
     // ... populate batch
    
     InferenceResult batchResult = engine.inferBatch(batch);
     // Process batch outputs
     

    Tensor Type Support:

    Java TypeONNX TypeSupported Shapes
    float[]FLOAT1D, 2D arrays
    int[]INT321D, 2D arrays
    long[]INT641D, 2D arrays
    double[]DOUBLE1D, 2D arrays
    String[]STRING1D, 2D arrays
    boolean[]BOOL1D, 2D arrays

    Thread Safety:

    This class is not thread-safe for concurrent inference calls. For multi-threaded scenarios, create separate engine instances or synchronize access to infer(java.util.Map<java.lang.String, java.lang.Object>) and inferBatch(java.util.Map<java.lang.String, java.lang.Object>[]) methods.

    Resource Management:

    Always call close() when finished with the engine to release native resources. The engine implements AutoCloseable for use with try-with-resources.

    Since:
    1.0.0
    Author:
    Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams
    See Also:
    LocalInferenceEngine, InferenceSession, OnnxModelLoader