java.lang.Object
- com.codedstream.otterstream.inference.engine.LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
- - com.codedstream.otterstream.pytorch.TorchScriptInferenceEngine

All Implemented Interfaces:: InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>

public class TorchScriptInferenceEngine
extends LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>

TorchScript (PyTorch) implementation of LocalInferenceEngine using Deep Java Library (DJL).

This engine provides inference capabilities for PyTorch models saved in TorchScript format. It leverages DJL's PyTorch engine to load and execute models with automatic GPU acceleration when available. The engine handles PyTorch's dynamic graph execution and tensor operations.

Supported PyTorch Features:

TorchScript Models: PyTorch models exported via torch.jit.trace or torch.jit.script
Data Types: Float and integer tensors with automatic dimension handling
Batch Dimension: Automatic addition of batch dimension via expandDims(0)
GPU Acceleration: Automatic GPU detection and execution when available

Model Loading:


 ModelConfig config = ModelConfig.builder()
     .modelPath("model.pt")  // TorchScript model file
     .modelId("pytorch-model")
     .build();

 TorchScriptInferenceEngine engine = new TorchScriptInferenceEngine();
 engine.initialize(config);

Inference Example:


 Map<String, Object> inputs = new HashMap<>();
 inputs.put("input1", new float[]{0.1f, 0.2f, 0.3f});
 inputs.put("input2", new int[]{1, 2, 3});

 InferenceResult result = engine.infer(inputs);
 float[] predictions = (float[]) result.getOutput("output_0");

Input Processing:

The TorchScriptInferenceEngine.MapTranslator automatically processes inputs:

float[]: Converts to FloatTensor with added batch dimension
int[]: Converts to IntTensor with added batch dimension
Dimension Expansion: Adds batch dimension via expandDims(0)

Output Processing:

Outputs are automatically converted back to Java arrays:

Tensor to Array: Converts DJL NDArrays to float arrays
Named Outputs: Generates output names as "output_0", "output_1", etc.
Type Preservation: Maintains original tensor data types

Capabilities:

Feature	Supported	Notes
Batch Inference	Yes	Via batch dimension in tensors
Native Batching	Yes	Through DJL's batch processing
Max Batch Size	64	Configurable based on memory
GPU Support	Yes	Automatic CUDA detection via DJL
Dynamic Graphs	Yes	Supports TorchScript dynamic execution

Dependencies:

 Requires DJL PyTorch engine:
 - ai.djl:api (runtime)
 - ai.djl.pytorch:pytorch-engine (runtime)
 - ai.djl.pytorch:pytorch-native-auto (runtime)

Performance Features:

Automatic GPU: DJL automatically uses GPU if CUDA is available
Memory Management: NDManager for efficient tensor lifecycle
Batch Optimization: Native batch processing through tensor operations
Model Caching: DJL caches loaded models for repeated use

Thread Safety:

DJL Predictor instances are not thread-safe. For concurrent inference:

Create separate engine instances per thread
Use Predictor pooling for high-throughput scenarios
Synchronize access to infer(java.util.Map<java.lang.String, java.lang.Object>) method

Resource Management:

Always call close() to release native resources (GPU memory, file handles). The engine implements AutoCloseable for use with try-with-resources:


 try (TorchScriptInferenceEngine engine = new TorchScriptInferenceEngine()) {
     engine.initialize(config);
     InferenceResult result = engine.infer(inputs);
 }

Since:: 1.0.0
Author:: Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams
See Also:: LocalInferenceEngine, Predictor, ZooModel, Deep Java Library Documentation

Nested Class Summary
- Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine
  InferenceEngine.EngineCapabilities

Field Summary
- Fields inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine
  initialized, loadedModel, modelConfig, modelLoader

Constructor Summary

Constructors
Constructor Description

TorchScriptInferenceEngine()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`close()`	Closes the engine and releases all DJL and native resources.
`InferenceEngine.EngineCapabilities`	`getCapabilities()`	Gets the engine's capabilities for PyTorch inference.
`ModelMetadata`	`getMetadata()`	Gets metadata about the loaded PyTorch model.
`InferenceResult`	`infer(Map<String,Object> inputs)`	Performs single inference on the provided inputs using the PyTorch model.
`InferenceResult`	`inferBatch(Map<String,Object>[] batchInputs)`	Batch inference implementation.
`void`	`initialize(ModelConfig config)`	Initializes the PyTorch inference engine by loading a TorchScript model.

Methods inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine
getModelConfig, isReady, loadModelDirectly

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - TorchScriptInferenceEngine
```
public TorchScriptInferenceEngine()
```
- Method Detail
  - initialize
```
public void initialize(ModelConfig config)
                throws InferenceException
```
    Initializes the PyTorch inference engine by loading a TorchScript model.
    The initialization process:
    
    Creates NDManager for tensor memory management
    
    Builds Criteria for model loading with Map-based I/O
    
    Configures TorchScriptInferenceEngine.MapTranslator for input/output processing
    
    Loads model using DJL's Criteria.loadModel()
    
    Creates Predictor for inference execution
    
    DJL Automatic Features:
    
    Engine Selection: Automatically selects PyTorch engine
    
    GPU Detection: Uses CUDA if available, falls back to CPU
    
    Native Libraries: Loads PyTorch native libraries automatically
    Specified by:
    
    initialize in interface InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Overrides:
    
    initialize in class LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Parameters:
    
    config - model configuration containing TorchScript model path
    
    Throws:
    
    InferenceException - if model loading fails or DJL is not properly configured
  - infer
```
public InferenceResult infer(Map<String,Object> inputs)
                      throws InferenceException
```
    Performs single inference on the provided inputs using the PyTorch model.
    The inference process:
    
    Inputs are processed by TorchScriptInferenceEngine.MapTranslator.processInput(ai.djl.translate.TranslatorContext, java.util.Map<java.lang.String, java.lang.Object>)
    
    Converted to DJL NDList with batch dimensions
    
    Executed through PyTorch engine (GPU if available)
    
    Outputs converted back to Map via TorchScriptInferenceEngine.MapTranslator.processOutput(ai.djl.translate.TranslatorContext, ai.djl.ndarray.NDList)
    
    Input Requirements:
    
    float[] arrays: Converted to Float32 tensors
    
    int[] arrays: Converted to Int32 tensors
    
    Batch Dimension: Automatically added (expandDims(0))
    
    Output Format:
    
    Outputs are named sequentially as "output_0", "output_1", etc., containing float arrays extracted from output tensors.
    Specified by:
    
    infer in interface InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Specified by:
    
    infer in class LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Parameters:
    
    inputs - map of input names to arrays (float[] or int[])
    
    Returns:
    
    inference result containing predictions and timing
    
    Throws:
    
    InferenceException - if inference fails or inputs are invalid
  - inferBatch
```
public InferenceResult inferBatch(Map<String,Object>[] batchInputs)
                           throws InferenceException
```
    Batch inference implementation.
    TODO: Implement batch inference for PyTorch models. Potential approaches:
    
    Stack individual tensors into batch tensors
    
    Use DJL's batch predictor capabilities
    
    Implement custom batch translator
    
    Leverage PyTorch's native batch processing
    Specified by:
    
    inferBatch in interface InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Specified by:
    
    inferBatch in class LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Parameters:
    
    batchInputs - array of input maps for batch processing
    
    Returns:
    
    batch inference results (currently returns null)
    
    Throws:
    
    InferenceException - not currently implemented
  - close
```
public void close()
           throws InferenceException
```
    Closes the engine and releases all DJL and native resources.
    Releases resources in reverse initialization order:
    
    Predictor: Stops inference execution threads
    
    ZooModel: Unloads PyTorch model from memory
    
    NDManager: Releases all tensor memory (GPU/CPU)
    
    Calls parent cleanup
    Specified by:
    
    close in interface InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Overrides:
    
    close in class LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Throws:
    
    InferenceException - if resource cleanup fails
  - getMetadata
```
public ModelMetadata getMetadata()
```
    Gets metadata about the loaded PyTorch model.
    TODO: Implement PyTorch metadata extraction via DJL. Potential metadata includes:
    
    Model architecture information
    
    Input/output tensor shapes and types
    
    GPU/CPU execution mode
    
    PyTorch version and model format
    Returns:
    
    model metadata (currently returns null, override for implementation)
  - getCapabilities
```
public InferenceEngine.EngineCapabilities getCapabilities()
```
    Gets the engine's capabilities for PyTorch inference.
    PyTorch engine capabilities:
    
    Batch Inference: Supported through tensor batching
    
    Native Batching: Yes, via DJL's batch processing
    
    Max Batch Size: 64 (conservative default)
    
    GPU Support: Yes, automatic CUDA detection
    Specified by:
    
    getCapabilities in interface InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Specified by:
    
    getCapabilities in class LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,Object>,Map<String,Object>>>
    
    Returns:
    
    engine capabilities indicating full PyTorch support

Class TorchScriptInferenceEngine

Supported PyTorch Features:

Model Loading:

Inference Example:

Input Processing:

Output Processing:

Capabilities:

Dependencies:

Performance Features:

Thread Safety:

Resource Management:

Nested Class Summary

Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine

Field Summary

Fields inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine

Constructor Summary

Method Summary

Methods inherited from class com.codedstream.otterstream.inference.engine.LocalInferenceEngine

Methods inherited from class java.lang.Object

Constructor Detail

TorchScriptInferenceEngine

Method Detail

initialize

DJL Automatic Features:

infer

Input Requirements:

Output Format:

inferBatch

close

getMetadata

getCapabilities