Class TorchScriptInferenceEngine

  • All Implemented Interfaces:
    InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,​Object>,​Map<String,​Object>>>

    public class TorchScriptInferenceEngine
    extends LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,​Object>,​Map<String,​Object>>>
    TorchScript (PyTorch) implementation of LocalInferenceEngine using Deep Java Library (DJL).

    This engine provides inference capabilities for PyTorch models saved in TorchScript format. It leverages DJL's PyTorch engine to load and execute models with automatic GPU acceleration when available. The engine handles PyTorch's dynamic graph execution and tensor operations.

    Supported PyTorch Features:

    • TorchScript Models: PyTorch models exported via torch.jit.trace or torch.jit.script
    • Data Types: Float and integer tensors with automatic dimension handling
    • Batch Dimension: Automatic addition of batch dimension via expandDims(0)
    • GPU Acceleration: Automatic GPU detection and execution when available

    Model Loading:

    
     ModelConfig config = ModelConfig.builder()
         .modelPath("model.pt")  // TorchScript model file
         .modelId("pytorch-model")
         .build();
    
     TorchScriptInferenceEngine engine = new TorchScriptInferenceEngine();
     engine.initialize(config);
     

    Inference Example:

    
     Map<String, Object> inputs = new HashMap<>();
     inputs.put("input1", new float[]{0.1f, 0.2f, 0.3f});
     inputs.put("input2", new int[]{1, 2, 3});
    
     InferenceResult result = engine.infer(inputs);
     float[] predictions = (float[]) result.getOutput("output_0");
     

    Input Processing:

    The TorchScriptInferenceEngine.MapTranslator automatically processes inputs:

    • float[]: Converts to FloatTensor with added batch dimension
    • int[]: Converts to IntTensor with added batch dimension
    • Dimension Expansion: Adds batch dimension via expandDims(0)

    Output Processing:

    Outputs are automatically converted back to Java arrays:

    • Tensor to Array: Converts DJL NDArrays to float arrays
    • Named Outputs: Generates output names as "output_0", "output_1", etc.
    • Type Preservation: Maintains original tensor data types

    Capabilities:

    FeatureSupportedNotes
    Batch InferenceYesVia batch dimension in tensors
    Native BatchingYesThrough DJL's batch processing
    Max Batch Size64Configurable based on memory
    GPU SupportYesAutomatic CUDA detection via DJL
    Dynamic GraphsYesSupports TorchScript dynamic execution

    Dependencies:

     Requires DJL PyTorch engine:
     - ai.djl:api (runtime)
     - ai.djl.pytorch:pytorch-engine (runtime)
     - ai.djl.pytorch:pytorch-native-auto (runtime)
     

    Performance Features:

    • Automatic GPU: DJL automatically uses GPU if CUDA is available
    • Memory Management: NDManager for efficient tensor lifecycle
    • Batch Optimization: Native batch processing through tensor operations
    • Model Caching: DJL caches loaded models for repeated use

    Thread Safety:

    DJL Predictor instances are not thread-safe. For concurrent inference:

    Resource Management:

    Always call close() to release native resources (GPU memory, file handles). The engine implements AutoCloseable for use with try-with-resources:

    
     try (TorchScriptInferenceEngine engine = new TorchScriptInferenceEngine()) {
         engine.initialize(config);
         InferenceResult result = engine.infer(inputs);
     }
     
    Since:
    1.0.0
    Author:
    Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams
    See Also:
    LocalInferenceEngine, Predictor, ZooModel, Deep Java Library Documentation
    • Constructor Detail

      • TorchScriptInferenceEngine

        public TorchScriptInferenceEngine()
    • Method Detail

      • initialize

        public void initialize​(ModelConfig config)
                        throws InferenceException
        Initializes the PyTorch inference engine by loading a TorchScript model.

        The initialization process:

        1. Creates NDManager for tensor memory management
        2. Builds Criteria for model loading with Map-based I/O
        3. Configures TorchScriptInferenceEngine.MapTranslator for input/output processing
        4. Loads model using DJL's Criteria.loadModel()
        5. Creates Predictor for inference execution

        DJL Automatic Features:

        • Engine Selection: Automatically selects PyTorch engine
        • GPU Detection: Uses CUDA if available, falls back to CPU
        • Native Libraries: Loads PyTorch native libraries automatically
        Specified by:
        initialize in interface InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,​Object>,​Map<String,​Object>>>
        Overrides:
        initialize in class LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,​Object>,​Map<String,​Object>>>
        Parameters:
        config - model configuration containing TorchScript model path
        Throws:
        InferenceException - if model loading fails or DJL is not properly configured
      • infer

        public InferenceResult infer​(Map<String,​Object> inputs)
                              throws InferenceException
        Performs single inference on the provided inputs using the PyTorch model.

        The inference process:

        1. Inputs are processed by TorchScriptInferenceEngine.MapTranslator.processInput(ai.djl.translate.TranslatorContext, java.util.Map<java.lang.String, java.lang.Object>)
        2. Converted to DJL NDList with batch dimensions
        3. Executed through PyTorch engine (GPU if available)
        4. Outputs converted back to Map via TorchScriptInferenceEngine.MapTranslator.processOutput(ai.djl.translate.TranslatorContext, ai.djl.ndarray.NDList)

        Input Requirements:

        • float[] arrays: Converted to Float32 tensors
        • int[] arrays: Converted to Int32 tensors
        • Batch Dimension: Automatically added (expandDims(0))

        Output Format:

        Outputs are named sequentially as "output_0", "output_1", etc., containing float arrays extracted from output tensors.

        Specified by:
        infer in interface InferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,​Object>,​Map<String,​Object>>>
        Specified by:
        infer in class LocalInferenceEngine<ai.djl.repository.zoo.ZooModel<Map<String,​Object>,​Map<String,​Object>>>
        Parameters:
        inputs - map of input names to arrays (float[] or int[])
        Returns:
        inference result containing predictions and timing
        Throws:
        InferenceException - if inference fails or inputs are invalid
      • getMetadata

        public ModelMetadata getMetadata()
        Gets metadata about the loaded PyTorch model.

        TODO: Implement PyTorch metadata extraction via DJL. Potential metadata includes:

        • Model architecture information
        • Input/output tensor shapes and types
        • GPU/CPU execution mode
        • PyTorch version and model format
        Returns:
        model metadata (currently returns null, override for implementation)