Class VertexAIInferenceClient

  • All Implemented Interfaces:
    InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>

    public class VertexAIInferenceClient
    extends Object
    implements InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    Google Vertex AI remote inference client for Google Cloud ML models.

    This engine provides integration with Google Cloud Vertex AI endpoints for inference on models hosted in Google Cloud. It uses the Google Cloud Vertex AI Java client library to communicate with the PredictionService API.

    Supported Features:

    • Vertex AI Endpoints: Integration with deployed Vertex AI model endpoints
    • Google Cloud Authentication: Automatic credentials via Application Default Credentials
    • Protobuf Payloads: Automatic conversion between Java Maps and protobuf Values
    • Batch Inference: Native support for batch predictions
    • Region Configuration: Configurable Google Cloud regions

    Configuration Example:

    
     InferenceConfig inferenceConfig = InferenceConfig.builder()
         .modelConfig(ModelConfig.builder()
             .modelName("vertex-model")
             .modelVersion("v1")
             .build())
         .engineOption("endpoint", "projects/my-project/locations/us-central1/endpoints/my-endpoint")
         .engineOption("project_id", "my-project")
         .engineOption("location", "us-central1")
         .build();
    
     VertexAIInferenceClient client = new VertexAIInferenceClient(inferenceConfig);
     client.initialize(inferenceConfig.getModelConfig());
     

    Authentication:

    Uses Google Cloud Application Default Credentials (ADC) which automatically searches for credentials in this order:

    1. GOOGLE_APPLICATION_CREDENTIALS environment variable
    2. Google Cloud SDK default credentials
    3. Google App Engine credentials
    4. Google Cloud Shell credentials
    5. Google Compute Engine credentials

    Endpoint Name Formats:

    • Full Resource Name: projects/{project}/locations/{location}/endpoints/{endpoint}
    • Separate Components: Provide project_id, location, and endpoint separately

    Vertex AI Request Format:

     {
       "instances": [
         {
           "feature1": value1,
           "feature2": value2,
           ...
         }
       ]
     }
     

    Performance Features:

    • gRPC-based: Uses high-performance gRPC protocol
    • Connection Pooling: Managed by Google Cloud client library
    • Request Batching: Native batch support for throughput optimization
    • Automatic Retry: Built-in retry with exponential backoff

    Error Handling:

    Thread Safety:

    PredictionServiceClient is thread-safe when created with default settings. The client manages its own connection pooling and request lifecycle.

    Since:
    1.0.0
    Author:
    Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams
    See Also:
    InferenceEngine, PredictionServiceClient, Vertex AI Predictions Documentation
    • Constructor Detail

      • VertexAIInferenceClient

        public VertexAIInferenceClient​(InferenceConfig inferenceConfig)
        Constructs a new Vertex AI inference client with the provided configuration.
        Parameters:
        inferenceConfig - inference configuration containing Vertex AI options
    • Method Detail

      • initialize

        public void initialize​(ModelConfig modelConfig)
                        throws InferenceException
        Initializes the Vertex AI inference client with Google Cloud configuration.

        Initialization process:

        1. Parses endpoint configuration from engine options
        2. Creates EndpointName from resource name or components
        3. Builds PredictionServiceSettings with automatic region endpoint
        4. Creates PredictionServiceClient with Application Default Credentials
        5. Initializes ModelMetadata with Vertex AI format

        Required Engine Options:

        • endpoint: Full resource name OR endpoint name
        • project_id: Google Cloud project ID (if not in endpoint)
        • location: Google Cloud region (default: us-central1)
        Specified by:
        initialize in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
        Parameters:
        modelConfig - model configuration containing model name and version
        Throws:
        InferenceException - if initialization fails or required options are missing
        IllegalArgumentException - if endpoint format is invalid
      • infer

        public InferenceResult infer​(Map<String,​Object> inputs)
                              throws InferenceException
        Performs single inference using Vertex AI PredictionService.

        Request flow:

        1. Converts input Map to protobuf Value using JSON serialization
        2. Creates PredictRequest with single instance
        3. Calls PredictionServiceClient.predict(com.google.cloud.aiplatform.v1.EndpointName, java.util.List<com.google.protobuf.Value>, com.google.protobuf.Value)
        4. Extracts first prediction from PredictResponse
        5. Converts protobuf Value back to Java Map
        6. Returns InferenceResult with timing information

        Vertex AI Response Format:

        Vertex AI returns predictions in the format configured during model deployment. Common formats include:

        • Classification: {"classes": ["class1", "class2"], "scores": [0.8, 0.2]}
        • Regression: {"value": 42.5}
        • Custom outputs based on model signature
        Specified by:
        infer in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
        Parameters:
        inputs - map of input names to values (must be JSON-serializable)
        Returns:
        inference result containing Vertex AI predictions
        Throws:
        InferenceException - if Vertex AI API call fails or no predictions returned
      • inferBatch

        public InferenceResult inferBatch​(Map<String,​Object>[] batchInputs)
                                   throws InferenceException
        Performs batch inference using Vertex AI native batch support.

        Vertex AI natively supports batch predictions, which is more efficient than sequential single inferences. This method:

        1. Converts all batch inputs to protobuf Values
        2. Sends single batch request with all instances
        3. Receives batch response with all predictions
        4. Aggregates results with index prefixes ("result_0", "result_1", etc.)

        Batch Size Considerations:

        Vertex AI has limits on batch size (varies by model and endpoint configuration). Check Vertex AI documentation for current limits. The batch size from InferenceConfig is used but may be limited by Vertex AI quotas.

        Specified by:
        inferBatch in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
        Parameters:
        batchInputs - array of input maps for batch processing
        Returns:
        aggregated inference result containing all batch predictions
        Throws:
        InferenceException - if batch inference fails or batch size exceeds limits
      • isReady

        public boolean isReady()
        Checks if the engine is ready for inference.
        Specified by:
        isReady in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
        Returns:
        true if initialized and prediction client is available
      • getMetadata

        public ModelMetadata getMetadata()
        Gets metadata about the Vertex AI model.
        Specified by:
        getMetadata in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
        Returns:
        model metadata created during initialization
      • getModelConfig

        public ModelConfig getModelConfig()
        Gets the model configuration.
        Specified by:
        getModelConfig in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
        Returns:
        model configuration from inference config
      • getCapabilities

        public InferenceEngine.EngineCapabilities getCapabilities()
        Gets the engine's capabilities for Vertex AI inference.

        Vertex AI capabilities:

        • Remote: Yes, communicates with cloud service
        • Batch: Yes, native batch support
        • Max Batch Size: From InferenceConfig.getBatchSize()
        • Async: Yes, platform supports async operations
        Specified by:
        getCapabilities in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
        Returns:
        engine capabilities for Vertex AI
      • close

        public void close()
        Closes the Vertex AI client and releases Google Cloud resources.

        Closes the PredictionServiceClient which releases gRPC channels and connection pools. Always call this method when finished to prevent resource leaks.

        Specified by:
        close in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>