java.lang.Object
- com.codedstream.otterstream.remote.vertex.VertexAIInferenceClient

All Implemented Interfaces:

InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
```
public class VertexAIInferenceClient
extends Object
implements InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
```
Google Vertex AI remote inference client for Google Cloud ML models.
This engine provides integration with Google Cloud Vertex AI endpoints for inference on models hosted in Google Cloud. It uses the Google Cloud Vertex AI Java client library to communicate with the PredictionService API.
Supported Features:
- Vertex AI Endpoints: Integration with deployed Vertex AI model endpoints
- Google Cloud Authentication: Automatic credentials via Application Default Credentials
- Protobuf Payloads: Automatic conversion between Java Maps and protobuf Values
- Batch Inference: Native support for batch predictions
- Region Configuration: Configurable Google Cloud regions
Configuration Example:
```
 InferenceConfig inferenceConfig = InferenceConfig.builder()
     .modelConfig(ModelConfig.builder()
         .modelName("vertex-model")
         .modelVersion("v1")
         .build())
     .engineOption("endpoint", "projects/my-project/locations/us-central1/endpoints/my-endpoint")
     .engineOption("project_id", "my-project")
     .engineOption("location", "us-central1")
     .build();

 VertexAIInferenceClient client = new VertexAIInferenceClient(inferenceConfig);
 client.initialize(inferenceConfig.getModelConfig());
 
```
Authentication:

Uses Google Cloud Application Default Credentials (ADC) which automatically searches for credentials in this order:
1. GOOGLE_APPLICATION_CREDENTIALS environment variable
2. Google Cloud SDK default credentials
3. Google App Engine credentials
4. Google Cloud Shell credentials
5. Google Compute Engine credentials
Endpoint Name Formats:
- Full Resource Name: projects/{project}/locations/{location}/endpoints/{endpoint}
- Separate Components: Provide project_id, location, and endpoint separately
Vertex AI Request Format:
```
 {
   "instances": [
     {
       "feature1": value1,
       "feature2": value2,
       ...
     }
   ]
 }
 
```
Performance Features:
- gRPC-based: Uses high-performance gRPC protocol
- Connection Pooling: Managed by Google Cloud client library
- Request Batching: Native batch support for throughput optimization
- Automatic Retry: Built-in retry with exponential backoff
Error Handling:
- Vertex AI API errors throw InferenceException
- Authentication failures throw InferenceException
- Network/timeout errors are wrapped in InferenceException
Thread Safety:

PredictionServiceClient is thread-safe when created with default settings. The client manages its own connection pooling and request lifecycle.
Since:

1.0.0

Author:

Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams

See Also:

InferenceEngine, PredictionServiceClient, Vertex AI Predictions Documentation

Nested Class Summary
- Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine
  InferenceEngine.EngineCapabilities

Constructor Summary

Constructors
Constructor Description

VertexAIInferenceClient(InferenceConfig inferenceConfig)
Constructs a new Vertex AI inference client with the provided configuration.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`close()`	Closes the Vertex AI client and releases Google Cloud resources.
`InferenceEngine.EngineCapabilities`	`getCapabilities()`	Gets the engine's capabilities for Vertex AI inference.
`ModelMetadata`	`getMetadata()`	Gets metadata about the Vertex AI model.
`ModelConfig`	`getModelConfig()`	Gets the model configuration.
`InferenceResult`	`infer(Map<String,Object> inputs)`	Performs single inference using Vertex AI PredictionService.
`InferenceResult`	`inferBatch(Map<String,Object>[] batchInputs)`	Performs batch inference using Vertex AI native batch support.
`void`	`initialize(ModelConfig modelConfig)`	Initializes the Vertex AI inference client with Google Cloud configuration.
`boolean`	`isReady()`	Checks if the engine is ready for inference.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - VertexAIInferenceClient
```
public VertexAIInferenceClient(InferenceConfig inferenceConfig)
```
    Constructs a new Vertex AI inference client with the provided configuration.
    
    Parameters:
    
    inferenceConfig - inference configuration containing Vertex AI options
- Method Detail
  - initialize
```
public void initialize(ModelConfig modelConfig)
                throws InferenceException
```
    Initializes the Vertex AI inference client with Google Cloud configuration.
    Initialization process:
    
    Parses endpoint configuration from engine options
    
    Creates EndpointName from resource name or components
    
    Builds PredictionServiceSettings with automatic region endpoint
    
    Creates PredictionServiceClient with Application Default Credentials
    
    Initializes ModelMetadata with Vertex AI format
    
    Required Engine Options:
    
    endpoint: Full resource name OR endpoint name
    
    project_id: Google Cloud project ID (if not in endpoint)
    
    location: Google Cloud region (default: us-central1)
    Specified by:
    
    initialize in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    
    Parameters:
    
    modelConfig - model configuration containing model name and version
    
    Throws:
    
    InferenceException - if initialization fails or required options are missing
    
    IllegalArgumentException - if endpoint format is invalid
  - infer
```
public InferenceResult infer(Map<String,Object> inputs)
                      throws InferenceException
```
    Performs single inference using Vertex AI PredictionService.
    Request flow:
    
    Converts input Map to protobuf Value using JSON serialization
    
    Creates PredictRequest with single instance
    
    Calls PredictionServiceClient.predict(com.google.cloud.aiplatform.v1.EndpointName, java.util.List<com.google.protobuf.Value>, com.google.protobuf.Value)
    
    Extracts first prediction from PredictResponse
    
    Converts protobuf Value back to Java Map
    
    Returns InferenceResult with timing information
    
    Vertex AI Response Format:
    
    Vertex AI returns predictions in the format configured during model deployment. Common formats include:
    
    Classification: {"classes": ["class1", "class2"], "scores": [0.8, 0.2]}
    
    Regression: {"value": 42.5}
    
    Custom outputs based on model signature
    Specified by:
    
    infer in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    
    Parameters:
    
    inputs - map of input names to values (must be JSON-serializable)
    
    Returns:
    
    inference result containing Vertex AI predictions
    
    Throws:
    
    InferenceException - if Vertex AI API call fails or no predictions returned
  - inferBatch
```
public InferenceResult inferBatch(Map<String,Object>[] batchInputs)
                           throws InferenceException
```
    Performs batch inference using Vertex AI native batch support.
    Vertex AI natively supports batch predictions, which is more efficient than sequential single inferences. This method:
    
    Converts all batch inputs to protobuf Values
    
    Sends single batch request with all instances
    
    Receives batch response with all predictions
    
    Aggregates results with index prefixes ("result_0", "result_1", etc.)
    
    Batch Size Considerations:
    
    Vertex AI has limits on batch size (varies by model and endpoint configuration). Check Vertex AI documentation for current limits. The batch size from InferenceConfig is used but may be limited by Vertex AI quotas.
    Specified by:
    
    inferBatch in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    
    Parameters:
    
    batchInputs - array of input maps for batch processing
    
    Returns:
    
    aggregated inference result containing all batch predictions
    
    Throws:
    
    InferenceException - if batch inference fails or batch size exceeds limits
  - isReady
```
public boolean isReady()
```
    Checks if the engine is ready for inference.
    
    Specified by:
    
    isReady in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    
    Returns:
    
    true if initialized and prediction client is available
  - getMetadata
```
public ModelMetadata getMetadata()
```
    Gets metadata about the Vertex AI model.
    
    Specified by:
    
    getMetadata in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    
    Returns:
    
    model metadata created during initialization
  - getModelConfig
```
public ModelConfig getModelConfig()
```
    Gets the model configuration.
    
    Specified by:
    
    getModelConfig in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    
    Returns:
    
    model configuration from inference config
  - getCapabilities
```
public InferenceEngine.EngineCapabilities getCapabilities()
```
    Gets the engine's capabilities for Vertex AI inference.
    Vertex AI capabilities:
    
    Remote: Yes, communicates with cloud service
    
    Batch: Yes, native batch support
    
    Max Batch Size: From InferenceConfig.getBatchSize()
    
    Async: Yes, platform supports async operations
    Specified by:
    
    getCapabilities in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
    
    Returns:
    
    engine capabilities for Vertex AI
  - close
```
public void close()
```
    Closes the Vertex AI client and releases Google Cloud resources.
    Closes the PredictionServiceClient which releases gRPC channels and connection pools. Always call this method when finished to prevent resource leaks.
    
    Specified by:
    
    close in interface InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>

Class VertexAIInferenceClient

Supported Features:

Configuration Example:

Authentication:

Endpoint Name Formats:

Vertex AI Request Format:

Performance Features:

Error Handling:

Thread Safety:

Nested Class Summary

Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

VertexAIInferenceClient

Method Detail

initialize

Required Engine Options:

infer

Vertex AI Response Format:

inferBatch

Batch Size Considerations:

isReady

getMetadata

getModelConfig

getCapabilities

close