Class VertexAIInferenceClient
- java.lang.Object
-
- com.codedstream.otterstream.remote.vertex.VertexAIInferenceClient
-
- All Implemented Interfaces:
InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
public class VertexAIInferenceClient extends Object implements InferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
Google Vertex AI remote inference client for Google Cloud ML models.This engine provides integration with Google Cloud Vertex AI endpoints for inference on models hosted in Google Cloud. It uses the Google Cloud Vertex AI Java client library to communicate with the PredictionService API.
Supported Features:
- Vertex AI Endpoints: Integration with deployed Vertex AI model endpoints
- Google Cloud Authentication: Automatic credentials via Application Default Credentials
- Protobuf Payloads: Automatic conversion between Java Maps and protobuf Values
- Batch Inference: Native support for batch predictions
- Region Configuration: Configurable Google Cloud regions
Configuration Example:
InferenceConfig inferenceConfig = InferenceConfig.builder() .modelConfig(ModelConfig.builder() .modelName("vertex-model") .modelVersion("v1") .build()) .engineOption("endpoint", "projects/my-project/locations/us-central1/endpoints/my-endpoint") .engineOption("project_id", "my-project") .engineOption("location", "us-central1") .build(); VertexAIInferenceClient client = new VertexAIInferenceClient(inferenceConfig); client.initialize(inferenceConfig.getModelConfig());Authentication:
Uses Google Cloud Application Default Credentials (ADC) which automatically searches for credentials in this order:
- GOOGLE_APPLICATION_CREDENTIALS environment variable
- Google Cloud SDK default credentials
- Google App Engine credentials
- Google Cloud Shell credentials
- Google Compute Engine credentials
Endpoint Name Formats:
- Full Resource Name: projects/{project}/locations/{location}/endpoints/{endpoint}
- Separate Components: Provide project_id, location, and endpoint separately
Vertex AI Request Format:
{ "instances": [ { "feature1": value1, "feature2": value2, ... } ] }Performance Features:
- gRPC-based: Uses high-performance gRPC protocol
- Connection Pooling: Managed by Google Cloud client library
- Request Batching: Native batch support for throughput optimization
- Automatic Retry: Built-in retry with exponential backoff
Error Handling:
- Vertex AI API errors throw
InferenceException - Authentication failures throw
InferenceException - Network/timeout errors are wrapped in
InferenceException
Thread Safety:
PredictionServiceClientis thread-safe when created with default settings. The client manages its own connection pooling and request lifecycle.- Since:
- 1.0.0
- Author:
- Nestor Martourez, Sr Software and Data Streaming Engineer @ CodedStreams
- See Also:
InferenceEngine,PredictionServiceClient, Vertex AI Predictions Documentation
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface com.codedstream.otterstream.inference.engine.InferenceEngine
InferenceEngine.EngineCapabilities
-
-
Constructor Summary
Constructors Constructor Description VertexAIInferenceClient(InferenceConfig inferenceConfig)Constructs a new Vertex AI inference client with the provided configuration.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes the Vertex AI client and releases Google Cloud resources.InferenceEngine.EngineCapabilitiesgetCapabilities()Gets the engine's capabilities for Vertex AI inference.ModelMetadatagetMetadata()Gets metadata about the Vertex AI model.ModelConfiggetModelConfig()Gets the model configuration.InferenceResultinfer(Map<String,Object> inputs)Performs single inference using Vertex AI PredictionService.InferenceResultinferBatch(Map<String,Object>[] batchInputs)Performs batch inference using Vertex AI native batch support.voidinitialize(ModelConfig modelConfig)Initializes the Vertex AI inference client with Google Cloud configuration.booleanisReady()Checks if the engine is ready for inference.
-
-
-
Constructor Detail
-
VertexAIInferenceClient
public VertexAIInferenceClient(InferenceConfig inferenceConfig)
Constructs a new Vertex AI inference client with the provided configuration.- Parameters:
inferenceConfig- inference configuration containing Vertex AI options
-
-
Method Detail
-
initialize
public void initialize(ModelConfig modelConfig) throws InferenceException
Initializes the Vertex AI inference client with Google Cloud configuration.Initialization process:
- Parses endpoint configuration from engine options
- Creates
EndpointNamefrom resource name or components - Builds
PredictionServiceSettingswith automatic region endpoint - Creates
PredictionServiceClientwith Application Default Credentials - Initializes
ModelMetadatawith Vertex AI format
Required Engine Options:
- endpoint: Full resource name OR endpoint name
- project_id: Google Cloud project ID (if not in endpoint)
- location: Google Cloud region (default: us-central1)
- Specified by:
initializein interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>- Parameters:
modelConfig- model configuration containing model name and version- Throws:
InferenceException- if initialization fails or required options are missingIllegalArgumentException- if endpoint format is invalid
-
infer
public InferenceResult infer(Map<String,Object> inputs) throws InferenceException
Performs single inference using Vertex AI PredictionService.Request flow:
- Converts input Map to protobuf
Valueusing JSON serialization - Creates
PredictRequestwith single instance - Calls
PredictionServiceClient.predict(com.google.cloud.aiplatform.v1.EndpointName, java.util.List<com.google.protobuf.Value>, com.google.protobuf.Value) - Extracts first prediction from
PredictResponse - Converts protobuf Value back to Java Map
- Returns
InferenceResultwith timing information
Vertex AI Response Format:
Vertex AI returns predictions in the format configured during model deployment. Common formats include:
- Classification: {"classes": ["class1", "class2"], "scores": [0.8, 0.2]}
- Regression: {"value": 42.5}
- Custom outputs based on model signature
- Specified by:
inferin interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>- Parameters:
inputs- map of input names to values (must be JSON-serializable)- Returns:
- inference result containing Vertex AI predictions
- Throws:
InferenceException- if Vertex AI API call fails or no predictions returned
- Converts input Map to protobuf
-
inferBatch
public InferenceResult inferBatch(Map<String,Object>[] batchInputs) throws InferenceException
Performs batch inference using Vertex AI native batch support.Vertex AI natively supports batch predictions, which is more efficient than sequential single inferences. This method:
- Converts all batch inputs to protobuf Values
- Sends single batch request with all instances
- Receives batch response with all predictions
- Aggregates results with index prefixes ("result_0", "result_1", etc.)
Batch Size Considerations:
Vertex AI has limits on batch size (varies by model and endpoint configuration). Check Vertex AI documentation for current limits. The batch size from
InferenceConfigis used but may be limited by Vertex AI quotas.- Specified by:
inferBatchin interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>- Parameters:
batchInputs- array of input maps for batch processing- Returns:
- aggregated inference result containing all batch predictions
- Throws:
InferenceException- if batch inference fails or batch size exceeds limits
-
isReady
public boolean isReady()
Checks if the engine is ready for inference.- Specified by:
isReadyin interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>- Returns:
- true if initialized and prediction client is available
-
getMetadata
public ModelMetadata getMetadata()
Gets metadata about the Vertex AI model.- Specified by:
getMetadatain interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>- Returns:
- model metadata created during initialization
-
getModelConfig
public ModelConfig getModelConfig()
Gets the model configuration.- Specified by:
getModelConfigin interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>- Returns:
- model configuration from inference config
-
getCapabilities
public InferenceEngine.EngineCapabilities getCapabilities()
Gets the engine's capabilities for Vertex AI inference.Vertex AI capabilities:
- Remote: Yes, communicates with cloud service
- Batch: Yes, native batch support
- Max Batch Size: From
InferenceConfig.getBatchSize() - Async: Yes, platform supports async operations
- Specified by:
getCapabilitiesin interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>- Returns:
- engine capabilities for Vertex AI
-
close
public void close()
Closes the Vertex AI client and releases Google Cloud resources.Closes the
PredictionServiceClientwhich releases gRPC channels and connection pools. Always call this method when finished to prevent resource leaks.- Specified by:
closein interfaceInferenceEngine<com.google.cloud.aiplatform.v1.PredictionServiceClient>
-
-