Skip to content

Commit 8c000a0

Browse files
Vertex AI Endpoint Stress Tester (#1336)
* Updated test files names for issue #1169 * Vertex AI Endpoint Stress Tester utility: First push * Updated the vegata script as per Trigger build errors * Further fixes for build failures * Further fixes for build failures * Updated README.md --------- Co-authored-by: Andrew Gold <[email protected]>
1 parent 91bb52d commit 8c000a0

File tree

11 files changed

+673
-0
lines changed

11 files changed

+673
-0
lines changed

Diff for: README.md

+4
Original file line numberDiff line numberDiff line change
@@ -564,6 +564,10 @@ Platform usage.
564564
* [STS Job Manager](tools/sts-job-manager/) - A petabyte-scale bucket
565565
migration tool utilizing
566566
[Storage Transfer Service](https://cloud.google.com/storage-transfer-service)
567+
* [Vertex AI Endpoint Tester] (tools/vertex-ai-endpoint-load-tester) - This
568+
utility helps to methodically test variety of Vertex AI Endpoints by their
569+
sizes so that one can decide the right size to deploy an ML Model on Vertex
570+
AI given a sample request JSON and some idea(s) on expected queries per second.
567571
* [VM Migrator](tools/vm-migrator) - This utility automates migrating Virtual
568572
Machine instances within GCP. You can migrate VM's from one zone to another
569573
zone/region within the same project or different projects while retaining

Diff for: tools/vertex-ai-endpoint-load-tester/README.md

+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
```
2+
# Copyright 2024 Google LLC
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# https://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
```
16+
17+
# Vertex AI Endpoint Stress Tester
18+
19+
go/vertex-endpoint-stress-tester
20+
21+
## Introduction
22+
23+
Vertex AI Endpoints are a great managed solution to deploy ML models at scale. By their architecture, the Vertex AI Endpoints use GKE or similar infrastructure components in the background to enable seamless deployment and inference capabilities for any ML model, be it AutoML or Custom ones.
24+
25+
In some of our recent engagements, we have seen questions or queries raised about the scalability perspective of Vertex AI Endpoints. There is this sample notebook available in GitHub under the Google Cloud Platform account, which explains one of the many ways to check how much load a particular instance handles. However, it is not an automated solution which anyone from GCC can use with ease. Also, it involves some tedious and manual activities as well of creating and deleting endpoints and deploying ML models on them to test the load that specific type of VM can handle. In lieu of the fact that Vertex AI endpoint service continues to grow and supports variety of instance types, this procedure requires an improvement, so that it is easy for anyone from GCC to deploy a given ML model on a series of endpoints of various sizes and check which one is more suitable for the given workload, with some estimations about how much traffic this particular ML model will or is supposed to receive once it goes to Production.
26+
27+
This is where we propose our automated tool (proposed to be open sourced in the PSO GitHub and KitHub), the objective of which is to automatically perform stress testing for one particular model over various types of Endpoint configurations with and without autoscaling, so that we have data driven approach to decide the right sizing of the endpoint.
28+
29+
## Assumptions
30+
31+
1. That the ML model is already built, which this automation tool will not train, but will simply refer from BQML or Vertex AI model registry.
32+
2. That the deployed ML model can accept a valid JSON request as input and provide online predictions as an output, preferably JSON.
33+
3. That the user of this utility has at least an example JSON request file, put into the [requests](requests/) folder. Please see the existing [example](requests/request_movie.json) for clarity.
34+
35+
## How to Install & Run?
36+
37+
Out of the box, the utility can be run from the command line, so the best way to try it for the first time, is to:
38+
39+
1. Edit the [config](config/config.ini) file and select only 1 or 2 VM types.
40+
2. Place the request JSON file into the [requests](requests/) folder. Please see the existing [example](requests/request_movie.json) for reference.
41+
3. Run the utility as follows:
42+
43+
44+
```
45+
cd vertex-ai-endpoint-load-tester/
46+
gcloud auth login
47+
gcloud config set project PROJECT_ID
48+
python main.py
49+
```
50+
51+
## Logging
52+
53+
When ran from the command line, all logs are printed on the console or STDOUT for user to validate. It is NOT stored anywhere else for historical references.
54+
Hence we recommend installing this solution as a container on Cloud Run and run it as a Cloud Run service or job (as long as applicable) so that all logs can then be found from Cloud logging.
55+
56+
## Reporting/Analytics
57+
58+
TODO: This is an open feature, and will be added shortly.
59+
The idea here is to utilize a Looker Studio dashboard to visualize the results of the load testing, so that it is easily consumable by anyone!
60+
61+
## Troubleshooting
62+
63+
1. Check for requisite IAM permissions of the user or Service account on Cloud run (for example) who is running the job.
64+
2. Ensure the [config](config/config.ini) file has no typo or additional information.
65+
3. Ensure from Logs if there are any specific errors captured to debug further.
66+
67+
## Known Errors
68+
69+
TODO
70+
71+
## Roadmap
72+
73+
In future, we can aim to extend this utility for LLMs or any other types of ML models.
74+
Further, we can also extend the same feature to load test other services in GCP, like GKE, which are frequently used to deploy ML solutions.
75+
76+
## Authors:
77+
78+
Ajit Sonawane - AI Engineer, Google Cloud
79+
Suddhasatwa Bhaumik - AI Engineer, Google Cloud
+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Copyright 2024 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# Input configurations
16+
[config]
17+
18+
# logging level
19+
log_level = INFO
20+
21+
# deployed model ID
22+
MODEL_ID = 888526341522063360
23+
24+
# the QPS rates to try
25+
RATE = [25, 50]
26+
27+
# duration for which tests will be ran
28+
DURATION = 10
29+
30+
# BigQuery table to store results
31+
OUTPUT_BQ_TBL_ID = load_test_dataset.test9
32+
33+
# project ID
34+
PROJECT = rare-signer-355918
35+
36+
# region
37+
LOCATION = us-central1
38+
39+
# amount of sleep time before
40+
# the endpoint is tested after
41+
# the model is deployed
42+
TIMEOUT = 300
43+
44+
# autoscaling details.
45+
MIN_NODES = 1
46+
MAX_NODES = 2
47+
48+
# Types of machines to
49+
# be used during testing
50+
# needs to be a list of all VM
51+
MACHINE_TYPES_LST = n1-standard-4,n1-standard-8
52+
53+
#name of request body file in requests folder for making post call to stress testing API
54+
#Please do not enclosed file names with quotes
55+
REQUEST_FILE = request_movie.json
56+
57+
# , "n1-standard-32", "n1-standard-64"]
58+
59+
# "n1-standard-4", "n1-standard-8", "n1-standard-16", "n1-standard-32",
60+
# "n1-highmem-2", "n1-highmem-4", "n1-highmem-8", "n1-highmem-16", "n1-highmem-32",
61+
# "n1-highcpu-2", "n1-highcpu-4", "n1-highcpu-8", "n1-highcpu-16", "n1-highcpu-32",
62+
# "c3-standard-4", "c3-standard-8", "c3-standard-22", "c3-standard-44", "c3-standard-88", "c3-standard-176"]
63+
64+
# End.
Binary file not shown.

Diff for: tools/vertex-ai-endpoint-load-tester/main.py

+211
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# Copyright 2024 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
#
16+
# Script deploys vertex AI endpoint
17+
# and Capture endpoint performance to BQ
18+
#
19+
# Authors: ajitsonawane@,suddhasatwa@
20+
# Team: Google Cloud Consulting
21+
# Date: 25.01.2024
22+
23+
# Imports
24+
import sys
25+
import logging
26+
import traceback
27+
import uuid
28+
import time
29+
import json
30+
from google.cloud import aiplatform
31+
32+
from utils import utils
33+
# from utils import config_parser as cfp
34+
# from utils.utils import register_latency
35+
# from utils.utils import log_latencies_to_bq
36+
# from utils.utils import write_results_to_bq
37+
38+
# function to process requests to endpoint.
39+
def process(machine_type: str, latencies: list, log_level: str):
40+
"""
41+
Deploys machine based on user input, creates endpoint and measure latencies.
42+
Takes the latencies List as input.
43+
Calls the Vegata utility to update latencies for each machine type.
44+
Passes it to another utility to generate full Results.
45+
Returns the Results back.
46+
47+
Inputs:
48+
machine_type: each type of machine to be tested.
49+
latencies: list (usually empty) to get results from Vegata
50+
log_level: level of logging.
51+
52+
Outputs:
53+
results: Combined results for each machine type.
54+
"""
55+
56+
# set logging setup
57+
logging.basicConfig(level=log_level, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
58+
59+
# start logging.
60+
logging.info("Reading configuration.")
61+
62+
# read config.
63+
config_data = utils.read_config("config/config.ini")
64+
MODEL_ID = config_data["config"]["model_id"] # model ID
65+
RATE = json.loads(config_data["config"]["rate"]) # the QPS rates to try
66+
DURATION = str(config_data["config"]["duration"]) # duration for which tests will be ran
67+
PROJECT = config_data["config"]["project"] # project ID
68+
LOCATION = config_data["config"]["location"] # region
69+
TIMEOUT = config_data["config"]["timeout"] # endpoint timeout
70+
MIN_NODES = int(config_data["config"]["min_nodes"]) # min nodes for scaling
71+
MAX_NODES = int(config_data["config"]["max_nodes"]) #max nodes for scaling
72+
REQUEST_FILE = str(config_data["config"]["request_file"])
73+
74+
# deploy model on endpoint.
75+
logging.info(
76+
"Deploying endpoint on machine: %s for model: %s", machine_type, MODEL_ID)
77+
try:
78+
# create client for Vertex AI.
79+
logging.info("Creating AI Platform object.")
80+
aiplatform.init(project=PROJECT, location=LOCATION)
81+
82+
# load the model from registry.
83+
logging.info("Loading {} from Model registry.".format(MODEL_ID))
84+
model = aiplatform.Model(model_name=MODEL_ID)
85+
86+
# generate random UUID
87+
logging.info("Generating random UUID for endpoint creation.")
88+
ep_uuid = uuid.uuid4().hex
89+
display_name = f"ep_{machine_type}_{ep_uuid}"
90+
91+
# create endpoint instance
92+
logging.info("Creating endpoint instance.")
93+
endpoint = aiplatform.Endpoint.create(display_name=display_name)
94+
95+
# deploy endpoint on specific machine type
96+
logging.info("Deploying model {} on endpoint {}".format(model, display_name))
97+
endpoint.deploy(model, min_replica_count=MIN_NODES,
98+
max_replica_count=MAX_NODES, machine_type=machine_type)
99+
100+
# Sleep for 5 minutes
101+
# general best practice with Vertex AI Endpoints
102+
logging.info("Sleeping for 5 minutes, for the endpoint to be ready!")
103+
time.sleep(TIMEOUT)
104+
105+
# Register latencies for predictions
106+
logging.info("Calling utility to register the latencies.")
107+
ret_code, latencies = utils.register_latencies(RATE, DURATION, endpoint, machine_type, endpoint.display_name, latencies, REQUEST_FILE, log_level)
108+
if ret_code == 1:
109+
logging.info("Latencies recorded for {}".format(machine_type))
110+
else:
111+
logging.error("Error in recording latencies for {}".format(machine_type))
112+
sys.exit(1)
113+
114+
# preprocess registered latencies
115+
logging.info("Calling utility to prepare latencies for BigQuery.")
116+
results = utils.log_latencies_to_bq(MODEL_ID, latencies, log_level)
117+
if results:
118+
logging.info("Latencies information processed successfully.")
119+
else:
120+
logging.error("Error in recording all latencies. Exiting.")
121+
sys.exit(1)
122+
123+
# Un-deploy endpoint
124+
logging.info("Un-deploying endpoint: %s", endpoint.resource_name)
125+
endpoint.undeploy_all()
126+
127+
# Deleting endpoint
128+
logging.info("Deleting endpoint: %s", endpoint.resource_name)
129+
endpoint.delete()
130+
131+
logging.info("Processing completed for machine: %s", machine_type)
132+
133+
except Exception as ex:
134+
logging.error(''.join(traceback.format_exception(etype=type(ex),
135+
value=ex, tb=ex.__traceback__)))
136+
137+
# return results.
138+
return (results)
139+
140+
# entrypoint function.
141+
def main():
142+
""" Entrypoint """
143+
144+
# Read config.
145+
config_data = utils.read_config("config/config.ini")
146+
MACHINE_TYPES_LST = config_data["config"]["machine_types_lst"].split(',') # List of machine types
147+
LOG_LEVEL = config_data["config"]["log_level"] # level of logging.
148+
OUTPUT_BQ_TBL_ID = config_data["config"]["output_bq_tbl_id"] # BigQuery table to store results
149+
PROJECT = config_data["config"]["project"] # project ID
150+
151+
# log setup.
152+
logging.basicConfig(level=LOG_LEVEL, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
153+
154+
# start logging.
155+
logging.info("Vertex Endpoint Stress Tester Utility.")
156+
157+
# variables
158+
logging.info("Prepping local variables.")
159+
LATENCIES = []
160+
RESULTS = []
161+
162+
# record start time.
163+
start = time.time()
164+
165+
# loop through each machine type
166+
# and process the records.
167+
try:
168+
for machine_type in MACHINE_TYPES_LST:
169+
# log calling the utility
170+
logging.info("Calling data processing utility.")
171+
172+
# append the results from utility
173+
RESULTS.extend(process(machine_type, LATENCIES, LOG_LEVEL))
174+
175+
# log end.
176+
logging.info("Results utility completed.")
177+
178+
# reset the latencies variable
179+
LATENCIES = []
180+
except Exception as e:
181+
# log error
182+
logging.error("Got error while running load tests.")
183+
logging.error(e)
184+
# exit
185+
sys.exit(1)
186+
187+
# REMOVE
188+
logging.info(len(LATENCIES))
189+
logging.info(len(RESULTS))
190+
191+
# write collected results to BigQuery
192+
logging.info(" Writing data of load testing on machine type %s", machine_type)
193+
bq_write_ret_code = utils.write_results_to_bq(RESULTS, OUTPUT_BQ_TBL_ID, PROJECT, LOG_LEVEL)
194+
if bq_write_ret_code == 1:
195+
# log success
196+
logging.info("Successfully written data into BQ in {} table.".format(OUTPUT_BQ_TBL_ID))
197+
else:
198+
# log error
199+
logging.error("Errors in writing data into BigQuery. Exiting.")
200+
# exit
201+
sys.exit(1)
202+
203+
# print the total time taken.
204+
# this is for all machines.
205+
logging.info(f"Total time taken for execution {time.time()-start}")
206+
207+
# Call entrypoint
208+
if __name__ == "__main__":
209+
main()
210+
211+
# End.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"instances": [
3+
{
4+
"Id": 3837,
5+
"name": "The",
6+
"rating": "R",
7+
"genre": "Comedy",
8+
"year": 2000,
9+
"released": "8/3/2001",
10+
"director": "John",
11+
"writer": "John",
12+
"star": "Michael",
13+
"country": "United",
14+
"budget": 35524924.14,
15+
"company": "Pictures",
16+
"runtime": 104,
17+
"data_cat": "TRAIN"
18+
}
19+
]
20+
}

0 commit comments

Comments
 (0)