Open
Description
Name of Feature or Improvement
Create an integration test case to validate DSP, CodeFlare and KubeRay implementation.
Describe the Solution You Would Like to See
Test environment assumptions:
- Data Science Pipeline v1.
- Ray cluster shall consist of no more than 2 worker pods, with 2 CPU cores and less than 6 GB available for each pod.
- An integration test execution time shall be less than 20 mins in total.
- S3 storage may be available, if needed.
- Free of proprietary intellectual property.
- Public data only.
Proposed test case: Clustering text documents using k-means on scikit-learn education page.
https://scikit-learn.org/stable/auto_examples/text/plot_document_clustering.html
Data Science Pipeline stages:
- Downloading test data (https://scikit-learn.org/stable/auto_examples/text/plot_document_clustering.html#loading-text-data)
- Launch Ray cluster with two worker pods.
- Ray driver launches two Ray actors, deployed to a pod each. The first actor runs TfidfVectorizer, followed by Kmeans clustering and evaluation. The second actor runs HashingVectorizer, followed by Kmeans clustering and evaluation.
- Ray driver collects evaluation results from the two actors. Then it reports the summaries.
- Ray cluster is stopped and shutdown.
- Pipeline run is completed.
Expected test assets:
- DSP pipeline yaml to deploy and kick off test runs.
- Test image with Ray and document clustering code.
- CodeFlare image to deploy the test image.
- Preconfigured credentials and configmaps in the test environment.
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
No status