-
Notifications
You must be signed in to change notification settings - Fork 98
✨ Helm Chart for VideoQnA Application #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 31 commits
131861f
7cd488a
1f121e9
418edb7
24b4e3f
880e406
0110b85
56afc63
1a5f762
5fbfe85
3bb015d
a29691b
e8f58bb
a8cbdbf
63bf7f6
cf4360f
2d02cf9
d03a52f
1e5eea1
6e832ea
fc17e30
5ee3dc0
ea180a5
ef695c8
7be67a1
3cf4f29
0e37189
d13bdcb
c932c31
a993f1a
773922e
4bd8b04
91f96ec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,14 +1,18 @@ | ||
| # data-prep | ||
| # Data-Prep Microservice | ||
|
|
||
| Helm chart for deploying data-prep microservice. | ||
| Helm chart for deploying data-prep microservice. Data-Prep is consumed by several reference applications present in [GenAIExample](https://github.com/opea-project/GenAIExamples/tree/main). | ||
|
|
||
| data-prep will use redis and tei service, please specify the endpoints. | ||
| There are 2 versions of Data-Prep microservice. First version is unimodal based on redis-vector-db and TEI. It performs data preparation for textual data. An alternative multimodal version based on `vdms-values.yaml` file, performs data preparation for visual data input. Follow along to select and install the version which suites your use case. | ||
|
|
||
| ## (Option1): Installing the chart separately | ||
| Data-Prep uses redis-vector-db and tei. The multimodal version uses vdms-vector-db service. Endpoints for these dependencies should be set properly before installing the chart. | ||
|
|
||
| ## Install the chart for data preparation using Redis Vector DB | ||
|
|
||
| ### (Option1): Installing the chart separately | ||
|
|
||
| First, you need to install the tei and redis-vector-db chart, please refer to the [tei](../tei/README.md) and [redis-vector-db](../redis-vector-db/README.md) for more information. | ||
|
|
||
| After you've deployted the tei and redis-vector-db chart successfully, please run `kubectl get svc` to get the service endpoint and URL respectively, i.e. `http://tei`, `redis://redis-vector-db:6379`. | ||
| After you've deployed the tei and redis-vector-db chart successfully, please run `kubectl get svc` to get the service endpoint and URL respectively, i.e. `http://tei`, `redis://redis-vector-db:6379`. | ||
|
|
||
| To install data-prep chart, run the following: | ||
|
|
||
|
|
@@ -20,7 +24,7 @@ helm dependency update | |
| helm install data-prep . --set REDIS_URL=${REDIS_URL} --set TEI_EMBEDDING_ENDPOINT=${TEI_EMBEDDING_ENDPOINT} | ||
| ``` | ||
|
|
||
| ## (Option2): Installing the chart with dependencies automatically | ||
| ### (Option2): Installing the chart with dependencies automatically | ||
|
|
||
| ```console | ||
| cd GenAIInfra/helm-charts/common/data-prep | ||
|
|
@@ -29,6 +33,52 @@ helm install data-prep . --set redis-vector-db.enabled=true --set tei.enabled=tr | |
|
|
||
| ``` | ||
|
|
||
| ## Install the chart for multimodal data preparation using VDMS Vector DB | ||
|
|
||
| ### (Option1): Installing the chart separately | ||
|
|
||
| First, you need to install the `vdms-vector-db` chart. Please refer to the [vdms-vector-db](../vdms-vector-db/README.md) for more information. | ||
|
|
||
| After you've deployed the `vdms-vector-db` chart successfully, please run `kubectl get svc` to get the service host and port respectively, for example: `http://vdms-vector-db:8001`. | ||
|
|
||
| Next, Run the following commands to install data-prep chart: | ||
|
|
||
| ```bash | ||
| cd GenAIInfra/helm-charts/common/data-prep | ||
|
|
||
| # Use the host and port received in previous step as VDMS_HOST and VDMS_PORT. | ||
| export VDMS_HOST="vdms-vector-db" | ||
| export VDMS_PORT="8001" | ||
| export INDEX_NAME="mega-videoqna" | ||
| export HFTOKEN=<your huggingface token> | ||
| # Set a directory to cache emdedding models | ||
| export CACHEDIR="/home/$USER/.cache" | ||
|
|
||
| # Export the proxy variables. Assign empty string if no proxy setup required. | ||
| export https_proxy="your_http_proxy" | ||
| export http_proxy="your_https_proxy" | ||
|
|
||
| helm dependency update | ||
| helm install data-prep . -f ../variant_videoqna-values.yaml --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set indexName=${INDEX_NAME} --set global.cacheUseHostPath=${CACHEDIR} --set vdmsHost=${VDMS_HOST} --set vdmsPort=${VDMS_PORT} --set global.https_proxy=${https_proxy} --set global.http_proxy=${http_proxy} | ||
| ``` | ||
|
|
||
| ### (Option2): Installing the chart with dependencies automatically | ||
|
|
||
| ```bash | ||
| cd GenAIInfra/helm-charts/common/data-prep | ||
| export INDEX_NAME="mega-videoqna" | ||
| export HFTOKEN=<your huggingface token> | ||
| # Set a directory to cache emdedding models | ||
| export CACHEDIR="/home/$USER/.cache" | ||
|
|
||
| # Export the proxy variables. Assign empty string if no proxy setup required. | ||
| export https_proxy="your_http_proxy" | ||
| export http_proxy="your_https_proxy" | ||
|
|
||
| helm dependency update | ||
| helm install data-prep . -f ./variant_videoqna-values.yaml --set vdms-vector-db.enabled=true --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set indexName=${INDEX_NAME} --set global.cacheUseHostPath=${CACHEDIR} --set global.https_proxy=${https_proxy} --set global.http_proxy=${http_proxy} | ||
| ``` | ||
|
|
||
| ## Verify | ||
|
|
||
| To verify the installation, run the command `kubectl get pod` to make sure all pods are running. | ||
|
|
@@ -37,21 +87,40 @@ Then run the command `kubectl port-forward svc/data-prep 6007:6007` to expose th | |
|
|
||
| Open another terminal and run the following command to verify the service if working: | ||
|
|
||
| ```console | ||
| ### 1. For Data-prep service using redis-vector-db: | ||
|
|
||
| ```bash | ||
|
|
||
| curl http://localhost:6007/v1/dataprep \ | ||
| -X POST \ | ||
| -H "Content-Type: multipart/form-data" \ | ||
| -F "files=@./README.md" | ||
| ``` | ||
|
|
||
| ### 2. For multimodal data prep service using vdms-vector-db: | ||
|
|
||
| ```bash | ||
| # 1) Download a sample video in current dir: | ||
| curl -svLO "https://github.com/opea-project/GenAIExamples/raw/refs/heads/main/VideoQnA/docker_compose/intel/cpu/xeon/data/op_1_0320241830.mp4" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it not be better for this sample video to live in some common VidoQnA directory versus in docker_compose/intel/cpu/xeon/data There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose requested change needs a PR in GenAIExamples rather than here. |
||
|
|
||
| # 2) Verify using above video | ||
| curl -X POST http://localhost:6007/v1/dataprep \ | ||
| -H "Content-Type: multipart/form-data" \ | ||
| -F "files=@./op_1_0320241830.mp4" | ||
| ``` | ||
|
|
||
| ## Values | ||
|
|
||
| | Key | Type | Default | Description | | ||
| | ---------------------- | ------ | ----------------------- | ----------- | | ||
| | image.repository | string | `"opea/dataprep-redis"` | | | ||
| | service.port | string | `"6007"` | | | ||
| | REDIS_URL | string | `""` | | | ||
| | TEI_EMBEDDING_ENDPOINT | string | `""` | | | ||
| | Key | Type | Default | Description | | ||
| | ---------------------------- | ------ | --------------------------------- | ----------- | | ||
| | image.repository | string | `"opea/dataprep-redis"` | | | ||
| | service.port | string | `"6007"` | | | ||
| | REDIS_URL | string | `""` | | | ||
| | TEI_EMBEDDING_ENDPOINT | string | `""` | | | ||
| | vdms-values:image.repository | string | `"opea/dataprep-multimodal-vdms"` | | | ||
| | vdms-values:vdmsHost | string | `""` | | | ||
| | vdms-values:vdmsPort | string | `"8001"` | | | ||
| | vdms-values:indexName | string | `"mega-videoqna"` | | | ||
|
|
||
| ## Milvus support | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # Copyright (C) 2024 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| vdms-vector-db: | ||
| enabled: true | ||
|
|
||
| image: | ||
| repository: opea/dataprep-multimodal-vdms | ||
| pullPolicy: IfNotPresent | ||
| # Overrides the image tag whose default is the chart appVersion. | ||
| tag: "latest" | ||
|
|
||
| indexName: "mega-videoqna" | ||
| vdmsHost: "" | ||
| vdmsPort: "8001" | ||
| entryCommand: ["/bin/sh"] | ||
| extraArgs: ["-c", "sleep 15 && python ingest_videos.py"] | ||
yongfengdu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Set cacheUseHostPath to for caching encoding/embedding models and other related data | ||
| global: | ||
| cacheUseHostPath: "" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,11 +41,28 @@ spec: | |
| {{- toYaml .Values.securityContext | nindent 12 }} | ||
| image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" | ||
| imagePullPolicy: {{ .Values.image.pullPolicy }} | ||
| {{- if .Values.entryCommand }} | ||
| command: {{ .Values.entryCommand }} | ||
| {{- end }} | ||
| {{- if .Values.extraArgs }} | ||
| args: | ||
| {{- range .Values.extraArgs }} | ||
| - {{ . | quote }} | ||
| {{- end }} | ||
| {{- end }} | ||
| ports: | ||
| - name: data-prep | ||
| containerPort: {{ .Values.port }} | ||
| containerPort: {{ .Values.service.containerPort }} | ||
| protocol: TCP | ||
| volumeMounts: | ||
| {{- if .Values.global.cacheUseHostPath }} | ||
| - mountPath: /home/user/.cache/clip | ||
yongfengdu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| name: cache-volume | ||
| subPath: clip | ||
| - mountPath: /home/user/.cache/huggingface/hub | ||
yongfengdu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| name: cache-volume | ||
| subPath: huggingface/hub | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When using model-volume, the subPath is not necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agree, not necessary. But I suppose, as same volume is being re-used to mount different paths from different pod/containers, having a subPath helps to have a proper directory hierarchy inside model-volume and helps with what is coming from which container path instead of dumping everything in the root of model-volume. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The model-volume was used to save the models downloaded from huggingface hub, which is the HUGGINGFACE_HUB_CACHE here https://huggingface.co/docs/text-generation-inference/en/reference/launcher#huggingfacehubcache |
||
| {{- end }} | ||
| - mountPath: /tmp | ||
| name: tmp | ||
| {{- if .Values.livenessProbe }} | ||
|
|
@@ -63,6 +80,12 @@ spec: | |
| resources: | ||
| {{- toYaml .Values.resources | nindent 12 }} | ||
| volumes: | ||
| {{- if .Values.global.cacheUseHostPath }} | ||
|
||
| - name: cache-volume | ||
| hostPath: | ||
| path: {{ .Values.global.cacheUseHostPath }} | ||
| type: Directory | ||
| {{- end }} | ||
| - name: tmp | ||
| emptyDir: {} | ||
| {{- with .Values.nodeSelector }} | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| # Copyright (C) 2024 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| image: | ||
| repository: opea/dataprep-multimodal-vdms | ||
| pullPolicy: IfNotPresent | ||
| # Overrides the image tag whose default is the chart appVersion. | ||
| tag: "latest" | ||
|
|
||
| indexName: "mega-videoqna" | ||
| vdmsHost: "" | ||
| vdmsPort: "8001" | ||
| entryCommand: ["/bin/sh"] | ||
| extraArgs: ["-c", "sleep 15 && python ingest_videos.py"] | ||
|
|
||
| # Set cacheUseHostPath to for caching encoding/embedding models and other related data | ||
| global: | ||
| cacheUseHostPath: "" |
Uh oh!
There was an error while loading. Please reload this page.