You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ChatQnA/README.md
+6-8Lines changed: 6 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -126,17 +126,17 @@ cd ../../
126
126
> [!NOTE]
127
127
> If you modified any files and want that change introduced in this step, add `--build` to the end of the command to build the container image instead of pulling it from dockerhub.
128
128
129
-
## Ingest data into redis
129
+
## Ingest data into Redis
130
130
131
-
After every time of redis container is launched, data should be ingested in the container ingestion steps:
131
+
Each time the Redis container is launched, data should be ingested into the container using the commands:
132
132
133
133
```bash
134
134
docker exec -it qna-rag-redis-server bash
135
135
cd /ws
136
136
python ingest.py
137
137
```
138
138
139
-
Note: `ingest.py` will download the embedding model, please set the proxy if necessary.
139
+
Note: `ingest.py` will download the embedding model. Please set the proxy if necessary.
The LangChain backend service listens to port 8000 by port, you can customize it by change the code in `docker/qna-app/app/server.py`.
172
+
The LangChain backend service listens to port 8000, you can customize it by changing the code in `docker/qna-app/app/server.py`.
173
173
174
174
And then you can make requests like below to check the LangChain backend service status:
175
175
@@ -227,7 +227,7 @@ This will initiate the frontend service and launch the application.
227
227
228
228
# Enable TGI Gaudi FP8 for higher throughput (Optional)
229
229
230
-
The TGI Gaudi utilizes BFLOAT16 optimization as the default setting. If you aim to achieve higher throughput, you can enable FP8 quantization on the TGI Gaudi. According to our test results, FP8 quantization yields approximately a 1.8x performance gain compared to BFLOAT16. Please follow the below steps to enable FP8 quantization.
230
+
The TGI Gaudi utilizes BFLOAT16 optimization as the default setting. If you aim to achieve higher throughput, you can enable FP8 quantization on the TGI Gaudi. Note that currently only Llama2 series and Mistral series models support FP8 quantization. Please follow the below steps to enable FP8 quantization.
231
231
232
232
## Prepare Metadata for FP8 Quantization
233
233
@@ -257,9 +257,7 @@ Then modify the `dump_stats_path` to "/data/hqt_output/measure" and update `dump
Now the TGI Gaudi will launch the FP8 model by default. Please note that currently only Llama2 series and Mistral series models support FP8 quantization.
261
-
262
-
And then you can make requests like below to check the service status:
260
+
Now the TGI Gaudi will launch the FP8 model by default and you can make requests like below to check the service status:
0 commit comments