Skip to content

Commit 1ae9119

Browse files
committed
Updated documentation
Signed-off-by: Shmuel Kallner <[email protected]>
1 parent 9fc75b5 commit 1ae9119

File tree

2 files changed

+100
-18
lines changed

2 files changed

+100
-18
lines changed

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ nav:
7171
- InferencePool Rollout: guides/inferencepool-rollout.md
7272
- Metrics and Observability: guides/metrics-and-observability.md
7373
- Configuration Guide:
74-
- Configuring the plugins via configuration YAML file: guides/epp-configuration/config-text.md
74+
- Configuring the EndPoint Picker via configuration YAML file: guides/epp-configuration/config-text.md
7575
- Prefix Cache Aware Plugin: guides/epp-configuration/prefix-aware.md
7676
- Troubleshooting Guide: guides/troubleshooting.md
7777
- Implementer Guides:

site-src/guides/epp-configuration/config-text.md

Lines changed: 99 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,14 @@
1-
# Configuring Plugins via YAML
1+
# Configuring via YAML
22

3-
The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
4-
it is configured. The IGW is primarily configured via a configuration file.
3+
The Inference Gateway (IGW) can be configured via a YAML file.
54

6-
The YAML file can either be specified as a path to a file or in-line as a parameter. The configuration defines the set of
7-
plugins to be instantiated along with their parameters. Each plugin can also be given a name, enabling
8-
the same plugin type to be instantiated multiple times, if needed (such as when configuring multiple scheduling profiles).
5+
At this time the YAML file based configuration allows for:
96

10-
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request.
11-
If no scheduling profile is specified, a default profile, named `default` will be added and will reference all of the
12-
instantiated plugins.
7+
1. The set of the lifecycle hooks (plugins) that are used by the IGW.
8+
2. The configuration of the saturation detector
9+
3. A set of feature gates that are used to enable experimental features.
1310

14-
The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
15-
will be used for a particular request. A Profile Handler must be specified, unless the configuration only
16-
contains one profile, in which case the `SingleProfileHandler` will be used.
17-
18-
In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
19-
the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
20-
instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
11+
The YAML file can either be specified as a path to a file or in-line as a parameter.
2112

2213
***NOTE***: While the configuration text looks like a Kubernetes CRD, it is
2314
**NOT** a Kubernetes CRD. Specifically, the config is not reconciled upon, and is only read on startup.
@@ -33,10 +24,46 @@ plugins:
3324
schedulingProfiles:
3425
- ....
3526
- ....
27+
saturationDetector:
28+
...
29+
featureGates:
30+
...
3631
```
3732
3833
The first two lines of the configuration are constant and must appear as is.
3934
35+
The plugins section defines the set of plugins that will be instantiated and their parameters. This section is described in more detail in the section [Configuring Plugins via text](#configuring-plugins-via-text)
36+
37+
The schedulingProfiles section defines the set of scheduling profiles that can be used in scheduling
38+
requests to pods. This section is described in more detail in the section [Configuring Plugins via YAML](#configuring-plugins-via-yaml)
39+
40+
The saturationDetector section configures the saturation detector, which is used to determine if special
41+
action needs to eb taken due to the system being overloaded or saturated. This section is described in more detail in the section [Saturation Detector configuration](#saturation-detector-configuration)
42+
43+
The featureGates sections allows the enablement of experimental features of the IGW. This section is
44+
described in more detail in the section [Feature Gates](#feature-gates)
45+
46+
## Configuring Plugins via YAML
47+
48+
The set of plugins that are used by the IGW is determined by how it is configured. The IGW is
49+
primarily configured via a configuration file.
50+
51+
The configuration defines the set of plugins to be instantiated along with their parameters.
52+
Each plugin can also be given a name, enabling the same plugin type to be instantiated multiple
53+
times, if needed (such as when configuring multiple scheduling profiles).
54+
55+
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling
56+
a request. If one is not defined, a default one names `default` will be added and will reference all of
57+
the instantiated plugins.
58+
59+
The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
60+
will be used for a particular request. A Profile Handler must be specified, unless the configuration only
61+
contains one profile, in which case the `SingleProfileHandler` will be used.
62+
63+
In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
64+
the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
65+
instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
66+
4067
The plugins section defines the set of plugins that will be instantiated and their parameters.
4168
Each entry in this section has the following form:
4269

@@ -184,7 +211,7 @@ schedulingProfiles:
184211
-pluginRef: max-score-picker
185212
```
186213

187-
## Plugin Configuration
214+
### Plugin Configuration
188215

189216
This section describes how to setup the various plugins that are available with the IGW.
190217

@@ -260,3 +287,58 @@ scored higher (since it's more available to serve new request).
260287

261288
- *Type*: lora-affinity-scorer
262289
- *Parameters*: none
290+
291+
## Saturation Detector configuration
292+
293+
The Saturation Detector is used to determine if the the cluster is overloaded, i.e. saturated. When
294+
the cluster is saturated special actions will be taken depending what has been enabled. At this time, sheddable requests will be dropped.
295+
296+
The Saturation Detector determines that the cluster is saturated by looking at the following metrics provided by the inference servers:
297+
298+
- Backed waiting queue size
299+
- KV cache utilization
300+
- Metrics staleness
301+
302+
The Saturation Detector is configured via the saturationDetector section of the overall configuration.
303+
It has the following form:
304+
305+
```yaml
306+
saturationDetector:
307+
queueDepthThreshold: 8
308+
kvCacheUtilThreshold: 0.75
309+
metricsStalenessThreshold: 150ms
310+
```
311+
312+
The various sub-fields of the saturationDetector section are:
313+
314+
- The `queueDepthThreshold` field which defines the backend waiting queue size above which a
315+
pod is considered to have insufficient capacity for new requests. This field is optional, if
316+
omitted a value of `5` will be used.
317+
- The `kvCacheUtilThreshold` field which defines the KV cache utilization (0.0 to 1.0) above
318+
which a pod is considered to have insufficient capacity. This field is optional, if omitted
319+
a value of `0.8` will be used.
320+
- The `metricsStalenessThreshold` field which defines how old a pod's metrics can be. If a pod's
321+
metrics are older than this, it might be excluded from "good capacity" considerations or treated
322+
as having no capacity for safety. This field is optional, if omitted a value of `200ms` will be used.
323+
324+
## Feature Gates
325+
326+
The Feature Gates section allows for the enabling of experimental features of the IGW. These experimental
327+
features are all disabled unless you explicitly enable them one by one.
328+
329+
The Feature Gates section has the follwoing form:
330+
331+
```yaml
332+
featureGates:
333+
enableDataLayer: true
334+
enableFlowControl: false
335+
```
336+
337+
Each sub-field of the Feature Gates section enables one experimental feature. The sub-fields are:
338+
339+
- `enableDataLayer` which, if present and has a value of true, enables the experimental Datalayer APIs.
340+
- `enableFlowControl` which, if present and has a value of true, enables the experimental FlowControl
341+
feature.
342+
343+
In all cases if the sub-field isn't present or has a value of false, that experimental feature will
344+
be disabled.

0 commit comments

Comments
 (0)