You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site-src/guides/epp-configuration/config-text.md
+99-17Lines changed: 99 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,14 @@
1
-
# Configuring Plugins via YAML
1
+
# Configuring via YAML
2
2
3
-
The set of lifecycle hooks (plugins) that are used by the Inference Gateway (IGW) is determined by how
4
-
it is configured. The IGW is primarily configured via a configuration file.
3
+
The Inference Gateway (IGW) can be configured via a YAML file.
5
4
6
-
The YAML file can either be specified as a path to a file or in-line as a parameter. The configuration defines the set of
7
-
plugins to be instantiated along with their parameters. Each plugin can also be given a name, enabling
8
-
the same plugin type to be instantiated multiple times, if needed (such as when configuring multiple scheduling profiles).
5
+
At this time the YAML file based configuration allows for:
9
6
10
-
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling a request.
11
-
If no scheduling profile is specified, a default profile, named `default` will be added and will reference all of the
12
-
instantiated plugins.
7
+
1. The set of the lifecycle hooks (plugins) that are used by the IGW.
8
+
2. The configuration of the saturation detector
9
+
3. A set of feature gates that are used to enable experimental features.
13
10
14
-
The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
15
-
will be used for a particular request. A Profile Handler must be specified, unless the configuration only
16
-
contains one profile, in which case the `SingleProfileHandler` will be used.
17
-
18
-
In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
19
-
the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
20
-
instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
11
+
The YAML file can either be specified as a path to a file or in-line as a parameter.
21
12
22
13
***NOTE***: While the configuration text looks like a Kubernetes CRD, it is
23
14
**NOT** a Kubernetes CRD. Specifically, the config is not reconciled upon, and is only read on startup.
@@ -33,10 +24,46 @@ plugins:
33
24
schedulingProfiles:
34
25
- ....
35
26
- ....
27
+
saturationDetector:
28
+
...
29
+
featureGates:
30
+
...
36
31
```
37
32
38
33
The first two lines of the configuration are constant and must appear as is.
39
34
35
+
The plugins section defines the set of plugins that will be instantiated and their parameters. This section is described in more detail in the section [Configuring Plugins via text](#configuring-plugins-via-text)
36
+
37
+
The schedulingProfiles section defines the set of scheduling profiles that can be used in scheduling
38
+
requests to pods. This section is described in more detail in the section [Configuring Plugins via YAML](#configuring-plugins-via-yaml)
39
+
40
+
The saturationDetector section configures the saturation detector, which is used to determine if special
41
+
action needs to eb taken due to the system being overloaded or saturated. This section is described in more detail in the section [Saturation Detector configuration](#saturation-detector-configuration)
42
+
43
+
The featureGates sections allows the enablement of experimental features of the IGW. This section is
44
+
described in more detail in the section [Feature Gates](#feature-gates)
45
+
46
+
## Configuring Plugins via YAML
47
+
48
+
The set of plugins that are used by the IGW is determined by how it is configured. The IGW is
49
+
primarily configured via a configuration file.
50
+
51
+
The configuration defines the set of plugins to be instantiated along with their parameters.
52
+
Each plugin can also be given a name, enabling the same plugin type to be instantiated multiple
53
+
times, if needed (such as when configuring multiple scheduling profiles).
54
+
55
+
Also defined is a set of SchedulingProfiles, which determine the set of plugins to be used when scheduling
56
+
a request. If one is not defined, a default one names `default` will be added and will reference all of
57
+
the instantiated plugins.
58
+
59
+
The set of plugins instantiated can include a Profile Handler, which determines which SchedulingProfiles
60
+
will be used for a particular request. A Profile Handler must be specified, unless the configuration only
61
+
contains one profile, in which case the `SingleProfileHandler` will be used.
62
+
63
+
In addition, the set of instantiated plugins can also include a picker, which chooses the actual pod to which
64
+
the request is scheduled after filtering and scoring. If one is not referenced in a SchedulingProfile, an
65
+
instance of `MaxScorePicker` will be added to the SchedulingProfile in question.
66
+
40
67
The plugins section defines the set of plugins that will be instantiated and their parameters.
41
68
Each entry in this section has the following form:
42
69
@@ -184,7 +211,7 @@ schedulingProfiles:
184
211
-pluginRef: max-score-picker
185
212
```
186
213
187
-
## Plugin Configuration
214
+
### Plugin Configuration
188
215
189
216
This section describes how to setup the various plugins that are available with the IGW.
190
217
@@ -260,3 +287,58 @@ scored higher (since it's more available to serve new request).
260
287
261
288
- *Type*: lora-affinity-scorer
262
289
- *Parameters*: none
290
+
291
+
## Saturation Detector configuration
292
+
293
+
The Saturation Detector is used to determine if the the cluster is overloaded, i.e. saturated. When
294
+
the cluster is saturated special actions will be taken depending what has been enabled. At this time, sheddable requests will be dropped.
295
+
296
+
The Saturation Detector determines that the cluster is saturated by looking at the following metrics provided by the inference servers:
297
+
298
+
- Backed waiting queue size
299
+
- KV cache utilization
300
+
- Metrics staleness
301
+
302
+
The Saturation Detector is configured via the saturationDetector section of the overall configuration.
303
+
It has the following form:
304
+
305
+
```yaml
306
+
saturationDetector:
307
+
queueDepthThreshold: 8
308
+
kvCacheUtilThreshold: 0.75
309
+
metricsStalenessThreshold: 150ms
310
+
```
311
+
312
+
The various sub-fields of the saturationDetector section are:
313
+
314
+
- The `queueDepthThreshold` field which defines the backend waiting queue size above which a
315
+
pod is considered to have insufficient capacity for new requests. This field is optional, if
316
+
omitted a value of `5` will be used.
317
+
- The `kvCacheUtilThreshold` field which defines the KV cache utilization (0.0 to 1.0) above
318
+
which a pod is considered to have insufficient capacity. This field is optional, if omitted
319
+
a value of `0.8` will be used.
320
+
- The `metricsStalenessThreshold` field which defines how old a pod's metrics can be. If a pod's
321
+
metrics are older than this, it might be excluded from "good capacity" considerations or treated
322
+
as having no capacity for safety. This field is optional, if omitted a value of `200ms` will be used.
323
+
324
+
## Feature Gates
325
+
326
+
The Feature Gates section allows for the enabling of experimental features of the IGW. These experimental
327
+
features are all disabled unless you explicitly enable them one by one.
328
+
329
+
The Feature Gates section has the follwoing form:
330
+
331
+
```yaml
332
+
featureGates:
333
+
enableDataLayer: true
334
+
enableFlowControl: false
335
+
```
336
+
337
+
Each sub-field of the Feature Gates section enables one experimental feature. The sub-fields are:
338
+
339
+
- `enableDataLayer`which, if present and has a value of true, enables the experimental Datalayer APIs.
340
+
- `enableFlowControl`which, if present and has a value of true, enables the experimental FlowControl
341
+
feature.
342
+
343
+
In all cases if the sub-field isn't present or has a value of false, that experimental feature will
0 commit comments