crowdsecurity
diff --git a/‎crowdsec-docs/docs/appsec/alerts_and_scenarios.md‎
Lines changed: 1 addition & 1 deletion b/‎crowdsec-docs/docs/appsec/alerts_and_scenarios.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎crowdsec-docs/docs/appsec/benchmark.md‎
Lines changed: 1 addition & 1 deletion b/‎crowdsec-docs/docs/appsec/benchmark.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎crowdsec-docs/docs/appsec/configuration.md‎
Lines changed: 2 additions & 2 deletions b/‎crowdsec-docs/docs/appsec/configuration.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎crowdsec-docs/docs/appsec/quickstart/traefik.mdx‎
Lines changed: 4 additions & 4 deletions b/‎crowdsec-docs/docs/appsec/quickstart/traefik.mdx‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎crowdsec-docs/docs/getting_started/crowdsec_tour.mdx‎
Lines changed: 1 addition & 1 deletion b/‎crowdsec-docs/docs/getting_started/crowdsec_tour.mdx‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎crowdsec-docs/docs/log_processor/data_sources/introduction.md‎
Lines changed: 26 additions & 12 deletions b/‎crowdsec-docs/docs/log_processor/data_sources/introduction.md‎
Lines changed: 26 additions & 12 deletions
diff --git a/‎crowdsec-docs/docs/log_processor/data_sources/syslog_service.md‎
Lines changed: 2 additions & 2 deletions b/‎crowdsec-docs/docs/log_processor/data_sources/syslog_service.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎crowdsec-docs/docs/log_processor/data_sources/troubleshoot.md‎
Lines changed: 1 addition & 1 deletion b/‎crowdsec-docs/docs/log_processor/data_sources/troubleshoot.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎crowdsec-docs/docs/log_processor/intro.mdx‎
Lines changed: 21 additions & 16 deletions b/‎crowdsec-docs/docs/log_processor/intro.mdx‎
Lines changed: 21 additions & 16 deletions
diff --git a/‎crowdsec-docs/docs/log_processor/service-discovery-setup/detect-yaml.md‎
Lines changed: 139 additions & 0 deletions b/‎crowdsec-docs/docs/log_processor/service-discovery-setup/detect-yaml.md‎
Lines changed: 139 additions & 0 deletions
@@ -115,7 +115,7 @@ We can now create a scenario that will trigger when a single IPs triggers this r
 type: leaky
 format: 3.0
 name: crowdsecurity/foobar-enum
-description: "Ban IPs repeateadly triggering out of band rules"
+description: "Ban IPs repeatedly triggering out of band rules"
 filter: "evt.Meta.log_type == 'appsec-info' && evt.Meta.rule_name == 'crowdsecurity/foobar-access'"
 distinct: evt.Meta.target_uri
 leakspeed: "60s"
 
@@ -15,7 +15,7 @@ sidebar_position: 80
 
 -->
 
-The Application Security Component benchmarks have been run on a AWS EC2 Instance `t2.medium` (2vCPU/4Go RAM).
+The Application Security Component benchmarks have been run on a AWS EC2 Instance `t2.medium` (2vCPU/4GiB RAM).
 
 All the benchmarks have been run with only one `routine` configured for the Application Security Component.
 
 
@@ -6,7 +6,7 @@ sidebar_position: 6
 
 ## Overview
 
-This page explains the interraction between various files involved in AppSec configuration and the details about the processing pipeline AppSec request processing.
+This page explains the interaction between various files involved in AppSec configuration and the details about the processing pipeline AppSec request processing.
 
 **Prerequisites**: 
 - Familiarity with [AppSec concepts](/appsec/intro.md)
@@ -24,7 +24,7 @@ The goals of the acquisition file are:
 - To specify the **address** and **port** where the AppSec-enabled Remediation Component(s) will forward the requests to.
 - And specify one or more [AppSec configuration files](#appsec-configuration) to use as definition of what rules to apply and how. 
 
-Details can be found in the [AppSec Datasource page](/log_processor/data_sources/apps).
+Details can be found in the [AppSec Datasource page](/log_processor/data_sources/appsec.md).
 
 ### Defining Multiple AppSec Configurations
 
 
@@ -25,7 +25,7 @@ Additionally, we'll show how to monitor these alerts through the [console](https
    -  Traefik Plugin **[Remediation Component](/u/bouncers/intro)**: Thanks to [maxlerebourg](https://github.com/maxlerebourg) and team they created a [Traefik Plugin](https://plugins.traefik.io/plugins/6335346ca4caa9ddeffda116/crowdsec-bouncer-traefik-plugin) that allows you to block requests directly from Traefik.
 
 :::info
-Prior to starting the guide ensure you are using the [Traefik Plugin](https://plugins.traefik.io/plugins/6335346ca4caa9ddeffda116/crowdsec-bouncer-traefik-plugin) and **NOT** the older [traefik-crowdsec-bouncer](https://app.crowdsec.net/hub/author/fbonalair/remediation-components/traefik-crowdsec-bouncer) as it hasnt recieved updates to use the new AppSec Component.
+Prior to starting the guide ensure you are using the [Traefik Plugin](https://plugins.traefik.io/plugins/6335346ca4caa9ddeffda116/crowdsec-bouncer-traefik-plugin) and **NOT** the older [traefik-crowdsec-bouncer](https://app.crowdsec.net/hub/author/fbonalair/remediation-components/traefik-crowdsec-bouncer) as it hasnt received updates to use the new AppSec Component.
 :::
 
 :::warning
@@ -77,7 +77,7 @@ If you have a folder in which you are persisting the configuration files, you ca
 There steps will change depending on how you are running the Security Engine. If you are running via `docker run` then you should launch the container within the same directory as the `appsec.yaml` file. If you are using `docker-compose` you can use a relative file mount to mount the `appsec.yaml` file.
 
 Steps:
-    1. Change to the location where you exectued the `docker run` or `docker compose` command.
+    1. Change to the location where you executed the `docker run` or `docker compose` command.
     2. Create a `appsec.yaml` file at the base of the directory.
     3. Add the following content to the `appsec.yaml` file.
 
@@ -96,11 +96,11 @@ Since CrowdSec is running inside a container you must set the `listen_addr` to `
 
 <FormattedTabs
     docker={`# Note if you have a docker run already running you will need to stop it before running this command
-docker run -d --name crowdsec -v /path/to/orginal:/etc/crowdsec -v ./appsec.yaml:/etc/crowdsec/acquis.d/appsec.yaml crowdsecurity/crowdsec`}
+docker run -d --name crowdsec -v /path/to/original:/etc/crowdsec -v ./appsec.yaml:/etc/crowdsec/acquis.d/appsec.yaml crowdsecurity/crowdsec`}
     dockerCompose={`services:
   crowdsec:
     volumes:
-      - /path/to/orginal:/etc/crowdsec ## or named volumes
+      - /path/to/original:/etc/crowdsec ## or named volumes
       - ./appsec.yaml:/etc/crowdsec/acquis.d/appsec.yaml`}
 />
 
 
@@ -250,7 +250,7 @@ Those metrics are a great way to know if your configuration is correct:
 The `Acquisition Metrics` is a great way to know if your parsers are setup correctly:
 
  - If you have 0 **LINES PARSED** for a source : You are probably *missing* a parser, or you have a custom log format that prevents the parser from understanding your logs.
- - However, it's perfectly OK to have a lot of **LINES UNPARSED** : Crowdsec is not a SIEM, and only parses the logs that are relevant to its scenarios. For example, [ssh parser](https://hub.crowdsec.net/author/crowdsecurity/configurations/sshd-logs),  only cares about failed authentication events (at the time of writting).
+ - However, it's perfectly OK to have a lot of **LINES UNPARSED** : Crowdsec is not a SIEM, and only parses the logs that are relevant to its scenarios. For example, [ssh parser](https://hub.crowdsec.net/author/crowdsecurity/configurations/sshd-logs),  only cares about failed authentication events (at the time of writing).
  - **LINES POURED TO BUCKET** tell you that your scenarios are matching your log sources : it means that some events from this log source made all their way to an actual scenario
 
 
 
@@ -1,20 +1,25 @@
 ---
 id: intro
-title: Acquisition Datasources Introduction
+title: Acquisition Datasources
 sidebar_position: 1
 ---
 
-## Datasources
+To monitor applications, the Security Engine needs to read logs.
+DataSources define where to access them (either as files, or over the network from a centralized logging service).
 
-To be able to monitor applications, the Security Engine needs to access logs.
-DataSources are configured via the [acquisition](/configuration/crowdsec_configuration.md#acquisition_path) configuration, or specified via the command-line when performing cold logs analysis.
+They can be defined:
 
+- in [Acquisition files](/configuration/crowdsec_configuration.md#acquisition_path). Each file can contain multiple DataSource definitions. This configuration can be generated automatically, please refer to the [Service Discovery documentation](/log_processor/service-discovery-setup/intro.md)
+- for cold log analysis, you can also specify acquisitions via the command line.
+
+
+## Datasources modules
 
 Name | Type | Stream | One-shot
 -----|------|--------|----------
 [Appsec](/log_processor/data_sources/appsec.md) | expose HTTP service for the Appsec component | yes | no
 [AWS cloudwatch](/log_processor/data_sources/cloudwatch.md) | single stream or log group | yes | yes
-[AWS kinesis](/log_processor/data_sources/kinesis.md)| read logs from a kinesis strean | yes | no
+[AWS kinesis](/log_processor/data_sources/kinesis.md)| read logs from a kinesis stream | yes | no
 [AWS S3](/log_processor/data_sources/s3.md)| read logs from a S3 bucket | yes | yes
 [docker](/log_processor/data_sources/docker.md) | read logs from docker containers | yes | yes
 [file](/log_processor/data_sources/file.md) | single files, glob expressions and .gz files | yes | yes
@@ -46,6 +51,7 @@ An expression that will run after the acquisition has read one line, and before
 It allows to modify an event (or generate multiple events from one line) before parsing.
 
 For example, if you acquire logs from a file containing a JSON object on each line, and each object has a `Records` array with multiple events, you can use the following to generate one event per entry in the array:
+
 ```
 map(JsonExtractSlice(evt.Line.Raw, "Records"), ToJsonString(#))
 ```
@@ -62,39 +68,47 @@ By default, when reading logs in real-time, crowdsec will use the time at which
 
 Setting this option to `true` will force crowdsec to use the timestamp from the log as the time of the event.
 
-It is mandatory to set this if your application buffers logs before writting them (for example, IIS when writing to a log file, or logs written to S3 from almost any AWS service).<br/>
+It is mandatory to set this if your application buffers logs before writing them (for example, IIS when writing to a log file, or logs written to S3 from almost any AWS service).<br/>
 If not set, then crowdsec will think all logs happened at once, which can lead to some false positive detections.
 
 ### `labels`
 
 A map of labels to add to the event.
 The `type` label is mandatory, and used by the Security Engine to choose which parser to use.
 
-## Acquisition configuration example
+## Acquisition configuration examples
 
-```yaml title="/etc/crowdsec/acquis.yaml"
+```yaml title="/etc/crowdsec/acquis.d/nginx.yaml"
 filenames:
   - /var/log/nginx/*.log
 labels:
   type: nginx
----
+```
+
+```yaml title="/etc/crowdsec/acquis.d/linux.yaml"
 filenames:
  - /var/log/auth.log
  - /var/log/syslog
 labels:
   type: syslog
----
+```
+
+```yaml title="/etc/crowdsec/acquis.d/docker.yaml"
 source: docker
 container_name_regexp:
  - .*caddy*
 labels:
   type: caddy
 ---
-...
+source: docker
+container_name_regexp:
+ - .*nginx*
+labels:
+  type: nginx
 ```
 
 :::warning
 The `labels` and `type` fields are necessary to dispatch the log lines to the right parser.
 
-Also note between each datasource is `---` this is needed to separate multiple YAML documents (each datasource) in a single file.
+In the last example we defined multiple datasources separated by the line `---`, which is the standard YAML marker.
 :::
@@ -51,6 +51,6 @@ This module does not support command-line acquisition.
 
 :::warning
 This syslog datasource is currently intended for small setups, and is at risk of losing messages over a few hundreds events/second.
-To process significant amounts of logs, rely on dedicated syslog server such as [rsyslog](https://www.rsyslog.com/), with this server writting logs to files that Security Engine will read from.
+To process significant amounts of logs, rely on dedicated syslog server such as [rsyslog](https://www.rsyslog.com/), with this server writing logs to files that Security Engine will read from.
 This page will be updated with further improvements of this data source.
-:::
+:::
@@ -5,7 +5,7 @@ sidebar_position: 10
 ---
 
 The [prometheus](/observability/prometheus.md) instrumentation exposes metrics about acquisition and data sources.
-Those can as well be view via `cscli metrics` :
+Those can as well be viewed via `cscli metrics` :
 
 ```bash
 INFO[19-08-2021 06:33:31 PM] Acquisition Metrics:                         
 
@@ -4,24 +4,23 @@ title: Introduction
 sidebar_position: 1
 ---
 
-The Log Processor is one of the core component of the Security Engine to:
+The Log Processor is a core component of the Security Engine. It:
 
-- Read logs from [Data Sources](log_processor/data_sources/introduction.md) in the form of Acquistions.
-- Parse the logs and extract relevant information using [Parsers](log_processor/parsers/introduction.mdx).
-- Enrich the parsed information with additional context such as GEOIP, ASN using [Enrichers](log_processor/parsers/enricher.md).  
-- Monitor the logs for patterns of interest known as [Scenarios](log_processor/scenarios/introduction.mdx).
-- Push alerts to the Local API (LAPI) for alert/decisions to be stored within the database.
-
-!TODO: Add diagram of the log processor pipeline
+- Reads logs from [Data Sources](log_processor/data_sources/introduction.md) via Acquistions.
+- Parses logs and extract relevant information using [Parsers](log_processor/parsers/introduction.mdx).
+- Enriches the parsed information with additional context such as GEOIP, ASN using [Enrichers](log_processor/parsers/enricher.md).
+- Monitors patterns of interest via [Scenarios](log_processor/scenarios/introduction.mdx).
+- Pushes alerts to the Local API (LAPI), where alert/decisions are stored.
 - Read logs from datasources
 - Parse the logs
 - Enrich the parsed information
 - Monitor the logs for patterns of interest
 
+<!-- !TODO: Add diagram of the log processor pipeline -->
 
-## Introduction
+## Log Processor
 
-The Log Processor is an internal core component of the Security Engine in charge of reading logs from Data Sources, parsing them, enriching them, and monitoring them for patterns of interest.
+The Log Processor reads logs from Data Sources, parses and enriches them, and monitors them for patterns of interest.
 
 Once a pattern of interest is detected, the Log Processor will push alerts to the Local API (LAPI) for alert/decisions to be stored within the database.
 
@@ -35,19 +34,19 @@ Data Sources are individual modules that can be loaded at runtime by the Log Pro
 
 Acquisitions are the configuration files that define how the Log Processor should read logs from a Data Source. Acquisitions are defined in YAML format and are loaded by the Log Processor at runtime.
 
-We have two ways to define Acquisitions within the [configuration directory](/u/troubleshooting/security_engine#where-is-configuration-stored) :
+We support two ways to define Acquisitions in the [configuration directory](/u/troubleshooting/security_engine#where-is-configuration-stored):
 
-- `acquis.yaml` file: This used to be only place to define Acquisitions prior to `1.5.0`. This file is still supported for backward compatibility.
-- `acquis.d` folder: This is a directory where you can define multiple Acquisitions in separate files. This is useful when you want to auto generate files using an external application such as ansible.
+- `acquis.yaml` file: the legacy, single-file configuration (still supported)
+- `acquis.d` directory: a directory of multiple acquisition files (since v1.5.0, recommended for any non-trivial setup)
 
 ```yaml title="Example Acquisition Configuration"
 ## /etc/crowdsec/acquis.d/file.yaml
 source: file ## The Data Source module to use
 filenames:
- - /tmp/foo/*.log
- - /var/log/syslog
+  - /tmp/foo/*.log
+  - /var/log/syslog
 labels:
- type: syslog
+  type: syslog
 ```
 
 For more information on Data Sources and Acquisitions, see the [Data Sources](log_processor/data_sources/introduction.md) documentation.
@@ -87,3 +86,9 @@ You can see more information on Whitelists in the [documentation](log_processor/
 Alert Context is additional context that can sent with an alert to the LAPI. This context can be shown locally via `cscli` or within the [CrowdSec Console](https://app.crowdsec.net/signup) if you opt in to share context when you enroll your instance.
 
 You can read more about Alert Context in the [documentation](log_processor/alert_context/intro.md).
+
+### Service Discovery & Setup
+
+On installation, CrowdSec can automatically detect existing services, download the relevant Hub collections, and generate acquisitions based on discovered log files.
+
+You can [customize or override these steps](log_processor/service-discovery-setup/intro.md), for example when provisioning multiple systems or using configuration management tools.
@@ -0,0 +1,139 @@
+---
+id: detect-yaml
+title: Syntax
+sidebar_position: 1
+---
+
+# Syntax
+
+A minimal detection file is a YAML map with a top‐level `detect:` key.
+
+Under it, each entry describes one service plan:
+
+```yaml
+# detect.yaml
+---
+detect:
+  apache2-file-apache2:
+    when:
+      - Systemd.UnitInstalled("apache2.service") or len(Path.Glob("/var/log/apache2/*.log")) > 0
+    hub_spec:
+      collections:
+        - crowdsecurity/apache2
+    acquisition_spec:
+      filename: apache2.yaml
+      datasource:
+        source: file
+        filenames:
+          - /var/log/apache2/*.log
+        labels:
+          type: apache2
+```
+
+## Fields
+
+### `when`
+
+A list of expression that must return a boolean.
+
+If multiple expressions are provided, they must all return `true` for the service to be included.
+
+```yaml
+when:
+ - Host.OS == "linux"
+ - Systemd.UnitInstalled("<unit>")
+```
+
+You can use any of the helper referenced [here](/log_processor/service-discovery-setup/expr.md).
+
+### `hub_spec`
+
+A map of hub items to install.
+
+Specifying an invalid item type or item will log an error but will not prevent the detection to continue.
+
+```yaml
+hub_spec:
+ collections:
+  - crowdsecurity/linux
+ parsers:
+  - crowdsecurity/nginx-logs
+ scenarios:
+  - crowdsecurity/http-bf
+```
+
+### `acquisition_spec`
+
+This item defines the acquisition that will be written to disk
+
+```yaml
+acquisition_spec:
+ filename: foobar.yaml
+ datasource:
+  source: docker
+  container_name: foo
+  labels:
+   type: bar
+```
+
+The `filename` attribute will be used to generate the name of file in the form of `acquis.d/setup.<filename>.yaml`.
+
+The content of `datasource` will be validated (syntax, required fields depending on the datasource configured) and be written as-is to the file.
+
+## Examples
+
+Basic OS / Hub only:
+
+```yaml
+detect:
+  linux:
+    when:
+      - Host.OS == "linux"
+    hub_spec:
+      collections:
+        - crowdsecurity/linux
+```
+
+`journalctl` source with a filter:
+
+```yaml
+detect:
+  caddy-journal:
+    when:
+      - Systemd.UnitInstalled("caddy.service")
+      - len(Path.Glob("/var/log/caddy/*.log")) == 0
+    hub_spec:
+      collections:
+       - crowdsecurity/caddy
+    acquisition_spec:
+      filename: caddy.yaml
+      datasource:
+        source: journalctl
+        labels:
+         type: caddy
+        journalctl_filter:
+          - "_SYSTEMD_UNIT=caddy.service"
+```
+
+Windows event log:
+
+```yaml
+detect:
+  windows_auth:
+    when:
+     - Host.OS == "windows"
+    hub_spec:
+      collections: 
+       - crowdsecurity/windows
+    acquisition_spec:
+      filename: windows_auth.yaml
+      datasource:
+        source: wineventlog
+        event_channel: Security
+        event_ids: 
+         - 4625
+         - 4623
+        event_level: information
+        labels: 
+         type: eventlog
+```