NOTE: These docs are in-progress. Feel free to direct any questions / feedback / improvements to your private channel on the Semgrep slack!
The Semgrep Network Broker facilitates secure access between Semgrep and a private network.
The broker accomplishes this by establishing a Wireguard VPN tunnel with the Semgrep backend, and then proxying inbound (Semgrep --> customer) HTTP requests through this tunnel. This approach allows Semgrep to interact with on-prem resources without having to expose them to the public internet.
Examples of inbound traffic include:
- Pull Request comments
- JIRA integrations
- Webhooks
NOTE: The Semgrep Network broker uses Buf for protobuf compilation. If you are building the broker from scratch outside of Docker, make sure you have the Buf CLI installed: https://buf.build/docs/installation
- Run
make buildto build thesemgrep-network-brokerbinary locally - Run
make dockerto build a docker image - Docker images are also published to ghcr.io/semgrep/semgrep-network-broker
The broker requires a Wireguard keypair in order to establish a secure connection.
semgrep-network-broker genkeygenerates a random private key in base64 and prints it to stdoutsemgrep-network-broker pubkeyreads a base64 private key from stdin and prints the corresponding base64 public key to stdout
> semgrep-network-broker genkey
some_private_key
> echo "some_private_key" | semgrep-network-broker pubkey
some_public_keyYour public key is safe to share. Do not share your private key with anyone (including Semgrep).
Semgrep will help you create a configuration file tailored to your Semgrep deployment.
Do not alter the wireguard section.
Do not share the value of inbound.wireguard.privateKey. This is your organization's private key. Reach out to Semgrep on Slack if you need to rotate your Wireguard keys.
Example:
inbound:
wireguard:
localAddress: ...
privateKey: ...
peers:
- endpoint: ...
allowlist: [...]The httpClient configuration section modifies the HTTP client used for proxying requests.
Example:
inbound:
httpClient:
additionalCACerts: # Optional. Certificates here will be appended to the Root CA trust of the container. Necessary when the SCM(s) the broker interacts with have self-signed certificates.
- /path/to/custom/cert.pem
tlsMinVersion: "1.2" # Optional. Valid values: "1.2", "1.3". Defaults to "1.3" if unset.An alternative to stipulating additionalCACerts: is setting the $SSL_CERT_DIR environment variable at time of container creation.
Example:
$ docker run \
...
-v /path/containing/your/certs:/certs \ # mount a path from the host machine as a container volume
-e SSL_CERT_DIR=/certs \ # set the $SSL_CERT_DIR environment variable to the mounted volume
...
-it semgrep-network-broker:latest -c /emt/config.yamlRefer to the network broker docs on semgrep.dev for more detail on docker setup.
The github configuration section simplifies granting Semgrep access to leave PR comments.
Example:
inbound:
github:
baseUrl: https://github.example.com/api/v3
token: ...
allowCodeAccess: false # default is false, set to true to allow Semgrep to read file contentsUnder the hood, this config adds these allowlist items:
- GET
https://github.example.com/api/v3/app - GET
https://github.example.com/api/v3/app/hook/config - GET
https://github.example.com/api/v3/installation/repositories - GET
https://github.example.com/api/v3/organizations - GET
https://github.example.com/api/v3/orgs/:org - GET
https://github.example.com/api/v3/orgs/:org/hooks - GET
https://github.example.com/api/v3/orgs/:org/installation - GET
https://github.example.com/api/v3/orgs/:org/members - GET
https://github.example.com/api/v3/orgs/:org/repos - GET
https://github.example.com/api/v3/orgs/:org/teams - GET
https://github.example.com/api/v3/orgs/:org/teams/:team_slug/members - GET
https://github.example.com/api/v3/repos/:org/:repo/actions/secrets/public-key - GET
https://github.example.com/api/v3/repos/:owner/:repo - GET
https://github.example.com/api/v3/repos/:owner/:repo/branches - GET
https://github.example.com/api/v3/repos/:owner/:repo/collaborators/:username/permission - GET
https://github.example.com/api/v3/repos/:owner/:repo/compare/:basehead - GET
https://github.example.com/api/v3/repos/:owner/:repo/contents/.github/workflows/semgrep.yml - GET
https://github.example.com/api/v3/repos/:owner/:repo/installation - GET
https://github.example.com/api/v3/repos/:owner/:repo/pulls - GET
https://github.example.com/api/v3/repos/:owner/:repo/pulls/comments/:comment_id/reactions - GET
https://github.example.com/api/v3/user - GET
https://github.example.com/api/v3/user/repos - GET
https://github.example.com/api/v3/users/:user/installation - GET
https://github.example.com/api/v3/users/:user/installation/repositories - GET
https://github.example.com/api/v3/users/:username - POST
https://github.example.com/api/v3/app-manifests/:code/conversions - POST
https://github.example.com/api/v3/app/installations/:id/access_tokens - POST
https://github.example.com/api/v3/orgs/:org/hooks - POST
https://github.example.com/api/v3/repos/:owner/:repo/check-runs - POST
https://github.example.com/api/v3/repos/:owner/:repo/issues/:number/comments - POST
https://github.example.com/api/v3/repos/:owner/:repo/pulls/:number/comments - POST
https://github.example.com/api/v3/repos/:owner/:repo/pulls/:number/comments/:comment_id/replies - POST
https://github.example.com/api/v3/repos/:owner/:repo/statuses/:commit - PUT
https://github.example.com/api/v3/repos/:org/:repo/actions/secrets/SEMGREP_APP_TOKEN - PUT
https://github.example.com/api/v3/repos/:owner/:repo/contents/.github/workflows/semgrep.yml - PATCH
https://github.example.com/api/v3/orgs/:org/hooks/:hook_id - PATCH
https://github.example.com/api/v3/repos/:owner/:repo/check-runs/:check_run_id - PATCH
https://github.example.com/api/v3/repos/:owner/:repo/pulls/:number/comments/:comment_id - PATCH
https://github.example.com/api/v3/repos/:owner/:repo/pulls/comments/:comment_id - DELETE
https://github.example.com/api/v3/orgs/:org/hooks/:hook_id
And if allowCodeAccess is set, additionally:
- GET
https://github.example.com/:owner/:repo/info/refs - GET
https://github.example.com/api/v3/repos/:owner/:repo/commits - GET
https://github.example.com/api/v3/repos/:owner/:repo/contents - GET
https://github.example.com/api/v3/repos/:owner/:repo/contents/* - POST
https://github.example.com/:owner/:repo/git-upload-pack
Similarly, the gitlab configuration section grants Semgrep access to leave MR comments.
Example:
inbound:
gitlab:
baseUrl: https://gitlab.example.com/api/v4
token: ...
allowCodeAccess: false # default is false, set to true to allow Semgrep to read file contentsUnder the hood, this config adds these allowlist items:
- GET
https://gitlab.example.com/api/v4/:entity_type/:namespace/projects - GET
https://gitlab.example.com/api/v4/groups/:namespace/hooks - GET
https://gitlab.example.com/api/v4/groups/:namespace/members/all - GET
https://gitlab.example.com/api/v4/groups/:namespace/members/all/:user - GET
https://gitlab.example.com/api/v4/namespaces/:namespace - GET
https://gitlab.example.com/api/v4/personal_access_tokens/self - GET
https://gitlab.example.com/api/v4/projects/:project - GET
https://gitlab.example.com/api/v4/projects/:project/members/all/:user - GET
https://gitlab.example.com/api/v4/projects/:project/merge_requests - GET
https://gitlab.example.com/api/v4/projects/:project/merge_requests/:number/discussions - GET
https://gitlab.example.com/api/v4/projects/:project/merge_requests/:number/discussions/:discussion/notes/:note/award_emoji - GET
https://gitlab.example.com/api/v4/projects/:project/merge_requests/:number/versions - GET
https://gitlab.example.com/api/v4/projects/:project/repository/branches - POST
https://gitlab.example.com/api/v4/groups/:namespace/hooks - POST
https://gitlab.example.com/api/v4/projects/:project/hooks - POST
https://gitlab.example.com/api/v4/projects/:project/merge_requests/:number/discussions - POST
https://gitlab.example.com/api/v4/projects/:project/merge_requests/:number/discussions/:discussion/notes - PUT
https://gitlab.example.com/api/v4/groups/:namespace/hooks - PUT
https://gitlab.example.com/api/v4/projects/:project/merge_requests/:number/discussions/:discussion - PUT
https://gitlab.example.com/api/v4/projects/:project/merge_requests/:number/discussions/:discussion/notes/:note - DELETE
https://gitlab.example.com/api/v4/groups/:namespace/hooks/:hook - DELETE
https://gitlab.example.com/api/v4/projects/:project/hooks/:hook
And if allowCodeAccess is set, additionally:
- GET
https://gitlab.example.com/:namespace/:project/info/refs - GET
https://gitlab.example.com/api/v4/projects/:project/repository/commits - GET
https://gitlab.example.com/api/v4/projects/:project/repository/compare - GET
https://gitlab.example.com/api/v4/projects/:project/repository/files/* - GET
https://gitlab.example.com/api/v4/projects/:project/repository/merge_base - POST
https://gitlab.example.com/:namespace/:project/git-upload-pack - POST
https://gitlab.example.com/api/v4/projects/:project/statuses/:commit
Similarly, the bitbucket configuration section grants Semgrep access to leave MR comments.
inbound:
bitbucket:
baseUrl: https://bitbucket.example.com/rest/api/latest
token: ...
allowCodeAccess: false # default is false, set to true to allow Semgrep to read file contentsUnder the hood, this config adds these allowlist items:
- GET
https://bitbucket.example.com/rest/api/latest/application-properties - GET
https://bitbucket.example.com/rest/api/latest/projects/:project - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/default-branch - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/pull-requests - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/pull-requests/:number/comments/:comment - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/webhooks - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/webhooks - POST
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/pull-requests/:number/blocker-comments - POST
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/pull-requests/:number/comments - POST
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/webhooks - POST
https://bitbucket.example.com/rest/api/latest/projects/:project/webhooks - PUT
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/pull-requests/:number/comments/:comment - PUT
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/webhooks/:webhook - PUT
https://bitbucket.example.com/rest/api/latest/projects/:project/webhooks/:webhook - DELETE
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/webhooks/:webhook - DELETE
https://bitbucket.example.com/rest/api/latest/projects/:project/webhooks/:webhook
And if allowCodeAccess is set, additionally:
- GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/browse/* - GET
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/commits - GET
https://bitbucket.example.com/scm/:project/:repo/info/refs - POST
https://bitbucket.example.com/rest/api/latest/projects/:project/repos/:repo/commit/:commit/builds - POST
https://bitbucket.example.com/scm/:project/:repo/git-upload-pack
Similarly, the azuredevops configuration section grants Semgrep access to azure devops.
inbound:
azureDevOps:
baseUrl: https://[email protected]/
token: ...
allowCodeAccess: false # default is false, set to true to allow Semgrep to read file contentsUnder the hood, this config adds these allowlist items:
- GET
https://dev.azure.com/:namespace/:project/_apis/git/repositories - GET
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo - GET
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/pullRequests - GET
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/pullRequests/:number/iterations - GET
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/pullRequests/:number/iterations/:iterationId/changes - GET
https://dev.azure.com/:namespace/:project/_apis/hooks/subscriptions - GET
https://dev.azure.com/:namespace/_apis/connectionData - GET
https://dev.azure.com/:namespace/_apis/projects/:project - GET
https://vsaex.dev.azure.com/:namespace/_apis/groupentitlements - POST
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/pullRequests/:number/threads - POST
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/pullRequests/:number/threads/:threadId/comments - POST
https://dev.azure.com/:namespace/:project/_apis/hooks/subscriptions - PUT
https://dev.azure.com/:namespace/:project/_apis/hooks/subscriptions - PATCH
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/pullRequests/:number/threads - PATCH
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/pullRequests/:number/threads/:threadId/comments/:commentId
And if allowCodeAccess is set, additionally:
- GET
https://dev.azure.com/:namespace/:project/_apis/git/pullrequests/:number - GET
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/items - GET
https://dev.azure.com/:namespace/:project/_git/:repo/info/refs - POST
https://dev.azure.com/:namespace/:project/_apis/git/repositories/:repo/commits/:commit/statuses - POST
https://dev.azure.com/:namespace/:project/_git/:repo/git-upload-pack
The allowlist configuration section provides finer-grained control over what HTTP requests are allowed to be forwarded out of the broker. The first matching allowlist item is used. No allowlist match means the request will not be proxied.
Examples:
inbound:
allowlist:
# allow GET requests from http://example.com/foo (exact URL match)
- url: http://example.com/foo
methods: [GET]
# allow GET or POST requests from any path on http://example.com
- url: http://example.com/*
methods: [GET, POST]
# allow GET requests from a URL that looks like a GitHub Enterprise review comments URL, and add a bearer token to the request
- url: http://example.com/api/v3/repos/:owner/:repo/pulls/:number/comments
methods: [GET]
setRequestHeaders:
Authorization: "Bearer ...snip..."Here's an example of allowing PR comments for a GitHub Enterprise instance hosted on https://git.example.com. Replace <GH TOKEN> with a GitHub PAT.
allowlist:
- url: https://git.example.com/api/v3/repos/:owner/:repo
methods: [GET]
setRequestHeaders:
Authorization: "Bearer <GH TOKEN>"
- url: https://git.example.com/api/v3/repos/:owner/:repo/pulls
methods: [GET]
setRequestHeaders:
Authorization: "Bearer <GH TOKEN>"
- url: https://git.example.com/api/v3/repos/:owner/:repo/pulls/:number/comments
methods: [POST]
setRequestHeaders:
Authorization: "Bearer <GH TOKEN>"
- url: https://git.example.com/api/v3/repos/:owner/:repo/issues/:number/comments
methods: [POST]
setRequestHeaders:
Authorization: "Bearer <GH TOKEN>"The logging configuration section allows you to set additional logging options for requests that are proxied through the broker.
inbound:
logging:
logRequestBody: false # If true, the contents of any proxied HTTP request matching the allowlist will be logged in the request_body field in the proxy.request event
logResponseBody: false # If true, the contents of any proxied HTTP response will be logged in the response_body field in the proxy.response eventHere's an example log output of curl -X POST -H "Content-Type: application/json" "https://httpbin.org/anything" -d '{"foo": "bar"}' being proxied through the network broker:
INFO[0006] request.start client_ip="::1" id=1 method=POST path="/proxy/https://httpbin.org/anything" query= user_agent=curl/8.2.1
INFO[0006] proxy.request allowlist_match="https://httpbin.org/*" client_ip="::1" destinationUrl="https://httpbin.org/anything" id=1 method=POST path="/proxy/https://httpbin.org/anything" query= request_body="{\"foo\": \"bar\"}" user_agent=curl/8.2.1
INFO[0006] proxy.response allowlist_match="https://httpbin.org/*" client_ip="::1" destinationUrl="https://httpbin.org/anything" id=1 method=POST path="/proxy/https://httpbin.org/anything" query= response_body="{\n \"args\": {}, \n \"data\": \"{\\\"foo\\\": \\\"bar\\\"}\", \n \"files\": {}, \n \"form\": {}, \n \"headers\": {\n \"Accept\": \"*/*\", \n \"Accept-Encoding\": \"gzip\", \n \"Content-Length\": \"14\", \n \"Content-Type\": \"application/json\", \n \"Host\": \"httpbin.org\", \n \"User-Agent\": \"curl/8.2.1\", \n \"X-Amzn-Trace-Id\": \"Root=1-650469a8-0032596526902b563d7e5ebc\"\n }, \n \"json\": {\n \"foo\": \"bar\"\n }, \n \"method\": \"POST\", \n \"origin\": \"::1, ...snip..., ...snip...\", \n \"url\": \"https://httpbin.org/anything\"\n}\n" user_agent=curl/8.2.1
INFO[0006] request.response body_size=511 client_ip="::1" id=1 latency=341.905708ms method=POST path="/proxy/https://httpbin.org/anything" query= status_code=200 user_agent=curl/8.2.1
logRequestBody and logResponseBody can also be set on a per-allowlist basis:
inbound:
allowlist:
- url: https://httpbin.org/*
methods: [GET, POST, DELETE]
logRequestBody: true
logResponseBody: trueThe broker can be run in Kubernetes, as a bare Docker container, or simply as a standalone binary on a machine. If more than one instance of the broker is run at a time to manage availability, you may see some noise in the logs as the broker is not yet architected with this specific configuration in mind. However, it should still perform correctly without duplicating requests.
Config file(s) are passed to the app with -c:
semgrep-network-broker -c config.yamlMultiple config files can be overlaid on top of each other by passing multiple -c args (ex. semgrep-network-broker -c config1.yaml -c config2.yaml -c config3.yaml). Note that while maps will be merged together, arrays will be replaced.
Requirements:
- internet access to
wireguard.semgrep.devon UDP port 51820
semgrep-network-broker dump dumps the current config. This is useful to see what the result of multiple configurations overlays would result in
semgrep-network-broker genkey generates a base64 private key to stdout
semgrep-network-broker pubkey generates a base64 public key for a given private key (via stdin)
semgrep-network-broker relay launches an HTTP server that relays request that match a certain rule.
outbound:
listenPort: 8080
relay:
test:
destinationUrl: https://httpbin.org/anything
jsonPath: "$.foo"
equals:
- barwould result in requests addressed to http://localhost:8080/relay/test being relayed to https://httpbin.org/anything as long as the result of the jsonpath query $.foo executed on the request body results in the string bar.
Check out an example here for how to use the relay for GitHub PR comments.
You can also define additional relay mappings via the additionalConfigs field:
outbound:
listenPort: 8080
relay:
test:
destinationUrl: https://httpbin.org/anything
jsonPath: "$.foo"
equals:
- bar
additionalConfigs:
- destinationUrl: https://example.com/fallbackThe example above would relay traffic to https://httpbin.org/anything if the request body contains {"foo": "bar"}, otherwise, it'd relay traffic to htttps://example.com/fallback.
For other questions or feedback, join us on the Semgrep Community Slack.