From 28b7cb0a69221532e5c11b0c6e238917fe4a0a7d Mon Sep 17 00:00:00 2001 From: Lukasz Gryglicki Date: Thu, 31 Jul 2025 10:41:02 +0200 Subject: [PATCH 1/4] Add support for skipping CLA requirement for bots Signed-off-by: Lukasz Gryglicki --- README.md | 4 + WHITELISTING_BOTS.md | 127 +++++++++ cla-backend-go/events/event_data.go | 17 ++ cla-backend-go/events/event_types.go | 2 + cla-backend-go/github/bots.go | 251 ++++++++++++++++++ cla-backend-go/github_organizations/models.go | 26 +- cla-backend-go/signatures/service.go | 9 +- .../swagger/common/github-organization.yaml | 25 ++ cla-backend-go/v2/sign/helpers.go | 20 +- cla-backend/cla/models/dynamo_models.py | 15 +- cla-backend/cla/models/event_types.py | 1 + cla-backend/cla/models/github_models.py | 219 ++++++++++++++- utils/describe_all.sh | 17 ++ utils/describe_table.sh | 11 +- utils/list_tables.sh | 10 + utils/skip_cla_entry.sh | 86 ++++++ 16 files changed, 816 insertions(+), 24 deletions(-) create mode 100644 WHITELISTING_BOTS.md create mode 100644 cla-backend-go/github/bots.go create mode 100755 utils/describe_all.sh create mode 100755 utils/list_tables.sh create mode 100755 utils/skip_cla_entry.sh diff --git a/README.md b/README.md index dfe4d040c..6adb618d0 100644 --- a/README.md +++ b/README.md @@ -61,6 +61,10 @@ The following diagram explains the EasyCLA architecture. ![CLA Architecture](.gitbook/assets/easycla-architecture-overview.png) +## Bot Whitelisting + +For whitelisting bots please see the [Whitelisting Bots](WHITELISTING_BOTS.md) documentation. + ## EasyCLA Release Process The following diagram illustrates the EasyCLA release process: diff --git a/WHITELISTING_BOTS.md b/WHITELISTING_BOTS.md new file mode 100644 index 000000000..f2a479390 --- /dev/null +++ b/WHITELISTING_BOTS.md @@ -0,0 +1,127 @@ +## Whitelisting Bots + +You can allow specific bot users to automatically pass the CLA check. + +This can be done on the GitHub organization level by setting the `skip_cla` property on `cla-{stage}-github-orgs` DynamoDB table. + +Replace `{stage}` with either `dev` or `prod`. + +This property is a Map attribute that contains mapping from repository pattern to bot username (GitHub login), email and name pattern. + +Example username/login is lukaszgryglicki (like any username/login that can be accessed via `https://github.com/username`). + +Example name is "Lukasz Gryglicki". + +Email pattern and name pattern are optional and `*` is assumed for them if not specified. + +Each pattern is a string and can be one of three possible types (and are checked tin this order): +- `"name"` - exact match for repository name, GitHub login/username, email address, GitHub name. +- `"re:regexp"` - regular expression match for repository name, GitHub username, or email address. +- `"*"` - matches all. + +So the format is like `"repository_pattern": "github_username_pattern;email_pattern;name_pattern"`. `;` is used as a separator. + +You can also specify multiple patterns so different set is used for multiple users - in such case configuration must start with `[`, end with `]` and be `||` separated. + +For example: `"[copilot-swe-agent[bot];*;*||re:(?i)^l(ukasz)?gryglicki$;*;re:Gryglicki]"`. + +There can be multiple entries under one Github Organization DynamoDB entry. + +Example: +``` +{ +(...) + "organization_name": { + "S": "linuxfoundation" + }, + "skip_cla": { + "M": { + "*": { + "S": "copilot-swe-agent[bot];*;*" + }, + "re:(?i)^repo[0-9]+$": { + "S": "re:vee?rendra;*;*" + } + } + }, +(...) +} +``` + +Algorithm to match pattern is as follows: +- First we check repository name for exact match. Repository name is without the organization name, so for `https://github.com/linuxfoundation/easycla` it is just `easycla`. If we find an entry in `skip_cla` for `easycla` that entry is used and we stop searching. +- If no exact match is found, we check for regular expression match. Only keys starting with `re:` are considered. If we find a match, we use that entry and stop searching. +- If no match is found, we check for `*` entry. If it exists, we use that entry and stop searching. +- If no match is found, we don't skip CLA check. +- Now when we have the entry, it is in the following format: `github_username_pattern;email_pattern;name_pattern` or `"[github_username_pattern;email_pattern;name_pattern||...]" (array)`. +- We check GitHub username/login, email address and name against the patterns. Algorithm is the same - username, email and name patterns can be either direct match or `re:regexp` or `*`. +- If username, email and name match the patterns, we skip CLA check. If username or email or name is not set but the pattern is `*` it means hit. +- So setting pattern to `username_pattern;*;*` or `username_pattern` (which is equivalent) means that we only check for username match and assume all emails and names are valid. +- Any actor that matches any of the entries in the array will be skipped (logical OR). +- If we set `repo_pattern` to `*` it means that this configuration applies to all repositories in the organization. If there are also specific repository patterns, they will be used instead of `*` (fallback for all). + + +There is a script that allows you to update the `skip_cla` property in the DynamoDB table. It is located in `utils/skip_cla_entry.sh`. You can run it like this: +- `` MODE=mode ./utils/skip_cla_entry.sh 'org-name' 'repo-pattern' 'github-username-pattern;email-pattern;name_pattern' ``. +- `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' '*' 'copilot-swe-agent[bot];*;*' ``. +- Complex example: `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' 're:(?i)^repo[0-9]+$' '[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]' ``. + +`MODE` can be one of: +- `put-item`: Overwrites/adds the entire `skip_cla` property. Needs all 3 arguments org, repo, and pattern. +- `add-key`: Adds or updates a key/value inside the `skip_cla` map (preserves other keys). Needs all 3 args. +- `delete-key`: Removes a key from the `skip_cla` map. Needs 2 arguments: org and repo. +- `delete-item`: Deletes the entire `skip_cla` from the item. Needs 1 argument: org. + + +You can also use AWS CLI to update the `skip_cla` property. Here is an example command: + +To add a new `skip_cla` entry: + +``` +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \ + --table-name "cla-prod-github-orgs" \ + --key '{"organization_name": {"S": "linuxfoundation"}}' \ + --update-expression 'SET skip_cla = :val' \ + --expression-attribute-values '{":val": {"M": {"re:^easycla":{"S":"copilot-swe-agent[bot];*;*"}}}}' +``` + +To add a new key to an existing `skip_cla` entry (or replace the existing key): + +``` +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \ + --table-name "cla-prod-github-orgs" \ + --key '{"organization_name": {"S": "linuxfoundation"}}' \ + --update-expression "SET skip_cla.#repo = :val" \ + --expression-attribute-names '{"#repo": "re:^easycla"}' \ + --expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot]"}}' +``` + +To delete a key from an existing `skip_cla` entry: + +``` +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \ + --table-name "cla-prod-github-orgs" \ + --key '{"organization_name": {"S": "linuxfoundation"}}' \ + --update-expression "REMOVE skip_cla.#repo" \ + --expression-attribute-names '{"#repo": "re:^easycla"}' +``` + +To delete the entire `skip_cla` entry: + +``` +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \ + --table-name "cla-prod-github-orgs" \ + --key '{"organization_name": {"S": "linuxfoundation"}}' \ + --update-expression "REMOVE skip_cla" +``` + +To see given organization's entry: `./utils/scan.sh github-orgs organization_name sun-test-org`. + +Or using AWS CLI: + +``` +aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs" --filter-expression "contains(organization_name,:v)" --expression-attribute-values "{\":v\":{\"S\":\"linuxfoundation\"}}" --max-items 100 | jq -r '.Items' +``` + +To check for log entries related to skipping CLA check, you can use the following command: `` STAGE=dev DTFROM='1 hour ago' DTTO='1 second ago' ./utils/search_aws_log_group.sh 'cla-backend-dev-githubactivity' 'skip_cla' ``. + diff --git a/cla-backend-go/events/event_data.go b/cla-backend-go/events/event_data.go index 04653d498..df4944a04 100644 --- a/cla-backend-go/events/event_data.go +++ b/cla-backend-go/events/event_data.go @@ -457,6 +457,23 @@ type CorporateSignatureSignedEventData struct { SignatoryName string } +// BypassCLAEventData event data model +type BypassCLAEventData struct { + Repo string + Config string + Actor string +} + +func (ed *BypassCLAEventData) GetEventDetailsString(args *LogEventArgs) (string, bool) { + data := fmt.Sprintf("repo='%s', config='%s', actor='%s'", ed.Repo, ed.Config, ed.Actor) + return data, true +} + +func (ed *BypassCLAEventData) GetEventSummaryString(args *LogEventArgs) (string, bool) { + data := fmt.Sprintf("repo='%s', config='%s', actor='%s'", ed.Repo, ed.Config, ed.Actor) + return data, true +} + func (ed *CorporateSignatureSignedEventData) GetEventDetailsString(args *LogEventArgs) (string, bool) { data := fmt.Sprintf("The signature was signed for the project %s and company %s by %s", args.ProjectName, ed.CompanyName, ed.SignatoryName) if args.UserName != "" { diff --git a/cla-backend-go/events/event_types.go b/cla-backend-go/events/event_types.go index cc6435a1c..8b762b4c6 100644 --- a/cla-backend-go/events/event_types.go +++ b/cla-backend-go/events/event_types.go @@ -99,4 +99,6 @@ const ( IndividualSignatureSigned = "individual.signature.signed" CorporateSignatureSigned = "corporate.signature.signed" + + BypassCLA = "Bypass CLA" ) diff --git a/cla-backend-go/github/bots.go b/cla-backend-go/github/bots.go new file mode 100644 index 000000000..5e13a1abb --- /dev/null +++ b/cla-backend-go/github/bots.go @@ -0,0 +1,251 @@ +// Copyright The Linux Foundation and each contributor to CommunityBridge. +// SPDX-License-Identifier: MIT + +package github + +import ( + "fmt" + "regexp" + "strings" + + "github.com/linuxfoundation/easycla/cla-backend-go/events" + "github.com/linuxfoundation/easycla/cla-backend-go/gen/v1/models" + log "github.com/linuxfoundation/easycla/cla-backend-go/logging" + "github.com/sirupsen/logrus" +) + +// propertyMatches returns true if value matches the pattern. +// - "*" matches anything +// - "re:..." matches regex (value must be non-empty) +// - otherwise, exact match +func propertyMatches(pattern, value string) bool { + f := logrus.Fields{ + "functionName": "github.propertyMatches", + "pattern": pattern, + "value": value, + } + if pattern == "*" { + return true + } + if value == "" { + return false + } + if strings.HasPrefix(pattern, "re:") { + regex := pattern[3:] + re, err := regexp.Compile(regex) + if err != nil { + log.WithFields(f).Debugf("Error in propertyMatches: bad regexp: %s, error: %v", regex, err) + return false + } + return re.MatchString(value) + } + return value == pattern +} + +// stripOrg removes the organization part from the repository name. +// If input is "org/repo", returns "repo". If no "/", returns input unchanged. +func stripOrg(repoFull string) string { + idx := strings.Index(repoFull, "/") + if idx >= 0 && idx+1 < len(repoFull) { + return repoFull[idx+1:] + } + return repoFull +} + +// isActorSkipped returns true if the actor should be skipped according to ANY pattern in config. +// Each config entry is ";;" +// Any missing pattern defaults to "*" +func isActorSkipped(actor *UserCommitSummary, config []string) bool { + for _, pattern := range config { + parts := strings.Split(pattern, ";") + for len(parts) < 3 { + parts = append(parts, "*") + } + loginPattern, emailPattern, namePattern := parts[0], parts[1], parts[2] + + var login, email, name string + if actor != nil && actor.CommitAuthor != nil { + if actor.CommitAuthor.Login != nil { + login = *actor.CommitAuthor.Login + } + if actor.CommitAuthor.Email != nil { + email = *actor.CommitAuthor.Email + } + if actor.CommitAuthor.Name != nil { + name = *actor.CommitAuthor.Name + } + } + + if propertyMatches(loginPattern, login) && + propertyMatches(emailPattern, email) && + propertyMatches(namePattern, name) { + return true + } + } + return false +} + +// actorToString converts a UserCommitSummary actor to a string representation. +func actorToString(actor *UserCommitSummary) string { + const nullStr = "(null)" + if actor == nil { + return nullStr + } + id, login, username, email := nullStr, nullStr, nullStr, nullStr + if actor.CommitAuthor != nil && actor.CommitAuthor.ID != nil { + id = fmt.Sprintf("%v", *actor.CommitAuthor.ID) + } + if actor.CommitAuthor != nil && actor.CommitAuthor.Login != nil { + login = *actor.CommitAuthor.Login + } + if actor.CommitAuthor != nil && actor.CommitAuthor.Name != nil { + username = *actor.CommitAuthor.Name + } + if actor.CommitAuthor != nil && actor.CommitAuthor.Email != nil { + email = *actor.CommitAuthor.Email + } + return fmt.Sprintf("id='%v',login='%v',username='%v',email='%v'", id, login, username, email) +} + +// parseConfigPatterns takes a config string and returns a slice of pattern strings. +// If the config starts with '[' and ends with ']', splits by '||' inside; else returns []string{config}. +// Trims whitespace from each pattern. +func parseConfigPatterns(config string) []string { + config = strings.TrimSpace(config) + if len(config) >= 2 && strings.HasPrefix(config, "[") && strings.HasSuffix(config, "]") { + inner := config[1 : len(config)-1] + parts := strings.Split(inner, "||") + for i, p := range parts { + parts[i] = strings.TrimSpace(p) + } + return parts + } + return []string{config} +} + +// SkipWhitelistedBots- check if the actors are whitelisted based on the skip_cla configuration. +// Returns two lists: +// - actors still missing cla: actors who still need to sign the CLA after checking skip_cla +// - whitelisted actors: actors who are skipped due to skip_cla configuration +// :param orgModel: The GitHub organization model instance. +// :param orgRepo: The repository name in the format 'org/repo'. +// :param actorsMissingCla: List of UserCommitSummary objects representing actors who are missing CLA. +// :return: two arrays (actors still missing CLA, whitelisted actors) +// : in cla-{stage}-github-orgs table there can be a skip_cla field which is a dict with the following structure: +// +// { +// "repo-name": ";;", +// "re:repo-regexp": "[;;||...]", +// "*": "" +// } +// +// where: +// - repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo") +// - re:repo-regexp is a regex pattern to match repository names +// - * is a wildcard that applies to all repositories +// - is a GitHub username pattern (exact match or regex prefixed by re: or match all '*') +// - is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*' +// - is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*' +// The username/login, email and name patterns are separated by a semicolon (;). Email and name parts are optional. +// There can be an array of patterns for a single repository, separated by ||. It must start with a '[' and end with a ']': "[...||...||...]" +// If the skip_cla is not set, it will skip the whitelisted bots check. +func SkipWhitelistedBots(ev events.Service, orgModel *models.GithubOrganization, orgRepo, projectID string, actorsMissingCLA []*UserCommitSummary) ([]*UserCommitSummary, []*UserCommitSummary) { + repo := stripOrg(orgRepo) + f := logrus.Fields{ + "functionName": "github.SkipWhitelistedBots", + "orgRepo": orgRepo, + "repo": repo, + "projectID": projectID, + } + outActorsMissingCLA := []*UserCommitSummary{} + whitelistedActors := []*UserCommitSummary{} + + skipCLA := orgModel.SkipCla + if skipCLA == nil { + log.WithFields(f).Debug("skip_cla is not set, skipping whitelisted bots check") + return actorsMissingCLA, []*UserCommitSummary{} + } + + var config string + // 1. Exact match + if val, ok := skipCLA[repo]; ok { + config = val + log.WithFields(f).Debugf("skip_cla config found for repo (exact hit): '%s'", config) + } + + // 2. Regex match (if no exact hit) + if config == "" { + log.WithFields(f).Debug("No skip_cla config found for repo, checking regex patterns") + for k, v := range skipCLA { + if !strings.HasPrefix(k, "re:") { + continue + } + pattern := k[3:] + re, err := regexp.Compile(pattern) + if err != nil { + log.WithFields(f).Warnf("Invalid regex in skip_cla: '%s': %+v", pattern, err) + continue + } + if re.MatchString(repo) { + config = v + log.WithFields(f).Debugf("Found skip_cla config for repo via regex pattern: '%s'", config) + break + } + } + } + + // 3. Wildcard fallback + if config == "" { + if val, ok := skipCLA["*"]; ok { + config = val + log.WithFields(f).Debugf("No skip_cla config found for repo, using wildcard config: '%s'", config) + } + } + + // 4. No match + if config == "" { + log.WithFields(f).Debug("No skip_cla config found for repo, skipping whitelisted bots check") + return actorsMissingCLA, []*UserCommitSummary{} + } + + configArray := parseConfigPatterns(config) + + // Log full configuration + actorDebugData := make([]string, 0, len(actorsMissingCLA)) + for _, a := range actorsMissingCLA { + actorDebugData = append(actorDebugData, actorToString(a)) + } + log.WithFields(f).Debugf("final skip_cla config for repo %s is %+v; actorsMissingCLA: [%s]", orgRepo, configArray, strings.Join(actorDebugData, ", ")) + + for _, actor := range actorsMissingCLA { + if actor == nil { + continue + } + actorData := actorToString(actor) + log.WithFields(f).Debugf("Checking actor: %s for skip_cla config: %+v", actorData, configArray) + if isActorSkipped(actor, configArray) { + msg := fmt.Sprintf( + "Skipping CLA check for repo='%s', actor: %s due to skip_cla config: %+v", + orgRepo, actorData, configArray, + ) + log.WithFields(f).Info(msg) + eventData := events.BypassCLAEventData{ + Repo: orgRepo, + Config: config, + Actor: actorData, + } + ev.LogEvent(&events.LogEventArgs{ + EventType: events.BypassCLA, + EventData: &eventData, + ProjectID: projectID, + }) + log.WithFields(f).Debugf("event logged") + actor.Authorized = true + whitelistedActors = append(whitelistedActors, actor) + } else { + outActorsMissingCLA = append(outActorsMissingCLA, actor) + } + } + + return outActorsMissingCLA, whitelistedActors +} diff --git a/cla-backend-go/github_organizations/models.go b/cla-backend-go/github_organizations/models.go index 7e11e7954..6ec8e0167 100644 --- a/cla-backend-go/github_organizations/models.go +++ b/cla-backend-go/github_organizations/models.go @@ -7,18 +7,19 @@ import "github.com/linuxfoundation/easycla/cla-backend-go/gen/v1/models" // GithubOrganization is data model for github organizations type GithubOrganization struct { - DateCreated string `json:"date_created,omitempty"` - DateModified string `json:"date_modified,omitempty"` - OrganizationInstallationID int64 `json:"organization_installation_id,omitempty"` - OrganizationName string `json:"organization_name,omitempty"` - OrganizationNameLower string `json:"organization_name_lower,omitempty"` - OrganizationSFID string `json:"organization_sfid,omitempty"` - ProjectSFID string `json:"project_sfid"` - Enabled bool `json:"enabled"` - AutoEnabled bool `json:"auto_enabled"` - BranchProtectionEnabled bool `json:"branch_protection_enabled"` - AutoEnabledClaGroupID string `json:"auto_enabled_cla_group_id,omitempty"` - Version string `json:"version,omitempty"` + DateCreated string `json:"date_created,omitempty"` + DateModified string `json:"date_modified,omitempty"` + OrganizationInstallationID int64 `json:"organization_installation_id,omitempty"` + OrganizationName string `json:"organization_name,omitempty"` + OrganizationNameLower string `json:"organization_name_lower,omitempty"` + OrganizationSFID string `json:"organization_sfid,omitempty"` + ProjectSFID string `json:"project_sfid"` + Enabled bool `json:"enabled"` + AutoEnabled bool `json:"auto_enabled"` + BranchProtectionEnabled bool `json:"branch_protection_enabled"` + AutoEnabledClaGroupID string `json:"auto_enabled_cla_group_id,omitempty"` + Version string `json:"version,omitempty"` + SkipCLA map[string]string `json:"skip_cla,omitempty"` } // ToModel converts to models.GithubOrganization @@ -35,6 +36,7 @@ func ToModel(in *GithubOrganization) *models.GithubOrganization { AutoEnabledClaGroupID: in.AutoEnabledClaGroupID, BranchProtectionEnabled: in.BranchProtectionEnabled, ProjectSFID: in.ProjectSFID, + SkipCla: in.SkipCLA, } } diff --git a/cla-backend-go/signatures/service.go b/cla-backend-go/signatures/service.go index 1b44f395a..81f2ef8c0 100644 --- a/cla-backend-go/signatures/service.go +++ b/cla-backend-go/signatures/service.go @@ -1114,6 +1114,13 @@ func (s service) updateChangeRequest(ctx context.Context, ghOrg *models.GithubOr } log.WithFields(f).Debugf("commit authors status => signed: %+v and missing: %+v", signed, unsigned) + var whitelisted []*github.UserCommitSummary + unsigned, whitelisted = github.SkipWhitelistedBots(s.eventsService, ghOrg, gitHubRepoName, projectID, unsigned) + if len(whitelisted) > 0 { + log.WithFields(f).Debugf("adding %d whitelisted actors to signed list", len(whitelisted)) + signed = append(signed, whitelisted...) + } + log.WithFields(f).Debugf("commit authors status after whitelisting bots => signed: %+v, missing: %+v, whitelisted: %+v", signed, unsigned, whitelisted) // update pull request updateErr := github.UpdatePullRequest(ctx, ghOrg.OrganizationInstallationID, int(pullRequestID), gitHubOrgName, gitHubRepoName, githubRepository.ID, *latestSHA, signed, unsigned, s.claBaseAPIURL, s.claLandingPage, s.claLogoURL) @@ -1132,7 +1139,7 @@ func (s service) updateChangeRequest(ctx context.Context, ghOrg *models.GithubOr // true, true, nil if user has an ECLA (authorized, with company affiliation, no error) func (s service) HasUserSigned(ctx context.Context, user *models.User, projectID string) (*bool, *bool, error) { f := logrus.Fields{ - "functionName": "v1.signatures.service.updateChangeRequest", + "functionName": "v1.signatures.service.HasUserSigned", "projectID": projectID, "user": user, } diff --git a/cla-backend-go/swagger/common/github-organization.yaml b/cla-backend-go/swagger/common/github-organization.yaml index d66f4d5b0..c2abf202c 100644 --- a/cla-backend-go/swagger/common/github-organization.yaml +++ b/cla-backend-go/swagger/common/github-organization.yaml @@ -69,7 +69,32 @@ properties: x-nullable: true example: "https://github.com/organizations/deal-test-org-2/settings/installations/1235464" format: uri + skipCla: + type: object + additionalProperties: + type: string + description: | + Map of repository name or pattern (e.g. 'repo1', '*', 're:pattern') to a string or array-string of pattern entries for skipping CLA checks for certain bots. + + Each value can be either: + - A string in the form ';;' (email and name patterns are optional, default to '*'). + - Or an OR-array in the form '[||||...]', where each entry uses the same pattern format above. + + Patterns can be: + - An exact match (e.g. 'repo1', 'username', 'email@domain'). + - A wildcard '*' to match all. + - A regular expression prefixed with 're:' (e.g. 're:(?i)^bot.*'). + Example formats: + - "copilot-swe-agent[bot];*;*" + - "re:vee?rendra;*;*" + - "[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]" + - "username;*" + - "username;email@domain.com;Real Name" + example: + "*": "copilot-swe-agent[bot];*;*" + "repo1": "re:vee?rendra;*;*" + "re:(?i)^repo[0-9]+$": "[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]" repositories: type: object properties: diff --git a/cla-backend-go/v2/sign/helpers.go b/cla-backend-go/v2/sign/helpers.go index 8e785b6db..9a77a669a 100644 --- a/cla-backend-go/v2/sign/helpers.go +++ b/cla-backend-go/v2/sign/helpers.go @@ -41,6 +41,15 @@ func (s service) updateChangeRequest(ctx context.Context, installationID, reposi return errors.New(msg) } + var ghOrg *models.GithubOrganization + if githubRepository.Owner.Login != nil { + var ghOrgErr error + ghOrg, ghOrgErr = s.githubOrgService.GetGitHubOrganizationByName(ctx, *githubRepository.Owner.Login) + if ghOrgErr != nil { + log.WithFields(f).WithError(ghOrgErr).Warnf("unable to lookup GitHub organization by name: %s - unable to update GitHub status", *githubRepository.Owner.Login) + } + } + gitHubOrgName := utils.StringValue(githubRepository.Owner.Login) gitHubRepoName := utils.StringValue(githubRepository.Name) @@ -138,6 +147,15 @@ func (s service) updateChangeRequest(ctx context.Context, installationID, reposi } log.WithFields(f).Debugf("commit authors status => signed: %+v and missing: %+v", signed, unsigned) + if ghOrg != nil { + var whitelisted []*github.UserCommitSummary + unsigned, whitelisted = github.SkipWhitelistedBots(s.eventsService, ghOrg, gitHubRepoName, projectID, unsigned) + if len(whitelisted) > 0 { + log.WithFields(f).Debugf("adding %d whitelisted actors to signed list", len(whitelisted)) + signed = append(signed, whitelisted...) + } + log.WithFields(f).Debugf("commit authors status after whitelisting bots => signed: %+v, missing: %+v, whitelisted: %+v", signed, unsigned, whitelisted) + } // update pull request updateErr := github.UpdatePullRequest(ctx, installationID, int(pullRequestID), gitHubOrgName, gitHubRepoName, githubRepository.ID, *latestSHA, signed, unsigned, s.ClaV1ApiURL, s.claLandingPage, s.claLogoURL) @@ -156,7 +174,7 @@ func (s service) updateChangeRequest(ctx context.Context, installationID, reposi // true, true, nil if user has an ECLA (authorized, with company affiliation, no error) func (s service) hasUserSigned(ctx context.Context, user *models.User, projectID string) (*bool, *bool, error) { f := logrus.Fields{ - "functionName": "v1.signatures.service.updateChangeRequest", + "functionName": "v1.signatures.service.hasUserSigned", "projectID": projectID, "user": user, } diff --git a/cla-backend/cla/models/dynamo_models.py b/cla-backend/cla/models/dynamo_models.py index cf900b733..5424b2604 100644 --- a/cla-backend/cla/models/dynamo_models.py +++ b/cla-backend/cla/models/dynamo_models.py @@ -3901,6 +3901,7 @@ class Meta: branch_protection_enabled = BooleanAttribute(null=True) enabled = BooleanAttribute(null=True) note = UnicodeAttribute(null=True) + skip_cla = MapAttribute(of=UnicodeAttribute, null=True) class GitHubOrg(model_interfaces.GitHubOrg): # pylint: disable=too-many-public-methods @@ -3910,7 +3911,7 @@ class GitHubOrg(model_interfaces.GitHubOrg): # pylint: disable=too-many-public- def __init__( self, organization_name=None, organization_installation_id=None, organization_sfid=None, - auto_enabled=False, branch_protection_enabled=False, note=None, enabled=True + auto_enabled=False, branch_protection_enabled=False, note=None, enabled=True, skip_cla=None, ): super(GitHubOrg).__init__() self.model = GitHubOrgModel() @@ -3923,6 +3924,7 @@ def __init__( self.model.branch_protection_enabled = branch_protection_enabled self.model.note = note self.model.enabled = enabled + self.model.skip_cla = skip_cla def __str__(self): return ( @@ -3933,8 +3935,9 @@ def __str__(self): f'organization company id: {self.model.organization_company_id}, ' f'auto_enabled: {self.model.auto_enabled},' f'branch_protection_enabled: {self.model.branch_protection_enabled},' - f'note: {self.model.note}' - f'enabled: {self.model.enabled}' + f'note: {self.model.note},' + f'enabled: {self.model.enabled},' + f'skip_cla: {self.model.skip_cla}' ) def to_dict(self): @@ -3980,6 +3983,9 @@ def get_auto_enabled(self): def get_branch_protection_enabled(self): return self.model.branch_protection_enabled + def get_skip_cla(self): + return self.model.skip_cla + def get_note(self): """ Getter for the note. @@ -4017,6 +4023,9 @@ def set_auto_enabled(self, auto_enabled): def set_branch_protection_enabled(self, branch_protection_enabled): self.model.branch_protection_enabled = branch_protection_enabled + def set_skip_cla(self, skip_cla): + self.model.skip_cla = skip_cla + def set_note(self, note): self.model.note = note diff --git a/cla-backend/cla/models/event_types.py b/cla-backend/cla/models/event_types.py index 4f6ca5d66..b70574e7e 100644 --- a/cla-backend/cla/models/event_types.py +++ b/cla-backend/cla/models/event_types.py @@ -48,3 +48,4 @@ class EventType(Enum): RepositoryRemoved = "Repository Removed" RepositoryDisable = "Repository Disabled" RepositoryEnabled = "Repository Enabled" + BypassCLA = "Bypass CLA" diff --git a/cla-backend/cla/models/github_models.py b/cla-backend/cla/models/github_models.py index cec72be9f..e660a8f3c 100644 --- a/cla-backend/cla/models/github_models.py +++ b/cla-backend/cla/models/github_models.py @@ -7,19 +7,21 @@ import concurrent.futures import json import os +import re import base64 import binascii import threading import time import uuid -from typing import List, Optional, Union +from typing import List, Optional, Union, Tuple import cla import falcon import github from cla.controllers.github_application import GitHubInstallation from cla.models import DoesNotExist, repository_service_interface -from cla.models.dynamo_models import GitHubOrg, Repository +from cla.models.dynamo_models import GitHubOrg, Repository, Event +from cla.models.event_types import EventType from cla.user import UserCommitSummary from cla.utils import (append_project_version_to_url, get_project_instance, set_active_pr_metadata) @@ -595,7 +597,7 @@ def update_merge_group_status( f"signing url: {sign_url}" ) cla.log.warning( - "{fn} - This is an error condition - " + f"{fn} - This is an error condition - " f"should have at least one committer in one of these lists: " f"{len(signed)} passed, {missing}" ) @@ -736,6 +738,12 @@ def update_merge_group(self, installation_id, github_repository_id, merge_group_ for user_commit_summary in commit_authors: handle_commit_from_user(project, user_commit_summary, signed, missing) + # Skip whitelisted bots per org/repo GitHub login/email regexps + missing, whitelisted = self.skip_whitelisted_bots(github_org, repository.get_repository_name(), missing) + if whitelisted is not None and len(whitelisted) > 0: + cla.log.debug(f"{fn} - adding {len(whitelisted)} whitelisted actors to signed list") + signed.extend(whitelisted) + # update Merge group status self.update_merge_group_status( installation_id, github_repository_id, pull_request, merge_group_sha, signed, missing, project.get_version() @@ -896,6 +904,11 @@ def update_change_request(self, installation_id, github_repository_id, change_re for future in concurrent.futures.as_completed(futures): cla.log.debug(f"{fn} - ThreadClosed for handle_commit_from_user") + # Skip whitelisted bots per org/repo GitHub login/email regexps + missing, whitelisted = self.skip_whitelisted_bots(github_org, repository.get_repository_name(), missing) + if whitelisted is not None and len(whitelisted) > 0: + cla.log.debug(f"{fn} - adding {len(whitelisted)} whitelisted actors to signed list") + signed.extend(whitelisted) # At this point, the signed and missing lists are now filled and updated with the commit user info cla.log.debug( @@ -915,6 +928,204 @@ def update_change_request(self, installation_id, github_repository_id, change_re project_version=project.get_version(), ) + def property_matches(self, pattern, value): + """ + Returns True if value matches the pattern. + - '*' matches anything + - 're:...' matches regex - value must be set + - otherwise, exact match + """ + try: + if pattern == '*': + return True + if value is None or value == '': + return False + if pattern.startswith('re:'): + regex = pattern[3:] + return re.search(regex, value) is not None + return value == pattern + except Exception as exc: + cla.log.warning("Error in property_matches: pattern=%s, value=%s, exc=%s", pattern, value, exc) + return False + + def is_actor_skipped(self, actor, config): + """ + Returns True if the actor should be skipped (whitelisted) based on config pattern. + config: ';;' + If any pattern is missing, it defaults to '*' + It returns true if ANY config entry matches or false if there is no match in any config entry. + """ + try: + # If config is a list/array, check all + if isinstance(config, (list, tuple)): + for entry in config: + if self.is_actor_skipped(actor, entry): + return True + return False + # Otherwise, treat as string pattern + parts = config.split(';') + while len(parts) < 3: + parts.append('*') + login_pattern, email_pattern, name_pattern = parts[:3] + login = getattr(actor, "author_login", None) + email = getattr(actor, "author_email", None) + name = getattr(actor, "author_username", None) + return ( + self.property_matches(login_pattern, login) and + self.property_matches(email_pattern, email) and + self.property_matches(name_pattern, name) + ) + except Exception as exc: + cla.log.warning("Error in is_actor_skipped: config=%s, actor=%s, exc=%s", config, actor, exc) + return False + + + def strip_org(self, repo_full): + """ + Removes the organization part from the repository name. + """ + if '/' in repo_full: + return repo_full.split('/', 1)[1] + return repo_full + + def parse_config_patterns(self, config): + """ + Returns a list of pattern strings. + - If config starts with '[' and ends with ']', splits by '||'. + - Otherwise, returns [config]. + """ + config = config.strip() + if config.startswith('[') and config.endswith(']'): + inner = config[1:-1] + return [p.strip() for p in inner.split('||')] + else: + return [config] + + + def skip_whitelisted_bots(self, org_model, org_repo, actors_missing_cla) -> Tuple[List[UserCommitSummary], List[UserCommitSummary]]: + """ + Check if the actors are whitelisted based on the skip_cla configuration. + Returns a tuple of two lists: + - actors_missing_cla: actors who still need to sign the CLA after checking skip_cla + - whitelisted_actors: actors who are skipped due to skip_cla configuration + :param org_model: The GitHub organization model instance. + :param org_repo: The repository name in the format 'org/repo'. + :param actors_missing_cla: List of UserCommitSummary objects representing actors who are missing CLA. + :return: Tuple of (actors_missing_cla, whitelisted_actors) + : in cla-{stage}-github-orgs table there can be a skip_cla field which is a dict with the following structure: + { + "repo-name": ";;", + "re:repo-regexp": "[;;||...]", + "*": "" + } + where: + - repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo") + - re:repo-regexp is a regex pattern to match repository names + - * is a wildcard that applies to all repositories + - is a GitHub username pattern (exact match or regex prefixed by re: or match all '*') + - is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') - defaults to '*' if not set + - is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') - defaults to '*' if not set + :note: The username/login, email and name patterns are separated by a semicolon (;). + :note: There can be an array of patterns - it must start with [ and with ] and be || separated. + :note: If the skip_cla is not set, it will skip the whitelisted bots check. + """ + try: + repo = self.strip_org(org_repo) + skip_cla = org_model.get_skip_cla() + if skip_cla is None: + cla.log.debug("skip_cla is not set on '%s', skipping whitelisted bots check", org_repo) + return actors_missing_cla, [] + + if hasattr(skip_cla, "as_dict"): + skip_cla = skip_cla.as_dict() + config = '' + # 1. Exact match + if repo in skip_cla: + cla.log.debug("skip_cla config found for repo %s: %s (exact hit)", org_repo, skip_cla[repo]) + config = skip_cla[repo] + + # 2. Regex pattern (if no exact hit) + if config == '': + cla.log.debug("No skip_cla config found for repo %s, checking regex patterns", org_repo) + for k, v in skip_cla.items(): + if not isinstance(k, str) or not k.startswith("re:"): + continue + pattern = k[3:] + try: + if re.search(pattern, repo): + config = v + cla.log.debug("Found skip_cla config for repo %s: %s via regex pattern: %s", org_repo, config, pattern) + break + except re.error as e: + cla.log.warning("Invalid regex in skip_cla: %s (%s) for repo: %s", k, e, org_repo) + continue + + # 3. Wildcard fallback + if config == '' and '*' in skip_cla: + cla.log.debug("No skip_cla config found for repo %s, using wildcard config", org_repo) + config = skip_cla['*'] + + # 4. No match + if config == '': + cla.log.debug("No skip_cla config found for repo %s, skipping whitelisted bots check", org_repo) + return actors_missing_cla, [] + + actor_debug_data = [ + f"id='{getattr(a, 'author_id', '(null)')}'," + f"login='{getattr(a, 'author_login', '(null)')}'," + f"username='{getattr(a, 'author_username', '(null)')}'," + f"email='{getattr(a, 'author_email', '(null)')}'" + for a in actors_missing_cla + ] + config = self.parse_config_patterns(config) + cla.log.debug("final skip_cla config for repo %s is %s; actors_missing_cla: [%s]", org_repo, config, ", ".join(actor_debug_data)) + out_actors_missing_cla = [] + whitelisted_actors = [] + for actor in actors_missing_cla: + if actor is None: + continue + try: + actor_data = "id='{}',login='{}',username='{}',email='{}'".format( + getattr(actor, "author_id", "(null)"), + getattr(actor, "author_login", "(null)"), + getattr(actor, "author_username", "(null)"), + getattr(actor, "author_email", "(null)"), + ) + cla.log.debug("Checking actor: %s for skip_cla config: %s", actor_data, config) + if self.is_actor_skipped(actor, config): + msg = "Skipping CLA check for repo='{}', actor: {} due to skip_cla config: '{}'".format( + org_repo, + actor_data, + config, + ) + cla.log.info(msg) + Event.create_event( + event_type=EventType.BypassCLA, + event_data=msg, + event_summary=msg, + event_user_name=actor_data, + contains_pii=True, + ) + actor.authorized = True + whitelisted_actors.append(actor) + continue + except Exception as e: + cla.log.warning( + "Error checking skip_cla for actor '%s' (login='%s', email='%s'): %s", + actor, getattr(actor, "author_login", None), getattr(actor, "author_email", None), e, + ) + out_actors_missing_cla.append(actor) + + return out_actors_missing_cla, whitelisted_actors + except Exception as exc: + cla.log.error( + "Exception in skip_whitelisted_bots: %s (repo=%s, actors=%s). Disabling skip_cla logic for this run.", + exc, org_repo, actors_missing_cla + ) + # Always return all actors if something breaks + return actors_missing_cla, [] + + def get_pull_request(self, github_repository_id, pull_request_number, installation_id): """ Helper method to get the pull request object from GitHub. @@ -1744,7 +1955,7 @@ def update_pull_request( f"signing url: {sign_url}" ) cla.log.warning( - "{fn} - This is an error condition - " + f"{fn} - This is an error condition - " f"should have at least one committer in one of these lists: " f"{len(signed)} passed, {missing}" ) diff --git a/utils/describe_all.sh b/utils/describe_all.sh new file mode 100755 index 000000000..0f23cd39c --- /dev/null +++ b/utils/describe_all.sh @@ -0,0 +1,17 @@ +#!/bin/bash +if [ -z "$STAGE" ] +then + export STAGE=dev +fi +if [ -z "$REGION" ] +then + export REGION=us-east-1 +fi +> all-tables.secret +./utils/list_tables.sh | sed 's/[", ]//g' | grep -v '^$' | while read -r table; do + tab="${table#cla-${STAGE}-}" + echo -n "Processing table $tab ..." + echo "Table: $tab" >> all-tables.secret + ALL=1 ./utils/scan.sh "${tab}" >> all-tables.secret + echo 'done' +done diff --git a/utils/describe_table.sh b/utils/describe_table.sh index be5c6006a..1be51cc3f 100755 --- a/utils/describe_table.sh +++ b/utils/describe_table.sh @@ -8,9 +8,14 @@ if [ -z "${STAGE}" ] then export STAGE=dev fi - +if [ -z "$REGION" ] +then + REGION=us-east-1 +fi if [ ! -z "${DEBUG}" ] then - echo "aws --profile \"lfproduct-${STAGE}\" dynamodb describe-table --table-name \"cla-${STAGE}-${1}\"" + echo "aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb describe-table --table-name \"cla-${STAGE}-${1}\"" + aws --profile "lfproduct-${STAGE}" --region "${REGION}" dynamodb describe-table --table-name "cla-${STAGE}-${1}" +else + aws --profile "lfproduct-${STAGE}" --region "${REGION}" dynamodb describe-table --table-name "cla-${STAGE}-${1}" | jq -r '.Table.AttributeDefinitions' fi -aws --profile "lfproduct-${STAGE}" dynamodb describe-table --table-name "cla-${STAGE}-${1}" diff --git a/utils/list_tables.sh b/utils/list_tables.sh new file mode 100755 index 000000000..d83281a5f --- /dev/null +++ b/utils/list_tables.sh @@ -0,0 +1,10 @@ +#!/bin/bash +if [ -z "$STAGE" ] +then + STAGE=dev +fi +if [ -z "$REGION" ] +then + REGION=us-east-1 +fi +aws --profile "lfproduct-${STAGE}" --region "${REGION}" dynamodb list-tables | grep "cla-${STAGE}-" diff --git a/utils/skip_cla_entry.sh b/utils/skip_cla_entry.sh new file mode 100755 index 000000000..35f45f2b1 --- /dev/null +++ b/utils/skip_cla_entry.sh @@ -0,0 +1,86 @@ +#!/bin/bash +# MODE=mode ./utils/skip_cla_entry.sh sun-test-org '*' 'copilot-swe-agent[bot]' '*' +# put-item Overwrites/adds the entire `skip_cla` entry. +# add-key Adds or updates a key/value inside the skip_cla map (preserves other keys) +# delete-key Removes a key from the skip_cla map +# delete-item Deletes the entire `skip_cla` entry. +# +# MODE=add-key ./utils/skip_cla_entry.sh sun-test-org 'repo1' 're:vee?rendra;*;*' +# MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' 'repo1' 'lukaszgryglicki;re:gryglicki' +# MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' 're:(?i)^repo[0-9]+$' '[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]' +# ./utils/scan.sh github-orgs organization_name sun-test-org +# STAGE=dev DTFROM='1 hour ago' DTTO='1 second ago' ./utils/search_aws_log_group.sh 'cla-backend-dev-githubactivity' 'skip_cla' +# MODE=delete-key ./utils/skip_cla_entry.sh 'sun-test-org' 're:(?i)^repo[0-9]+$' + +if [ -z "$MODE" ] +then + echo "$0: MODE must be set, valid values are: put-item, add-key, delete-key, delete-item" + exit 1 +fi + +if [ -z "$STAGE" ]; then + STAGE='dev' +fi +if [ -z "$REGION" ]; then + REGION='us-east-1' +fi + +case "$MODE" in + put-item) + if ( [ -z "${1}" ] || [ -z "${2}" ] || [ -z "${3}" ] ); then + echo "Usage: $0 " + exit 1 + fi + CMD="aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb update-item \ + --table-name \"cla-${STAGE}-github-orgs\" \ + --key '{\"organization_name\": {\"S\": \"${1}\"}}' \ + --update-expression 'SET skip_cla = :val' \ + --expression-attribute-values '{\":val\": {\"M\": {\"${2}\":{\"S\":\"${3}\"}}}}'" + ;; + add-key) + if ( [ -z "${1}" ] || [ -z "${2}" ] || [ -z "${3}" ] ); then + echo "Usage: $0 " + exit 1 + fi + CMD="aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb update-item \ + --table-name \"cla-${STAGE}-github-orgs\" \ + --key '{\"organization_name\": {\"S\": \"${1}\"}}' \ + --update-expression 'SET skip_cla.#repo = :val' \ + --expression-attribute-names '{\"#repo\": \"${2}\"}' \ + --expression-attribute-values '{\":val\": {\"S\": \"${3}\"}}'" + ;; + delete-key) + if ( [ -z "${1}" ] || [ -z "${2}" ] ); then + echo "Usage: $0 " + exit 1 + fi + CMD="aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb update-item \ + --table-name \"cla-${STAGE}-github-orgs\" \ + --key '{\"organization_name\": {\"S\": \"${1}\"}}' \ + --update-expression 'REMOVE skip_cla.#repo' \ + --expression-attribute-names '{\"#repo\": \"${2}\"}'" + ;; + delete-item) + if [ -z "${1}" ]; then + echo "Usage: $0 " + exit 1 + fi + CMD="aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb update-item \ + --table-name \"cla-${STAGE}-github-orgs\" \ + --key '{\"organization_name\": {\"S\": \"${1}\"}}' \ + --update-expression 'REMOVE skip_cla'" + ;; + *) + echo "$0: Unknown MODE: $MODE" + echo "Valid values are: put-item, add-key, delete-key, delete-item" + exit 1 + ;; +esac + +if [ ! -z "$DEBUG" ] +then + echo "$CMD" +fi + +eval $CMD + From e0702b4090bc5d2977d663b0a916543dd9d47780 Mon Sep 17 00:00:00 2001 From: Lukasz Gryglicki Date: Thu, 31 Jul 2025 11:14:26 +0200 Subject: [PATCH 2/4] Fix escaping patterns in util script Signed-off-by: Lukasz Gryglicki --- utils/skip_cla_entry.sh | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/utils/skip_cla_entry.sh b/utils/skip_cla_entry.sh index 35f45f2b1..de3ee46dd 100755 --- a/utils/skip_cla_entry.sh +++ b/utils/skip_cla_entry.sh @@ -31,34 +31,39 @@ case "$MODE" in echo "Usage: $0 " exit 1 fi + repo=$(echo "${2}" | sed 's/\\/\\\\/g') + pat=$(echo "${3}" | sed 's/\\/\\\\/g') CMD="aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb update-item \ --table-name \"cla-${STAGE}-github-orgs\" \ --key '{\"organization_name\": {\"S\": \"${1}\"}}' \ --update-expression 'SET skip_cla = :val' \ - --expression-attribute-values '{\":val\": {\"M\": {\"${2}\":{\"S\":\"${3}\"}}}}'" + --expression-attribute-values '{\":val\": {\"M\": {\"${repo}\":{\"S\":\"${pat}\"}}}}'" ;; add-key) if ( [ -z "${1}" ] || [ -z "${2}" ] || [ -z "${3}" ] ); then echo "Usage: $0 " exit 1 fi + repo=$(echo "${2}" | sed 's/\\/\\\\/g') + pat=$(echo "${3}" | sed 's/\\/\\\\/g') CMD="aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb update-item \ --table-name \"cla-${STAGE}-github-orgs\" \ --key '{\"organization_name\": {\"S\": \"${1}\"}}' \ --update-expression 'SET skip_cla.#repo = :val' \ - --expression-attribute-names '{\"#repo\": \"${2}\"}' \ - --expression-attribute-values '{\":val\": {\"S\": \"${3}\"}}'" + --expression-attribute-names '{\"#repo\": \"${repo}\"}' \ + --expression-attribute-values '{\":val\": {\"S\": \"${pat}\"}}'" ;; delete-key) if ( [ -z "${1}" ] || [ -z "${2}" ] ); then echo "Usage: $0 " exit 1 fi + repo=$(echo "${2}" | sed 's/\\/\\\\/g') CMD="aws --profile \"lfproduct-${STAGE}\" --region \"${REGION}\" dynamodb update-item \ --table-name \"cla-${STAGE}-github-orgs\" \ --key '{\"organization_name\": {\"S\": \"${1}\"}}' \ --update-expression 'REMOVE skip_cla.#repo' \ - --expression-attribute-names '{\"#repo\": \"${2}\"}'" + --expression-attribute-names '{\"#repo\": \"${repo}\"}'" ;; delete-item) if [ -z "${1}" ]; then From d0f3880db5c16a17ae052e5b0d844ec18192501b Mon Sep 17 00:00:00 2001 From: Lukasz Gryglicki Date: Thu, 31 Jul 2025 11:49:30 +0200 Subject: [PATCH 3/4] Add prod setup example Signed-off-by: Lukasz Gryglicki --- WHITELISTING_BOTS.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/WHITELISTING_BOTS.md b/WHITELISTING_BOTS.md index f2a479390..c3beebe6c 100644 --- a/WHITELISTING_BOTS.md +++ b/WHITELISTING_BOTS.md @@ -125,3 +125,37 @@ aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs" To check for log entries related to skipping CLA check, you can use the following command: `` STAGE=dev DTFROM='1 hour ago' DTTO='1 second ago' ./utils/search_aws_log_group.sh 'cla-backend-dev-githubactivity' 'skip_cla' ``. +# Example setup on prod + +To add first `skip_cla` value for an organization: +``` +aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"otel-arrow":{"S":"copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}' +aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"vscode-ext":{"S":"copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}' +``` + +To add additional repositories entries without overwriting the existing `skip_cla` value: +``` +aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}' +aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}' +``` + +To delete a specific repo entry from `skip_cla`: +``` +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'REMOVE skip_cla.#repo' --expression-attribute-names '{"#repo": "*"}' +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'REMOVE skip_cla.#repo' --expression-attribute-names '{"#repo": "*"}' +``` + +To delete the entire `skip_cla` attribute: +``` +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'REMOVE skip_cla' +aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'REMOVE skip_cla' +``` + +To check values: +``` +aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs" --filter-expression "contains(organization_name,:v)" --expression-attribute-values "{\":v\":{\"S\":\"open-telemetry\"}}" --max-items 100 | jq -r '.Items' +aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs" --filter-expression "contains(organization_name,:v)" --expression-attribute-values "{\":v\":{\"S\":\"openfga\"}}" --max-items 100 | jq -r '.Items' +aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs" --filter-expression "contains(organization_name,:v)" --expression-attribute-values "{\":v\":{\"S\":\"open-telemetry\"}}" --max-items 100 | jq -r '.Items[0].skip_cla.M["otel-arrow"]["S"]' +aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs" --filter-expression "contains(organization_name,:v)" --expression-attribute-values "{\":v\":{\"S\":\"openfga\"}}" --max-items 100 | jq -r '.Items[0].skip_cla.M["vscode-ext"]["S"]' +``` + From 02bdd4561ec6116c952a455d1a22b19a71a45b64 Mon Sep 17 00:00:00 2001 From: Lukasz Gryglicki Date: Thu, 31 Jul 2025 12:14:45 +0200 Subject: [PATCH 4/4] One more docs update Signed-off-by: Lukasz Gryglicki --- WHITELISTING_BOTS.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/WHITELISTING_BOTS.md b/WHITELISTING_BOTS.md index c3beebe6c..4c36c6884 100644 --- a/WHITELISTING_BOTS.md +++ b/WHITELISTING_BOTS.md @@ -8,9 +8,9 @@ Replace `{stage}` with either `dev` or `prod`. This property is a Map attribute that contains mapping from repository pattern to bot username (GitHub login), email and name pattern. -Example username/login is lukaszgryglicki (like any username/login that can be accessed via `https://github.com/username`). +Example `username/login` is `lukaszgryglicki` (like any `username/login` that can be accessed via `https://github.com/username`). -Example name is "Lukasz Gryglicki". +Example name is `"Lukasz Gryglicki"`. Email pattern and name pattern are optional and `*` is assumed for them if not specified. @@ -25,6 +25,15 @@ You can also specify multiple patterns so different set is used for multiple use For example: `"[copilot-swe-agent[bot];*;*||re:(?i)^l(ukasz)?gryglicki$;*;re:Gryglicki]"`. +Full format is like `"repository_pattern": "[github_username_pattern;email_pattern;name_pattern||..]"`. + +Other complex example: `"re:(?i)^repo\d*$": "[veerendra||re:(?i)^l(ukasz)?gryglicki$;lukaszgryglicki@o2.pl||*;*;Lukasz Gryglicki]"`. + +This matches one of: +- GitHub username/login `veerendra` no matter the email and name. +- GitHub username/login like lgryglicki, LukaszGryglicki and similar with email lukaszgryglicki@o2.pl, name doesn't matter. +- GitHub name "Lukasz Gryglicki" email and username/login doesn't matter. + There can be multiple entries under one Github Organization DynamoDB entry. Example: @@ -37,7 +46,7 @@ Example: "skip_cla": { "M": { "*": { - "S": "copilot-swe-agent[bot];*;*" + "S": "copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*" }, "re:(?i)^repo[0-9]+$": { "S": "re:vee?rendra;*;*"