-
Notifications
You must be signed in to change notification settings - Fork 24
V1.41.0 #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V1.41.0 #168
Conversation
WalkthroughAdds OSADL-backed copyleft data and a Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant Copyleft
participant ScanResultProcessor
participant LicenseUtil
participant Osadl
CLI->>Copyleft: init(license_sources)
Copyleft->>ScanResultProcessor: init(license_sources)
Copyleft->>LicenseUtil: init(include/exclude/explicit)
LicenseUtil->>Osadl: lazy load data (on first call)
Note over ScanResultProcessor,LicenseUtil: For each component
ScanResultProcessor->>ScanResultProcessor: selected = _select_licenses(licenses_data)
alt filtering mode (license_sources provided)
ScanResultProcessor->>ScanResultProcessor: keep licenses with source in license_sources + 'unknown'
else priority mode (no license_sources)
ScanResultProcessor->>ScanResultProcessor: use legacy source priority
end
ScanResultProcessor->>LicenseUtil: is_copyleft(spdxid)
alt explicit list present
LicenseUtil->>LicenseUtil: check explicit_licenses only
else
LicenseUtil->>Osadl: is_copyleft(spdxid)
Osadl-->>LicenseUtil: boolean
end
LicenseUtil-->>ScanResultProcessor: copyleft flag
ScanResultProcessor-->>Copyleft: aggregated results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/scanoss/scanner.py (1)
920-923: Wrap long post‑size warning inscan_wfp_file_threadedto fix E501The post‑size warning line is over the 120‑character limit. Wrapping it avoids the lint error and keeps the message readable:
- if self.debug and cur_size > self.max_post_size: - Scanner.print_stderr(f'Warning: Post size {cur_size} greater than limit {self.max_post_size}') + if self.debug and cur_size > self.max_post_size: + Scanner.print_stderr( + f'Warning: Post size {cur_size} greater than limit {self.max_post_size}' + )This should clear the remaining E501 violation flagged by the pipeline.
src/scanoss/inspection/utils/scan_result_processor.py (1)
167-189: Addfile_spdx_tagto the priority sources listThe review's concern is confirmed:
file_spdx_tagexists in real test data but is omitted frompriority_sourcesat line 337. When onlyfile_spdx_taglicenses are present, the code incorrectly falls back to returning all licenses instead of prioritizing that source.Update line 337 from:
priority_sources = ['component_declared', 'license_file', 'file_header', 'scancode']to:
priority_sources = ['component_declared', 'license_file', 'file_header', 'file_spdx_tag', 'scancode']
🧹 Nitpick comments (9)
src/scanoss/scanners/folder_hasher.py (1)
179-184: Consider adding a comment to clarify the tree-building logic.The loop appends
file_itemto each ancestor directory's files list, which means each file appears in multiple directory nodes. While this is intentional for hierarchical folder hashing (where each directory's hash includes all files in its subtree), a brief comment would make the design more immediately clear to future maintainers.src/scanoss/threadedscanning.py (1)
107-112: Consider alternatives to__del__for cleanup.While the
__del__method provides a safety net for progress bar cleanup, Python destructors have unreliable invocation timing and are generally discouraged for critical cleanup logic. Combined with theatexitregistration (line 82), this creates dual cleanup paths that may execute in any order or potentially twice.Consider:
- Using context managers (
__enter__/__exit__) for deterministic cleanup, allowing usage likewith ThreadedScanning(...) as scanner:- Documenting that callers should explicitly call
complete_bar()after scanning completes- Choosing a single cleanup mechanism (atexit OR
__del__) rather than bothThe bare
except Exception:is acceptable in destructors since errors during cleanup cannot be meaningfully handled, but it may hide diagnostic information during development.src/scanoss/filecount.py (1)
152-155: Address PLC0206 by iteratingfile_typeswith.items()You can clear the PLC0206 warning and improve readability by iterating key/value pairs directly:
- for k in file_types: - d = file_types[k] - csv_dict.append({'extension': k, 'count': d[0], 'size(MB)': f'{d[1] / (1 << 20):,.2f}'}) + for extension, (count, size_bytes) in file_types.items(): + csv_dict.append( + { + 'extension': extension, + 'count': count, + 'size(MB)': f'{size_bytes / (1 << 20):,.2f}', + } + )src/scanoss/scanners/scanner_hfh.py (1)
113-152: Threaded gRPC scan helper is correct; consider aligning spinner gating with other scanners
_execute_grpc_scancleanly encapsulates the gRPC call and error handling, and thescan()method’s thread + spinner loop is straightforward and safe (join()after the loop guaranteesscan_resultsvisibility).For consistency with
Scanner.scan_folder/scan_filesandFileCount.count_files, you might optionally gate the spinner on quiet/TTY and usenullcontext:+import sys +from contextlib import nullcontext @@ - spinner_ctx = Spinner('Scanning folder...') - - with spinner_ctx as spinner: + spinner_ctx = ( + Spinner('Scanning folder...') + if (not self.base.quiet and sys.stderr.isatty()) + else nullcontext() + ) + + with spinner_ctx as spinner: grpc_thread = threading.Thread(target=self._execute_grpc_scan, args=(hfh_request,)) grpc_thread.start() @@ - while grpc_thread.is_alive(): - spinner.next() - time.sleep(0.1) + while grpc_thread.is_alive(): + if spinner: + spinner.next() + time.sleep(0.1)Not required, but it would make spinner behaviour uniform across the codebase.
tests/test_osadl.py (1)
34-99: Good coverage; tweak final shared‑data assertionThe tests exercise the key Osadl behaviors (loading, case‑insensitive lookups, copyleft vs non‑copyleft, and empty/None handling) nicely. The last assertion in
test_multiple_instances_share_data:self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data)is a tautology and doesn’t verify that instances share the same data. If you want to assert sharing, consider checking through the instances instead, e.g.:
- self.assertIs(Osadl._shared_copyleft_data, Osadl._shared_copyleft_data) + self.assertIs(osadl1._shared_copyleft_data, osadl2._shared_copyleft_data)or simply rely on the existing behavioral checks and drop this line.
CHANGELOG.md (1)
8-28: Changelog accurately reflects new copyleft and-lsbehaviorThe Unreleased notes line up with the code:
--license-sourcesis limited to copyleft inspection, OSADL is now the copyleft authority,-or-latervariants are called out, and the default of component‑level sources with-lsoverride matches the CLI wiring. If you want to nitpick wording, “OSADL authoritative copyleft license data” could be hyphenated as “OSADL‑authoritative copyleft license data,” but that’s purely stylistic.src/scanoss/cli.py (2)
704-715:--license-sourcesargument is well‑wired; default handled downstreamThe
-ls/--license-sourcesdefinition withaction='extend'andnargs='+'correctly supports both syntaxes you document and constrains values viaVALID_LICENSE_SOURCES. Since you don’t set a default here, it assumesCopyleft(or its collaborators) treatlicense_sources=Noneas “useDEFAULT_COPYLEFT_LICENSE_SOURCES,” which matches the help text. That implicit contract looks fine but is worth keeping in sync if defaults change later.
1758-1771: Newlicense_sourcesplumbing intoCopyleftlooks correctPassing
license_sources=args.license_sourcesintoCopyleft(...)completes the CLI→policy wiring for both raw and legacy copyleft commands, and falls back toNonewhen-lsisn’t used. Behaviorally this matches the changelog. As a minor improvement, consider updating theinspect_copyleftdocstring to mention the newlicense_sourcesoption so the function‑level documentation stays aligned with the CLI help.src/scanoss/osadl.py (1)
69-76: Load failures silently degrade all copyleft checks to “non‑copyleft”If
_load_copyleft_datafails (I/O, JSON error, packaging issue), it logs to stderr and returnsFalse, but__init__ignores the return value andis_copyleftsimply consults an empty_shared_copyleft_data, treating every license as non‑copyleft. That’s a fairly silent failure mode for something described as “authoritative”.Consider at least surfacing the failure to callers (e.g., via a boolean attribute or raising on first failure), so upstream logic can detect that OSADL data wasn’t available and either abort or emit a clear warning.
For example:
class Osadl: @@ - _shared_copyleft_data = {} - _data_loaded = False + _shared_copyleft_data: dict[str, str] = {} + _data_loaded: bool = False + _load_failed: bool = False @@ def __init__(self, debug: bool = False): @@ - self._load_copyleft_data() + if not self._load_copyleft_data(): + Osadl._load_failed = True @@ - if Osadl._data_loaded: + if Osadl._data_loaded: return True @@ - self.print_stderr(f'ERROR: Problem loading OSADL copyleft data: {e}') - return False + self.print_stderr(f'ERROR: Problem loading OSADL copyleft data: {e}') + return False @@ Osadl._data_loaded = True @@ def is_copyleft(self, spdx_id: str) -> bool: @@ - if not spdx_id: + if not spdx_id or Osadl._load_failed: return FalseThat keeps behavior simple while making failures visible to policy/CLI layers if they want to react differently.
Also applies to: 92-107, 108-130
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
CHANGELOG.md(1 hunks)README.md(1 hunks)src/scanoss/cli.py(4 hunks)src/scanoss/constants.py(1 hunks)src/scanoss/data/osadl-copyleft.json(1 hunks)src/scanoss/filecount.py(2 hunks)src/scanoss/inspection/policy_check/scanoss/copyleft.py(4 hunks)src/scanoss/inspection/utils/license_utils.py(1 hunks)src/scanoss/inspection/utils/scan_result_processor.py(3 hunks)src/scanoss/osadl.py(1 hunks)src/scanoss/scanner.py(5 hunks)src/scanoss/scanners/folder_hasher.py(1 hunks)src/scanoss/scanners/scanner_hfh.py(2 hunks)src/scanoss/threadedscanning.py(3 hunks)tests/test_osadl.py(1 hunks)tests/test_policy_inspect.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
src/scanoss/osadl.py (1)
src/scanoss/inspection/utils/license_utils.py (1)
is_copyleft(71-102)
src/scanoss/inspection/utils/license_utils.py (2)
src/scanoss/osadl.py (3)
Osadl(31-130)print_debug(62-67)is_copyleft(108-130)src/scanoss/scanossbase.py (2)
ScanossBase(28-107)print_debug(58-63)
src/scanoss/filecount.py (2)
src/scanoss/scanossbase.py (2)
print_msg(51-56)print_trace(65-70)src/scanoss/cli.py (1)
file_count(1312-1343)
src/scanoss/scanners/scanner_hfh.py (2)
src/scanoss/scanossgrpc.py (1)
folder_hash_scan(429-447)src/scanoss/scanossbase.py (1)
print_stderr(45-49)
src/scanoss/inspection/utils/scan_result_processor.py (1)
src/scanoss/inspection/utils/license_utils.py (2)
LicenseUtil(30-111)init(47-69)
src/scanoss/scanner.py (3)
src/scanoss/file_filters.py (2)
get_filtered_files_from_folder(311-352)get_filtered_files_from_files(354-404)src/scanoss/threadedscanning.py (3)
stop_scanning(154-158)queue_add(141-149)run(168-191)src/scanoss/scanners/scanner_hfh.py (1)
scan(126-152)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (1)
src/scanoss/inspection/utils/scan_result_processor.py (1)
ScanResultProcessor(53-412)
tests/test_osadl.py (1)
src/scanoss/osadl.py (2)
Osadl(31-130)is_copyleft(108-130)
src/scanoss/scanners/folder_hasher.py (2)
src/scanoss/scanossbase.py (1)
print_debug(58-63)src/scanoss/utils/crc64.py (2)
CRC64(29-96)get_hash_buff(82-96)
tests/test_policy_inspect.py (1)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (2)
Copyleft(50-240)run(220-240)
🪛 GitHub Actions: Lint
src/scanoss/filecount.py
[warning] 100-100: PLR0912 Too many branches (15 > 12) in count_files.
[warning] 100-100: PLR0915 Too many statements (54 > 50) in count_files.
[warning] 152-152: PLC0206 Extracting value from dictionary without calling .items().
src/scanoss/scanner.py
[error] 397-397: E501 Line too long (121 > 120).
[error] 416-416: E501 Line too long (124 > 120).
[error] 802-802: E501 Line too long (121 > 120).
🪛 LanguageTool
README.md
[uncategorized] ~140-~140: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...vecommons.org/licenses/by/4.0/) by the [Open Source Automation Development Lab (OSADL) eG](...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...g/). Attribution: A project by the Open Source Automation Development Lab (OSADL) eG. ...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
CHANGELOG.md
[uncategorized] ~17-~17: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - Copyleft detection now uses [OSADL (Open Source Automation Development Lab)](https://ww...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (12)
src/scanoss/scanners/folder_hasher.py (1)
162-192: LGTM! Context manager improves resource cleanup.The switch to a context manager for the progress bar ensures proper cleanup even when exceptions occur or the user interrupts with Ctrl+C. Moving the exception handling inside the loop also improves resilience by allowing file processing to continue when individual files fail.
src/scanoss/threadedscanning.py (2)
25-25: LGTM!The
atexitimport is necessary for the cleanup registration added below.
81-82: Review comment contains inaccurate claims about double cleanup and idempotency.The underlying premise is flawed:
Bar.finish()from theprogress.barlibrary is already idempotent—it sets an internal flag and safely handles multiple calls. The suggested fix (addingself.bar = None) is unnecessary.- The "double cleanup risk" is not a real issue because both the
atexithandler and__del__are guarded byif self.bar:, and the underlying method is designed to handle repeated invocations safely.The valid concerns here are:
- The bound method reference in
atexit.register(self.complete_bar)does keepselfalive until program exit (minor memory lifecycle consideration).- The lack of signal handler means
atexitmay not fire on Ctrl+C (valid but separate from cleanup idempotency).The suggested fix does not address the actual memory concern and adds unnecessary code given the idempotent nature of
Bar.finish().Likely an incorrect or invalid review comment.
src/scanoss/filecount.py (1)
100-147: Spinner context usage is sound and keeps behaviour intactUsing
spinner_ctx = Spinner(...) if (not self.quiet and self.isatty) else nullcontext()andwith spinner_ctx as spinner:cleanly scopes spinner lifetime and avoids manualfinish()calls. Theif spinner: spinner.next()guard correctly handles thenullcontext()branch, and the counting logic is unchanged aside from indentation.src/scanoss/scanner.py (1)
1051-1061: Spinner context inwfp_foldermatches the new pattern and looks correctThe new
spinner_ctx = Spinner(...) if (not self.quiet and self.isatty) else nullcontext()pluswith spinner_ctx as spinner:pattern is consistent withscan_folder/scan_filesand properly guardsspinner.next()behind anif spinnercheck. The fingerprinting logic is unchanged apart from the visual feedback.src/scanoss/constants.py (1)
21-22: License source constants are well‑defined for downstream filteringDefining
VALID_LICENSE_SOURCESandDEFAULT_COPYLEFT_LICENSE_SOURCEShere centralizes the license source vocabulary and matches howCopyleftandScanResultProcessor._select_licensesuse these strings. No issues spotted.README.md (1)
139-142: Dataset license notice is clear and appropriately attributedThe new section clearly distinguishes the MIT license of the application from the CC‑BY‑4.0 license of the embedded OSADL copyleft dataset and provides proper attribution and source URL. Looks good.
tests/test_policy_inspect.py (1)
31-645: New copyleftlicense_sourcestests give solid coverage of the filtering semanticsImporting
DEFAULT_COPYLEFT_LICENSE_SOURCES/VALID_LICENSE_SOURCESand adding thetest_copyleft_policy_*license_sources*cases thoroughly exercise:
- Default vs explicit
license_sources(includingNone).- Single‑source and multi‑source scenarios.
- Interaction with include/exclude filters and different result files.
- JSON and markdown/Jira markdown outputs.
The expectations (component counts, status values, and allowed
license['source']sets) are consistent withCopyleftandScanResultProcessor._select_licensesbehaviour. This is a good, targeted test extension for the new OSADL‑backed logic.src/scanoss/inspection/policy_check/scanoss/copyleft.py (1)
29-102: Copyleftlicense_sourceswiring is consistent with ScanResultProcessor and testsImporting
DEFAULT_COPYLEFT_LICENSE_SOURCES, extending__init__withlicense_sources: list = None, and then doing:self.license_sources = license_sources or DEFAULT_COPYLEFT_LICENSE_SOURCES self.results_processor = ScanResultProcessor(..., explicit, self.license_sources)cleanly propagates the configuration into
ScanResultProcessor._select_licenses, matching how the new tests drive behaviour. The default now explicitly favours['component_declared', 'license_file'], which aligns with the intended “component‑level first” semantics. Just be aware that an empty list would also resolve to the default (which seems fine for current usage).src/scanoss/data/osadl-copyleft.json (1)
1-133: OSADL dataset structure looks compatible with the loaderThe JSON shape and keys (including the
copyleftmapping and the status values like"Yes"/"Yes (restricted)"/"No") align with the expectations inOsadl._load_copyleft_dataandOsadl.is_copyleft. I don’t see structural or syntactic issues in the excerpt, and the included metadata (title/license/attribution/timestamp) satisfies attribution requirements.src/scanoss/inspection/utils/license_utils.py (1)
71-102: Copyleft decision precedence is clear and matches OSADL semanticsThe
is_copyleftflow (early False on empty, explicit list override, include > exclude, then fallback toself.osadl.is_copyleft) is straightforward and matches the documented logic. The consistent lowercasing for all filter sets and lookups also aligns with howOsadlnormalizes IDs, so you avoid case‑sensitivity issues.src/scanoss/inspection/utils/scan_result_processor.py (1)
65-81: Remove the defensivelicense_sourcesnormalization—it is not needed.The original concern about raw CLI strings being passed is invalid. The
--license-sourcesoption uses argparse'saction='extend'withchoicesvalidation, which guaranteesargs.license_sourcesis always a list of validated strings. This list is passed through toScanResultProcessorconsistently, making the suggested defensive code unnecessary.All call sites pass
license_sourcesas either:
- A list (from CLI via copyleft inspection)
None(other call sites using defaults)The current code at line 80 is safe as-is.
Likely an incorrect or invalid review comment.
7455238 to
6db2eac
Compare
SCANOSS SCAN Completed 🚀
View more details on SCANOSS Action Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/scanoss/osadl.py (1)
79-88: Fix OSADL data resource path or all licenses will be treated as non‑copyleftThe loader currently resolves the JSON path as:
f_name = importlib_resources.files(__name__) / 'data/osadl-copyleft.json'With
osadl.pyinscanoss/osadl.py,__name__isscanoss.osadl, so this looks forscanoss/osadl/data/osadl-copyleft.json. The actual file is packaged atscanoss/data/osadl-copyleft.json, so the load will fail,_shared_copyleft_datastays empty, andis_copyleft()will effectively returnFalsefor all licenses.Consider resolving from the top-level
scanosspackage instead:- try: - f_name = importlib_resources.files(__name__) / 'data/osadl-copyleft.json' - with importlib_resources.as_file(f_name) as f: - with open(f, 'r', encoding='utf-8') as file: - data = json.load(file) + try: + # Resolve resource from the root "scanoss" package + # Expects file at scanoss/data/osadl-copyleft.json + f_name = importlib_resources.files('scanoss') / 'data' / 'osadl-copyleft.json' + with importlib_resources.as_file(f_name) as path: + with open(path, 'r', encoding='utf-8') as file: + data = json.load(file)Also worth re-running tests in an installed environment (not just from source tree) to confirm
Osadl._data_loadedisTrueand_shared_copyleft_datais non-empty afterOsadl()construction.
🧹 Nitpick comments (7)
src/scanoss/__init__.py (1)
25-25: Version bump looks consistent with release intent
__version__ = '1.41.0'matches the PR title and the new Unreleased entries inCHANGELOG.md. No issues here; just remember to convert the Unreleased section into a## [1.41.0] - YYYY-MM-DDentry when you cut the actual release.CHANGELOG.md (1)
8-28: Changelog entry clearly documents new copyleft behavior and CLI optionThe Unreleased section accurately explains the
--license-sourcesoption, OSADL-based copyleft data, and the new default to component-level licenses only. This aligns well with the new constants and Copyleft behavior.Two minor optional polish ideas:
- When you finalize the release, rename this to
## [1.41.0] - YYYY-MM-DDand add a[1.41.0]compare link at the bottom for consistency with previous versions.- Consider adding a short example command (e.g.
scanoss-py inspect copyleft -ls component_declared license_file) to make the new option immediately actionable for readers.LICENSE (1)
23-31: OSADL dataset attribution in LICENSE is clear and well-scopedThe added section cleanly separates the MIT license from the OSADL copyleft dataset, names the data file, states the CC-BY-4.0 license, and notes that the file carries its own header. This is a good, explicit attribution.
If you ever expand to additional third-party datasets, you might reference this section from the README’s Dataset License Notice (or vice versa) to keep documentation in sync, but what you have now is already solid.
README.md (1)
139-142: Dataset license notice is accurate and user-friendlyThe new section clearly distinguishes the MIT application license from the OSADL copyleft dataset, references the correct JSON path, and provides both the CC-BY-4.0 and original OSADL URLs. This should give users sufficient clarity on licensing.
Optionally, you could add “See LICENSE for full details” at the end of the paragraph to tie it to the main license file, but it’s not strictly necessary.
tests/test_policy_inspect.py (1)
393-646: License source filtering tests are comprehensive; consider one stronger assertionThe new “Copyleft License Source Filtering Tests” block does a good job validating:
- default vs.
Nonelicense_sources fall back toDEFAULT_COPYLEFT_LICENSE_SOURCES- single-source filters (
component_declared,license_file,file_header) produce the expected status and component set- multi-source and “all valid sources” paths behave as intended
- markdown output, include, exclude, and no-copyleft scenarios are all covered.
Two minor, optional improvements:
- In
test_copyleft_policy_license_sources_with_include_filter, you might explicitly assert that at least one license hasspdxid == 'MIT'to prove the include filter is taking effect, not just that some components exist.- Where you use
assertGreaterEqualon component counts, decide whether you want strict regression detection (exact counts) or flexibility as OSADL data evolves; if stability is expected, tightening those assertions could catch unintended changes earlier.Overall, this is strong coverage of the new
license_sourcesbehavior.src/scanoss/cli.py (1)
54-69: CLI wiring for--license-sourcesis coherent; consider explicit default + confirm runtime supportThe new
license_sourcesflow (constants import, argparse options on raw/legacyinspect copyleft, and passingargs.license_sourcesintoCopyleft) is consistent and should allow source-filtered copyleft checks while keeping behavior unchanged when the option is omitted, assumingCopyleftappliesDEFAULT_COPYLEFT_LICENSE_SOURCESwhenlicense_sourcesisNone.Two minor points to verify/consider:
action='extend'forargparserequires Python 3.8+; confirm this matches your supported runtime matrix.- Since the help string documents the default, you could reflect it at the argparse layer via
default=DEFAULT_COPYLEFT_LICENSE_SOURCES(leaving Copyleft’s defaulting logic as a safety net). Not required, but it would make the default visible directly onargs.license_sources.Also applies to: 704-714, 1757-1769
src/scanoss/inspection/utils/scan_result_processor.py (1)
74-74: Consider a more specific type hint.The parameter uses a generic
listtype hint. For better type safety and IDE support, consider usingOptional[List[str]]orlist[str] | None.- license_sources: list = None, + license_sources: list[str] | None = None,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
CHANGELOG.md(1 hunks)LICENSE(1 hunks)README.md(1 hunks)src/scanoss/__init__.py(1 hunks)src/scanoss/cli.py(4 hunks)src/scanoss/constants.py(1 hunks)src/scanoss/data/osadl-copyleft.json(1 hunks)src/scanoss/inspection/policy_check/scanoss/copyleft.py(4 hunks)src/scanoss/inspection/utils/license_utils.py(1 hunks)src/scanoss/inspection/utils/scan_result_processor.py(3 hunks)src/scanoss/osadl.py(1 hunks)src/scanoss/scanner.py(5 hunks)tests/test_osadl.py(1 hunks)tests/test_policy_inspect.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/scanoss/inspection/policy_check/scanoss/copyleft.py
- src/scanoss/scanner.py
- src/scanoss/constants.py
🧰 Additional context used
🧬 Code graph analysis (5)
src/scanoss/inspection/utils/scan_result_processor.py (1)
src/scanoss/inspection/utils/license_utils.py (2)
LicenseUtil(30-116)init(47-74)
tests/test_policy_inspect.py (2)
src/scanoss/inspection/policy_check/scanoss/copyleft.py (2)
Copyleft(50-240)run(220-240)src/scanoss/inspection/policy_check/policy_check.py (2)
run(87-102)PolicyStatus(33-44)
tests/test_osadl.py (2)
src/scanoss/osadl.py (2)
Osadl(31-130)is_copyleft(108-130)src/scanoss/inspection/utils/license_utils.py (1)
is_copyleft(76-107)
src/scanoss/inspection/utils/license_utils.py (2)
src/scanoss/osadl.py (3)
Osadl(31-130)print_debug(62-67)is_copyleft(108-130)src/scanoss/scanossbase.py (2)
ScanossBase(28-107)print_debug(58-63)
src/scanoss/osadl.py (1)
src/scanoss/inspection/utils/license_utils.py (1)
is_copyleft(76-107)
🪛 LanguageTool
README.md
[uncategorized] ~140-~140: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...vecommons.org/licenses/by/4.0/) by the [Open Source Automation Development Lab (OSADL) eG](...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...g/). Attribution: A project by the Open Source Automation Development Lab (OSADL) eG. ...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
CHANGELOG.md
[uncategorized] ~17-~17: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - Copyleft detection now uses [OSADL (Open Source Automation Development Lab)](https://ww...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (8)
tests/test_policy_inspect.py (1)
31-31: ****The review comment suggests aligning line 31 to use
src.scanoss.*instead ofscanoss.*. However, the project configuration inpyproject.tomldeclaresknown-first-party = ["scanoss"], which specifies that the correct import style isfrom scanoss.*(notsrc.scanoss.*). This aligns with how the source code itself imports from constants (src/scanoss/cli.pyandsrc/scanoss/inspection/policy_check/scanoss/copyleft.pyboth usefrom scanoss.constants).Line 31 is actually correct. The inconsistency in the test file is in lines 32-39, which use
src.scanoss.*and should be aligned to match the package-style imports. The review comment identified a real inconsistency but suggested the wrong direction for the fix.Likely an incorrect or invalid review comment.
tests/test_osadl.py (1)
34-82: Good OSADL behavior coverageThe test suite exercises initialization, GPL/MIT decisions, case-insensitivity, and empty/None handling in a way that closely matches
Osadl.is_copyleftsemantics. This should give good confidence in the OSADL-backed copyleft logic.src/scanoss/data/osadl-copyleft.json (1)
1-133: OSADL copyleft dataset structure matches loader expectationsThe JSON layout (metadata +
copyleftmapping) aligns withOsadl._load_copyleft_data(top-levelcopyleftobject, license IDs as keys, status strings as values) and should work correctly with the lowercasing done in the loader.src/scanoss/inspection/utils/license_utils.py (1)
25-75: OSADL-backed copyleft logic and idempotent filters look solidThe refactor to use
Osadlplusinclude/exclude/explicitfilters is clean and matches the documented 4-step decision flow:
init()now clearsinclude_licenses,exclude_licenses, andexplicit_licensesup front, making it safe to call multiple times and addressing the prior “sticky state” concern.explicit_licensescorrectly short-circuits OSADL and other filters; otherwise include → True, exclude → False, and fallback toself.osadl.is_copyleft(spdxid).Overall this keeps the API stable while making behavior more predictable and easier to reason about.
Also applies to: 76-107
src/scanoss/inspection/utils/scan_result_processor.py (4)
332-332: Verify that 'unknown' source should always be included in filtering mode.Line 332 automatically adds
'unknown'to the filter set, meaning licenses with an unknown source will always be included regardless of the user's--license-sourcesselection. This prevents users from excluding unknown-source licenses when filtering.Please confirm this is the intended behavior. If so, consider adding a comment explaining the rationale (e.g., "Always include unknown sources as a safety fallback").
167-171: Implementation looks good.The replacement of the previous priority-based logic with the new
_select_licensesmethod is well-integrated. The method cleanly separates filtering mode from priority mode, and the selected licenses are properly processed in the iteration loop.
316-361: LGTM - dual-mode selection logic is sound.The
_select_licensesmethod correctly implements both filtering and priority modes:
- Filtering mode (lines 331-334): Filters licenses by source when
license_sourcesis specified- Priority mode (lines 336-361): Preserves the original priority-based selection (component_declared > license_file > file_header > scancode)
- De-duplication within each source (line 351) prevents duplicate license names
- Graceful fallback to all licenses when no priority sources are found (line 361)
The logic is well-structured and maintains backward compatibility.
331-334: The current implementation is correct and the review comment is inaccurate.The condition
if self.license_sources:correctly implements the intended behavior. While an empty list[]is technically falsy, this is the correct semantic choice. Here's why:
Design pattern prevents empty lists: The Copyleft class (line 92) uses
license_sources or DEFAULT_COPYLEFT_LICENSE_SOURCES, so empty lists never reach ScanResultProcessor in practice.Intended semantics are clear: Filtering mode activates only for non-empty lists. An empty list should fall through to priority mode, not filter to zero results.
The suggested fix would introduce a bug: Changing to
if self.license_sources is not None:would cause empty lists to filter licenses to nothing (empty result), contradicting the class design where Copyleft prevents empty lists via itsoroperator.No code paths use empty lists: A search of the codebase found no instances of
license_sources = [], confirming this is never a real use case.Likely an incorrect or invalid review comment.
Summary by CodeRabbit
New Features
Changed
Documentation
Bug Fixes