Skip to content

Conversation

pranjalg1331
Copy link
Contributor

(Please add to the PR name the issue/s that this PR would close if merged by using a Github keyword. Example: <feature name>. Closes #999. If your PR is made by a single commit, please add that clause in the commit too. This is all required to automate the closure of related issues.)

Description

Please include a summary of the change and link to the related issue.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue).
  • [ x] New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist

  • [ x] I have read and understood the rules about how to Contribute to this project
  • [ x] The pull request is for the branch develop
  • A new plugin (analyzer, connector, visualizer, playbook, pivot or ingestor) was added or changed, in which case:
    • I strictly followed the documentation "How to create a Plugin"
    • Usage file was updated. A link to the PR to the docs repo has been added as a comment here.
    • Advanced-Usage was updated (in case the plugin provides additional optional configuration). A link to the PR to the docs repo has been added as a comment here.
    • I have dumped the configuration from Django Admin using the dumpplugin command and added it in the project as a data migration. ("How to share a plugin with the community")
    • If a File analyzer was added and it supports a mimetype which is not already supported, you added a sample of that type inside the archive test_files.zip and you added the default tests for that mimetype in test_classes.py.
    • If you created a new analyzer and it is free (does not require any API key), please add it in the FREE_TO_USE_ANALYZERS playbook by following this guide.
    • Check if it could make sense to add that analyzer/connector to other freely available playbooks.
    • I have provided the resulting raw JSON of a finished analysis and a screenshot of the results.
    • If the plugin interacts with an external service, I have created an attribute called precisely url that contains this information. This is required for Health Checks (HEAD HTTP requests).
    • If the plugin requires mocked testing, _monkeypatch() was used in its class to apply the necessary decorators.
    • I have added that raw JSON sample to the MockUpResponse of the _monkeypatch() method. This serves us to provide a valid sample for testing.
    • I have created the corresponding DataModel for the new analyzer following the documentation
  • I have inserted the copyright banner at the start of the file: # This file is a part of IntelOwl https://github.com/intelowlproject/IntelOwl # See the file 'LICENSE' for copying permission.
  • Please avoid adding new libraries as requirements whenever it is possible. Use new libraries only if strictly needed to solve the issue you are working for. In case of doubt, ask a maintainer permission to use a specific library.
  • If external libraries/packages with restrictive licenses were added, they were added in the Legal Notice section.
  • Linters (Black, Flake, Isort) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
  • [ x] I have added tests for the feature/bug I solved (see tests folder). All the tests (new and old ones) gave 0 errors.
  • If the GUI has been modified:
    • I have a provided a screenshot of the result in the PR.
    • I have created new frontend tests for the new component or updated existing ones.
  • After you had submitted the PR, if DeepSource, Django Doctors or other third-party linters have triggered any alerts during the CI checks, I have solved those alerts.

Important Rules

  • If you miss to compile the Checklist properly, your PR won't be reviewed by the maintainers.
  • Everytime you make changes to the PR and you think the work is done, you should explicitly ask for a review by using GitHub's reviewing system detailed here.

@pranjalg1331
Copy link
Contributor Author

Hello mentors, I hope you’re all doing well.
In this PR, as discussed in the proposal, I have implemented a parent test class that runs unittest subtests for all supported analyzer observable types. I also updated the test_nvd_cve to work with this new base test class. Could you please review it before I extend this approach to the other analyzers?

image

Copy link
Contributor

@fgibertoni fgibertoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!
I like the general approach you're proposing here. My only consideration is about the analyzer_mocks.py file: I like having a centralized place to hold the configurations but I think that each mock should be related to one or two classes at most. So I don't see the benefits of having a separate file and separate dictionary to hold them and having to update it every time a new analyzer/plugin is created. This also imply that for every plugin there will be a similar file, right ?

I would like also to hear what you think about this, and asking for @mlodic and @drosetti for an opinion

@mlodic
Copy link
Member

mlodic commented Jun 7, 2025

Thank you for asking early review. I agree with Federico. General approach is really good and I love it. I think that the mocks should stay inside the related analyzer files. An idea is to make the BaseAnalyzerTest an ABC class and add an abstract property that must be declared by each Analyzer Test that would contain the patch.
In this way, in case there are some different analyzers that use the same patch, you could use another level of inheritance to make it re-usable between difference instances.

@pranjalg1331
Copy link
Contributor Author

Thanks for the feedback! I tried using an ABC with abstract properties and shared test methods, but Django’s test discovery tries to instantiate all TestCase subclasses—even abstract ones. This causes a TypeError when abstract properties aren’t implemented (So I cannot define a base test inside an abstract class).

@fgibertoni
Copy link
Contributor

Can you post a little example of how you did the implementation? So we can help you better 😄

@pranjalg1331
Copy link
Contributor Author

image
Currently, I have implemented the Base Class without using the ABC class. The analyzer's mocked data is in the same file as the analyzer test class. The base class has a get_mocked_response() function which needs to be overridden by every test class that inherits it.

@drosetti
Copy link
Contributor

Thanks for the feedback! I tried using an ABC with abstract properties and shared test methods, but Django’s test discovery tries to instantiate all TestCase subclasses—even abstract ones. This causes a TypeError when abstract properties aren’t implemented (So I cannot define a base test inside an abstract class).

The auto discovery should look for the files called test_*.py, if you put the abstract classes into files with different names shouldn't run.

@pranjalg1331
Copy link
Contributor Author

Yes, @drosetti. The base test case is not directly discovered in base_test_class.py, but when another class inherits it, the base test case is also run for the base class.
Thus, with ABC class we get an error like this
TypeError: Can't instantiate abstract class BaseAnalyzerTest with abstract methods analyzer_class, mock_patch_key

@pranjalg1331
Copy link
Contributor Author

Hello mentors,

Over the past few days, I’ve been exploring the possibility of defining a common base structure for unit testing file-based analyzers. However, I’ve noticed that these analyzers vary significantly — some are Docker-based, while others like Androguard don’t rely on mock data at all.

Given this diversity, I’m finding it challenging to apply a uniform strategy across all of them. I’d appreciate your thoughts on whether we should still aim for a shared base test structure (similar to what we have for observable analyzers), or if it would be more practical to focus on writing well-structured, analyzer-specific unit tests for file_analyzers.

Looking forward to your guidance. 😄

@AnshSinghal
Copy link
Contributor

Hi @pranjalg1331 ,

I had previously explored this task a bit and just wanted to share a quick suggestion if that can help you.
How about a base test class, say BaseFileAnalyzerTest, that handles common setup like creating test jobs and files. Then, each analyzer can have its own test class inheriting from this base, adding specific mocks for Docker or skipping mocks for analyzers like Androguard. Since analyzers support different file types, the base class could parameterize the file type or have a list of supported types per analyzer, testing each type with _analyze_sample.
Totally appreciate your work on this — feel free to ignore if you’ve already considered it or have a better approach!

@fgibertoni
Copy link
Contributor

Hello @pranjalg1331,
I think that the approach suggested by @AnshSinghal should be the way to go. A common class for all file analyzers and then the specific classes for each analyzer.
You can also extend the base BaseFileAnalyzerTest with different subclasses, e.g. XXXFileAnalyzerTest for each "classic" file analyzer, XXXDockerFileAnalyzerTest that contains specific code for Docker based analyzers. The analyzers that don't rely on mocks should be also treated specifically to use generic testing even without mocks.
What do you guys think ? @mlodic @drosetti

Also, let me know if you have any other doubts on this 😃

@pranjalg1331
Copy link
Contributor Author

Hello @fgibertoni
I’m a bit hesitant to introduce jobs into our unit tests, because that would move them away from “pure” unit testing and would largely replicate what we already cover elsewhere. 
I’ve also noticed that some analyzers rely on mocked responses, while others—such as APKID and BoxJS—don’t appear to use any. Given that the testing logic varies so much across analyzers with no mocked responses, I’m not sure a common base test class would offer many advantages.

Could you shed some light on why APKID and BoxJS don’t include mocked data? Is that a deliberate choice, or simply an area we haven’t addressed yet?

@mlodic
Copy link
Member

mlodic commented Jun 19, 2025

Could you shed some light on why APKID and BoxJS don’t include mocked data? Is that a deliberate choice, or simply an area we haven’t addressed yet?

mocked data has been introduced at a later point and those are old analyzer, that's the reason. Ideally, all analyzers should have a decent mock.

@mlodic
Copy link
Member

mlodic commented Jun 19, 2025

I’m a bit hesitant to introduce jobs into our unit tests, because that would move them away from “pure” unit testing and would largely replicate what we already cover elsewhere. 

It makes sense to me

Given this diversity, I’m finding it challenging to apply a uniform strategy across all of them. I’d appreciate your thoughts on whether we should still aim for a shared base test structure (similar to what we have for observable analyzers), or if it would be more practical to focus on writing well-structured, analyzer-specific unit tests for file_analyzers.

To me, it makes sense to use common strategies for only the analyzers that are similar to each other. Yes, there are differences, but they are not many. I think that you can try starting low and easy, implementing the tests for each analyzer and when you find two of them that are different to all the others but similar to each other, create a structure for them.

What Federico suggested "e.g. XXXFileAnalyzerTest for each "classic" file analyzer, XXXDockerFileAnalyzerTest that contains specific code for Docker based analyzers.", it could be an idea. The important thing is to avoid repetition and keep the code clean. When we/you see repetitive patterns, then create an additional structure to handle it

@pranjalg1331
Copy link
Contributor Author

Hello @fgibertoni,

I've implemented a base class for file analyzers that supports both Docker-based and non-Docker analyzers. It works by loading sample files based on mimetypes and mocking any external dependencies. I've also written unit tests for a few analyzers for you to review.

I'd appreciate it if you could take a look at the implementation and share your thoughts—especially on the maintainability of the current structure—before I begin scaling it to all of the analyzers.

Thank you!

Copy link
Contributor

@fgibertoni fgibertoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @pranjalg1331,
I like the general approach that you followed. Great work!
I also think that maybe some new users may find it a bit tricky if they have no experience in IntelOwl, so it will for sure require some documentation. But at the moment I can't think of any way to improve it.
Let's hear also from @mlodic and @drosetti if they have some other suggestions 😄

Copy link
Member

@mlodic mlodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good work :)


def test_analyzer_on_supported_filetypes(self):
if self.analyzer_class is None:
self.skipTest("analyzer_class is not set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print test name so that we can track where is the problem when this trigger

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please address this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

config = AnalyzerConfig.objects.get(
python_module=self.analyzer_class.python_module
)
print(config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover / or use logging.debug

file_bytes = self.get_sample_file_bytes(mimetype)
except (ValueError, OSError):
print(f"SKIPPING {mimetype}")
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify test case in the log and use logging

@pranjalg1331 pranjalg1331 force-pushed the gsoc25_refactor_analyzer_tests branch from 0637a79 to e1b2a66 Compare July 15, 2025 18:17
Copy link
Contributor

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Worth considering though. View full project report here.

return set(cls.MIMETYPE_TO_FILENAME.keys())

@classmethod
def get_extra_config(self) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_extra_config(self) -> dict:
def get_extra_config(cls) -> dict:

Class methods should take cls as the first argument. More info.

raise NotImplementedError

@classmethod
def _apply_patches(self, patches):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _apply_patches(self, patches):
def _apply_patches(cls, patches):

Likewise, Consider using cls instead.

Copy link

gitguardian bot commented Jul 26, 2025

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Worth considering though. View full project report here.

fingerprint_report_mode: int = 2

def run(self):
reports = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reports = dict()
reports = {}

Using dict literal syntax is simpler and computationally quicker. More details.

@pranjalg1331 pranjalg1331 force-pushed the gsoc25_refactor_analyzer_tests branch from 7da6181 to 1f6d8e7 Compare July 26, 2025 08:15
Copy link
Contributor

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to consider. View full project report here.

fingerprint_report_mode: int = 2

def run(self):
reports = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reports = dict()
reports = {}

Using dict literal syntax is simpler and computationally quicker. Explained here.

Copy link
Contributor

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some food for thought. View full project report here.

raise AnalyzerRunException(f"{self.name} An unexpected error occurred: {e}")

@classmethod
def update(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def update(self):
def update(cls):

Class methods should take cls as the first argument. Explained here.

Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
Signed-off-by: pranjalg1331 <[email protected]>
@pranjalg1331 pranjalg1331 force-pushed the gsoc25_refactor_analyzer_tests branch from 09aa2c7 to 5d980ff Compare August 27, 2025 09:11
Signed-off-by: pranjalg1331 <[email protected]>
@drosetti drosetti merged commit 1bbf2a7 into develop Sep 4, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in GSoC 2025 Refactor Analyzer Tests Sep 4, 2025
@drosetti drosetti deleted the gsoc25_refactor_analyzer_tests branch September 4, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants