Skip to content

Historical Data Management OSS-Fuzz SDK Implementation #1150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

zewei-wang
Copy link
Collaborator

Summary

This PR introduces a comprehensive Historical Data SDK for OSS-Fuzz, providing a unified interface for accessing, storing, and analyzing historical fuzzing data. The SDK enables researchers and developers to track fuzzing progress over time, analyze trends, and generate detailed reports across builds, crashes, corpus, and coverage data.

Features

  • Main SDK Facade: Implemented OSSFuzzSDK class providing unified access to all historical data functionality
  • History Managers: Added specialized managers for different data types:
    • BuildHistoryManager - Build history, success rates, and artifacts
    • CrashHistoryManager - Crash data, deduplication, and analysis
    • CorpusHistoryManager - Corpus growth, statistics, and effectiveness
    • CoverageHistoryManager - Coverage data, trends, and reporting
  • Storage Infrastructure: Enhanced storage system with:
    • StorageManager - Unified storage backend management
    • StorageAdapter - Abstract interface with file and GCS implementations
  • Data Models: Extended core data models with historical data structures
  • Error Handling: Added comprehensive error handling for historical data operations

Testing

  • New Test Suite: Added test_historical_data_sdk.py with comprehensive coverage of:
    • SDK initialization and configuration
    • History manager functionality
    • Storage adapter operations
    • Error handling scenarios
    • Data model validation
  • Compatibility Updates: Modified existing test files (test_cloud_builder_pipeline.py, test_local_builder_pipeline.py) to use proper path resolution

None - This is a purely additive feature that extends the existing SDK without modifying existing APIs or functionality.

- Add comprehensive data models for build, crash, corpus, and coverage history
- Implement HistoricalSummary model for aggregated statistics
- Add specialized error classes for SDK configuration and validation
- Include proper type hints and Pydantic validation
- Extend storage adapters with history-specific functionality
- Add support for time-series data storage and retrieval
- Implement environment variable utilities for configuration
- Improve error handling and logging in storage operations
- Add abstract HistoryManager base class with common functionality
- Implement BuildHistoryManager for build statistics and trends
- Add CoverageHistoryManager for coverage data analysis
- Include data validation and storage abstraction
- Add comprehensive logging and error handling
- Implement CorpusHistoryManager for corpus growth analysis
- Add CrashHistoryManager for crash tracking and statistics
- Include duplicate detection and data validation
- Complete the historical data management infrastructure
- Add OSSFuzzSDK class as main entry point for historical data
- Implement project report generation and analysis features
- Add fuzzing efficiency analysis and health scoring
- Include environment configuration and error handling
- Provide unified interface for all history managers
- Export OSSFuzzSDK and history managers in package __init__
- Add data models and error classes to public API
- Maintain backward compatibility with existing exports
- Complete integration of historical data functionality
- Add test suite for OSSFuzzSDK main functionality
- Include tests for all history managers (build, crash, corpus, coverage)
- Test configuration, error handling, and edge cases
- Ensure proper integration with storage and data validation
- Add mocking for external dependencies
@zewei-wang
Copy link
Collaborator Author

/gcbrun exp -n zewei -m vertex_ai_gemini-2-5-flash-chat -ag -b quick-test

@zewei-wang zewei-wang requested review from Copilot and DonggeLiu July 22, 2025 20:08
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a comprehensive Historical Data SDK for OSS-Fuzz, providing unified access to historical fuzzing data with specialized managers for builds, crashes, corpus, and coverage analysis. The implementation includes storage infrastructure, data models, and extensive testing capabilities.

Key Changes:

  • Introduces the main OSSFuzzSDK facade class for unified historical data access
  • Adds specialized history managers for builds, crashes, corpus, and coverage data
  • Extends storage infrastructure with history-specific operations and multiple backend support

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
ossfuzz_py/utils/env_vars.py Adds environment variables for historical data storage configuration
ossfuzz_py/unittests/test_local_builder_pipeline.py Updates path resolution for benchmark YAML file
ossfuzz_py/unittests/test_historical_data_sdk.py Comprehensive test suite for new SDK functionality
ossfuzz_py/unittests/test_cloud_builder_pipeline.py Updates path resolution for benchmark YAML file
ossfuzz_py/history/*.py New history manager classes and base functionality
ossfuzz_py/errors/*.py New error types for historical data operations
ossfuzz_py/data/storage_*.py Extended storage infrastructure with history operations
ossfuzz_py/core/ossfuzz_sdk.py Main SDK facade implementation
ossfuzz_py/core/data_models.py New data models for historical data structures
ossfuzz_py/init.py Updates to public API exports

…atures

- Update cloud builder pipeline tests for new SDK integration
- Modify local builder pipeline tests to work with enhanced functionality
- Ensure backward compatibility and proper error handling
- Fix any test conflicts with new historical data features
@zewei-wang
Copy link
Collaborator Author

/gcbrun exp -n zewei -m vertex_ai_gemini-2-5-flash-chat -ag -b quick-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant