Skip to content

Conversation

Copy link

Copilot AI commented Sep 20, 2025

This PR begins the refactoring work to isolate PyPI download daily summarization as a standalone product, as requested in the issue.

Current Architecture Analysis

I've analyzed the existing pipeline structure:

  • Main workflow: Runs weekly via .github/workflows/deploy.yml
  • Data source: Processes the massive bigquery-public-data.pypi.file_downloads table (100+ TiB)
  • Key model: daily_package_downloads creates incremental daily aggregations
  • Downstream: Joins with Safety DB in downloads_and_vulnerabilities for vulnerability analysis
  • Output: Final metrics like vulnerable_downloads_by_package

Clarification Questions

Before implementing the refactoring, I need clarification on several architectural decisions:

  1. Frequency: Should the new standalone workflow run daily or maintain weekly schedule?
  2. Data location: Should the daily model remain in the same BigQuery project or move to a separate one?
  3. Safety DB: Should vulnerability processing remain in main workflow or move to standalone?
  4. Isolation level: Should this be separate folders/namespaces or completely separate repository?
  5. Target confirmation: Is daily_package_downloads the specific model to isolate?

Next Steps

Once clarifications are provided, the implementation will:

  • Create new standalone GitHub Actions workflow for daily downloads processing
  • Isolate the expensive PyPI data processing from vulnerability analysis
  • Update main workflow to treat daily model as external source
  • Ensure proper separation of concerns between workflows

Addressing #17.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • hub.getdbt.com
    • Triggering command: /usr/bin/python3 /home/REDACTED/.local/bin/dbt deps (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copy link
Author

Copilot AI commented Sep 20, 2025

@brabster 👋 This repository doesn't have Copilot instructions. With Copilot instructions, I can understand the repository better, work faster and produce higher quality PRs.

I can generate a .github/copilot-instructions.md file for you automatically. Click here to open a pre-filled issue and assign it to me. I'll write the instructions, and then tag you for review.

Copilot AI changed the title [WIP] Refactor pypi download daily summarisation to be a standalone product Initial analysis for refactoring PyPI download daily summarization Sep 20, 2025
Copilot finished work on behalf of brabster September 20, 2025 22:30
Copilot AI requested a review from brabster September 20, 2025 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants