[Feature Request]: Is there a way to estimate resource usage before deeply crawling a URL? #1022
QuangTQV
started this conversation in
Feature requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What needs to be done?
Add a feature that estimates the potential resource consumption (e.g., number of pages, redirects, content size, link depth) of a URL before initiating a full deep crawl. This would allow developers to apply guardrails to reject overly complex or heavy URLs early.
What problem does this solve?
In chatbot systems where users can provide URLs for data ingestion, deeply crawling large or complex websites can lead to excessive CPU and memory usage, potentially crashing the system. A resource estimation step before deep crawling would prevent system overload by rejecting problematic URLs upfront with a clear warning to users.
Target users/beneficiaries
No response
Current alternatives/workarounds
No response
Proposed approach
No response
Beta Was this translation helpful? Give feedback.
All reactions