[Feature Request]: Is there a way to estimate resource usage before deeply crawling a URL? #1022

QuangTQV · 2025-04-24T03:20:43Z

QuangTQV
Apr 24, 2025

What needs to be done?

Add a feature that estimates the potential resource consumption (e.g., number of pages, redirects, content size, link depth) of a URL before initiating a full deep crawl. This would allow developers to apply guardrails to reject overly complex or heavy URLs early.

What problem does this solve?

In chatbot systems where users can provide URLs for data ingestion, deeply crawling large or complex websites can lead to excessive CPU and memory usage, potentially crashing the system. A resource estimation step before deep crawling would prevent system overload by rejecting problematic URLs upfront with a clear warning to users.

Target users/beneficiaries

No response

Current alternatives/workarounds

No response

Proposed approach

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Is there a way to estimate resource usage before deeply crawling a URL? #1022

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Feature Request]: Is there a way to estimate resource usage before deeply crawling a URL? #1022

Uh oh!

QuangTQV Apr 24, 2025

What needs to be done?

What problem does this solve?

Target users/beneficiaries

Current alternatives/workarounds

Proposed approach

Replies: 0 comments

QuangTQV
Apr 24, 2025