Skip to content

Commit 5d453c5

Browse files
authored
fix: remove max_pages parameter from confluence loader parameters if it is a string (#86)
This pull request introduces an improvement to the handling of the `max_pages` parameter in the Confluence extractor. The update ensures that the parameter is only included if it is properly set and of the correct type, which helps prevent potential issues when passing parameters to the `ConfluenceLoader`. Parameter handling improvements: * In `extractors/confluence_extractor.py`, the code now removes the `max_pages` parameter from `confluence_loader_parameters` if it is missing or is a string, ensuring only valid integer values are passed to the loader. This PR fixes following issue: #85
1 parent 0fb959a commit 5d453c5

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

libs/extractor-api-lib/src/extractor_api_lib/impl/extractors/confluence_extractor.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
"""Module for the DefaultConfluenceExtractor class."""
22

3+
import logging
34
from langchain_community.document_loaders import ConfluenceLoader
45

56
from extractor_api_lib.impl.types.extractor_types import ExtractorTypes
@@ -10,6 +11,8 @@
1011
ConfluenceLangchainDocument2InformationPiece,
1112
)
1213

14+
logger = logging.getLogger(__name__)
15+
1316

1417
class ConfluenceExtractor(InformationExtractor):
1518
"""Implementation of the InformationExtractor interface for confluence."""
@@ -54,6 +57,13 @@ async def aextract_content(
5457
confluence_loader_parameters = {
5558
x.key: int(x.value) if x.value.isdigit() else x.value for x in extraction_parameters.kwargs
5659
}
60+
if not confluence_loader_parameters.get("max_pages") or isinstance(
61+
confluence_loader_parameters.get("max_pages"), str
62+
):
63+
logging.warning(
64+
"max_pages parameter is not set or invalid discarding it. ConfluenceLoader will use default value."
65+
)
66+
confluence_loader_parameters.pop("max_pages")
5767
# Drop the document_name parameter as it is not used by the ConfluenceLoader
5868
if "document_name" in confluence_loader_parameters:
5969
confluence_loader_parameters.pop("document_name", None)

0 commit comments

Comments
 (0)