-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
Steps to reproduce:
- Create a Notion page
- Add a 'Callout' block
- Remove the icon
- Try to sync with unstructured-ingest

Here is a trace:
failed to sync page content for 1e75b4e6-7d26-80f2-934f-e5325297b4c3: failed to fetch page content: failed to fetch Notion resources: python script error: Error during Notion API connection or download: Pipeline did not run successfully
Script output: 2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - notion_api_key: ntn_2...
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - page_ids: ['1e75b4e67d2680f2934fe5325297b4c3']
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - database_ids: []
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - output_dir: /tmp/notion_resource_2377192960
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - recursive: False
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - Starting Notion download for 1 pages
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - Page IDs: 1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - Recursive mode: disabled
2025-05-02 00:24:36,994 MainProcess INFO created indexer with configs: {"page_ids":["1e75b4e67d2680f2934fe5325297b4c3"],"database_ids":[],"recursive":false}, connection configs: {"access_config":"**********"}
2025-05-02 00:24:36,994 MainProcess INFO Created download with configs: {"download_dir":null}, connection configs: {"access_config":"**********"}
2025-05-02 00:24:36,994 MainProcess INFO created partition with configs: {"strategy":"hi_res","ocr_languages":null,"encoding":null,"additional_partition_args":null,"skip_infer_table_types":null,"fields_include":["element_id","text","type","metadata","embeddings"],"flatten_metadata":false,"metadata_exclude":[],"element_exclude":[],"metadata_include":[],"partition_endpoint":"https://api.unstructuredapp.io/general/v0/general","partition_by_api":false,"api_timeout_ms":null,"api_key":null,"hi_res_model_name":null,"raise_unsupported_filetype":false}
2025-05-02 00:24:36,994 MainProcess INFO Created upload with configs: {"output_dir":"/tmp/notion_resource_2377192960/structured-output"}, connection configs: {"access_config":"**********"}
2025-05-02 00:24:36,994 MainProcess INFO created upload_stage with configs: {}
2025-05-02 00:24:37,111 MainProcess INFO HEAD https://api.notion.com/v1/users
2025-05-02 00:24:37,111 MainProcess DEBUG => None -- None
2025-05-02 00:24:37,250 MainProcess INFO running local pipeline: indexer (NotionIndexer) -> download (NotionDownloader) -> partition (hi_res) -> upload_stage (BlobStoreUploadStager) -> upload (LocalUploader) with configs: {"reprocess":false,"verbose":true,"tqdm":false,"work_dir":"/root/.cache/unstructured/ingest/pipeline/process_32","num_processes":1,"max_connections":null,"raise_on_error":true,"disable_parallelism":false,"preserve_downloads":false,"download_only":false,"re_download":false,"uncompress":false,"iter_delete":false,"delete_cache":true,"otel_endpoint":null,"status":{}}
2025-05-02 00:24:37,250 MainProcess INFO indexer finished in 2.1233e-05s
2025-05-02 00:24:37,256 MainProcess INFO GET https://api.notion.com/v1/pages/1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:37,256 MainProcess DEBUG => {} -- None
2025-05-02 00:24:37,955 MainProcess DEBUG => {'object': 'page', 'id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3', 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:24:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'cover': None, 'icon': None, 'parent': {'type': 'workspace', 'workspace': True}, 'archived': False, 'in_trash': False, 'properties': {'title': {'id': 'title', 'type': 'title', 'title': [{'type': 'text', 'text': {'content': 'This Should Fail', 'link': None}, 'annotations': {'bold': False, 'italic': False, 'strikethrough': False, 'underline': False, 'code': False, 'color': 'default'}, 'plain_text': 'This Should Fail', 'href': None}]}}, 'url': 'https://www.notion.so/This-Should-Fail-1e75b4e67d2680f2934fe5325297b4c3', 'public_url': None, 'request_id': '7125d9ad-88a1-470f-8bb2-03c2eaab3676'}
2025-05-02 00:24:37,955 MainProcess DEBUG generated file data: {"identifier":"1e75b4e67d2680f2934fe5325297b4c3","connector_type":"notion","source_identifiers":{"filename":"1e75b4e67d2680f2934fe5325297b4c3.html","fullpath":"1e75b4e67d2680f2934fe5325297b4c3.html","rel_path":"1e75b4e67d2680f2934fe5325297b4c3.html"},"metadata":{"url":null,"version":null,"record_locator":{"page_id":"1e75b4e67d2680f2934fe5325297b4c3"},"date_created":"2025-05-02T00:11:00.000Z","date_modified":"2025-05-02T00:24:00.000Z","date_processed":"1746145477.955145","permissions_data":null,"filesize_bytes":null},"additional_metadata":{"created_by":{"id":"1b9d872b-594c-81c9-99ef-0002089759a7","object":"user"},"last_edited_by":{"id":"1b9d872b-594c-81c9-99ef-0002089759a7","object":"user"},"parent":{"type":"workspace","workspace":true},"url":"https://www.notion.so/This-Should-Fail-1e75b4e67d2680f2934fe5325297b4c3"},"reprocess":false,"local_download_path":null,"display_name":null}
2025-05-02 00:24:37,955 MainProcess INFO calling DownloadStep with 1 docs
2025-05-02 00:24:37,955 MainProcess INFO processing content async
2025-05-02 00:24:37,956 MainProcess DEBUG /root/.cache/unstructured/ingest/pipeline/process_32/indexer/eed2596e408f.json not detected as batch file data
2025-05-02 00:24:37,964 MainProcess INFO GET https://api.notion.com/v1/blocks/1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:37,965 MainProcess DEBUG => None -- None
2025-05-02 00:24:38,066 MainProcess DEBUG => {'object': 'block', 'id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3', 'parent': {'type': 'workspace', 'workspace': True}, 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:24:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'has_children': True, 'archived': False, 'in_trash': False, 'type': 'child_page', 'child_page': {'title': 'This Should Fail'}, 'request_id': '908b817f-d1dd-4308-b53d-1ec9fc738fac'}
2025-05-02 00:24:38,066 MainProcess DEBUG processing page id: 1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:38,066 MainProcess INFO GET https://api.notion.com/v1/blocks/1e75b4e6-7d26-80f2-934f-e5325297b4c3/children
2025-05-02 00:24:38,066 MainProcess DEBUG => {} -- None
2025-05-02 00:24:39,814 MainProcess DEBUG => {'object': 'list', 'results': [{'object': 'block', 'id': '1e75b4e6-7d26-806a-9dad-da59d9ab84a8', 'parent': {'type': 'page_id', 'page_id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3'}, 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:12:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'has_children': False, 'archived': False, 'in_trash': False, 'type': 'callout', 'callout': {'rich_text': [{'type': 'text', 'text': {'content': 'HEY THIS IS A CALLOUT', 'link': None}, 'annotations': {'bold': False, 'italic': False, 'strikethrough': False, 'underline': False, 'code': False, 'color': 'default'}, 'plain_text': 'HEY THIS IS A CALLOUT', 'href': None}], 'icon': None, 'color': 'gray_background'}}, {'object': 'block', 'id': '1e75b4e6-7d26-8027-90c4-d342a39e4400', 'parent': {'type': 'page_id', 'page_id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3'}, 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:11:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'has_children': False, 'archived': False, 'in_trash': False, 'type': 'paragraph', 'paragraph': {'rich_text': [], 'color': 'default'}}], 'next_cursor': None, 'has_more': False, 'type': 'block', 'block': {}, 'request_id': 'c24fed34-744d-4731-ad8c-c93541a7a7f4'}
2025-05-02 00:24:39,814 MainProcess ERROR Error downloading page 1e75b4e67d2680f2934fe5325297b4c3: 'NoneType' object has no attribute 'get'
2025-05-02 00:24:39,815 MainProcess INFO download finished in 1.859335789s, attributes: file_id=eed2596e408f
2025-05-02 00:24:39,815 MainProcess ERROR Exception raised while running download
Traceback (most recent call last):
File "/venv/lib/python3.11/site-packages/unstructured_ingest/pipeline/interfaces.py", line 171, in run_async
return await self._run_async(fn=fn, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/unstructured_ingest/pipeline/steps/download.py", line 113, in _run_async
return self.create_step_results(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/unstructured_ingest/pipeline/steps/download.py", line 129, in create_step_results
download_path = download_results["path"]
~~~~~~~~~~~~~~~~^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
2025-05-02 00:24:39,817 MainProcess INFO download step finished in 1.86135445s
2025-05-02 00:24:39,818 MainProcess INFO ingest process finished in 2.823138445s
2025-05-02 00:24:39,818 MainProcess ERROR 1 failed documents:
2025-05-02 00:24:39,818 MainProcess ERROR /root/.cache/unstructured/ingest/pipeline/process_32/indexer/eed2596e408f.json: [download] 'NoneType' object is not subscriptable
2025-05-02 00:24:39,818 MainProcess INFO deleting cache directory: /root/.cache/unstructured/ingest/pipeline/process_32
Error during Notion download: Pipeline did not run successfully
Metadata
Metadata
Assignees
Labels
No labels