Skip to content

Notion page fails to sync when there's a Callout block with no Icon #500

@teddysupercuts

Description

@teddysupercuts

Steps to reproduce:

  1. Create a Notion page
  2. Add a 'Callout' block
  3. Remove the icon
  4. Try to sync with unstructured-ingest
Image

Here is a trace:

failed to sync page content for 1e75b4e6-7d26-80f2-934f-e5325297b4c3: failed to fetch page content: failed to fetch Notion resources: python script error: Error during Notion API connection or download: Pipeline did not run successfully
Script output: 2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - notion_api_key: ntn_2...
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - page_ids: ['1e75b4e67d2680f2934fe5325297b4c3']
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - database_ids: []
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - output_dir: /tmp/notion_resource_2377192960
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - recursive: False
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - Starting Notion download for 1 pages
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - Page IDs: 1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:36,993 - notion_resource_fetcher - INFO - Recursive mode: disabled
2025-05-02 00:24:36,994 MainProcess INFO     created indexer with configs: {"page_ids":["1e75b4e67d2680f2934fe5325297b4c3"],"database_ids":[],"recursive":false}, connection configs: {"access_config":"**********"}
2025-05-02 00:24:36,994 MainProcess INFO     Created download with configs: {"download_dir":null}, connection configs: {"access_config":"**********"}
2025-05-02 00:24:36,994 MainProcess INFO     created partition with configs: {"strategy":"hi_res","ocr_languages":null,"encoding":null,"additional_partition_args":null,"skip_infer_table_types":null,"fields_include":["element_id","text","type","metadata","embeddings"],"flatten_metadata":false,"metadata_exclude":[],"element_exclude":[],"metadata_include":[],"partition_endpoint":"https://api.unstructuredapp.io/general/v0/general","partition_by_api":false,"api_timeout_ms":null,"api_key":null,"hi_res_model_name":null,"raise_unsupported_filetype":false}
2025-05-02 00:24:36,994 MainProcess INFO     Created upload with configs: {"output_dir":"/tmp/notion_resource_2377192960/structured-output"}, connection configs: {"access_config":"**********"}
2025-05-02 00:24:36,994 MainProcess INFO     created upload_stage with configs: {}
2025-05-02 00:24:37,111 MainProcess INFO     HEAD https://api.notion.com/v1/users
2025-05-02 00:24:37,111 MainProcess DEBUG    => None -- None
2025-05-02 00:24:37,250 MainProcess INFO     running local pipeline: indexer (NotionIndexer) -> download (NotionDownloader) -> partition (hi_res) -> upload_stage (BlobStoreUploadStager) -> upload (LocalUploader) with configs: {"reprocess":false,"verbose":true,"tqdm":false,"work_dir":"/root/.cache/unstructured/ingest/pipeline/process_32","num_processes":1,"max_connections":null,"raise_on_error":true,"disable_parallelism":false,"preserve_downloads":false,"download_only":false,"re_download":false,"uncompress":false,"iter_delete":false,"delete_cache":true,"otel_endpoint":null,"status":{}}
2025-05-02 00:24:37,250 MainProcess INFO     indexer finished in 2.1233e-05s
2025-05-02 00:24:37,256 MainProcess INFO     GET https://api.notion.com/v1/pages/1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:37,256 MainProcess DEBUG    => {} -- None
2025-05-02 00:24:37,955 MainProcess DEBUG    => {'object': 'page', 'id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3', 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:24:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'cover': None, 'icon': None, 'parent': {'type': 'workspace', 'workspace': True}, 'archived': False, 'in_trash': False, 'properties': {'title': {'id': 'title', 'type': 'title', 'title': [{'type': 'text', 'text': {'content': 'This Should Fail', 'link': None}, 'annotations': {'bold': False, 'italic': False, 'strikethrough': False, 'underline': False, 'code': False, 'color': 'default'}, 'plain_text': 'This Should Fail', 'href': None}]}}, 'url': 'https://www.notion.so/This-Should-Fail-1e75b4e67d2680f2934fe5325297b4c3', 'public_url': None, 'request_id': '7125d9ad-88a1-470f-8bb2-03c2eaab3676'}
2025-05-02 00:24:37,955 MainProcess DEBUG    generated file data: {"identifier":"1e75b4e67d2680f2934fe5325297b4c3","connector_type":"notion","source_identifiers":{"filename":"1e75b4e67d2680f2934fe5325297b4c3.html","fullpath":"1e75b4e67d2680f2934fe5325297b4c3.html","rel_path":"1e75b4e67d2680f2934fe5325297b4c3.html"},"metadata":{"url":null,"version":null,"record_locator":{"page_id":"1e75b4e67d2680f2934fe5325297b4c3"},"date_created":"2025-05-02T00:11:00.000Z","date_modified":"2025-05-02T00:24:00.000Z","date_processed":"1746145477.955145","permissions_data":null,"filesize_bytes":null},"additional_metadata":{"created_by":{"id":"1b9d872b-594c-81c9-99ef-0002089759a7","object":"user"},"last_edited_by":{"id":"1b9d872b-594c-81c9-99ef-0002089759a7","object":"user"},"parent":{"type":"workspace","workspace":true},"url":"https://www.notion.so/This-Should-Fail-1e75b4e67d2680f2934fe5325297b4c3"},"reprocess":false,"local_download_path":null,"display_name":null}
2025-05-02 00:24:37,955 MainProcess INFO     calling DownloadStep with 1 docs
2025-05-02 00:24:37,955 MainProcess INFO     processing content async
2025-05-02 00:24:37,956 MainProcess DEBUG    /root/.cache/unstructured/ingest/pipeline/process_32/indexer/eed2596e408f.json not detected as batch file data
2025-05-02 00:24:37,964 MainProcess INFO     GET https://api.notion.com/v1/blocks/1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:37,965 MainProcess DEBUG    => None -- None
2025-05-02 00:24:38,066 MainProcess DEBUG    => {'object': 'block', 'id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3', 'parent': {'type': 'workspace', 'workspace': True}, 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:24:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'has_children': True, 'archived': False, 'in_trash': False, 'type': 'child_page', 'child_page': {'title': 'This Should Fail'}, 'request_id': '908b817f-d1dd-4308-b53d-1ec9fc738fac'}
2025-05-02 00:24:38,066 MainProcess DEBUG    processing page id: 1e75b4e67d2680f2934fe5325297b4c3
2025-05-02 00:24:38,066 MainProcess INFO     GET https://api.notion.com/v1/blocks/1e75b4e6-7d26-80f2-934f-e5325297b4c3/children
2025-05-02 00:24:38,066 MainProcess DEBUG    => {} -- None
2025-05-02 00:24:39,814 MainProcess DEBUG    => {'object': 'list', 'results': [{'object': 'block', 'id': '1e75b4e6-7d26-806a-9dad-da59d9ab84a8', 'parent': {'type': 'page_id', 'page_id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3'}, 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:12:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'has_children': False, 'archived': False, 'in_trash': False, 'type': 'callout', 'callout': {'rich_text': [{'type': 'text', 'text': {'content': 'HEY THIS IS A CALLOUT', 'link': None}, 'annotations': {'bold': False, 'italic': False, 'strikethrough': False, 'underline': False, 'code': False, 'color': 'default'}, 'plain_text': 'HEY THIS IS A CALLOUT', 'href': None}], 'icon': None, 'color': 'gray_background'}}, {'object': 'block', 'id': '1e75b4e6-7d26-8027-90c4-d342a39e4400', 'parent': {'type': 'page_id', 'page_id': '1e75b4e6-7d26-80f2-934f-e5325297b4c3'}, 'created_time': '2025-05-02T00:11:00.000Z', 'last_edited_time': '2025-05-02T00:11:00.000Z', 'created_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'last_edited_by': {'object': 'user', 'id': '1b9d872b-594c-81c9-99ef-0002089759a7'}, 'has_children': False, 'archived': False, 'in_trash': False, 'type': 'paragraph', 'paragraph': {'rich_text': [], 'color': 'default'}}], 'next_cursor': None, 'has_more': False, 'type': 'block', 'block': {}, 'request_id': 'c24fed34-744d-4731-ad8c-c93541a7a7f4'}
2025-05-02 00:24:39,814 MainProcess ERROR    Error downloading page 1e75b4e67d2680f2934fe5325297b4c3: 'NoneType' object has no attribute 'get'
2025-05-02 00:24:39,815 MainProcess INFO     download finished in 1.859335789s, attributes: file_id=eed2596e408f
2025-05-02 00:24:39,815 MainProcess ERROR    Exception raised while running download
Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/unstructured_ingest/pipeline/interfaces.py", line 171, in run_async
    return await self._run_async(fn=fn, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/unstructured_ingest/pipeline/steps/download.py", line 113, in _run_async
    return self.create_step_results(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/unstructured_ingest/pipeline/steps/download.py", line 129, in create_step_results
    download_path = download_results["path"]
                    ~~~~~~~~~~~~~~~~^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
2025-05-02 00:24:39,817 MainProcess INFO     download step finished in 1.86135445s
2025-05-02 00:24:39,818 MainProcess INFO     ingest process finished in 2.823138445s
2025-05-02 00:24:39,818 MainProcess ERROR    1 failed documents:
2025-05-02 00:24:39,818 MainProcess ERROR    /root/.cache/unstructured/ingest/pipeline/process_32/indexer/eed2596e408f.json: [download] 'NoneType' object is not subscriptable
2025-05-02 00:24:39,818 MainProcess INFO     deleting cache directory: /root/.cache/unstructured/ingest/pipeline/process_32
Error during Notion download: Pipeline did not run successfully

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions