Skip to content

Feature request: Add default merge strategy to merge_items and support merging >2 items #916

@dpriskorn

Description

@dpriskorn

def merge_items(from_id: str, to_id: str, login: _Login | None = None, ignore_conflicts: list[str] | None = None, is_bot: bool = False, **kwargs: Any) -> dict:

See https://www.wikidata.org/w/api.php?action=help&modules=wbmergeitems which seem to be lacking such a default strategy.
I looked at the gadget JS merge tool and that has a strategy built in, but the wikibaseintegrator helper function does not currently.
See https://www.wikidata.org/wiki/User:So9q/Gadget-Merge.js

Suggested implementation by chatgpt:

    # Sort QIDs numerically to keep the lowest as target
    sorted_qids = sorted(qids, key=lambda x: int(x[1:]))
    to_id = sorted_qids[0]  # keep lowest QID
    from_ids = sorted_qids[1:]  # merge all others into to_id

The signature of the function could be improved to enable merging of multiple items together (more than two) which is convenient for users, suggestion by chatgpt:

def merge_items(
    qids: list[str],
    login: _Login | None = None,
    ignore_conflicts: List[str] | None = None,
    is_bot: bool = False,
    **kwargs: Any
) -> None:
    """
    Merge multiple Wikibase items into the lowest QID.

    :param qids: List of item QIDs to merge. The lowest QID will be kept.
    :param login: A wbi_login.Login instance
    :param ignore_conflicts: List of elements to ignore conflicts for. Can contain "description", "sitelink", "statement"
    :param is_bot: Mark this edit as bot
    :param kwargs: Additional parameters to pass to mediawiki_api_call_helper
    """
    if not qids or len(qids) < 2:
        raise ValueError("You must provide at least two QIDs to merge")

    # Sort QIDs numerically to keep the lowest
    sorted_qids = sorted(qids, key=lambda x: int(x.lstrip('Q')))
    to_id = sorted_qids[0]          # keep the lowest QID
    from_ids = sorted_qids[1:]      # merge all other QIDs into to_id

    for from_id in from_ids:
        params = {
            'action': 'wbmergeitems',
            'fromid': from_id,
            'toid': to_id,
            'format': 'json'
        }

        if ignore_conflicts is not None:
            params['ignoreconflicts'] = '|'.join(ignore_conflicts)

        if is_bot:
            params['bot'] = ''

        # Make the API call
        try:
            mediawiki_api_call_helper(data=params, login=login, is_bot=is_bot, **kwargs)
            print(f"Merged {from_id} into {to_id}")
        except Exception as e:
            print(f"Error merging {from_id} into {to_id}: {e}")

One could make it a new helper "merge_multiple_items" or add strategy handling supporting different strategies.
I suggest we just keep to the default strategy, I never heard about anyone not wanting to keep the lowest id, an if really do want that, its just a chatgpt response away.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions