Skip to content

feat: update document index status model and related components #985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 24, 2025

Conversation

iziang
Copy link
Collaborator

@iziang iziang commented Jun 24, 2025

  • Refactored document index status to use a single status model with new states: CREATING, ACTIVE, DELETING, and DELETION_IN_PROGRESS.
  • Updated related database models, API components, and service logic to accommodate the new status model.
  • Adjusted index management and reconciliation processes to align with the new status definitions.
  • Modified tests to reflect changes in index status handling and ensure compatibility with the updated API responses.

- Refactored document index status to use a single status model with new states: CREATING, ACTIVE, DELETING, and DELETION_IN_PROGRESS.
- Updated related database models, API components, and service logic to accommodate the new status model.
- Adjusted index management and reconciliation processes to align with the new status definitions.
- Modified tests to reflect changes in index status handling and ensure compatibility with the updated API responses.
@apecloud-bot apecloud-bot added the size/XXL Denotes a PR that changes 1000+ lines. label Jun 24, 2025
cursor[bot]

This comment was marked as outdated.

iziang added 3 commits June 24, 2025 15:46
- Introduced a new IndexAction class to standardize index operation types (CREATE, UPDATE, DELETE).
- Updated DocumentIndexReconciler to utilize the new IndexAction constants for improved clarity and maintainability.
- Refactored document deletion logic in DocumentService to streamline the deletion process and ensure proper index reconciliation.
- Enhanced error handling and logging during document deletion and index management operations.
- Updated documentation to reflect changes in the indexing architecture and operational flow.
cursor[bot]

This comment was marked as outdated.

Copy link

cursor bot commented Jun 24, 2025

🚨 BugBot couldn't run

Something went wrong. Try again by commenting "bugbot run", or contact support (requestId: serverGenReqId_8ad2d329-132f-4331-81eb-9e80d2d182ac).

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect Return Type Causes Serialization Errors

The delete_document service method in aperag/service/document_service.py has an incorrect return type annotation (dict) and returns a raw database model (db_models.Document) instead of the expected view_models.Document. This creates a type mismatch with the delete_document_view API endpoint in aperag/views/main.py, which is annotated to return view_models.Document, leading to runtime serialization errors.

aperag/service/document_service.py#L288-L299

async def delete_document(self, user: str, collection_id: str, document_id: str) -> dict:
"""Delete a single document and trigger index reconciliation."""
async def _delete_document_atomically(session: AsyncSession):
return await self._delete_document(session, user, collection_id, document_id)
result = await self.db_ops.execute_with_transaction(_delete_document_atomically)
# Trigger reconciliation to process the deletion
_trigger_index_reconciliation()
return result

aperag/views/main.py#L126-L135

ApeRAG/aperag/views/main.py

Lines 126 to 135 in 8c580b6

@router.delete("/collections/{collection_id}/documents/{document_id}")
@audit(resource_type="document", api_name="DeleteDocument")
async def delete_document_view(
request: Request,
collection_id: str,
document_id: str,
user: User = Depends(current_user),
) -> view_models.Document:
return await document_service.delete_document(str(user.id), collection_id, document_id)

Fix in Cursor


Bug: Incorrect Return Type Causes Serialization Errors

The delete_document method's signature declares a dict return type, but it incorrectly returns a db_models.Document object. This returned object is problematic as it has just been deleted from the session, making it detached and potentially unusable. This type inconsistency will cause serialization errors in the API response, which expects a view_models.Document or a dict. The previous implementation correctly converted the database model to a view model before returning.

aperag/service/document_service.py#L282-L299

# Delete the document record from the database
await session.delete(document)
await session.flush()
logger.info(f"Successfully marked document {document.id} and its indexes for deletion.")
return document
async def delete_document(self, user: str, collection_id: str, document_id: str) -> dict:
"""Delete a single document and trigger index reconciliation."""
async def _delete_document_atomically(session: AsyncSession):
return await self._delete_document(session, user, collection_id, document_id)
result = await self.db_ops.execute_with_transaction(_delete_document_atomically)
# Trigger reconciliation to process the deletion
_trigger_index_reconciliation()
return result

Fix in Cursor


Was this report helpful? Give feedback by reacting with 👍 or 👎

@iziang iziang merged commit aa5585c into main Jun 24, 2025
6 of 7 checks passed
@iziang iziang deleted the support/indexing branch June 24, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants