Skip to content

Conversation

@obenland
Copy link
Member

@obenland obenland commented Oct 13, 2025

Fixes #2277

Proposed changes:

This PR introduces a new Attachments processor class that provides unified handling of ActivityPub attachments (images, videos, audio, documents) for both remote URLs and local files.

  • Unified attachment processing: New Attachments class centralizes all attachment handling logic previously scattered across multiple classes
  • Remote URL support: Downloads and imports media from ActivityPub network using media_handle_sideload()
  • Local file support: Processes files from Mastodon archive imports without moving originals
  • Smart media markup generation: Automatically generates appropriate WordPress blocks (gallery for multiple images, video/audio blocks for media files)
  • Classic Editor compatibility: New integration adds filter-based extensibility for custom markup generation
  • Metadata preservation: Stores source URLs and alt text for images via post meta

Other information:

  • Have you written new tests for your changes, if applicable?

Testing instructions:

Test remote attachments (ActivityPub network):

  1. Set up a test ActivityPub post with images from another server (e.g., Mastodon instance)
  2. Send a Create activity with attachments to your WordPress site
  3. Verify the post is created with images properly attached and displayed
  4. Check that alt text from the ActivityPub name field is preserved as image alt text
  5. Verify images are saved to WordPress media library with correct metadata

Test local attachments (Mastodon import):

  1. Export a Mastodon archive that includes posts with media attachments
  2. Go to Tools > Import > Mastodon
  3. Upload the archive ZIP file and run the import
  4. Verify posts are imported with their original media preserved
  5. Check that images appear correctly in the post content (as gallery blocks or individual images)
  6. Confirm attachment metadata (source URL) is preserved

Test Classic Editor integration:

  1. Activate Classic Editor plugin
  2. Create a post that receives ActivityPub attachments
  3. Verify that attachments are properly appended to post content
  4. Test the activitypub_attachments_media_markup filter for custom markup

Run automated tests:

npm run env-test

All tests should pass, including the 14 new tests in class-test-attachments.php.

Changelog entry

  • Automatically create a changelog entry from the details below.
Changelog Entry Details

Significance

  • Patch
  • Minor
  • Major

Type

  • Added - for new features
  • Changed - for changes in existing functionality
  • Deprecated - for soon-to-be removed features
  • Removed - for now removed features
  • Fixed - for any bug fixes
  • Security - in case of vulnerabilities

Message

Add unified attachment processor for handling ActivityPub media imports from both remote URLs and local files, with automatic media block generation and Classic Editor support.

pfefferle and others added 30 commits October 13, 2025 10:45
Introduces a new 'ap_object' post type and related taxonomies for handling ActivityPub objects, including registration, CRUD operations, and taxonomy management. Updates create and update handlers to process non-interaction objects using the new Objects collection. Enhances debug functionality to support the new post type and taxonomies.
Introduced Sanitize::content() to process and format content for ActivityPub, including block support and HTML sanitization. Updated Objects class to use the new sanitizer for post content.
Introduces a new Blocks::html_to_blocks() method to convert HTML content into block format using DOMDocument parsing and block mapping. Refactors Sanitize to use this new method, replacing the previous regex-based paragraph splitting logic for improved accuracy and maintainability.
The get_node_attributes private method was removed and the logic for setting the 'ordered' attribute on 'ol' nodes is now handled inline. This simplifies the code by reducing indirection and focusing only on the required attribute for ordered lists.
Refactored the Blocks class by renaming the html_to_blocks method to convert_from_html for improved clarity. Updated all references to use the new method name.
Introduces new PHPUnit tests for content sanitization in Sanitize::content, covering scenarios such as block support, malicious content, URLs, empty content, and safe HTML preservation. Also adds a test for Blocks::convert_from_html to verify HTML-to-block conversion when blocks are supported.
Introduces tests to verify content sanitization in Create handler, handling of private activities, and behavior with malformed object data. Also ensures required post types are registered during test setup.
Introduces a comprehensive PHPUnit test suite for the Activitypub\Collection\Objects class, covering object addition, updating, retrieval by GUID, and activity-to-post conversion, including edge cases and error handling. Mocks HTTP requests for remote actor fetching and ensures correct post type registration for test isolation.
Renamed update_actor and update_object to handle_actor_update and handle_object_update for improved clarity and consistency. Combined logic for handling object and interaction updates into handle_object_update. Updated corresponding test method names to match the new handler method names.
Changed the PHPUnit @Covers annotation from ::update_actor to ::handle_actor_update in Test_Update to accurately reflect the method being tested.
Changed the post type labels in the registration array from 'Posts'/'Post' to 'Objects'/'Object' to better reflect the content type.
Inserted a blank line after the use statements in class-update.php to improve code readability and adhere to coding standards.
Deleted the Blocks::convert_from_html method and its related test, and updated the Sanitize class to no longer convert HTML content to blocks. This simplifies content sanitization by removing automatic block conversion.
Refactored the Objects collection to Posts, updating class names, constants, and references in all relevant files. Adjusted post type from 'ap_object' to 'ap_post' and updated related taxonomy registrations, handlers, and tests to reflect the new naming convention. This improves clarity and consistency in the codebase.
Refactored the method name from create_object to create_post to better reflect its purpose of handling post creation. Updated all relevant references and docblocks for clarity.
Changed the @Covers annotation from ::create_object to ::create_post in the test_handle_create_object_with_sanitization method to accurately reflect the method being tested.
Adds registration of the '_activitypub_remote_actor_id' post meta for the custom post type, storing the local ID of the remote actor that created the object. This includes type enforcement, a description, and a sanitize callback.
The 'ap_object_type' taxonomy has been added to the registered taxonomies array for the relevant post type, allowing posts to be associated with both 'ap_tag' and 'ap_object_type'.
Deleted the 'labels' arrays from the registration of 'ap_tag' and 'ap_object_type' taxonomies. This will cause WordPress to use default labels for these taxonomies.
Eliminated the 'rewrite' arguments for 'ap_tag' and 'ap_object_type' taxonomies, reverting to default rewrite behavior. This may affect the URLs generated for these taxonomies.
Refactors the method name from handle_actor_update to update_actor for clarity and updates all references accordingly.
Renamed the test method from test_handle_actor_update to test_update_actor and updated the @Covers annotation and method calls to reference update_actor instead of handle_actor_update. This aligns the test with the updated function name.
Initializes $result as WP_Error by default and updates the docblock for the 'activitypub_handled_update' action to specify more precise types for the $result parameter. Removes unreachable assignment in the comment update branch.
Adds explicit WP_Error objects when Create or Update operations fail, ensuring consistent error reporting. Also updates the docblock for the Create handler to reflect possible result types.
Moved the default WP_Error assignment to the start of handle_object_update, ensuring $result is always initialized. Removed redundant error assignment after update attempts.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Jiwoon-Kim
Copy link
Contributor

For example, if the hosting policy offers unlimited databases but only 1 GB of storage space, would that still be sufficient for normal use?

@pfefferle
Copy link
Member

pfefferle commented Oct 23, 2025

The image is still added twice, if it is inline and an attachment. I think we should de-duplicate them and not add the attachment if the image is already inline.

Pardon me! This works like a charm and I simply messed up my test!

The previous method name 'process' was too vague and didn't clearly convey what the method does. The new name 'import' better reflects that the method downloads and imports external attachments from ActivityPub objects into the WordPress media library.

Updates all method calls and @Covers annotations in tests.
@obenland obenland requested a review from pfefferle October 23, 2025 16:04
@pfefferle pfefferle requested a review from Copilot October 23, 2025 16:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Refactored Attachments class to rename the private method process_inline_images to import_inline_images for clarity. Updated all references and related test to use the new method name.
Replaces the use of the short ternary operator with the get() method's default value parameter for retrieving 'meta_query' in the Attachments class. This improves code clarity and consistency.
@Menrath
Copy link
Contributor

Menrath commented Oct 24, 2025

I haven't reviewed that, but just by reading the title and desc, does this also target #1914 ?

@Jiwoon-Kim
Copy link
Contributor

Jiwoon-Kim commented Oct 24, 2025

@Menrath Personally, I think it’s better to handle post-type objects like Article and Note separately from media-type objects like Image, Video, and Audio.
Since the uploadMedia endpoint is already defined and object retrieval is supported,
I believe we could implement a local user uploadMedia endpoint on the frontend through extensions of the Media Library and Attachment Page features.

However, I’m not entirely sure whether it’s truly necessary to parse the attachment property from a remote actor’s Note or Article and store it in the local media library.
For example, even if a WordPress hosting plan provides an unlimited database, it might only offer 1 GB of storage.
If the system automatically caches all external media from remote actors, the local site’s storage could quickly fill up with spam, effectively preventing any further media uploads.

That’s why I suggested that direct file links should be treated as hotlinks,
and that we should only record them in the database (as something like remote_attachment_page)
when the actor has an uploadMedia endpoint and the object is a genuine media object.

In any case, implementing remote media local caching now could still be useful later,
since it would provide reusable infrastructure when implementing selective local caching in the future.
So while I don’t oppose the current PR, I can’t say I fully agree with its direction.


I wanted to suggest that WordPress could independently implement a sharedMedia endpoint to enable the reuse of remote media objects.

@obenland
Copy link
Member Author

does this also target #1914 ?

@Menrath Probably not this one specifically, but mayve a combination of #2311 and #2363

@obenland
Copy link
Member Author

@pfefferle This should be ready for another look when you get a chance

@Jiwoon-Kim
Copy link
Contributor

I discovered something quite shocking: you should never use an Image object directly inside the attachment property of a WordPress post.

@pfefferle
Copy link
Member

Why do you use the word "shocking" in this regards! I can't see why this is shocking!

@pfefferle
Copy link
Member

@Jiwoon-Kim can you please stop using clickbaity titles/wordings like that!

@Jiwoon-Kim
Copy link
Contributor

I’m sorry about that. I had just finished reviewing all the related object types, properties, and endpoints, and the conclusion I reached was honestly quite shocking to me personally. I wrote the post before I had fully calmed down from that excitement.

@pfefferle
Copy link
Member

If you think there is something wrong with the implementation, please file a bug! Explain the issue, be precise and link to the relevant code/documentation!

@pfefferle
Copy link
Member

pfefferle commented Oct 27, 2025

@obenland What about hooking the Attachment importer into do_action( "save_post_{$post->post_type}", $post_id, $post, $update ); instead of a direct calling in the create and update functions?

Not every user has enough storage or maybe uses a different form of caching, so it might be nice to be able to unhook the caching mechanism!?

@Jiwoon-Kim
Copy link
Contributor

I will make my final statement regarding this PR.
An image file must always be wrapped within an Image object.
However, if the Image object does not contain an id, there is no way for WordPress to assign an attachment ID to it.
Therefore, if the Image object itself cannot be cached, the image file must not be cached either.

I believe the media upload mechanism should be perfectly symmetrical with the remote media object/file caching process.
To remain consistent with WordPress’s native media upload workflow, this approach should be enforced.

In particular, servers MUST append an id property to the object, and SHOULD include the uploaded and/or processed file paths in the object's url property.
https://www.w3.org/wiki/SocialCG/ActivityPub/MediaUpload#uploadMedia

@pfefferle
Copy link
Member

pfefferle commented Oct 27, 2025

@Jiwoon-Kim as I already tried to explain to you multiple times: this is not about "uploading" data! we are working on S2S communication, not on C2S!

this is a simple local caching of remote images that are part of a post (Note or Article), the exact same way, any other platform (including Mastodon) is doing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants