-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ENH: Automatically preserve links in added pages #3298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d0b2c8a
to
460139e
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3298 +/- ##
==========================================
+ Coverage 96.69% 96.71% +0.02%
==========================================
Files 53 54 +1
Lines 9023 9084 +61
Branches 1674 1685 +11
==========================================
+ Hits 8725 8786 +61
Misses 176 176
Partials 122 122 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fb8c123
to
7e394e4
Compare
Thanks for the PR.
At first sight it looks okay, but unless there are specific aspects to talk about, I tend to prefer a proper review once the automated checks were successful. It seems like CI is still failing due to coverage, typing and code style issues - this is something to consider in a second step. Nevertheless, regarding the code style, I would prefer to move the new classes into a submodule of
In theory, each release could break the code of some user when doing special stuff. As this fixes shortcomings of the current implementation without removing anything, I do not see the need to introduce a deprecation period for this at the moment.
What exactly have you tried and what has been the result? |
Yeah, sorry. I had to cook dinner, and now I have a meeting. I didn't intend to leave the build broken like this. The trouble is I can't get the right version of ruff on my laptop right now, so all the checks end up being done in CI, which is slow. Anyway, I will sort all this out.
Will do.
Ack! 👍
Mainly that I couldn't get the reference lookups to work, but I take this to mean that they should work. I'll work on this a bit more. |
eaad222
to
7274240
Compare
Now it should finally be ready for review. Note that I changed the type of the Once this PR is merged I'll look at handling links in merged-in pages, but because of upcoming holiday that will take a while. |
Thanks. I will try to have a look at this as soon as possible - this might take some time as well. Regarding the broken type hints, there is a corresponding issue as well: #3233. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I just had a first look at the changes and added some small comments. As general notes:
- Using abbreviations in the names and docstrings should be avoided. It is completely fine to use "reference" instead of "ref" for example to improve clarity and avoid having to deprecate stuff later on.
- In the type hints, please use
type1, type2
instead oftype1,type2
. I have marked some cases, but not all. - Instead of nesting functions and bloating the already large modules, consider moving the corresponding functionality to the new submodule.
@@ -209,6 +212,11 @@ def __init__( | |||
"""The PDF file identifier, | |||
defined by the ID in the PDF file's trailer dictionary.""" | |||
|
|||
self._unresolved_links: list[tuple[RefLink,RefLink]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._unresolved_links: list[tuple[RefLink,RefLink]] = [] | |
self._unresolved_links: list[tuple[RefLink, RefLink]] = [] |
@@ -209,6 +212,11 @@ def __init__( | |||
"""The PDF file identifier, | |||
defined by the ID in the PDF file's trailer dictionary.""" | |||
|
|||
self._unresolved_links: list[tuple[RefLink,RefLink]] = [] | |||
"Tracks links in pages added to the writer for resolving later." | |||
self._merged_in_pages: Dict[Optional[IndirectObject],Optional[IndirectObject]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._merged_in_pages: Dict[Optional[IndirectObject],Optional[IndirectObject]] = {} | |
self._merged_in_pages: Dict[Optional[IndirectObject], Optional[IndirectObject]] = {} |
@@ -482,12 +490,47 @@ def _add_page( | |||
] | |||
except Exception: | |||
pass | |||
|
|||
def _extract_links(new_page: PageObject, old_page: PageObject) -> List[Tuple[RefLink,RefLink]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of nesting functions, could we please move them to the new module as well? As far as I can see, they already depend on the parameters only.
|
||
|
||
@pytest.mark.enable_socket | ||
def test_named_ref_to_page_thats_gone(pdf_file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_named_ref_to_page_thats_gone(pdf_file_path): | |
def test_named_ref_to_page_that_is_gone(pdf_file_path): |
Here is a draft implementation of the first stage of the issue #3290 implementation. It handles links in pages added via
add_page
andinsert_page
, but it doesn't handle pages merged into those pages before adding.Does this look OK?
I'm wondering if some users may already have written their own link patching code -- will this code break theirs? If so, should we make it possible to turn this behaviour off somehow?
At the moment I'm resolving everything by searching lists to find corresponding indirect references. It would be much faster with a hash, but I haven't been able to make that work. Thoughts?