Add fetch book list method to overdrive api #2767

dbernstein · 2025-09-29T17:44:50Z

Description

In order to process of converting the overdrive scripts to celery we want to be able to efficiently download feed data and store it in a redis set for downstream processing. This PR advances that end by providing a means via the Overdrive API to pull a complete "page" of book data, with the option to include metadata or circulation data depending on the set the needs to be built. Long running (ie more than a minute or so) celery tasks require replacing one task with another in order to keep the flow of celery tasks moving by ensuring that no one task commandeers a celery work for any significant length of time. Therefore, this method will be used efficiently build set of book info while not making undue demands on the redis set, all the while ensuring that each chunk of book data can be retried in case of an error.

Motivation and Context

https://ebce-lyrasis.atlassian.net/browse/PP-3015

How Has This Been Tested?

Unit tests added.

Checklist

I have updated the documentation accordingly.
All new and existing tests passed.

codecov · 2025-09-29T17:51:08Z

Codecov Report

❌ Patch coverage is 91.75258% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.41%. Comparing base (5fd6f06) to head (0a98bb6).

Files with missing lines	Patch %	Lines
...alace/manager/integration/license/overdrive/api.py	91.95%	2 Missing and 5 partials ⚠️
src/palace/manager/util/http/async_http.py	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2767      +/-   ##
==========================================
- Coverage   92.41%   92.41%   -0.01%     
==========================================
  Files         449      449              
  Lines       42628    42720      +92     
  Branches     5955     5967      +12     
==========================================
+ Hits        39396    39480      +84     
- Misses       2120     2123       +3     
- Partials     1112     1117       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… with the option of including metadata and/or availability using an async http client for optimal performance.

jonathangreen

Made some comments here to address before this one can be merged.

I'm also wondering if you have looked into this comment in JIRA yet: https://ebce-lyrasis.atlassian.net/browse/PP-2183?focusedCommentId=27994. Before you build this whole redis set infrastructure, I'd like to make sure overdrive doesn't directly give us what we need.

jonathangreen · 2025-10-01T14:54:16Z

src/palace/manager/integration/license/overdrive/api.py

        return availability_queue, next_link

+    def _get_headers(self, auth_token: str) -> dict[str, str]:
+        return {"Authorization": f"Bearer {auth_token}", "User-Agent": "Palace"}


Why add User-Agent here? Adding this will override the more detailed user agent header set by the HTTP class.

I can remove that. I was refactoring a bit and noticed that we were setting the user agent in another place in this file when formatting the auth header. I can remove if you think it's better.

Looking through the diff and doing a quick grep grep -ni "User-Agent" src/palace/manager/integration/license/overdrive/api.py I don't see anywhere we were previously setting User-Agent in this file. Are you sure you didn't pull this in from elsewhere?

If you look in the HTTP code, we always set the user agent, if its not already set:

circulation/src/palace/manager/util/http/http.py

Lines 232 to 236 in 3798ec8

# Set a user-agent if not already present

headers = get_default_headers()

if (additional_headers := kwargs.get("headers")) is not None:

headers.update(additional_headers)

kwargs["headers"] = headers

So this change would result in the overdrive code having a user agent of Palace, rather then the Palace Manager/version header that we tell our integration partners that we send.

jonathangreen · 2025-10-01T15:00:44Z

src/palace/manager/integration/license/overdrive/importer2.py

This empty file should be removed

jonathangreen · 2025-10-01T15:01:17Z

src/palace/manager/integration/license/overdrive/api.py

+            urls: deque[str] = deque()
+            pending_requests: list[asyncio.Task[httpx._models.Response]] = []
+            books: dict[str, Any] = {}
+            retried_requests: defaultdict[str, int] = defaultdict(int)


This appears unused

good catch - I factored that out when I moved to the palace async client.

jonathangreen · 2025-10-01T15:02:54Z

src/palace/manager/integration/license/overdrive/api.py

+            return list(books.values()), next
+
+    def create_async_client(self, connections: int = 5) -> AsyncClient:
+        return AsyncClient.for_web(


Will this be used in the web context? It seems like the plan is to use this as part of a celery task.

Good point. I didn't realize that was the function of this builder. Will fix.

jonathangreen · 2025-10-01T15:04:00Z

src/palace/manager/integration/license/overdrive/api.py

+                    except BadResponseException as e:
+                        if e.response.status_code == 404:
+                            self.log.warning(
+                                f"404 returned: {e.response.url}: ignoring..."


Why ignore 404 errors? At the very least I think we need a comment explaining why

My thought here was that when I was using the palace-tools to download feeds I was seeing 404 for some endpoints (availability I believe) which was causing the download to fail. But I suppose a 404 on a product list endpoint should not be ignored. I'll fix and make some comments.

jonathangreen · 2025-10-01T15:15:45Z

src/palace/manager/integration/license/overdrive/api.py

+                        if not next:
+                            next_url = extractor_class.link(
+                                response.json(), rel_to_follow
+                            )
+                            next = BookInfoEndpoint(next_url) if next_url else None


This seems like a race condition, since any request could have come in here right, its whatever request completed successfully first, which might not be the request that was issued first?

Maybe I have it wrong, but I thought that subsequent requests (for availability and metadata) would only be issued after the product list GET request had been read. Therefore, I can safely assume that if the next link would appear only in the first URI returned. Am I mistaken there?

That said, I do see another problem though (and perhaps this is what you are pointing at): I'm assuming that there will be a non null next link in the product page which I should not assume. I'll take a closer look.

jonathangreen · 2025-10-01T15:16:58Z

src/palace/manager/integration/license/overdrive/api.py

+                            base_url
+                        )
+                    )
+                id = product["id"].lower()


Minor: Really shouldn't override the built-in python id keyword here. Its allowed, but it really should be avoided.

jonathangreen · 2025-10-01T15:23:34Z

src/palace/manager/integration/license/overdrive/api.py

+            client.headers.update(self._get_headers(self._client_oauth_token))
+            client.base_url = URL(base_url)


Why do this instead of passing them to the client constructor?

jonathangreen · 2025-10-01T15:26:14Z

src/palace/manager/util/http/async_http.py

+    @property
+    def headers(self) -> httpx.Headers:
+        return self._httpx_client.headers
+
+    @property
+    def base_url(self) -> URL:
+        return self._httpx_client.base_url
+
+    @base_url.setter
+    def base_url(self, base_url: URL | str) -> None:
+        self._httpx_client.base_url = base_url
+


Minor: If there isn't a compelling argument why these are needed, I'd rather they be set via the constructor and have these properties removed so we don't expose the internal httpx client state.

jonathangreen · 2025-10-01T15:36:51Z

src/palace/manager/integration/license/overdrive/api.py

+            next: BookInfoEndpoint | None = None
+
+            while pending_requests:
+                done, pending = await asyncio.wait(


I needed to use asyncio.wait in the code in palace tools, because the length of the queue there was variable length. Here, if I understand correctly whats happening. You know exactly how many requests you will make after the first request has returned. In this case it seems better to use either asyncio.gather or an asyncio TaskGroup.

That gives you deterministic ordering for the requests, so you don't have to rely on URL matching in order to process the responses.

I'll look into that.

I will take a look at the asyncio docs. Thanks for the explanation.

dbernstein force-pushed the add_fetch_book_list_method_to_overdrive_api branch from 8149c70 to 2211d83 Compare September 29, 2025 21:04

dbernstein added 4 commits September 30, 2025 09:34

This update adds support for fetching a "page" of overdrive book data…

8c2f1e0

… with the option of including metadata and/or availability using an async http client for optimal performance.

Fix mypy.

d369582

Expand test coverage.

fcbcdf3

Use palace AsyncClient instance of httpx.AsyncClient

b38ab1d

dbernstein force-pushed the add_fetch_book_list_method_to_overdrive_api branch from 2211d83 to b38ab1d Compare September 30, 2025 17:20

dbernstein added 2 commits September 30, 2025 12:30

Fix broken test.

5dee144

Fix mypy

0a98bb6

dbernstein marked this pull request as ready for review September 30, 2025 20:55

dbernstein requested a review from a team September 30, 2025 20:55

jonathangreen requested changes Oct 1, 2025

View reviewed changes

	# Set a user-agent if not already present
	headers = get_default_headers()
	if (additional_headers := kwargs.get("headers")) is not None:
	headers.update(additional_headers)
	kwargs["headers"] = headers

		client.headers.update(self._get_headers(self._client_oauth_token))
		client.base_url = URL(base_url)

Add fetch book list method to overdrive api #2767

Are you sure you want to change the base?

Add fetch book list method to overdrive api #2767

Uh oh!

Conversation

dbernstein commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Checklist

Uh oh!

codecov bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jonathangreen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dbernstein commented Sep 29, 2025 •

edited

Loading

codecov bot commented Sep 29, 2025 •

edited

Loading