Fix code to download the works of a single author #75

raffaem · 2025-05-28T17:48:20Z

By default get returns only the first 25 entries

By default `get` returns only the first 25 entries

PeterLombaers · 2025-06-02T08:25:48Z

Hi @raffaem! Thanks for noting that the example doesn't return all works of an author and for giving a way to collect all the works. I want to keep the examples section a bit more streamlined though, and avoid every example giving a different handcrafted way to get all the pages from a paginator. The easiest way would be to simply link to say something like: 'this example gives the first 25 works of an author, to get all works see the section on pagination.' Alternatively, you could update the example, but make it a bit more clean. There is no need to write a function in an example, you can get all the works using 2 lines of code. See for example this post.

raffaem · 2025-06-12T18:31:42Z

Hi @raffaem! Thanks for noting that the example doesn't return all works of an author and for giving a way to collect all the works. I want to keep the examples section a bit more streamlined though, and avoid every example giving a different handcrafted way to get all the pages from a paginator. The easiest way would be to simply link to say something like: 'this example gives the first 25 works of an author, to get all works see the section on pagination.' Alternatively, you could update the example, but make it a bit more clean. There is no need to write a function in an example, you can get all the works using 2 lines of code. See for example this post.

I don't understand what those two lines would be.

Something like that:

for batch in batched(sample_ids, 10):
    works.extend(Works().filter_or(openalex_id=list(batch)).get(per_page=10))

would still download just the first page.

How do you check the last page was reached?

PeterLombaers · 2025-06-13T07:28:51Z

In the example, every batch will contain 10 records. For every step in the for-loop, you will make a request to OpenAlex for the 10 records in the batch and you will get a response with a page size of 10. So the page will contain all the records from the batch. So there is really no such thing as a 'last page' in the example. There is only the last batch. You can try adding print statements to see what is happening, e.g.:

works = []
for idx, batch in enumerate(batched(sample_ids, 10)):
    print("Batch index: {idx}")
    print(f"Getting identifiers: {sample_ids}")
    page = Works().filter_or(openalex_id=list(batch)).get(per_page=10)
    page_identifiers = [record["id"] for record in page]
    print(f"Page contains identifiers: {page_identifiers}")
    works.extend(page)

Does this make sense to you?

raffaem · 2025-06-13T08:15:57Z

In the example, every batch will contain 10 records. For every step in the for-loop, you will make a request to OpenAlex for the 10 records in the batch and you will get a response with a page size of 10. So the page will contain all the records from the batch. So there is really no such thing as a 'last page' in the example. There is only the last batch. You can try adding print statements to see what is happening, e.g.:
works = []
for idx, batch in enumerate(batched(sample_ids, 10)):
    print("Batch index: {idx}")
    print(f"Getting identifiers: {sample_ids}")
    page = Works().filter_or(openalex_id=list(batch)).get(per_page=10)
    page_identifiers = [record["id"] for record in page]
    print(f"Page contains identifiers: {page_identifiers}")
    works.extend(page)
Does this make sense to you?

I don't understand why we are filtering by Works' OpenAlex IDs when we want to filter by Works' Authors' OpenAlex IDs and we don't know how many works that author published in advance.

raffaem · 2025-06-13T08:17:52Z

How would you rewrite my download_author_works function? It taks as input the OpenAlex ID of an author

PeterLombaers · 2025-06-16T10:02:30Z

Oh sorry, you're totally right. I got confused with a different question and pointed you to the wrong place. What I gave only works if you already have a list of work identifiers.

Finding all the works from an author can be done using the basic example from the pagination section:

from pyalex import Works

works = []
pager = Works().filter(author={"id": "A5083411784"}).paginate(per_page=200)
for page in pager:
    works.extend(page)

If you want access to the index of the current page for log statements, you wrap the pager in enumerate. If you only want the first n pages, you wrap the pager in itertools.islice.

J535D165 · 2025-06-16T15:46:33Z

I like the simplicity of your example @PeterLombaers. I propose to use that example.

raffaem · 2025-06-16T15:56:43Z

I like the simplicity of your example @PeterLombaers. I propose to use that example.

Yes

raffaem · 2025-06-18T19:23:32Z

Oh sorry, you're totally right. I got confused with a different question and pointed you to the wrong place. What I gave only works if you already have a list of work identifiers.

Finding all the works from an author can be done using the basic example from the pagination section:
from pyalex import Works

works = []
pager = Works().filter(author={"id": "A5083411784"}).paginate(per_page=200)
for page in pager:
    works.extend(page)
If you want access to the index of the current page for log statements, you wrap the pager in enumerate. If you only want the first n pages, you wrap the pager in itertools.islice.

Thanks, that was exactly what I needed!

Fix code to download the works of a single author

b7c06cc

By default `get` returns only the first 25 entries

raffaem closed this Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix code to download the works of a single author #75

Fix code to download the works of a single author #75

Uh oh!

raffaem commented May 28, 2025

Uh oh!

PeterLombaers commented Jun 2, 2025

Uh oh!

raffaem commented Jun 12, 2025

Uh oh!

PeterLombaers commented Jun 13, 2025

Uh oh!

raffaem commented Jun 13, 2025

Uh oh!

raffaem commented Jun 13, 2025

Uh oh!

PeterLombaers commented Jun 16, 2025

Uh oh!

J535D165 commented Jun 16, 2025

Uh oh!

raffaem commented Jun 16, 2025

Uh oh!

raffaem commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix code to download the works of a single author #75

Fix code to download the works of a single author #75

Uh oh!

Conversation

raffaem commented May 28, 2025

Uh oh!

PeterLombaers commented Jun 2, 2025

Uh oh!

raffaem commented Jun 12, 2025

Uh oh!

PeterLombaers commented Jun 13, 2025

Uh oh!

raffaem commented Jun 13, 2025

Uh oh!

raffaem commented Jun 13, 2025

Uh oh!

PeterLombaers commented Jun 16, 2025

Uh oh!

J535D165 commented Jun 16, 2025

Uh oh!

raffaem commented Jun 16, 2025

Uh oh!

raffaem commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants