-
-
Notifications
You must be signed in to change notification settings - Fork 65
Add Museums Victoria fetch #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@oree-xx I have opened a new pull request. I guess that is much better |
|
@sbarhin ohh okay great. |
|
@TimidRobot There are several file changes in my PR, this is due to pulling from the main branch where I believe you merged a certain PR. These changes have taken effect in my branch hence those file changes in my PR. |
Please revisit the documentation on keeping a branch/fork synchronized with upstream and on resolving merge conflicts. This PR won't be reviewed or merged while these issue are present. |
I will do that please. Thank you |
|
@TimidRobot I believe we are good now |
@TimidRobot Please should the --limit apply individual record types or the total number of records altogether? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@sbarhin Which implementation will satisfies the stated goal? |
@sbarhin force pushes are generally a worst practice and should be avoided when unnecessary. |
I think per individual record type will do |
|
@TimidRobot Please what are the next steps? Seems everything is resolved now |
| licence_data = media_item.get("licence") | ||
|
|
||
| # COUNTING THE UNIQUE LICENCE TYPES | ||
| license_short_name = licence_data.get("shortName") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the license short name is not reliable (does not include version number).
|
@TimidRobot Please I have made the necessary changes |
|
@TimidRobot Hello there |
| records_processed = 0 | ||
| current_page = 1 | ||
| total_pages = None | ||
| per_page = min(PER_PAGE, args.limit) if args.limit else PER_PAGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any value in changing per_page. All it does is slow down the execution of the script. It does not address the issue of the script taking a very long time to complete.
Again, your comment suggested the limit would apply to the total records of each type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default per_page is 100 from the documentation. With the presence of --limit say 50, we fetch 50 records perpage for each record type instead of the default 100. No fetch is done once we hit the limit (50) regardless of the number of pages for that record type.
If per_page isn't altered, then the --limit won't work. It would fetch the 100 records anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the --limit applies to the total records for each type regardless of the total pages for a record type. So, if I don't override the perpage with the limit (if provided) it would fetch the 100 records per page and proceed to the next page if there are more records for that type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimidRobot Please see this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimidRobot hello there
| def get_requests_session(): | ||
| """ | ||
| Returns a configured requests session with retries and a User-Agent. | ||
| """ | ||
| return shared.get_session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't actually do anything and should be removed.
Fixes
Description
This PR implements automation for Museum Victoria data fetching as discussed in issue #215. The implementation follows the established patterns from existing fetch scripts.
This purpose of this file is to fetch all the records from the Museum Victoria API, then saving the necessary response fields needed for the next phase (processing phase).
Checklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."
Developer Certificate of Origin