Skip to content

refactor: migrate citations.py from pybtex to bibtexparser #4974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

gurukiran7
Copy link

Description

This PR refactors pybamm/citations.py to fully remove the legacy pybtex dependency and migrate all citation management to bibtexparser (v2). The new implementation:

  • Uses bibtexparser for reading, parsing, and validating BibTeX entries
  • Updates the register method to handle both known keys and raw BibTeX strings
  • Adds improved error handling and type validation
  • Implements custom citation string formatting to closely match the previous output
  • Issues warnings when overwriting existing citations with different content

Fixes #3648

Important checks:

Please confirm the following before marking the PR as ready for review:

  • No style issues: nox -s pre-commit
  • All tests pass: nox -s tests
  • The documentation builds: nox -s doctests
  • Code is commented for hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@gurukiran7 gurukiran7 requested a review from a team as a code owner April 15, 2025 08:37
@gurukiran7 gurukiran7 marked this pull request as draft April 15, 2025 08:39
@agriyakhetarpal
Copy link
Member

Thanks for starting this, @gurukiran7! To fix the tests, you'll need to add bibtexparser as in the [cite] set of optional dependencies in pyproject.toml. You may also run the tests locally to see if they pass and ease your development, as we'll need to trigger the workflows every time you push a commit: https://docs.pybamm.org/en/latest/source/user_guide/contributing.html#testing.

@gurukiran7
Copy link
Author

@agriyakhetarpal I've updated the tests to align with the new citations.py implementation. At the moment, only a few tests are passing I'm still working on the rest. I want your feedback on the implementation so far

@agriyakhetarpal
Copy link
Member

I think the implementation is looking good, so far. Thanks! One important thing here will be to retain the same output from pybamm.print_citations() as the previous implementation, or if it isn't possible, to stay as close to it. Please continue to work on the rest of the tests.

@agriyakhetarpal
Copy link
Member

It looks like a stable release for bibtexparser v2 is not yet available, and the resolver picks up pre-release versions only if we specify them. Could you please pin to v2.0.0b8 and see if that works?

@gurukiran7
Copy link
Author

It looks like a stable release for bibtexparser v2 is not yet available, and the resolver picks up pre-release versions only if we specify them. Could you please pin to v2.0.0b8 and see if that works?

done

@gurukiran7
Copy link
Author

@agriyakhetarpal The test_citations test seems to be failing because read_citations() isn't populating _all_citations, so keys like "Sulzer2021" are missing during assertions. I suspect this might be due to the CITATIONS.bib file not loading correctly in the test environment. Could you please take a look and let me know if I’m missing something?

@agriyakhetarpal
Copy link
Member

🤔 The CI logs report a different error: ImportError: cannot import name 'BibDatabase' from 'bibtexparser' (/home/runner/work/PyBaMM/PyBaMM/.nox/unit/lib/python3.11/site-packages/bibtexparser/__init__.py), is it related to what you're seeing locally?

@gurukiran7
Copy link
Author

Actually the current error I'm seeing is an AssertionError, not an ImportError. It's happening because _all_citations is empty, so the test fails when it checks for "Sulzer2021".

@agriyakhetarpal
Copy link
Member

Yes, that's alright, but the CI logs still don't reflect it – it is hence difficult to see all the failures. I pulled your branch to my machine, and I see the same error: ImportError: cannot import name 'BibDatabase' from 'bibtexparser'. I think there might be a few commits on your local branch you haven't pushed?

https://github.com/search?q=repo:sciunto-org/python-bibtexparser+%22bibdatabase%22&type=code also reports no results, so I'm not sure where you're getting that attribute from.

It's not in the documentation either: https://bibtexparser.readthedocs.io/en/main/search.html?q=bibdatabase

@gurukiran7
Copy link
Author

Thanks for pointing that out! the ImportError stems from my use of BibDatabase, which was part of bibtexparser v1 but has been removed in v2. I mistakenly mixed v1-style imports with the newer v2 API. I’ve now updated the code to use Entry from bibtexparser.model instead and removed all references to BibDatabase. That should fix the import issue I’ll push the changes shortly.

@gurukiran7
Copy link
Author

gurukiran7 commented Apr 17, 2025

Hi @agriyakhetarpal , I've pushed the updated code, and most methods and their tests are passing as expected However, there's an issue with the register method, causing tests like test_citations and test_overwrite_citation to fail. Despite debugging, I couldn't resolve the issue, as it seems related to how citations are added to _all_citations and _papers_to_cite.
could u pls review the register method and suggest a fix?

@agriyakhetarpal
Copy link
Member

Hi Gurukiran, I've triggered the workflows and I'm still not sure – the CI logs show otherwise? The tests are failing to load:

ImportError: cannot import name 'UndefinedString' from 'bibtexparser' (/home/runner/work/PyBaMM/PyBaMM/.nox/coverage/lib/python3.12/site-packages/bibtexparser/__init__.py)

May I know the steps you're performing to run them locally, so that I can take a look at the register() problem?

Also, I'd like to note that we shouldn't use bibtexparser v1 here. The README at https://github.com/sciunto-org/python-bibtexparser#python-bibtexparser-v2 states:

Note that all development and maintenance effort is focussed on v2. Small PRs for v1 are still accepted, but only as long as they are backwards compatible and don't introduce much additional technical debt. Development of version one happens on the dedicated v1 branch.

which is code-speak to say that bibtexparser v1 will be deprecated in the near future in favour of bibtexparser v2.

This is a problem for three reasons:

  • we've pinned down to a specific package version, which isn't recommended when trying to use PyBaMM as a library to integrate with other libraries (this is fine if you use it as an application). We do this only in extreme cases where it's needed for stability (Add exact pin #4945)
  • we will have no guarantees whether bibtexparser v1 will receive sufficient updates over time to support newer Python versions, and this is the same problem we have with pybtex right now – it doesn't support Python 3.13 natively and is unmaintained.
  • if there are any bugs in bibtexparser v1's code, it makes it difficult for us to receive fixes for them as they can take a long time to land in 1.x releases and also need to conform to bibtexparser v1's BC (backwards compatibility) guarantees.

@gurukiran7
Copy link
Author

I switched back to bibtexparser==2.0.0b8, I have pushed the updated code.
I run the tests using pytest. I also verify specific tests by running:
pytest -k test_citations -v
Please let me know if there are any additional steps you'd recommend

@agriyakhetarpal
Copy link
Member

Thanks! Could you please run the whole test suite using nox directly? Running pytest -k test_citations is fine locally, but we're running all the tests in CI and that's different from running these tests in specific as you may see.

Also, you would need to fix the style job to allow the test suite to run.

@gurukiran7
Copy link
Author

@agriyakhetarpal I've pushed the latest updated code addressing most of the test failures and feedback

Refactored the register method and _add_citation to support bibtexparser v2+
Made citation handling optional using import_optional_dependency
Cleaned up all formatting/style issues (ruff-format, pre-commit now passes)
if you have any suggestions on expected behavior or test intentn pls let me know

@gurukiran7
Copy link
Author

gurukiran7 commented Apr 21, 2025

@agriyakhetarpal , @kratman , I've pushed the latest code please review when you get a chance

@gurukiran7 gurukiran7 marked this pull request as ready for review April 22, 2025 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to an active citations parser dependency to replace pybtex
3 participants