Skip to content

Conversation

ndaelman-hu
Copy link
Contributor

ndaelman added 3 commits April 28, 2025 14:37
…on electronic parsers + Claude Sonnet 3.7

- Update main file discovery for all quantum espresso workflows
- Correct unit mapping
- TODO: fix remaining bugs
@coveralls
Copy link

coveralls commented Apr 29, 2025

Pull Request Test Coverage Report for Build 15002848715

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.004%) to 82.315%

Totals Coverage Status
Change from base Build 14262472577: 0.004%
Covered Lines: 3733
Relevant Lines: 4535

💛 - Coveralls

@ndaelman-hu
Copy link
Contributor Author

Here are some notes on how this implementation should handle Fermi level searches based on discussions with Elena Molteni:

Band structure calculations typically come in (2 or) 3 steps:

  1. PWSCF module (Self-consistent Calculation): self-consistent calculation at high, grid-like k-point sampling. Reports: Fermi-level (optional), highest occupied level (optional), no. valence electrons, no. valence orbitals.
  2. PWSCF module (Band Structure Calculation): non-self-consistent calculation defined along the k-path of interest. Reports: eigenvalues (multiplicity applied) per k-point, k-point weights.
  3. BANDS module: symmetry analysis of the (electronic) structure. Reports: symmetry group labels, symmetry labels + eigenvalues + multiplicities per orbital.

Match the calculation type based on step 3, and enrich the band structure with information from step 1.
Some form of band gap (with the highest occupied energy) is necessary for aligning / visualizing the band structure. Symmetry information is used for decomposition in segments.

Regarding the level alignment, there is a cascade of sources to check:

  1. "Fermi energy": printed when the smearing is defined in the input (https://www.quantum-espresso.org/faq/faq-self-consistency/#6.7).
  2. "highest occupied level": printed when the no. bands is included in the input (https://www.quantum-espresso.org/faq/faq-self-consistency/#6.7).
  3. QE else estimates the necessary no. bands based on the "number of electrons" (always reported in step 1). These are either a) just the necessary no. of occupied bands for insulators, or b) the no. occupied bands + 20% (minimum of 4 bands) for metals (https://www.quantum-espresso.org/Doc/INPUT_PW.html#nbnd).
  4. In the case of option 3, make sure to correct for the orbital occupation: "Note that in spin-polarized calculations the number of
    k-point, not the number of bands per k-point, is doubled".

@ndaelman-hu
Copy link
Contributor Author

I found an edge case in BANDS calculations (step 3), where none of the k-point information is displayed, reporting instead

                    xk=(   0.00000,   0.00000,   1.00000  )

     zone border point and non-symmorphic group 
     symmetry decomposition not available

In these cases, step 2 still proceeds as regular. An alternative strategy would be to parse this file instead.
Segments would then be determined via a) a symmetry analysis package (e.g. spglib, pymatgen), or b) by computing the whether points align. The b option would cover cases where the analysis fails for similar reasons as with QE BANDS. Note that a lack of symmetry labels would leave the band structure unlabelled.

@ndaelman-hu ndaelman-hu requested a review from ladinesa April 29, 2025 19:49
@ndaelman-hu
Copy link
Contributor Author

@ladinesa Pls note that targeting BANDS rather than the Band Structure Calculation PWSCF, means that there is an empty entry being created for the latter. This entry will bear a FAILURE status. Any recommendations on how to handle it?

Should I delete it from this parser?

@ladinesa
Copy link
Collaborator

@ladinesa Pls note that targeting BANDS rather than the Band Structure Calculation PWSCF, means that there is an empty entry being created for the latter. This entry will bear a FAILURE status. Any recommendations on how to handle it?

Should I delete it from this parser?

i am.noy sure why the other entry is empty. is this for the overall workflow ?

@ndaelman-hu
Copy link
Contributor Author

@ladinesa Pls note that targeting BANDS rather than the Band Structure Calculation PWSCF, means that there is an empty entry being created for the latter. This entry will bear a FAILURE status. Any recommendations on how to handle it?
Should I delete it from this parser?

i am.noy sure why the other entry is empty. is this for the overall workflow ?

Pls take a look at my comment above. Step 2 remains mostly vacant, as I don't process it. It is picked up by the quantumespresso/parser.py. This is a consequence of splitting off the parsing of the band structure into workflows.

@ladinesa
Copy link
Collaborator

@ladinesa Pls note that targeting BANDS rather than the Band Structure Calculation PWSCF, means that there is an empty entry being created for the latter. This entry will bear a FAILURE status. Any recommendations on how to handle it?
Should I delete it from this parser?

i am.noy sure why the other entry is empty. is this for the overall workflow ?

Pls take a look at my comment above. Step 2 remains mostly vacant, as I don't process it. It is picked up by the quantumespresso/parser.py. This is a consequence of splitting off the parsing of the band structure into workflows.

I do not see any problem if it is parsed by another parser. The important thing is, you are able to link it with what is parsed by the band parser in a separate worklow entry.

@ndaelman-hu
Copy link
Contributor Author

@ladinesa Pls note that targeting BANDS rather than the Band Structure Calculation PWSCF, means that there is an empty entry being created for the latter. This entry will bear a FAILURE status. Any recommendations on how to handle it?
Should I delete it from this parser?

i am.noy sure why the other entry is empty. is this for the overall workflow ?

Pls take a look at my comment above. Step 2 remains mostly vacant, as I don't process it. It is picked up by the quantumespresso/parser.py. This is a consequence of splitting off the parsing of the band structure into workflows.

I do not see any problem if it is parsed by another parser. The important thing is, you are able to link it with what is parsed by the band parser in a separate worklow entry.

It is picked up by the mainfile scan of the quantumespresso/parser.py. The entry itself is hardly populated.
Barring the edge cases of faulty symmetry analysis or minimalistic band structure workflows, there's no need for this file.

Let me be clear: our new QE parser should likely target step 2 and perform the symmetry analysis itself.
These edge cases only became clear towards the end of the current implementation that targets step 3. I do not consider that it merits refactoring here.

Ways to handle the empty entry:

  1. Improve the mainfile scan. Idk how to, as the distinguishing line is a ways down.
  2. Have the QE BANDS parser delete it. This presupposes that the QE parser always runs first.
  3. Leave it be, maybe adding a warning.

@ladinesa
Copy link
Collaborator

@ladinesa Pls note that targeting BANDS rather than the Band Structure Calculation PWSCF, means that there is an empty entry being created for the latter. This entry will bear a FAILURE status. Any recommendations on how to handle it?
Should I delete it from this parser?

i am.noy sure why the other entry is empty. is this for the overall workflow ?

Pls take a look at my comment above. Step 2 remains mostly vacant, as I don't process it. It is picked up by the quantumespresso/parser.py. This is a consequence of splitting off the parsing of the band structure into workflows.

I do not see any problem if it is parsed by another parser. The important thing is, you are able to link it with what is parsed by the band parser in a separate worklow entry.

It is picked up by the mainfile scan of the quantumespresso/parser.py. The entry itself is hardly populated. Barring the edge cases of faulty symmetry analysis or minimalistic band structure workflows, there's no need for this file.

Let me be clear: our new QE parser should likely target step 2 and perform the symmetry analysis itself. These edge cases only became clear towards the end of the current implementation that targets step 3. I do not consider that it merits refactoring here.

Ways to handle the empty entry:

  1. Improve the mainfile scan. Idk how to, as the distinguishing line is a ways down.
  2. Have the QE BANDS parser delete it. This presupposes that the QE parser always runs first.
  3. Leave it be, maybe adding a warning.

well, it is an entry matched by the parser, it it has no output it does not matter, as the file contains no info. the other parser should not delete other entries.

@ndaelman-hu
Copy link
Contributor Author

well, it is an entry matched by the parser, it it has no output it does not matter, as the file contains no info. the other parser should not delete other entries.

Largely, yes*. Then pls feel free to proceed with the review.

*: I think being specific here matters, so that's why I'm reiterating. The file in step 2 contains less information than the one in step 3. There are 2 exceptions to that statement:

  • The data producer skipped step 3.
  • The symmetry analysis failed for some k-points in step 3. Then their matching eigenvalues are not printed. These remain present in step 2, though.

# Find PWSCF files and parse
try:
if (pwscf_file := self._find_files()) is not None:
self.pwscf_parser.mainfile = pwscf_file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thsi is already matched and parsed by the quantum espresso parser for PWSCF, you simply have to load the entry if you need to access info from the archive. This means that the BANDS parser should be executed after PWSCF. Set level to 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this approach. I ended up going with a minimalistic parser here, since this is what the other QE workflow parsers did too.
I can apply this, if you're fine with deviating from the old approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other qe workflow parsers also do not match and parse pwscf files, look at the parsers init file. there is no deviation afaik, the only thing missing is connecting the entries to pwscf, we should do this in pwscf so we do the child archive generation in only one place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should your or I see to add this to the QE parser then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can also add it no problem, just do the bands parsing in this pr then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be appreciated, as you have a better view on what you want and how to handle child archives.
I will take a look during the review or after then.

Just to be clear: I remove all population of run, except for bandstructures, correct? Or do I still copy over e.g. method sections?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes just populate the band structure quantities, then the method and system, if you do not see them in the bands mainfile or if they are the same as in pwscf then do not parse them i simply link them with with the pwscf entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method and system are not in the BANDS file. I'll remove any setting of them here then.
I will only link properties like Fermi level, necessary for the band structure.

)
self._process_data()

def _process_data(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just put this under parse function


if self.pwscf_parser.results:
self._create_system_section(sec_run)
self._create_method_section(sec_run)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps also the method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean here?

Copy link
Collaborator

@ladinesa ladinesa May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was referring to do the same as the previous comment i.e. simply linking the method section of the pwscf entry but if they are diferent then you parse it separately

ndaelman added 4 commits May 5, 2025 13:40
- Rename BANDS text parser to `MainfileParser`
- Remove `_process_data` function header (not the code block)
- TODO: incorporate `_extract_reference_energy`
@ndaelman-hu
Copy link
Contributor Author

@ladinesa I'm setting the missing band structure information from the QE parser (after the workflow setup).
Is it possible to defer triggering of the result normalizer for the bands entry till after it has its information set?

@ladinesa
Copy link
Collaborator

ladinesa commented May 7, 2025

@ladinesa I'm setting the missing band structure information from the QE parser (after the workflow setup). Is it possible to defer triggering of the result normalizer for the bands entry till after it has its information set?

there is this parser function after_normalization where you can call the results normalizer again. but no, since we are using the MatchingParserInterface wrapper no way to inject it. You can write a wrapper function in MatchingParserInterface for this.

- Remove references to old NOMAD search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants