Skip to content

Conversation

fneum
Copy link
Member

@fneum fneum commented Aug 19, 2025

closes #245
closes #241
closes #229
closes #161
closes #233
closes #215
closes #259
closes #214
closes #59
closes #260

Changes

  • Update Marktstammdatenregister data for Germany from open-MaStR (February 25, 2025) <https://zenodo.org/records/14783581>__.

  • Drop support for Python 3.9, add support for Python 3.13. Minimum required Python version is now 3.10.

  • Added GeoNuclearData dataset as pm.data.GND().

  • Added European Energy Storage Inventory dataset as pm.data.EESI().

  • Updated ENTSOE, BEYONDCOAL, JRC, IRENASTAT and the Global Energy Monitor datasets to the latest versions.

  • Fix in pm.data.MASTR() the distinction of hydro technologies and between offshore and onshore wind. Also read in storage technologies.

  • Improved recognition of CHP power plants.

  • In Global Energy Monitor datasets, also read entries below capacity threshold.

  • In pm.data.GCPT(), add estimate for coal plant efficiency.

  • Include mothballed gas, oil and coal power plants.

  • Added option to retain blocks for subsets of fuel types (e.g. clean_name: fueltypes_with_blocks: ['Nuclear']).

  • For fully included datasets, add option to only aggregate units included in the matching process (e.g. aggregate_only_matching_sources: ['MASTR']).

  • Added option for multiprocessing when aggregating units of non-matched power plants (e.g. threads_extend_by_non_matched: 16).

  • Updating matching logic configuration.

Updates

Comparison for Reference Year 2024

Europe

image

Germany (note ENTSO-E statistics don't distinguish solid biomass/biogas)

image

Diff Europe this PR to master (note that ENTSO-E statistics are also known to be inaccurate)

image

@fneum fneum changed the title Data Updates 2025 [DNMY] Data Updates 2025 Aug 19, 2025
@fneum fneum force-pushed the data-update-2025 branch from 24f3571 to c886184 Compare August 19, 2025 13:09
@fneum
Copy link
Member Author

fneum commented Aug 22, 2025

Ok, @euronion I requested your review a bit too prematurely, but now everything is in a good state! Especially in Germany the accuracy is very good and all other countries look largely reasonable as well.

Today I focussed on performance (powerplantmatching runs in a handful of minutes again!) and splitting up nuclear power plant blocks in the merged dataset. No clue about the one test failure.

@fneum
Copy link
Member Author

fneum commented Aug 22, 2025

image image image image image

@fneum fneum requested a review from Copilot August 22, 2025 19:11
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements major data updates for 2025 across multiple power plant datasets and introduces several configuration improvements. The changes focus on updating data sources to their latest versions, improving data processing logic, and adding new functionality for better handling of power plant matching and aggregation.

  • Updated multiple external datasets (ENTSOE, BEYONDCOAL, JRC, Global Energy Monitor series) to 2025 versions
  • Added two new datasets: GeoNuclearData (GND) and European Energy Storage Inventory (EESI)
  • Enhanced configuration system with better matching logic and multiprocessing support

Reviewed Changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
powerplantmatching/utils.py Removed individual dataset queries from config_filter and enhanced parmap function with flexible threading support
powerplantmatching/package_data/config.yaml Major configuration overhaul with updated dataset URLs, improved matching logic, and enhanced fuel type mappings
powerplantmatching/package_data/PLZ_Coords_map.csv Added new German postal code coordinate mappings
powerplantmatching/heuristics.py Added threading configuration for non-matched power plant processing
powerplantmatching/duke.py Added threading parameter to duke function
powerplantmatching/data.py Extensive updates to data processing functions and added two new dataset importers (EESI, GND)
powerplantmatching/collection.py Added query filtering for matching sources
powerplantmatching/cleaning.py Enhanced name cleaning with block preservation and improved aggregation with threading support
doc/release-notes.rst Documented new features and changes
doc/basics.rst Added documentation for new datasets
Comments suppressed due to low confidence (1)

powerplantmatching/utils.py:189

  • Removing 'Hydrogen Storage' from the default fueltypes list may cause breaking changes for existing code that relies on this fuel type being automatically converted to 'Other'. Consider documenting this breaking change or providing a migration path.
        "Electro-mechanical",

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@euronion
Copy link
Contributor

Thanks @fneum !

  1. Python 3.9 support has already been dropped in chore: Deprecate python==3.9, officially support python==3.13 #255 :) (related to your PR notes)

Comments based on the figures:

  1. I think the ylabel is still off - MW can't be the correct unit here. Maybe GW for the Europe and Germany figure? The comparison with master has k MW, that seems fine.
  2. Germany: The nuclear phaseout was in 2023, this seems to be correct now in the dataset ✔️ . Are OPSD/GEO/GPD even updated nowadays?
  3. Germany: Solar PV capacities are too little - any idea how to fix? Is that related to filtering out small units from MaStR ?
  4. Germany: Why does it seem like there are more Waste capacities in the combined data than in any of the datasets alone?
  5. Germany: No battery data?
  6. Europe: Nuclear capacities I'd expect to have ENTSOE stats be more reliable on those, any idea why the difference between ENTSOE stats and basically all other datasets?
  7. Europe: Do we also have battery energy capacity from the dataset, or only power capacity?
  8. Europe: More wind capacity in combined dataset than in any individual dataset by a good margin - no double counting I hope? Is an individual country cause of the difference?
  9. Europe: Same problem with solar capacity as Germany? In this case probably not due to MaStR? Any idea why solar is underreported?
  10. Europe: Power plant data for Ukraine is likely off - Should we add a warning somewhere?
  11. Comparison PR/master: master is also reporting 2024 capacities, right? So the PR is only improving the 2024 data, not updating e.g. from 2023 to 2024? Looks plausible!
  12. Are the maps showing all capacities or are they supposed to show only active capacities? E.g. nuclear in Lithuania (probably this plant) is shown, but that has been inactive for more than 20 years. Or it is the Belarus plant, but that shouldn't be in the Europe dataset for lack of other Belarus data.
  13. Faroe islands seem to be missing a (albeit small) power plant.
  14. Lack of solar / battery data for Norway is probably a datasource problem?
  15. Biogas - do we only have data for Germany?

@euronion
Copy link
Contributor

I'm not the best person to review the code as I'm not too familiar with it, but IMHO it seems ok!

@fneum
Copy link
Member Author

fneum commented Aug 25, 2025

Python 3.9 support has already been dropped in #255 :) (related to your PR notes)

Yes, I just copied over the release notes.

I think the ylabel is still off - MW can't be the correct unit here. Maybe GW for the Europe and Germany figure? The comparison with master has k MW, that seems fine.

Fair enough. I corrected it.

Germany: The nuclear phaseout was in 2023, this seems to be correct now in the dataset ✔️ . Are OPSD/GEO/GPD even updated nowadays?

No, it's not updated anymore. Therefore, it has been downrated in its reliability score.

Germany: Solar PV capacities are too little - any idea how to fix? Is that related to filtering out small units from MaStR ?

That's the best we can do. There are a lot of (smaller) rooftop units that do not need to be registered.

Germany: Why does it seem like there are more Waste capacities in the combined data than in any of the datasets alone?

This can happen when two fully included data sources have power plants that the other does not have (non-matching); in this case GEM and MASTR. I don't think it's a problem.

Germany: No battery data?

There is, I just accidentally filtered it out in this plot.

Europe: Nuclear capacities I'd expect to have ENTSOE stats be more reliable on those, any idea why the difference between ENTSOE stats and basically all other datasets?

I tend to trust the agreement of many different datasets more than ENTSO-E transparency. But I am not sure why. At least it's not a single country outlier. They are all quite even.

Europe: Do we also have battery energy capacity from the dataset, or only power capacity?

EESI and MASTR give energy and power capacity.

Europe: More wind capacity in combined dataset than in any individual dataset by a good margin - no double counting I hope? Is an individual country cause of the difference?

It's similar in all countries. I don't think we have double-counting, as I was quite careful to select the fully included source with regard to wind, solar and hydro. It can be larger because I take all of MASTR plus GEM (without Germany). Also the latest ENTSO-E statistics for GB (wind-heavy) is outdated from 2021... who knows why? ;)

This explains about 7 GW surplus: https://ourworldindata.org/grapher/cumulative-installed-wind-energy-capacity-gigawatts?country=~GBR

Europe: Same problem with solar capacity as Germany? In this case probably not due to MaStR? Any idea why solar is underreported?

All datasets but MASTR have a threshold of 1 MW which doesn't capture distributed solar.

Europe: Power plant data for Ukraine is likely off - Should we add a warning somewhere?

Yes, the numbers are all over the place.

Comparison PR/master: master is also reporting 2024 capacities, right? So the PR is only improving the 2024 data, not updating e.g. from 2023 to 2024? Looks plausible!

Yes, I apply the same filtering -- but there may be some data updates about phase-outs etc. in the PR version.

Are the maps showing all capacities or are they supposed to show only active capacities? E.g. nuclear in Lithuania (probably this plant) is shown, but that has been inactive for more than 20 years. Or it is the Belarus plant, but that shouldn't be in the Europe dataset for lack of other Belarus data.

It shows all, also long-retired, power plants. It's up to the user to filter out retired power plants, since they may want to reproduce a historical year. Belarus is not included.

Faroe islands seem to be missing a (albeit small) power plant.

Faroe is not covered in the list of countries. Just like Malta, Cyprus, Iceland.

Lack of solar / battery data for Norway is probably a datasource problem?

No, there are very few utiltiy-scale solar projects in Norway (according to GEM).

https://globalenergymonitor.org/projects/global-solar-power-tracker/tracker-map/

The one solar project of 1 MW is actually there :)

Biogas - do we only have data for Germany?

Kind of, but it's also just the dominant country for biogas. The other datasets do not offer any distinction.

@fneum
Copy link
Member Author

fneum commented Aug 25, 2025

Okay, hopefully final state. It's definitely better than before (although not perfect yet). What one really needs to do is match by block for coal and gas power plants as well as units gradually phase out.

Europe

image

Germany

image image

@fneum fneum requested a review from euronion September 5, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment