-
Notifications
You must be signed in to change notification settings - Fork 69
Data Updates 2025 #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Data Updates 2025 #256
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
24f3571
to
c886184
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Ok, @euronion I requested your review a bit too prematurely, but now everything is in a good state! Especially in Germany the accuracy is very good and all other countries look largely reasonable as well. Today I focussed on performance ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements major data updates for 2025 across multiple power plant datasets and introduces several configuration improvements. The changes focus on updating data sources to their latest versions, improving data processing logic, and adding new functionality for better handling of power plant matching and aggregation.
- Updated multiple external datasets (ENTSOE, BEYONDCOAL, JRC, Global Energy Monitor series) to 2025 versions
- Added two new datasets: GeoNuclearData (GND) and European Energy Storage Inventory (EESI)
- Enhanced configuration system with better matching logic and multiprocessing support
Reviewed Changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
powerplantmatching/utils.py | Removed individual dataset queries from config_filter and enhanced parmap function with flexible threading support |
powerplantmatching/package_data/config.yaml | Major configuration overhaul with updated dataset URLs, improved matching logic, and enhanced fuel type mappings |
powerplantmatching/package_data/PLZ_Coords_map.csv | Added new German postal code coordinate mappings |
powerplantmatching/heuristics.py | Added threading configuration for non-matched power plant processing |
powerplantmatching/duke.py | Added threading parameter to duke function |
powerplantmatching/data.py | Extensive updates to data processing functions and added two new dataset importers (EESI, GND) |
powerplantmatching/collection.py | Added query filtering for matching sources |
powerplantmatching/cleaning.py | Enhanced name cleaning with block preservation and improved aggregation with threading support |
doc/release-notes.rst | Documented new features and changes |
doc/basics.rst | Added documentation for new datasets |
Comments suppressed due to low confidence (1)
powerplantmatching/utils.py:189
- Removing 'Hydrogen Storage' from the default fueltypes list may cause breaking changes for existing code that relies on this fuel type being automatically converted to 'Other'. Consider documenting this breaking change or providing a migration path.
"Electro-mechanical",
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Thanks @fneum !
Comments based on the figures:
|
I'm not the best person to review the code as I'm not too familiar with it, but IMHO it seems ok! |
Yes, I just copied over the release notes.
Fair enough. I corrected it.
No, it's not updated anymore. Therefore, it has been downrated in its reliability score.
That's the best we can do. There are a lot of (smaller) rooftop units that do not need to be registered.
This can happen when two fully included data sources have power plants that the other does not have (non-matching); in this case GEM and MASTR. I don't think it's a problem.
There is, I just accidentally filtered it out in this plot.
I tend to trust the agreement of many different datasets more than ENTSO-E transparency. But I am not sure why. At least it's not a single country outlier. They are all quite even.
EESI and MASTR give energy and power capacity.
It's similar in all countries. I don't think we have double-counting, as I was quite careful to select the fully included source with regard to wind, solar and hydro. It can be larger because I take all of MASTR plus GEM (without Germany). Also the latest ENTSO-E statistics for GB (wind-heavy) is outdated from 2021... who knows why? ;) This explains about 7 GW surplus: https://ourworldindata.org/grapher/cumulative-installed-wind-energy-capacity-gigawatts?country=~GBR
All datasets but MASTR have a threshold of 1 MW which doesn't capture distributed solar.
Yes, the numbers are all over the place.
Yes, I apply the same filtering -- but there may be some data updates about phase-outs etc. in the PR version.
It shows all, also long-retired, power plants. It's up to the user to filter out retired power plants, since they may want to reproduce a historical year. Belarus is not included.
Faroe is not covered in the list of countries. Just like Malta, Cyprus, Iceland.
No, there are very few utiltiy-scale solar projects in Norway (according to GEM). https://globalenergymonitor.org/projects/global-solar-power-tracker/tracker-map/ The one solar project of 1 MW is actually there :)
Kind of, but it's also just the dominant country for biogas. The other datasets do not offer any distinction. |
Co-authored-by: Johannes HAMPP <[email protected]>
for more information, see https://pre-commit.ci
closes #245
closes #241
closes #229
closes #161
closes #233
closes #215
closes #259
closes #214
closes #59
closes #260
Changes
Update Marktstammdatenregister data for Germany from
open-MaStR (February 25, 2025) <https://zenodo.org/records/14783581>
__.Drop support for Python 3.9, add support for Python 3.13. Minimum required Python version is now 3.10.
Added GeoNuclearData dataset as
pm.data.GND()
.Added European Energy Storage Inventory dataset as
pm.data.EESI()
.Updated ENTSOE, BEYONDCOAL, JRC, IRENASTAT and the Global Energy Monitor datasets to the latest versions.
Fix in
pm.data.MASTR()
the distinction of hydro technologies and between offshore and onshore wind. Also read in storage technologies.Improved recognition of CHP power plants.
In Global Energy Monitor datasets, also read entries below capacity threshold.
In
pm.data.GCPT()
, add estimate for coal plant efficiency.Include mothballed gas, oil and coal power plants.
Added option to retain blocks for subsets of fuel types (e.g.
clean_name: fueltypes_with_blocks: ['Nuclear']
).For fully included datasets, add option to only aggregate units included in the matching process (e.g.
aggregate_only_matching_sources: ['MASTR']
).Added option for multiprocessing when aggregating units of non-matched power plants (e.g.
threads_extend_by_non_matched: 16
).Updating matching logic configuration.
Updates
entsoe-py
BNetzA https://www.bundesnetzagentur.de/EN/Areas/Energy/SecurityOfSupply/GeneratingCapacity/PowerPlantList/start.htmldoes not make sense as newer versions only source from MaStR which is already covered~/Downloads/IRENASTAT_capacities_2000-2024.csv
Comparison for Reference Year 2024
Europe
Germany (note ENTSO-E statistics don't distinguish solid biomass/biogas)
Diff Europe this PR to
master
(note that ENTSO-E statistics are also known to be inaccurate)