The Dutch government makes all data from the general elections (Tweede Kamerverkiezingen) available. For example:
- Candidate list of the elections in 2025: kiesraad.nl
- Election results of the elections in 2025: kiesraad.nl
The problem: the format is quite horrible, because these data are not created with analysis in mind, but rather reliability, trustworthiness, and auditability on a local scale.
The goal of this repository is to create an efficient analysis-ready relational database, which can be used for all kinds of election analysis, focussing on political parties, municipalities, or candidates. The smallest unit of analysis is the "stembureau" which is the locations where voting takes place. This data is also ready to be connected to external data sources through clear ID variables, efficient subtables.
Warning
This is work in progress, the database was not validated; the data pipeline is provided as-is. There may be several inaccuracies with this data, please do your own consistency / accuracy checks!
This enables plotting and analysis!
The database was created using duckdb, a fast and portable analytics database software. The latest version of the database is available on the releases page. The database can be queried directly in the duckdb CLI, but it also has excellent support in other programming languages e.g., R or python, so it's easy to integrate into a standard analysis workflow:
library(tidyverse)
library(duckdb)
dbc <- dbConnect(duckdb("votes_tk2025.duckdb", read_only = TRUE))
# note that these tbls are "lazy"
vote_tbl <- tbl(dbc, "vote")
party_tbl <- tbl(dbc, "party")
# computation of this join and sort happens entirely in duckdb
left_join(vote_tbl, party_tbl, by = join_by(party_id == id)) |>
arrange(bureau_id, party_id, candidate_nr) |>
head(100) |>
collect()# A tibble: 100 × 7
kieskring_id gemeente_id bureau_id party_id candidate_nr votes name
<chr> <chr> <chr> <int> <int> <int> <chr>
1 1 0014 0014::SB100 1 1 37 PVV (Partij voor de Vrijheid)
2 1 0014 0014::SB100 1 2 1 PVV (Partij voor de Vrijheid)
3 1 0014 0014::SB100 1 3 2 PVV (Partij voor de Vrijheid)
4 1 0014 0014::SB100 1 4 1 PVV (Partij voor de Vrijheid)
5 1 0014 0014::SB100 1 5 0 PVV (Partij voor de Vrijheid)
6 1 0014 0014::SB100 1 6 4 PVV (Partij voor de Vrijheid)
7 1 0014 0014::SB100 1 7 0 PVV (Partij voor de Vrijheid)
8 1 0014 0014::SB100 1 8 0 PVV (Partij voor de Vrijheid)
9 1 0014 0014::SB100 1 9 0 PVV (Partij voor de Vrijheid)
10 1 0014 0014::SB100 1 10 1 PVV (Partij voor de Vrijheid)
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows
The available tables in the database are the following:
bureaucontains location and postcode info of the "stembureaus"candidatecontains candidate information (party, name, gender, ...) for each "kieskring"gemeentelinks municipality ID to its namekieskringlinks the kieskring (electoral region) to its namepartycontains the names and IDs of the political partiespostcodecontains some information with spatial coordinates for each postal code areavotecontains the number of votes cast at each "stembureau" for each candidate
The main analytical table is prepared as a view using the abovementioned tables as its source. The main table has the following columns:
> duckdb votes.duckdb -c "SHOW main;"┌──────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├──────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ kieskring_id │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ kieskring │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ gemeente_id │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ gemeente │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ bureau_id │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ location │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ postcode │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ party_id │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ party │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ candidate_nr │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ candidate_id │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ first_name │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ infix │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ last_name │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ gender │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ language │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ votes │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ party_votes │ HUGEINT │ YES │ NULL │ NULL │ NULL │
│ bureau_votes │ HUGEINT │ YES │ NULL │ NULL │ NULL │
│ x_centroid │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ y_centroid │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ population │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ area_m2 │ DOUBLE │ YES │ NULL │ NULL │ NULL │
├──────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┤
│ 23 rows 6 columns │
└────────────────────────────────────────────────────────────────────┘
The first and last columns of the first few rows look like this:
> duckdb votes.duckdb -c "select * from main limit 6;"┌──────────────┬───────────┬─────────────┬───────────┬─────────────┬──────────┬───┬───────┬─────────────┬──────────────┬────────────────────┬───────────────────┬────────────┬───────────────────┐
│ kieskring_id │ kieskring │ gemeente_id │ gemeente │ bureau_id │ location │ . │ votes │ party_votes │ bureau_votes │ x_centroid │ y_centroid │ population │ area_m2 │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ │ int32 │ int128 │ int128 │ double │ double │ int32 │ double │
├──────────────┼───────────┼─────────────┼───────────┼─────────────┼──────────┼───┼───────┼─────────────┼──────────────┼────────────────────┼───────────────────┼────────────┼───────────────────┤
│ 1 │ Groningen │ 0014 │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │ 0 │ 3 │ 1074 │ 237557.97857383578 │ 584304.9422795495 │ 25 │ 5945.412247211905 │
│ 1 │ Groningen │ 0014 │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │ 0 │ 3 │ 1074 │ 237557.97857383578 │ 584304.9422795495 │ 25 │ 5945.412247211905 │
│ 1 │ Groningen │ 0014 │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │ 0 │ 1 │ 1074 │ 237557.97857383578 │ 584304.9422795495 │ 25 │ 5945.412247211905 │
│ 1 │ Groningen │ 0014 │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │ 0 │ 1 │ 1074 │ 237557.97857383578 │ 584304.9422795495 │ 25 │ 5945.412247211905 │
│ 1 │ Groningen │ 0014 │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │ 0 │ 42 │ 1074 │ 237557.97857383578 │ 584304.9422795495 │ 25 │ 5945.412247211905 │
│ 1 │ Groningen │ 0014 │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │ 0 │ 42 │ 1074 │ 237557.97857383578 │ 584304.9422795495 │ 25 │ 5945.412247211905 │
├──────────────┴───────────┴─────────────┴───────────┴─────────────┴──────────┴───┴───────┴─────────────┴──────────────┴────────────────────┴───────────────────┴────────────┴───────────────────┤
│ 6 rows 23 columns (13 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
The schema of the actual tables in the database looks like so. Arrows have been added manually to indicate which tables can be joined. There are no formal primary and foreign keys in the database because it hampers performance.
To create the database, you need a recent version of R with the following packages:
install.packages(c("tidyverse", "emld", "digest", "nanoparquet", "sf"))And you need to install the duckdb CLI: duckdb.org/install.
Go to the file raw_data/source.txt and download the source files from the URLs there. Specifically, download the municipal .eml.xml files for the actual votes, the .csv file for the candidates, and name the postal code information pc6.gpkg.
After downloading the source files, the repo should look like this:
📁 election_database/
├── 📖 README.md
├── LICENSE
├── .gitattributes
├── .gitignore
├── 📄 01_eml_to_votes.R
├── 📄 02_votes_to_parquet.R
├── 📄 03_model.R
├── 📄 03_query.R
├── 📄 04_create_duckdb.sql
├── voting_analysis.Rproj
├── 📁 raw_data/
│ ├── .gitignore
│ ├── 📜 Overzicht+Kandidatenlijsten_TK2025_csv.csv
│ ├── 📦 pc6.gpkg
│ ├── source.txt
│ └── 📁 Gemeente tellingen/
│ ├── 📜 Telling_TK2025_gemeente_Aa_en_Hunze.eml.xml
│ ├── 📜 Telling_TK2025_gemeente_Aalsmeer.eml.xml
│ ├── ...
│ ├── 📜 Telling_TK2025_openbaar_lichaam_Saba.eml.xml
│ └── 📜 Telling_TK2025_openbaar_lichaam_Sint_Eustatius.eml.xml
├── 📁 img/
│ ├── 📊 schema.png
│ ├── schema.drawio
│ └── 📊 votes_per_party.png
└── 📁 processed_data/
# made with J535D165/scitree
First, parse and extract the information we need from the municipal EML files:
Rscript 01_eml_to_votes.RThen, convert the results to parquet files of each individual table:
Rscript 02_votes_to_parquet.RThe processed_data folder should now look like this:
📁 processed_data/
├── 📁 gemeente/
│ ├── 📦 0014.rds
│ ├── 📦 0034.rds
│ ├── 📦 0037.rds
│ ├── ...
│ ├── 📦 1992.rds
│ ├── 📦 9001.rds
│ └── 📦 9010.rds
└── 📁 votes_data/
├── 📦 bureau.parquet
├── 📦 candidate.parquet
├── 📦 gemeente.parquet
├── 📦 kieskring.parquet
├── 📦 party.parquet
├── 📦 postcode.parquet
└── 📦 vote.parquet
Last, load the resulting parquet files and create the main view using duckdb:
duckdb < 03_create_duckdb.sqlthe file votes.duckdb should now appear in the processed_data folder.
This is a project by the ODISSEI Social Data Science team. Do you have questions, suggestions, or remarks on the technical implementation? Create an issue in the issue tracker or feel free to contact Erik-Jan van Kesteren.

