Analysis-ready election votes database

The Dutch government makes all data from the general elections (Tweede Kamerverkiezingen) available. For example:

Candidate list of the elections in 2025: kiesraad.nl
Election results of the elections in 2025: kiesraad.nl

The problem: the format is quite horrible, because these data are not created with analysis in mind, but rather reliability, trustworthiness, and auditability on a local scale.

The goal of this repository is to create an efficient analysis-ready relational database, which can be used for all kinds of election analysis, focussing on political parties, municipalities, or candidates. The smallest unit of analysis is the "stembureau" which is the locations where voting takes place. This data is also ready to be connected to external data sources through clear ID variables, efficient subtables.

Warning

This is work in progress, the database was not validated; the data pipeline is provided as-is. There may be several inaccuracies with this data, please do your own consistency / accuracy checks!

This enables plotting and analysis!

Using the database

The database was created using duckdb, a fast and portable analytics database software. The latest version of the database is available on the releases page. The database can be queried directly in the duckdb CLI, but it also has excellent support in other programming languages e.g., R or python, so it's easy to integrate into a standard analysis workflow:

library(tidyverse)
library(duckdb)

dbc <- dbConnect(duckdb("votes_tk2025.duckdb", read_only = TRUE))

# note that these tbls are "lazy"
vote_tbl  <- tbl(dbc, "vote") 
party_tbl <- tbl(dbc, "party")

# computation of this join and sort happens entirely in duckdb
left_join(vote_tbl, party_tbl, by = join_by(party_id == id)) |> 
    arrange(bureau_id, party_id, candidate_nr) |>
    head(100) |> 
    collect()

# A tibble: 100 × 7  
   kieskring_id gemeente_id bureau_id   party_id candidate_nr votes name                         
   <chr>        <chr>       <chr>          <int>        <int> <int> <chr>                        
 1 1            0014        0014::SB100        1            1    37 PVV (Partij voor de Vrijheid)
 2 1            0014        0014::SB100        1            2     1 PVV (Partij voor de Vrijheid)
 3 1            0014        0014::SB100        1            3     2 PVV (Partij voor de Vrijheid)
 4 1            0014        0014::SB100        1            4     1 PVV (Partij voor de Vrijheid)
 5 1            0014        0014::SB100        1            5     0 PVV (Partij voor de Vrijheid)
 6 1            0014        0014::SB100        1            6     4 PVV (Partij voor de Vrijheid)
 7 1            0014        0014::SB100        1            7     0 PVV (Partij voor de Vrijheid)
 8 1            0014        0014::SB100        1            8     0 PVV (Partij voor de Vrijheid)
 9 1            0014        0014::SB100        1            9     0 PVV (Partij voor de Vrijheid)
10 1            0014        0014::SB100        1           10     1 PVV (Partij voor de Vrijheid)
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows

Tables

The available tables in the database are the following:

bureau contains location and postcode info of the "stembureaus"
candidate contains candidate information (party, name, gender, ...) for each "kieskring"
gemeente links municipality ID to its name
kieskring links the kieskring (electoral region) to its name
party contains the names and IDs of the political parties
postcode contains some information with spatial coordinates for each postal code area
vote contains the number of votes cast at each "stembureau" for each candidate

Analysis view

The main analytical table is prepared as a view using the abovementioned tables as its source. The main table has the following columns:

> duckdb votes.duckdb -c "SHOW main;"

┌──────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name  │ column_type │  null   │   key   │ default │  extra  │
│   varchar    │   varchar   │ varchar │ varchar │ varchar │ varchar │
├──────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ kieskring_id │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ kieskring    │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ gemeente_id  │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ gemeente     │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ bureau_id    │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ location     │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ postcode     │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ party_id     │ INTEGER     │ YES     │ NULL    │ NULL    │ NULL    │
│ party        │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ candidate_nr │ INTEGER     │ YES     │ NULL    │ NULL    │ NULL    │
│ candidate_id │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ first_name   │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ infix        │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ last_name    │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ gender       │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ language     │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ votes        │ INTEGER     │ YES     │ NULL    │ NULL    │ NULL    │
│ party_votes  │ HUGEINT     │ YES     │ NULL    │ NULL    │ NULL    │
│ bureau_votes │ HUGEINT     │ YES     │ NULL    │ NULL    │ NULL    │
│ x_centroid   │ DOUBLE      │ YES     │ NULL    │ NULL    │ NULL    │
│ y_centroid   │ DOUBLE      │ YES     │ NULL    │ NULL    │ NULL    │
│ population   │ INTEGER     │ YES     │ NULL    │ NULL    │ NULL    │
│ area_m2      │ DOUBLE      │ YES     │ NULL    │ NULL    │ NULL    │
├──────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┤
│ 23 rows                                                  6 columns │
└────────────────────────────────────────────────────────────────────┘

The first and last columns of the first few rows look like this:

> duckdb votes.duckdb -c "select * from main limit 6;"

┌──────────────┬───────────┬─────────────┬───────────┬─────────────┬──────────┬───┬───────┬─────────────┬──────────────┬────────────────────┬───────────────────┬────────────┬───────────────────┐
│ kieskring_id │ kieskring │ gemeente_id │ gemeente  │  bureau_id  │ location │ . │ votes │ party_votes │ bureau_votes │     x_centroid     │    y_centroid     │ population │      area_m2      │
│   varchar    │  varchar  │   varchar   │  varchar  │   varchar   │ varchar  │   │ int32 │   int128    │    int128    │       double       │      double       │   int32    │      double       │
├──────────────┼───────────┼─────────────┼───────────┼─────────────┼──────────┼───┼───────┼─────────────┼──────────────┼────────────────────┼───────────────────┼────────────┼───────────────────┤
│ 1            │ Groningen │ 0014        │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │     0 │           3 │         1074 │ 237557.97857383578 │ 584304.9422795495 │         25 │ 5945.412247211905 │
│ 1            │ Groningen │ 0014        │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │     0 │           3 │         1074 │ 237557.97857383578 │ 584304.9422795495 │         25 │ 5945.412247211905 │
│ 1            │ Groningen │ 0014        │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │     0 │           1 │         1074 │ 237557.97857383578 │ 584304.9422795495 │         25 │ 5945.412247211905 │
│ 1            │ Groningen │ 0014        │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │     0 │           1 │         1074 │ 237557.97857383578 │ 584304.9422795495 │         25 │ 5945.412247211905 │
│ 1            │ Groningen │ 0014        │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │     0 │          42 │         1074 │ 237557.97857383578 │ 584304.9422795495 │         25 │ 5945.412247211905 │
│ 1            │ Groningen │ 0014        │ Groningen │ 0014::SB443 │ Kajuit 4 │ . │     0 │          42 │         1074 │ 237557.97857383578 │ 584304.9422795495 │         25 │ 5945.412247211905 │
├──────────────┴───────────┴─────────────┴───────────┴─────────────┴──────────┴───┴───────┴─────────────┴──────────────┴────────────────────┴───────────────────┴────────────┴───────────────────┤
│ 6 rows                                                                                                                                                                   23 columns (13 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Full schema

The schema of the actual tables in the database looks like so. Arrows have been added manually to indicate which tables can be joined. There are no formal primary and foreign keys in the database because it hampers performance.

Creating the database

Prerequisites

To create the database, you need a recent version of R with the following packages:

install.packages(c("tidyverse", "emld", "digest", "nanoparquet", "sf"))

And you need to install the duckdb CLI: duckdb.org/install.

Download the source files

Go to the file raw_data/source.txt and download the source files from the URLs there. Specifically, download the municipal .eml.xml files for the actual votes, the .csv file for the candidates, and name the postal code information pc6.gpkg.

After downloading the source files, the repo should look like this:

📁 election_database/
├── 📖 README.md
├── LICENSE
├── .gitattributes
├── .gitignore
├── 📄 01_eml_to_votes.R
├── 📄 02_votes_to_parquet.R
├── 📄 03_model.R
├── 📄 03_query.R
├── 📄 04_create_duckdb.sql
├── voting_analysis.Rproj
├── 📁 raw_data/
│   ├── .gitignore
│   ├── 📜 Overzicht+Kandidatenlijsten_TK2025_csv.csv
│   ├── 📦 pc6.gpkg
│   ├── source.txt
│   └── 📁 Gemeente tellingen/
│       ├── 📜 Telling_TK2025_gemeente_Aa_en_Hunze.eml.xml
│       ├── 📜 Telling_TK2025_gemeente_Aalsmeer.eml.xml
│       ├── ...
│       ├── 📜 Telling_TK2025_openbaar_lichaam_Saba.eml.xml
│       └── 📜 Telling_TK2025_openbaar_lichaam_Sint_Eustatius.eml.xml
├── 📁 img/
│   ├── 📊 schema.png
│   ├── schema.drawio
│   └── 📊 votes_per_party.png
└── 📁 processed_data/

# made with J535D165/scitree

Run the pre-processing pipeline

First, parse and extract the information we need from the municipal EML files:

Rscript 01_eml_to_votes.R

Then, convert the results to parquet files of each individual table:

Rscript 02_votes_to_parquet.R

The processed_data folder should now look like this:

📁 processed_data/
├── 📁 gemeente/
│   ├── 📦 0014.rds
│   ├── 📦 0034.rds
│   ├── 📦 0037.rds
│   ├── ...
│   ├── 📦 1992.rds
│   ├── 📦 9001.rds
│   └── 📦 9010.rds
└── 📁 votes_data/
    ├── 📦 bureau.parquet
    ├── 📦 candidate.parquet
    ├── 📦 gemeente.parquet
    ├── 📦 kieskring.parquet
    ├── 📦 party.parquet
    ├── 📦 postcode.parquet
    └── 📦 vote.parquet

Last, load the resulting parquet files and create the main view using duckdb:

duckdb < 03_create_duckdb.sql

the file votes.duckdb should now appear in the processed_data folder.

Contact

This is a project by the ODISSEI Social Data Science team. Do you have questions, suggestions, or remarks on the technical implementation? Create an issue in the issue tracker or feel free to contact Erik-Jan van Kesteren.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis-ready election votes database

Using the database

Tables

Analysis view

Full schema

Creating the database

Prerequisites

Download the source files

Run the pre-processing pipeline

Contact

About

Uh oh!

Releases 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
img		img
processed_data		processed_data
raw_data		raw_data
.gitattributes		.gitattributes
.gitignore		.gitignore
01_eml_to_votes.R		01_eml_to_votes.R
02_votes_to_parquet.R		02_votes_to_parquet.R
03_create_duckdb.sql		03_create_duckdb.sql
04_query.R		04_query.R
05_analyze.R		05_analyze.R
LICENSE		LICENSE
README.md		README.md
election_database.Rproj		election_database.Rproj

License

sodascience/election_database

Folders and files

Latest commit

History

Repository files navigation

Analysis-ready election votes database

Using the database

Tables

Analysis view

Full schema

Creating the database

Prerequisites

Download the source files

Run the pre-processing pipeline

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Languages