Czech FOI Simulation Analysis
Investigates whether there is a reliable statistical way to determine the dAEFI rate when the baseline is unknown (real world).
As far as I know, this (vital) problem is still waiting for the head that can solve it?
Simulates dAEFIs to analyse the impact on the curve and back-calculate the dAEFIs rate (comparing known and unknown baseline). Uses real Czech FOI (Freedom of Information) data, or generates d, dvx, duvx data in modulated sine wave form.
Simulated data can be used to check for calculation errors in your code, it is possible to create a CSV file with the data of all Plot curves (from day 1-1534).
The Python Scripts process and visualize CSV data from the TERRA folder, generating interactive HTML plots.
Each plot compares two age groups. To interact with the plots, click on a legend entry to show/hide curves.
Refactored Scripts AF) and AG) compare AG groups (e.g., 1-year intervals) by calculating differences between closely positioned age groups. The differences are summed, and simulated dAEFIs are added to examine the curves with and without dAEFIs. Multiple AG groups are plotted into a single HTML file for comparison
Download the processed plots for analysis from the Plot Results Folder. Or simply adapt and run the Python script to meet your own analysis requirements!
Dates are counted as the number of days since January 1, 2020, for easier processing. "AGE_2023" represents age on January 1, 2023.
The data can optionally be normalized per 100,000 for comparison.
Access the original Czech FOI data from a Freedom of Information request. To learn how the Pivot CSV files in the TERRA folder were created, see the wiki
Abbreviations: The figures are per age group from the CSV files in the TERRA folder:
Deaths | Definition | Population/Doses | Definition |
---|---|---|---|
NUM_D | Number deaths | NUM_POP | Total people |
NUM_DUVX | Number unvaxed deaths | NUM_UVX | Number of unvaxed people |
NUM_DVX | Number vaxed deaths | NUM_VX | Number of vaxed people |
NUM_DVD1-DVD7 | Number deaths doses 1 - 7 | NUM_VD1-VD7 | Number of vax doses 1 - 7 |
NUM_DVDA | Number deaths from all doses | NUM_VDA | Total number of all vax doses (sum) |
dAEFI | simulated death Adverse Events following imunis. |
DoWhy Analysis of Causal Impact Estimates
Phyton script AI) dowhy diff all-agegrp-in-same-plot
Uses the DoWhy Library https://github.com/py-why/dowhy
DoWhy is a Python library for causal inference that allows modeling and testing of causal assumptions, based on a unified language for causal inference. See the book Models, Reasoning, and Inference by Judea Pearl for deeper insights, that goes far beyond my horizon.
mean (RAW) blue actual raw data averages of doses and deaths for an age group, along with estimated causal impact of doses on deaths by DoWhy.
mean (AEF) red simulation of one additional death per 5000 doses, added to the raw data, along with estimated causal impact of doses on deaths by DoWhy.
DoWhy causal impact estimates: comparing real (RAW) data in green with simulated (AEF) data in yellow, where one additional death per 5000 doses is simulated, for age group 15-84
The difference between DoWhy's estimate of the causal effect of the simulated data (AEF) and the real data (RAW), converted into the number of doses per death.
DoWhy's estimate of the causal effect per number of doses is fairly close to the simulation of 5000 doses per additional death.
Phase diagram of absolute values: D_Curve to Doses_curve for days 0-1533 and age group 15-84 (RAW, AEF)
Download html
Below is the same simulation, but instead of using the absolute values of doses and deaths for each age group, it shows the differences in doses and deaths between neighboring age groups at one-year intervals. This method helps to cancel out external disturbances.
DoWhy causal impact estimates: comparing real (RAW) data in green with simulated (AEF) data in yellow, where one additional death per 5000 doses is simulated, for age group 15-84
The difference between DoWhy's estimate of the causal effect of the simulated data (AEF) and the real data (RAW), converted into the number of doses per death.
DoWhy's estimate of the causal effect per number of doses is fairly close to the simulation of 5000 doses per additional death.
Phase diagram difference values: D_Curve and Doses_curve between neighboring age groups at one-year intervals, for days 0-1533 and age group 15-84 (RAW, AEF)
Download html
Phyton script AX) dowhy diff all-agegrp-in-same-plot
Below is the same simulation. Instead of comparing the real curve with the simulated curve, I compared the real absolute data (RAW) of two neighboring age groups. The population at one-year intervals should be quite similar, thus minimizing possible confounders
DoWhy causal impact estimates: comparing real absolute (RAW) data in yellow with the absolute (RAW) data of neigbouhr age group one year appart, for age group 15-84
The difference between DoWhy's estimate of the causal effect of the absolute real data (RAW) of two neigboughr age groups, converted into the number of doses per death for age group 15-84
DoWhy's estimate of the causal effect per number of doses, the outliers are not visible, because zoomed in
Phase diagram difference values: D_Curve and Doses_curve between neighboring age groups at one-year intervals, for days 0-1533 and age group 15-84 (RAW, AEF)
Download html
Phyton script AY) dowhy diff all-agegrp-in-same-plot
Here Instead of comparing the real curve with the simulated curve, I compared the real (RAW) difference of Doses_Curve and Death_Curve for two neighboring age groups. This method helps to cancel out external disturbances. The population at one-year intervals should be quite similar, thus minimizing possible confounders
DoWhy causal impact estimates: comparing death and doses difference in real (RAW) data between two neigbour age groups one year appart, for age group 15-84
The difference between DoWhy's estimate of the causal effect of difference in doses deaths of real data (RAW) between two neigbour age groups, converted into the number of doses per death.
DoWhy's estimate of the causal effect per number of doses, the outliers are not visible, because zoomed in
Phase diagram difference values: D_Curve and Doses_curve between neighboring age groups at one-year intervals, for days 0-1533 and age group 15-84 (RAW, AEF)
Download html
Phyton script AZ) dowhy diff all-agegrp-in-same-plot
dAEFI simulation known Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_50-54
If the baseline is known (which is not the case in practice), the estimated dAEFIs per dose are quite accurate, e.g., 4408 vs. 5000. .
dAEFI simulation known Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_75-79
The estimated dAEFIs per dose, e.g., 4179 vs. 5000. .
dAEFI simulation unknown Basline real world.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_50-54
If the baseline is unknown (which is the case in practice), the estimated dAEFI per dose are not reliable , e.g., 136 vs. 5000. .
dAEFI simulation unknown Basline (real world).
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_75-79
The estimated dAEFIs per dose, e.g., 39 vs. 5000. .
D, DVX, DUVX plots.
Added dAEFIs (1/5000 Doses) vs non added AEFIs: AG_50-54 vs 75-79
As you can see, the added dAEFIs have little impact on the top D-curves for age group 75-79, making it hard to detect a signal without knowing the baseline.
I struggled to find a reliable method to back-calculate the dAEFIs ratio using only the moving average as the baseline (real world). This is particularly true for the older age groups.
_________________________________________
dAEFI simulation known Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_54-59
Simulation of sinus curves for D, DVX, and DUVX, adding one dAEFI per 5000 doses in a random 1-250 day window after dose . The estimated dAEFIs per dose, e.g., 5138 vs. 5000 - if basline is known.
dAEFI simulation unknown Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_54-59
The legend label "pr" calculates the mortality curves for D, DUVX, and DVX, assuming that the vx and uvx populations have the same hypothetical mortality probability (distribution), see upper part of the first plot.
The lower part of the first plot shows the result of the normalized mortality curves (deaths/100,000 people - legend label "n"). Since the mortality probability for D, VX, and UVX is assumed to be identical, the normalized curves overlap.
Additionally, 1/5000 dAEFIs are added (legend label "ae")
The estimated dAEFIs per dose, e.g., 388 vs. 5000 - if basline is unknown.
Phyton script AC) calc dAEFI diff all-agegrp-in-same-plot
The script compares age groups in 1-year intervals.
The idea is that the two populations, which are one year apart, can be considered comparable.
It calculates the difference in normalized death rates (per 100,000 people) and takes a rolling average of this difference as the baseline. It also calculates the difference in normalized doses administered (per 100,000 doses). However, a reliable and accurate method for calculating estimated dAEFIs has not yet been found.
Can also calculate rolling and phaseshift correlation
For Database and CSV File creation in the Terra folder All AG SQL Time.sql was used.
DIF-VDA n all AgeGroups
Shows the estimated mean dAEFI values for AG 1 to 113.
Shows the normalized DIF-VDA (All Doses difference for all AG)
DIF-VDA Basline Mean estimate dAEFI n - Some examples of different AGs
For AG 13-14
For AG 14-15
For AG 17-18
For AG 42-43
For AG 71-72
For AG 81-82
For AG 107-108
Refactored Scripts AF)
The Python script calculates the differences in doses and deaths for similar age bands (one year appart), as specified in the age_band_compare
list. It then summarizes the differences and adds dAEFIs (one per 5,000 doses). Additionally, it compares the rolling and shift correlations of the raw D-curve with the D-curve that includes the added dAEFI events.
With one dAEFI per 5,000 doses, there is no significant change in the D-curves, including the rolling Pearson correlation, making it irrelevant in this context. Although the amplitude of the phase shift correlation has changed significantly, this is not helpful since the baseline is unknown
Zoomed in to highlight the minimal difference at 1/5,000 doses.
Download html
Refactored Scripts AG)
This Python script employs a different approach but produces results similar to those of the AF script. It calculates the rolling Pearson correlation based on changes in cumulative doses, revealing a strong correlation. However, this correlation is not relevant in the context of rare dAEFIs. Additionally, although the amplitude of the phase shift correlation changes significantly, this information is not useful without a known baseline
- Python 3.12.5 to run the scripts.
- Visual Studio Code 1.92.2 to edit and run scripts.
- Optional - DB Browser for SQLite 3.13.0 for database creation, SQL queries, and CSV export.
The results have not been checked for errors. Neither methodological nor technical checks or data cleansing have been performed.