FAERS

The purpose of this repository is the following:

Download files from FAERS, and prepare data to import to a Postgresql DB.
Obtain PRR of a specific group of drugs like antidepressants, either local values (comparing only with the specified group, example prr compared to only antidepressants), or global values (comparing with all the drugs of FAERS.

Author : Allan Ken Miyazono Ushijima
email : [email protected] ## Requirements

20 GB of storage
32 GB of memory
Linux or OS capable of running bash (WSL, MAC). Need the following programs in Linux:
unzip
To download unzip you need to run the following command:
sudo apt install unzip
wget
curl
Necessary libraries:
- polars (v2)
- pandas
- numpy

Versions

v1: Used data obtained in February 2024, and we only considered this scripts to clean data and upload it to postgresql
v2: To allow people to access the data without Postgresql, we used polars to obtain PRR (prr_polars). As we use polars the processing time is shorter that with the script sql_prr.py

Processing

We took the files from 2014 Q3 to 2024 Q4 (v2). The reason is to make use of the field "prod_ai" (product active ingredient), and reduce the need to normalize drug names. If there is a need to change the dates, you can modify the variables in download_script_FAERS.sh script (start_year, start_quarter, end_year, end_quarter).

To steps to run the scripts are:

Create conda environment.
conda env create --file environment.yml
Change rights to be able to execute the script:
chmod +x ./download_FAERS.sh ./script_python.sh ./upload_files.sh
Run bash script:
./download_FAERS.sh
Run bash script:
./script_python.sh
If you want to host a database with postgresql(please check your server policies like user rights before doing this step):
sudo -u postgres createdb faers postgres
sudo -u postgres psql faers < CREATE_TABLE.sql
Change the variables in script upload_files.sh with your database name and user
./upload_files.sh
To obtain the prr of the group of drugs, update Drugs.csv
nano Drugs.csv
Run polars script to obtain prr
python prr_polars.py

Limitations

There are some limitations to our approach, as we only modified the data to be able to obtain the PRR. The limitations that we found are the following:

The dates cannot be parsed automatically as they are considering different formats: YYYYMMDD, YYYYMM, and YYYY. We didn't process this field to prevent increasing the memory and storage (by dividing them into year, month and day; and parsing the full dates in another column).
In DEMO, the units of ages are varied. There are some rows that have entries of a person above the age of 30 years old with units of days or months.
In DEMO, the units of weight/mass are varied, and cannot be processed directly. Some of the rows have the unit of mass (example: 70 KG), so it is necessary to first obtain the float and then convert it to required units of mass.

Scripts and Files

CREATE_TABLE.sql : This SQL script creates the table for PostgreSQL. This is not necessary if you are not planning to upload the dataset to PostgreSQL.
DB_drop_du.py: This script deletes duplicates that are found in DEMO, considering primaryid.
DB_merge.py: This script merges all the quarters' files into their respective file (DEMO, REAC, DRUG, INDI, OUTC, RPSR, THER).
download_FAERS.sh: This bash script downloads FAERS and extracts the necessary file (ASCII).
Drugs.csv: This csv file is used for the input of prr_polars.py and sql_prr.py. The csv contains a list of drugs that will be used to obtain their adverse effects along with their respective local and global PRRs.
dtype_FAERS_pl.json: This json file is to specify the dtypes of the csv files, to prevent any issues with typing when importing to dataframes.
environment.yml: This yml file is to help download the proper python libraries
prr_polars.py: This script is used to obtain the PRR and adverse effects of a specific group of drugs listed in Drugs.csv.
script_python.sh: This bash script runs the python scripts that are necessary to merge and clean the csv files.
sql_prr.py (v1): This script (version 1) is used to obtain the PRR and adverse effects of a specific group of drugs listed in Drugs.csv by connecting to a PostgreSQL server.
upload_files.sh (v1): This script (version 1) is used to upload the csv files to a PostgreSQL server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FAERS

Versions

Processing

Limitations

Scripts and Files

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
.env		.env
.gitignore		.gitignore
CREATE_TABLE.sql		CREATE_TABLE.sql
DB_drop_dup.py		DB_drop_dup.py
DB_merge.py		DB_merge.py
Drugs.csv		Drugs.csv
README.md		README.md
download_FAERS.sh		download_FAERS.sh
dtype_FAERS_pl.json		dtype_FAERS_pl.json
environment.yml		environment.yml
prr_polars.py		prr_polars.py
script_python.sh		script_python.sh
sql_prr.py		sql_prr.py
upload_files.sh		upload_files.sh

CSB-IG/FAERS

Folders and files

Latest commit

History

Repository files navigation

FAERS

Versions

Processing

Limitations

Scripts and Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages