Skip to content
/ FAERS Public
forked from TAKMU/FAERS

Download files from FAERS, and prepare data to import to a Postgresql DB, and obtain PRR.

Notifications You must be signed in to change notification settings

CSB-IG/FAERS

 
 

Repository files navigation

FAERS

The purpose of this repository is the following:

  • Download files from FAERS, and prepare data to import to a Postgresql DB.
  • Obtain PRR of a specific group of drugs like antidepressants, either local values (comparing only with the specified group, example prr compared to only antidepressants), or global values (comparing with all the drugs of FAERS.
Author : Allan Ken Miyazono Ushijima
email : [email protected] ## Requirements
  • 20 GB of storage
  • 32 GB of memory
  • Linux or OS capable of running bash (WSL, MAC). Need the following programs in Linux:
  • unzip
    To download unzip you need to run the following command:
    sudo apt install unzip
  • wget
  • curl
  • Necessary libraries:
    • polars (v2)
    • pandas
    • numpy

Versions

  1. v1: Used data obtained in February 2024, and we only considered this scripts to clean data and upload it to postgresql
  2. v2: To allow people to access the data without Postgresql, we used polars to obtain PRR (prr_polars). As we use polars the processing time is shorter that with the script sql_prr.py

Processing

We took the files from 2014 Q3 to 2024 Q4 (v2). The reason is to make use of the field "prod_ai" (product active ingredient), and reduce the need to normalize drug names. If there is a need to change the dates, you can modify the variables in download_script_FAERS.sh script (start_year, start_quarter, end_year, end_quarter).

To steps to run the scripts are:

  1. Create conda environment.
    conda env create --file environment.yml
  2. Change rights to be able to execute the script:
    chmod +x ./download_FAERS.sh ./script_python.sh ./upload_files.sh
  3. Run bash script:
    ./download_FAERS.sh
  4. Run bash script:
    ./script_python.sh
  5. If you want to host a database with postgresql(please check your server policies like user rights before doing this step):
    sudo -u postgres createdb faers postgres
    sudo -u postgres psql faers < CREATE_TABLE.sql
    Change the variables in script upload_files.sh with your database name and user
    ./upload_files.sh
  6. To obtain the prr of the group of drugs, update Drugs.csv
    nano Drugs.csv
  7. Run polars script to obtain prr
    python prr_polars.py

Limitations

There are some limitations to our approach, as we only modified the data to be able to obtain the PRR. The limitations that we found are the following:

  • The dates cannot be parsed automatically as they are considering different formats: YYYYMMDD, YYYYMM, and YYYY. We didn't process this field to prevent increasing the memory and storage (by dividing them into year, month and day; and parsing the full dates in another column).
  • In DEMO, the units of ages are varied. There are some rows that have entries of a person above the age of 30 years old with units of days or months.
  • In DEMO, the units of weight/mass are varied, and cannot be processed directly. Some of the rows have the unit of mass (example: 70 KG), so it is necessary to first obtain the float and then convert it to required units of mass.

Scripts and Files

  • CREATE_TABLE.sql : This SQL script creates the table for PostgreSQL. This is not necessary if you are not planning to upload the dataset to PostgreSQL.
  • DB_drop_du.py: This script deletes duplicates that are found in DEMO, considering primaryid.
  • DB_merge.py: This script merges all the quarters' files into their respective file (DEMO, REAC, DRUG, INDI, OUTC, RPSR, THER).
  • download_FAERS.sh: This bash script downloads FAERS and extracts the necessary file (ASCII).
  • Drugs.csv: This csv file is used for the input of prr_polars.py and sql_prr.py. The csv contains a list of drugs that will be used to obtain their adverse effects along with their respective local and global PRRs.
  • dtype_FAERS_pl.json: This json file is to specify the dtypes of the csv files, to prevent any issues with typing when importing to dataframes.
  • environment.yml: This yml file is to help download the proper python libraries
  • prr_polars.py: This script is used to obtain the PRR and adverse effects of a specific group of drugs listed in Drugs.csv.
  • script_python.sh: This bash script runs the python scripts that are necessary to merge and clean the csv files.
  • sql_prr.py (v1): This script (version 1) is used to obtain the PRR and adverse effects of a specific group of drugs listed in Drugs.csv by connecting to a PostgreSQL server.
  • upload_files.sh (v1): This script (version 1) is used to upload the csv files to a PostgreSQL server.

About

Download files from FAERS, and prepare data to import to a Postgresql DB, and obtain PRR.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.3%
  • Shell 15.7%