Python utilities for working with data source types used by the dsgrid project (i.e., AWS, PostgreSQL, and .parquet)
Install | Configure | Uninstall
pip install webspinneror
pip install git+ssh://[email protected]/dsgrid/[email protected]or
pip install git+https://github.com/dsgrid/webspinner.git@masterTo get dependencies required for certain types of resources, add extras as in:
pip install webspinner[pgres]
pip install webspinner[aws]
pip install webspinner[parquet]
pip install webspinner[pgres,aws,parquet]
webspinner provides code in support of accessing various dsgrid data sources. That said, you may only need to access one or two data sources for your particular project. Please configure access to the resources you need.
PostgreSQL | AWS | .parquet
pip install pgpasslib psycopg2
Install pgAdmin or another PostgreSQL client.
Then identify or create your pgpass file.
On Mac and Linux, the file to edit or create is ~/.pgpass. On Windows it is
%APPDATA%/Roaming/postgresql/pgpass.conf, where %APPDATA% is the AppData
subdirectory under your user profile (i.e., C:/Users/$USER/AppData). If the
file does not yet exist, simply create a new text file named pgpass.conf.
Once the file exists, add the lines like:
POSTGRES_SERVER_ADDRESS:*:*:$USER:YOUR_PASSWORD
replacing POSTGRES_SERVER_ADDRESS with the PostgreSQL server to connect to,
$USER with your actual username, and YOUR_GISPGDB_PASSWORD with your actual
password. The dsgrid team typically connects to 10.20.5.28 or its alias
gds_edit.nrel.gov.
On Mac and Linux be sure to set the permissions to
chmod 600 ~/.pgpass
If the permissions are too permissive, your run may not start.
Project-specific defaults can be stored in a text file, e.g., webspinner.config
or config.ini with a [PGRES] section and any or all of the following arguments:
[PGRES]
user = your_user_name
dbase = database_name
host = host_address
port = port_number
The defaults can then be loaded in at any time (e.g., at the top of a script or
a notebook) by passing the configuration filepath into the webspinner.configure
function.
pip install pyathena awscli
Set up your AWS access credentials by issuing the following command in the terminal. (FYI there are other ways to set up your credentials if you're interested.)
>> aws configure
AWS Access Key ID [None]: <your key>
AWS Secret Access Key [None]: <your secret key>
Default region name [None]: us-west-2
Default output format [None]: json
Project-specific defaults can be stored in a text file, e.g., webspinner.config
or config.ini with an [AWS] section and any or all of the following arguments:
[AWS]
s3_staging_dir = data_staging_dir_on_s3
region_name = aws_region_name
schema_name = aws_database_schema_name
work_group = aws_work_group_name
The defaults can then be loaded in at any time (e.g., at the top of a script or
a notebook) by passing the configuration filepath into the webspinner.configure
function.
If you plan to work with .parquet files, also install pyarrow.
pip install pyarrow
pip uninstall webspinner