Hello, and welcome to the dbt CoP-sponsored learning repository. This repo highlights the power of dbt and Snowflake working together to enable streamlined data transformation.
In order to successfully run these dbt models, you will need to complete the following steps:
-
Snowflake Access
First, you will need to be able to log in to Snowflake. There is a ServiceDesk ticket available for this, specifically Request Access to the Snowflake Partner Sandbox. For a description, you can indicate you are looking to access training materials from the dbt CoP.
Once you have access to Snowflake, please log in using this link. From the Home screen, navigate to the Data tab, which should show a list of databases available. The dbt CoP has created the
DBT_COP_LEARNING
database, with a conveniently titledSTART_HERE
example schema, to store all the tables and views defined by the sql files within the models directory.For source data, we rely on Snowflake-provided sample data from the TPC-H performance benchmarking dataset. You can find the ER diagram, as well as more context on the dataset, in the Snowflake documentation. Feel free to explore these resources and use them as a base reference when creating/updating your own models.
-
Install Python
You also need to have Python installed. If you do not, visit the official Python site and download the latest stable version (where Maintenance status is not 'pre-release'). Follow the guided instructions to set up Python on your machine. You can verify your installation by running the below command in a terminal.
python --version
(Note - if you have an error
command not found: python
, check your PATH environment variable and confirm the path to your Python executable is present. If it is, try runningpython{your_version} --version
, e.g.python3.12 --version
). -
Clone the Repository
You are now ready to pull down the repository from Github. If you do not have a CapTech Github account, please fill out the Request Github Access or Repository Creation support ticket. Open up a terminal window in your desired repository directory and run:
git clone https://github.com/captechconsulting/dbt-cop-snowflake
This repository operates under the assumption you are using VS Code as your IDE. If you don't have an IDE installed, or have no idea what that means, go ahead and download VS Code here. Once you install the application, open the application and 'Open' the cloned repository. For Mac users, this would be cmd-O, while for Windows it is ctrl-K ctrl-O. If you aren't using VS Code, you probably know what you're doing as far as opening a repository directory.
dbt announced 3/19/2025 that it will be creating a VS Code extension. Once this is finalized, add installation/configuration steps for the new extension.
-
Create and Activate a Virtual Environment
A virtual environment is an isolated environment for Python projects. It allows you to manage dependencies for your project separately from the global Python environment. This means you can have different versions of packages for different projects without conflicts. To create and activate a virtual environment, run the following commands in the VS Code terminal window (ctrl-` on both operating systems, if you don't see a terminal):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies
Now that we have a virtual environment in which we can install dependencies, let's go ahead and do so. You can find the full list in our requirements file if you are interested; otherwise, you can just run the below command:
pip install -r requirements.txt
Included in these dependencies is dbt-core and dbt-snowflake, which together will enable model development in our Snowflake environment.
-
Configure dbt
Now that we have our dependencies installed, our dev environment is almost complete. To find your Snowflake username, which should be the same as your CapTech email address without the domain (e.g. [email protected] -> TESTUSER in Snowflake), you can visit your Profile page by following these two steps:
In the resulting popup, you should see your Username. While you are in the Snowflake console, go ahead and create a new schema for yourself within the DBT_COP_LEARNING database - use your Username as the Schema name. If you select the DBT_COP_LEARNING database in the Data tab, you can create a schema using the Snowflake UI:
You can also create a schema via a Snowflake worksheet if you are comfortable with SQL. Ensure you are using the correct role -
APG_SNOWFLAKE_USERS
- and runCREATE SCHEMA DBT_COP_LEARNING.{your_username}
in a Snowflake worksheet or notebook. This will ensure theAPG_SNOWFLAKE_USERS
role owns the Schema, which should help prevent any permissions issues from popping up.Now that we have our schema, we are ready to tell our local dbt-core installation how to connect to our specific database. This is accomplished via a profile, which can be set up two ways:
-
You can run a dbt command in your repository terminal, e.g.
dbt debug
. If you don't have a profile file, you will be prompted in the command line to populate all of your information. dbt will then create and save that information to ~/.dbt/profiles.yml Please reference below for the proper values to use - you won't need to add a password, as we are relying on SSO authentication. -
Alternatively, you can just create a
profiles.yml
file in the~/.dbt/
directory directly (create the directory if it doesn't exist). Add the following configuration to theprofiles.yml
file:dbt_cop_snowflake: outputs: dev: account: OCMUXFQ-CAPTECH_PARTNER # our Snowflake account database: DBT_COP_LEARNING user: {your user name} # you can find this in the Snowflake console authenticator: externalbrowser # this lets us use SSO for authentication with Snowflake warehouse: XS_WH # default warehouse, should be plenty of compute and keeps costs low role: APG_SNOWFLAKE_USERS # role accessible to us as non-admins schema: {your user name} # using your username so that you have a unique schema to save tables/views to threads: 2 # enables multithreading type: snowflake target: dev
Regardless of the method you chose, you should run the below commands in your IDE terminal. The first validates your connection to Snowflake, while the second ensures you have all dbt dependencies installed.
- Test your connection:
dbt debug
- Install dbt dependencies - note, these are separate from our Python dependencies:
dbt deps
We are making use of the dbt_utils package, the most popular external package for dbt. It contains a number of helpful macros and tests, you can find the full list in the project README.
One other note - our Snowflake account does not have MFA token caching enabled, which means every time we run a dbt command, you will see a browser window open up for authentication. You don't need to do anything, but don't be alarmed when you see a window like below:
-
-
Run dbt Operations
- Build all models
dbt build
- Test the models
dbt test
- Generate documentation
dbt docs generate
- Serve the documentation
dbt docs serve
- Build all models