A list of tools, concepts, and learning resources for analytics engineers. It covers essential technologies, frameworks, and best practices. While most of the links focus analytics enginnering there's some that are data engineering. Since there's so much overlap between the two fields.
Feel free to contribute to this list by adding links you've found helpful. Submit a Pull Request (PR) with your suggestions.
- What is Analytics Engineering?
- Need real-world data? Check out Sites to Find Public Datasets
- Looking for courses? See Specific Courses
- Need an overview? Checkout A guide to the data landscape
- dbt – Modular SQL-based transformations
- SQLMesh – Open source Data Transformations
- DataForm – Google BigQuery specific Data Transformation tool
- SQL Basics – Querying and transforming structured data
- Apache Spark – Large-scale distributed data processing
- Apache Airflow – Workflow automation & scheduling
- Dagster – Workflow automation & scheduling
- Prefect – Pythonic Workflow orchestration
- Google Cloud Workflows - Google's version of Workflows
- Snowflake – Cloud data warehousing
- BigQuery – Serverless, scalable data warehouse
- Databricks – Data Lakehouse from creators of Apache Spark
- PostgreSQL – Relational database
- Git – Version control for data projects
- GitHub Actions – Automate testing and deployment
- Looker – Modern BI platform
- Metabase - Open source & scalable
- Power BI - One of the major players, from Microsoft
- Tableau – The other major player, from/owned by Salesforce
- Docker – Containerization for data apps
- Kubernetes – Orchestrate and scale data pipelines
- Terraform – Infrastructure as code
- Use a Star Schema – Organize data into fact and dimension tables to improve query performance.
- Partition & Cluster Large Tables – Partition by date and cluster by frequently filtered columns to speed up queries.
- Documentation – Document the data models to keep schema and relationships clear.
- Specific Courses
- Sites to Find Datasets
- LinkedIn Creators
- Books
- Newsletters
- Hands-On Analytics Engineering Project - LinkedIn Learning Course
- Git Immersion - Free course in Ruby to learn basics of Git
- Data Engineering with dbt - LinkedIn Learning Course
- Microsoft's Fabric Analytics Engineer Assosciate
Note this is very similar to the same section in data-analytics-resources.
- Datahub
- Dataset Search
- Kaggle
- Data Gov
- Maven Analytics Data Playground
- Awesome Public Datasets
- Datacamp Datasets
- NASA Data
- Google BigQuery
- UC Irvine Machine Learning Repository
- Fundamentals of Data Engineering by Joe Reis, Matt Housley
- Data Pipelines Pocket Reference by James Densmore
- The ABCs of Analytics Engineering - Ebook by Madison Schott
- Data Wrangling on AWS by Navnit Shukla, Sankar M, Sam Palani