Skip to content

Saad-learning/Spotify-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotify Advanced SQL Project and Query Optimization P-6

Project Category: Advanced Click Here to get Dataset

Overview

This project involves analyzing a Spotify dataset with various attributes about tracks, albums, and artists using SQL. It covers an end-to-end process of normalizing a denormalized dataset, performing SQL queries of varying complexity (easy, medium, and advanced), and optimizing query performance. The primary goals of the project are to practice advanced SQL skills and generate valuable insights from the dataset.

-- create table
DROP TABLE IF EXISTS spotify;
CREATE TABLE spotify (
    artist VARCHAR(255),
    track VARCHAR(255),
    album VARCHAR(255),
    album_type VARCHAR(50),
    danceability FLOAT,
    energy FLOAT,
    loudness FLOAT,
    speechiness FLOAT,
    acousticness FLOAT,
    instrumentalness FLOAT,
    liveness FLOAT,
    valence FLOAT,
    tempo FLOAT,
    duration_min FLOAT,
    title VARCHAR(255),
    channel VARCHAR(255),
    views FLOAT,
    likes BIGINT,
    comments BIGINT,
    licensed BOOLEAN,
    official_video BOOLEAN,
    stream BIGINT,
    energy_liveness FLOAT,
    most_played_on VARCHAR(50)
);

Project Steps

1. Data Exploration

Before diving into SQL, it’s important to understand the dataset thoroughly. The dataset contains attributes such as:

  • Artist: The performer of the track.
  • Track: The name of the song.
  • Album: The album to which the track belongs.
  • Album_type: The type of album (e.g., single or album).
  • Various metrics such as danceability, energy, loudness, tempo, and more.

4. Querying the Data

After the data is inserted, various SQL queries can be written to explore and analyze the data. Queries are categorized into easy, medium, and advanced levels to help progressively develop SQL proficiency.

Easy Queries

  • Simple data retrieval, filtering, and basic aggregations.

Medium Queries

  • More complex queries involving grouping, aggregation functions, and joins.

Advanced Queries

  • Nested subqueries, window functions, CTEs, and performance optimization.

5. Query Optimization

In advanced stages, the focus shifts to improving query performance. Some optimization strategies include:

  • Indexing: Adding indexes on frequently queried columns.
  • Query Execution Plan: Using EXPLAIN ANALYZE to review and refine query performance.

15 Practice Questions

Easy Level

  1. Retrieve the names of all tracks that have more than 1 billion streams.
  2. List all albums along with their respective artists.
  3. Get the total number of comments for tracks where licensed = TRUE.
  4. Find all tracks that belong to the album type single.
  5. Count the total number of tracks by each artist.

Medium Level

  1. Calculate the average danceability of tracks in each album.
  2. Find the top 5 tracks with the highest energy values.
  3. List all tracks along with their views and likes where official_video = TRUE.
  4. For each album, calculate the total views of all associated tracks.
  5. Retrieve the track names that have been streamed on Spotify more than YouTube.

Advanced Level

  1. Find the top 3 most-viewed tracks for each artist using window functions.
  2. Write a query to find tracks where the liveness score is above the average.
  3. Use a WITH clause to calculate the difference between the highest and lowest energy values for tracks in each album.

Technology Stack

  • Database: PostgreSQL
  • SQL Queries: DDL, DML, Aggregations, Joins, Subqueries, Window Functions
  • Tools: pgAdmin 4 (or any SQL editor), PostgreSQL (via Homebrew, Docker, or direct installation)

How to Run the Project

  1. Install PostgreSQL and pgAdmin (if not already installed).
  2. Set up the database schema and tables using the provided normalization structure.
  3. Insert the sample data into the respective tables.
  4. Execute SQL queries to solve the listed problems.
  5. Explore query optimization techniques for large datasets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published