Skip to content

codeby-suyog/PBIP_model_check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Power BI Project Optimization Reviewer

Introduction

This Python script is designed to help Power BI developers and analysts optimize their Power BI projects by identifying unused columns and measures (DAX expressions). In large and complex Power BI models, it's common for elements to become obsolete or redundant over time. This tool automates the process of finding these unused components, making your models cleaner, more efficient, and easier to maintain.

Benefits

  • Improved Performance: Removing unused columns and measures can significantly reduce the size of your Power BI model, leading to faster refresh times and improved report performance.
  • Enhanced Maintainability: A cleaner model with fewer unnecessary elements is easier to understand, troubleshoot, and update.
  • Reduced Complexity: Simplifies the data model, making it more manageable for both current and future development.
  • Resource Optimization: Frees up memory and processing power by eliminating unneeded data and calculations.
  • Historical Tracking: Each run generates a timestamped report, allowing you to track optimization efforts over time.

Procedure to Use

Follow these steps to set up and run the Power BI Project Optimization Reviewer:

  1. Prerequisites:

    • Ensure you have Python 3.6 or higher installed on your machine. You can download it from python.org.
  2. Download the Script:

    • Save the analyze_pbip.py script to a convenient location on your computer. For example, you can save it in C:\Users\SUYOG\OneDrive\Desktop\CODES.
  3. Configure the Script:

    • Open the analyze_pbip.py file using any text editor (like VS Code, Notepad++, etc.).

    • At the very top of the script, locate the "User Configuration" section:

      # --- User Configuration ---
      # Set the base folder path for your Power BI project here.
      # Example: base_folder = r"C:\Users\YourUser\Documents\MyPowerBIProject"
      base_folder = r"C:\Users\SUYOG\Downloads\New folder" # <--- CHANGE THIS LINE
      
      # Set the relative paths to the Semantic Model and Report folders within the base folder.
      # You typically won't need to change these unless your project structure is unusual.
      semantic_model_subfolder = "SHPMS.SemanticModel"
      report_subfolder = "SHPMS.Report"
      # --------------------------
    • base_folder: Crucially, change the value of base_folder to the absolute path of your Power BI project's main folder. This is the folder that contains your .pbip file, and the YourProjectName.Report and YourProjectName.SemanticModel subfolders.

      • Important for Windows Paths: Use a raw string (prefix with r, e.g., r"C:\path\to\your\project") or double backslashes ("C:\\path\\to\\your\\project") to avoid issues with backslashes being interpreted as escape characters.
    • semantic_model_subfolder and report_subfolder: These variables define the names of the semantic model and report subfolders relative to your base_folder. For most standard Power BI projects, these will be YourProjectName.SemanticModel and YourProjectName.Report. Only change these if your project uses different naming conventions.

  4. Run the Script:

    • Open your command prompt or terminal.
    • Navigate to the directory where you saved the analyze_pbip.py script. For example:
      cd C:\Users\SUYOG\OneDrive\Desktop\CODES
    • Execute the script using the Python interpreter:
      python analyze_pbip.py
  5. View Results:

    • After the script completes its execution, it will print a message to the terminal indicating that the analysis is complete and where the results are saved.
    • A new text file will be generated inside a subfolder named MODE_QC within your specified base_folder (the main Power BI project folder).
    • The filename will be dynamically generated with a timestamp, following the format optimization_review_DD_MM_YY_HH_MM.txt (e.g., optimization_review_16_08_25_17_30.txt).
    • Open this text file to review the list of unused columns and measures in your Power BI model.

Code with Sections and Explanations

The analyze_pbip.py script is structured into several functions, each responsible for a specific part of the analysis:

import os
import re
import json
from datetime import datetime

# --- User Configuration ---
# Set the base folder path for your Power BI project here.
# Example: base_folder = r"C:\Users\YourUser\Documents\MyPowerBIProject"
base_folder = r"C:\Users\SUYOG\Downloads\New folder"

# Set the relative paths to the Semantic Model and Report folders within the base folder.
# You typically won't need to change these unless your project structure is unusual.
semantic_model_subfolder = "SHPMS.SemanticModel"
report_subfolder = "SHPMS.Report"
# --------------------------

This section imports necessary Python modules (os for path operations, re for regular expressions, json for JSON parsing, datetime for timestamps) and defines the user-configurable paths for the Power BI project. This is the only part of the code a user needs to modify.


def get_all_tmdl_files(base_path):
    """Recursively find all .tmdl files in the given base path."""
    tmdl_files = []
    for root, _, files in os.walk(base_path):
        for file in files:
            if file.endswith(".tmdl"):
                tmdl_files.append(os.path.join(root, file))
    return tmdl_files

This helper function takes a directory path and recursively searches through it to find all files ending with .tmdl. These files contain the definitions of tables, columns, and measures in a Power BI semantic model.


def parse_tmdl_definitions(file_path):
    """Parse a .tmdl file to extract table, column, and measure names."""
    table_name = None
    columns = []
    measures = []
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
        # Extract table name
        table_match = re.search(r'table\s+(\w+)', content)
        if table_match:
            table_name = table_match.group(1)

        # Extract column names
        column_matches = re.findall(r'column\s+(\w+)', content)
        for col in column_matches:
            columns.append(col)
        
        # Extract measure names
        measure_matches = re.findall(r'measure\s+(\w+)', content)
        for measure in measure_matches:
            measures.append(measure)

    return table_name, columns, measures

This function reads a single .tmdl file. It uses regular expressions to identify and extract the table name (if present), all column names defined within that table, and all measure names defined within that table or model.


def parse_relationships_tmdl(file_path):
    """Parse relationships.tmdl to find columns used in relationships."""
    used_references = set()
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
        # Find fromColumn and toColumn in relationships
        relationship_matches = re.findall(r'(?:fromColumn|toColumn):\s+(\w+\.\w+)', content)
        for col_ref in relationship_matches:
            used_references.add(col_ref)
    return used_references

This function specifically parses the relationships.tmdl file. It looks for fromColumn and toColumn entries, which indicate that a column is part of a relationship between two tables, thus marking it as "used."


def find_references_in_json(json_data, all_definitions_flat):
    """Recursively search for column and measure references in JSON data."""
    found_references = set()
    json_string = json.dumps(json_data)

    for definition in all_definitions_flat:
        # Simple string search for now. This can be improved with more sophisticated parsing
        if definition in json_string:
            found_references.add(definition)
    return found_references

This function is crucial for identifying usage within Power BI reports. It takes the content of a JSON file (which defines report visuals, pages, etc.) and a flat list of all known column and measure names. It then performs a string search to see if any of these column or measure names appear within the JSON content, marking them as "used."


def main():
    semantic_model_path = os.path.join(base_folder, semantic_model_subfolder)
    report_path = os.path.join(base_folder, report_subfolder)

    if not os.path.exists(semantic_model_path):
        print(f"Error: Semantic model path not found: {semantic_model_path}")
        return
    if not os.path.exists(report_path):
        print(f"Error: Report path not found: {report_path}")
        return

    all_tables_columns = {}
    all_measures = {}
    all_definitions_flat = set()

    # 1. Get all columns and measures from .tmdl files
    all_tmdl_files = get_all_tmdl_files(os.path.join(semantic_model_path, "definition"))
    for tmdl_file in all_tmdl_files:
        table_name, columns, measures = parse_tmdl_definitions(tmdl_file)
        if table_name:
            if columns:
                all_tables_columns[table_name] = columns
                for col in columns:
                    all_definitions_flat.add(f"{table_name}.{col}")
            if measures:
                all_measures[table_name] = measures
                for measure in measures:
                    all_definitions_flat.add(f"{table_name}.{measure}")
        elif measures:
            model_name = "Model"
            if model_name not in all_measures:
                all_measures[model_name] = []
            all_measures[model_name].extend(measures)
            for measure in measures:
                all_definitions_flat.add(f"{model_name}.{measure}")

    # 2. Get references used in relationships
    relationships_file = os.path.join(semantic_model_path, "definition", "relationships.tmdl")
    used_in_relationships = parse_relationships_tmdl(relationships_file)

    # 3. Get references used in reports (visuals, etc.) and other TMDL files (for DAX expressions)
    all_used_references = set()
    all_used_references.update(used_in_relationships)

    # Search in all TMDL files for DAX expressions referencing other measures/columns
    for tmdl_file in all_tmdl_files:
        with open(tmdl_file, 'r', encoding='utf-8') as f:
            content = f.read()
            for definition in all_definitions_flat:
                if re.search(r'\b' + re.escape(definition) + r'\b', content):
                    all_used_references.add(definition)

    # Search in all JSON files in the report for references
    for root, _, files in os.walk(report_path):
        for file in files:
            if file.endswith(".json"):
                json_file_path = os.path.join(root, file)
                try:
                    with open(json_file_path, 'r', encoding='utf-8') as f:
                        json_data = json.load(f)
                        found_refs = find_references_in_json(json_data, all_definitions_flat)
                        all_used_references.update(found_refs)
                except json.JSONDecodeError:
                    pass
                except Exception as e:
                    pass

    # Determine unused columns
    unused_columns_by_table = {}
    for table, columns in all_tables_columns.items():
        unused_columns_by_table[table] = []
        for col in columns:
            full_column_name = f"{table}.{col}"
            if full_column_name not in all_used_references:
                unused_columns_by_table[table].append(col)

    # Determine unused measures
    unused_measures_by_table = {}
    for table, measures in all_measures.items():
        unused_measures_by_table[table] = []
        for measure in measures:
            full_measure_name = f"{table}.{measure}"
            if full_measure_name not in all_used_references:
                unused_measures_by_table[table].append(measure)

    # Prepare results for output
    output_lines = []
    output_lines.append("Unused Definitions in Power BI Model:")
    output_lines.append("-------------------------------------")
    found_unused = False

    output_lines.append("\n--- Unused Columns ---")
    column_found = False
    for table, unused_cols in unused_columns_by_table.items():
        if unused_cols:
            column_found = True
            found_unused = True
            output_lines.append(f"Table: {table}")
            for col in unused_cols:
                output_lines.append(f"  - Column: {col}")
    if not column_found:
        output_lines.append("No unused columns found based on this analysis.")

    output_lines.append("\n--- Unused Measures ---")
    measure_found = False
    for table, unused_measures in unused_measures_by_table.items():
        if unused_measures:
            measure_found = True
            found_unused = True
            output_lines.append(f"Table/Model: {table}")
            for measure in unused_measures:
                output_lines.append(f"  - Measure: {measure}")
    if not measure_found:
        output_lines.append("No unused measures found based on this analysis.")

    output_lines.append("\n-------------------------------------")
    if not found_unused:
        output_lines.append("Overall: No unused columns or measures found.")

    # Generate dynamic filename
    current_time = datetime.now()
    timestamp_str = current_time.strftime("%d_%m_%y_%H_%M")
    output_filename = f"optimization_review_{timestamp_str}.txt"
    
    # Define the output directory within the base_folder
    output_dir = os.path.join(base_folder, "MODE_QC")
    os.makedirs(output_dir, exist_ok=True)
    output_file_path = os.path.join(output_dir, output_filename)

    with open(output_file_path, 'w', encoding='utf-8') as f:
        for line in output_lines:
            f.write(line + "\n")
    
    print(f"Analysis complete. Results saved to: {output_file_path}")


if __name__ == "__main__":
    main()

This is the main execution block of the script.

  • It sets up the full paths to the semantic model and report folders using the user-defined configurations.
  • It then calls the parsing functions to gather all column and measure definitions.
  • It identifies used references by checking relationships, DAX expressions within other .tmdl files, and references within .json report files.
  • Finally, it compares the defined elements with the used elements to determine what is unused.
  • The results are formatted and written to a new, timestamped text file within the MODE_QC subfolder of your Power BI project's base directory. This ensures that each analysis run creates a unique report.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages