This Python script is designed to help Power BI developers and analysts optimize their Power BI projects by identifying unused columns and measures (DAX expressions). In large and complex Power BI models, it's common for elements to become obsolete or redundant over time. This tool automates the process of finding these unused components, making your models cleaner, more efficient, and easier to maintain.
- Improved Performance: Removing unused columns and measures can significantly reduce the size of your Power BI model, leading to faster refresh times and improved report performance.
- Enhanced Maintainability: A cleaner model with fewer unnecessary elements is easier to understand, troubleshoot, and update.
- Reduced Complexity: Simplifies the data model, making it more manageable for both current and future development.
- Resource Optimization: Frees up memory and processing power by eliminating unneeded data and calculations.
- Historical Tracking: Each run generates a timestamped report, allowing you to track optimization efforts over time.
Follow these steps to set up and run the Power BI Project Optimization Reviewer:
-
Prerequisites:
- Ensure you have Python 3.6 or higher installed on your machine. You can download it from python.org.
-
Download the Script:
- Save the
analyze_pbip.py
script to a convenient location on your computer. For example, you can save it inC:\Users\SUYOG\OneDrive\Desktop\CODES
.
- Save the
-
Configure the Script:
-
Open the
analyze_pbip.py
file using any text editor (like VS Code, Notepad++, etc.). -
At the very top of the script, locate the "User Configuration" section:
# --- User Configuration --- # Set the base folder path for your Power BI project here. # Example: base_folder = r"C:\Users\YourUser\Documents\MyPowerBIProject" base_folder = r"C:\Users\SUYOG\Downloads\New folder" # <--- CHANGE THIS LINE # Set the relative paths to the Semantic Model and Report folders within the base folder. # You typically won't need to change these unless your project structure is unusual. semantic_model_subfolder = "SHPMS.SemanticModel" report_subfolder = "SHPMS.Report" # --------------------------
-
base_folder
: Crucially, change the value ofbase_folder
to the absolute path of your Power BI project's main folder. This is the folder that contains your.pbip
file, and theYourProjectName.Report
andYourProjectName.SemanticModel
subfolders.- Important for Windows Paths: Use a raw string (prefix with
r
, e.g.,r"C:\path\to\your\project"
) or double backslashes ("C:\\path\\to\\your\\project"
) to avoid issues with backslashes being interpreted as escape characters.
- Important for Windows Paths: Use a raw string (prefix with
-
semantic_model_subfolder
andreport_subfolder
: These variables define the names of the semantic model and report subfolders relative to yourbase_folder
. For most standard Power BI projects, these will beYourProjectName.SemanticModel
andYourProjectName.Report
. Only change these if your project uses different naming conventions.
-
-
Run the Script:
- Open your command prompt or terminal.
- Navigate to the directory where you saved the
analyze_pbip.py
script. For example:cd C:\Users\SUYOG\OneDrive\Desktop\CODES
- Execute the script using the Python interpreter:
python analyze_pbip.py
-
View Results:
- After the script completes its execution, it will print a message to the terminal indicating that the analysis is complete and where the results are saved.
- A new text file will be generated inside a subfolder named
MODE_QC
within your specifiedbase_folder
(the main Power BI project folder). - The filename will be dynamically generated with a timestamp, following the format
optimization_review_DD_MM_YY_HH_MM.txt
(e.g.,optimization_review_16_08_25_17_30.txt
). - Open this text file to review the list of unused columns and measures in your Power BI model.
The analyze_pbip.py
script is structured into several functions, each responsible for a specific part of the analysis:
import os
import re
import json
from datetime import datetime
# --- User Configuration ---
# Set the base folder path for your Power BI project here.
# Example: base_folder = r"C:\Users\YourUser\Documents\MyPowerBIProject"
base_folder = r"C:\Users\SUYOG\Downloads\New folder"
# Set the relative paths to the Semantic Model and Report folders within the base folder.
# You typically won't need to change these unless your project structure is unusual.
semantic_model_subfolder = "SHPMS.SemanticModel"
report_subfolder = "SHPMS.Report"
# --------------------------
This section imports necessary Python modules (os
for path operations, re
for regular expressions, json
for JSON parsing, datetime
for timestamps) and defines the user-configurable paths for the Power BI project. This is the only part of the code a user needs to modify.
def get_all_tmdl_files(base_path):
"""Recursively find all .tmdl files in the given base path."""
tmdl_files = []
for root, _, files in os.walk(base_path):
for file in files:
if file.endswith(".tmdl"):
tmdl_files.append(os.path.join(root, file))
return tmdl_files
This helper function takes a directory path and recursively searches through it to find all files ending with .tmdl
. These files contain the definitions of tables, columns, and measures in a Power BI semantic model.
def parse_tmdl_definitions(file_path):
"""Parse a .tmdl file to extract table, column, and measure names."""
table_name = None
columns = []
measures = []
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# Extract table name
table_match = re.search(r'table\s+(\w+)', content)
if table_match:
table_name = table_match.group(1)
# Extract column names
column_matches = re.findall(r'column\s+(\w+)', content)
for col in column_matches:
columns.append(col)
# Extract measure names
measure_matches = re.findall(r'measure\s+(\w+)', content)
for measure in measure_matches:
measures.append(measure)
return table_name, columns, measures
This function reads a single .tmdl
file. It uses regular expressions to identify and extract the table name (if present), all column names defined within that table, and all measure names defined within that table or model.
def parse_relationships_tmdl(file_path):
"""Parse relationships.tmdl to find columns used in relationships."""
used_references = set()
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# Find fromColumn and toColumn in relationships
relationship_matches = re.findall(r'(?:fromColumn|toColumn):\s+(\w+\.\w+)', content)
for col_ref in relationship_matches:
used_references.add(col_ref)
return used_references
This function specifically parses the relationships.tmdl
file. It looks for fromColumn
and toColumn
entries, which indicate that a column is part of a relationship between two tables, thus marking it as "used."
def find_references_in_json(json_data, all_definitions_flat):
"""Recursively search for column and measure references in JSON data."""
found_references = set()
json_string = json.dumps(json_data)
for definition in all_definitions_flat:
# Simple string search for now. This can be improved with more sophisticated parsing
if definition in json_string:
found_references.add(definition)
return found_references
This function is crucial for identifying usage within Power BI reports. It takes the content of a JSON file (which defines report visuals, pages, etc.) and a flat list of all known column and measure names. It then performs a string search to see if any of these column or measure names appear within the JSON content, marking them as "used."
def main():
semantic_model_path = os.path.join(base_folder, semantic_model_subfolder)
report_path = os.path.join(base_folder, report_subfolder)
if not os.path.exists(semantic_model_path):
print(f"Error: Semantic model path not found: {semantic_model_path}")
return
if not os.path.exists(report_path):
print(f"Error: Report path not found: {report_path}")
return
all_tables_columns = {}
all_measures = {}
all_definitions_flat = set()
# 1. Get all columns and measures from .tmdl files
all_tmdl_files = get_all_tmdl_files(os.path.join(semantic_model_path, "definition"))
for tmdl_file in all_tmdl_files:
table_name, columns, measures = parse_tmdl_definitions(tmdl_file)
if table_name:
if columns:
all_tables_columns[table_name] = columns
for col in columns:
all_definitions_flat.add(f"{table_name}.{col}")
if measures:
all_measures[table_name] = measures
for measure in measures:
all_definitions_flat.add(f"{table_name}.{measure}")
elif measures:
model_name = "Model"
if model_name not in all_measures:
all_measures[model_name] = []
all_measures[model_name].extend(measures)
for measure in measures:
all_definitions_flat.add(f"{model_name}.{measure}")
# 2. Get references used in relationships
relationships_file = os.path.join(semantic_model_path, "definition", "relationships.tmdl")
used_in_relationships = parse_relationships_tmdl(relationships_file)
# 3. Get references used in reports (visuals, etc.) and other TMDL files (for DAX expressions)
all_used_references = set()
all_used_references.update(used_in_relationships)
# Search in all TMDL files for DAX expressions referencing other measures/columns
for tmdl_file in all_tmdl_files:
with open(tmdl_file, 'r', encoding='utf-8') as f:
content = f.read()
for definition in all_definitions_flat:
if re.search(r'\b' + re.escape(definition) + r'\b', content):
all_used_references.add(definition)
# Search in all JSON files in the report for references
for root, _, files in os.walk(report_path):
for file in files:
if file.endswith(".json"):
json_file_path = os.path.join(root, file)
try:
with open(json_file_path, 'r', encoding='utf-8') as f:
json_data = json.load(f)
found_refs = find_references_in_json(json_data, all_definitions_flat)
all_used_references.update(found_refs)
except json.JSONDecodeError:
pass
except Exception as e:
pass
# Determine unused columns
unused_columns_by_table = {}
for table, columns in all_tables_columns.items():
unused_columns_by_table[table] = []
for col in columns:
full_column_name = f"{table}.{col}"
if full_column_name not in all_used_references:
unused_columns_by_table[table].append(col)
# Determine unused measures
unused_measures_by_table = {}
for table, measures in all_measures.items():
unused_measures_by_table[table] = []
for measure in measures:
full_measure_name = f"{table}.{measure}"
if full_measure_name not in all_used_references:
unused_measures_by_table[table].append(measure)
# Prepare results for output
output_lines = []
output_lines.append("Unused Definitions in Power BI Model:")
output_lines.append("-------------------------------------")
found_unused = False
output_lines.append("\n--- Unused Columns ---")
column_found = False
for table, unused_cols in unused_columns_by_table.items():
if unused_cols:
column_found = True
found_unused = True
output_lines.append(f"Table: {table}")
for col in unused_cols:
output_lines.append(f" - Column: {col}")
if not column_found:
output_lines.append("No unused columns found based on this analysis.")
output_lines.append("\n--- Unused Measures ---")
measure_found = False
for table, unused_measures in unused_measures_by_table.items():
if unused_measures:
measure_found = True
found_unused = True
output_lines.append(f"Table/Model: {table}")
for measure in unused_measures:
output_lines.append(f" - Measure: {measure}")
if not measure_found:
output_lines.append("No unused measures found based on this analysis.")
output_lines.append("\n-------------------------------------")
if not found_unused:
output_lines.append("Overall: No unused columns or measures found.")
# Generate dynamic filename
current_time = datetime.now()
timestamp_str = current_time.strftime("%d_%m_%y_%H_%M")
output_filename = f"optimization_review_{timestamp_str}.txt"
# Define the output directory within the base_folder
output_dir = os.path.join(base_folder, "MODE_QC")
os.makedirs(output_dir, exist_ok=True)
output_file_path = os.path.join(output_dir, output_filename)
with open(output_file_path, 'w', encoding='utf-8') as f:
for line in output_lines:
f.write(line + "\n")
print(f"Analysis complete. Results saved to: {output_file_path}")
if __name__ == "__main__":
main()
This is the main execution block of the script.
- It sets up the full paths to the semantic model and report folders using the user-defined configurations.
- It then calls the parsing functions to gather all column and measure definitions.
- It identifies used references by checking relationships, DAX expressions within other
.tmdl
files, and references within.json
report files. - Finally, it compares the defined elements with the used elements to determine what is unused.
- The results are formatted and written to a new, timestamped text file within the
MODE_QC
subfolder of your Power BI project's base directory. This ensures that each analysis run creates a unique report.