This module contains a comprehensive collection of utility functions designed to simplify common tasks related to data handling, file operations, statistical comparisons, Excel manipulations, and geospatial processing. Below is an overview of the functions available in the module.
To use this module in your Python project, import the functions you need. Example:
from general_utilities import (
addPrefixesToColumnNames,
convertToDatetime,
compareValueRangesMathematically,
saveToExcel
)
# Example usage:
dataframe = pandas.read_csv("example.csv")
dataframe = addPrefixesToColumnNames(dataframe, column_names=["name", "age"], prefixes="demo_")
date = convertToDatetime("2024-04-20")
scores = compareValueRangesMathematically([1, 2, 3], [1.1, 2.1, 3.2])
saveToExcel(dataframe, "output.xlsx", "Processed Data")
You can also import the entire module if needed:
import general_utilities
Or use an alias for convenience:
import general_utilities as GU
dataframe = GU.addPrefixesToColumnNames(dataframe, column_names=["id", "score"], prefixes="meta_")
addPrefixesToColumnNames(dataframe, column_names=None, prefixes="df")
: Adds specified prefixes to given column names.addSuffixesToColumnNames(dataframe, column_names=None, suffixes="df")
: Adds specified suffixes to given column names.reduceColumns(dataframe, columns_to_keep=None, columns_to_drop=None)
: Reduces the DataFrame to desired columns.reorderColumnsOfDataFrame(dataframe, starter_columns, column_before_starters=None)
: Reorders columns placing certain ones at the start or after a specified column.replaceColumnNameOfDataFrame(dataframe, old_substrings, new_substrings)
: Renames columns by replacing substrings.categorizeColumnsByType(dataframe, keys=[], ignoring_columns=[])
: Categorizes columns into string or numeric types excluding keys and ignored columns.
highlightDataFrameMissingValues(dataframe)
: Highlights missing values in yellow.convertAllStringNumericsToNumeric(dataframe, ignoring_columns=[])
: Converts all string-based numerics to actual numeric values.convertDataFrameStringNumericToNumeric(dataframe, numeric_columns=None, ignoring_columns=None)
: Converts specific columns from string to numeric.removeDuplicateData(dataframe, ignoring_columns=[])
: Removes duplicate rows and duplicate columns.removeEmptyRows(dataframe, columns_to_check)
: Removes rows that have all NaNs in specified columns.
expandColumns(dataframe, source_columns, destination_columns, string_separators, remove_source_columns=False)
: Splits columns into multiple based on delimiters.splitDataFrameHorizontally(dataframe, common_columns, columns_to_split)
: Splits horizontally into multiple DataFrames.splitDataFrameVertically(dataframe, grouping_columns)
: Splits vertically based on groupings.splitDataFrameVerticallyIntoExcelFiles(dataframe, grouping_columns, save_folder, file_name_prefix, data_value_as_file_name)
: Saves vertical splits as separate Excel files.
replaceSubstringsInDataFrame(dataframe, column_names, old_substrings, new_substrings)
: Replaces multiple substrings with new ones in specified columns.roundCoordinates(dataframe, coordinate_columns, precision_digits)
: Rounds coordinates to specified precision.uniqueValuesCount(values)
: Counts unique non-null values.
calculateR2(observed, predicted)
: Calculates R-squared.calculateRMSD(observed, predicted)
: Calculates Root Mean Square Deviation.calculateDegreeOfAgreement(observed, predicted)
: Computes Willmott’s degree of agreement.
compareValueRangesMathematically(first_list, second_list)
: Compares two sets using R², RMSD, and agreement.compareByBiasCorrection(observed, predicted)
: Applies bias correction to predictions.compareKDE(observed, predicted)
: Compares Kernel Density Estimations.compareRangesDifferenceByQuantiles(observed, predicted, quantiles)
: Quantile comparison.compareRangesDifferenceByKLDivergence(first_list, second_list)
: KL Divergence comparison.compareRangesDifferenceByKSTest(first_list, second_list)
: KS Test comparison.compareRangesDifferenceByMannWhitney(first_list, second_list)
: Mann-Whitney U test comparison.getNormalRangesDifference(first_list, second_list)
: Difference by range width.getVariantRangesDifference(first_list, second_list, acceptable_percent)
: Combines multiple statistical differences.
convertToDatetime(value, source_format=None)
: Converts a value to datetime.changeDateTimeFormatInDataFrame(dataframe, column_names, new_formats)
: Changes date format of specified columns.insertDateByTimestampIntoDataFrame(dataframe, timestamp_column, date_column_name)
: Inserts date column from timestamp.insertYearByTimestampIntoDataFrame(dataframe, timestamp_column, year_column_name)
: Extracts year from timestamp.addNewDateColumnByDateRangesToDataFrame(dataframe, column_name, date_ranges, new_date_column_name, new_date_format)
: Creates new date columns based on date range.
convertShapeFileDataToDataFrame(file_path, projection_system)
: Converts shapefile to DataFrame.extractCSVDataIntoDataFrame(file_path, file_proj, dest_proj)
: Extracts CSV and transforms projection.getDominantProjectionSystem(shape_files_path)
: Finds most common CRS in shape files.getDominantProjectionSystemOfCSVFiles(csv_files_path)
: Finds common projection in CSV files.addLatAndLongColumnsToDataframe(dataframe, location_column, lat_column, lon_column, remove_location)
: Splits location string into lat/lon columns.
makeExcelFileColumnsWidthAutoFit(workbook)
: Adjusts the width of all columns in all sheets of a workbook to fit their content.makeExcelFileRowsHeightAutoFit(workbook)
: Adjusts the height of all rows in all sheets of a workbook to fit their content.makeExcelFileColumnsWidthAutoFit(workbook)
: Adjusts the width of all columns in all sheets of a workbook to fit their content.addDataFrameAsSheetToExcel(dataframe, title, file_path)
: Adds a DataFrame as a new sheet.saveToExcel(dataframe, save_path, file_subject)
: Saves to Excel with formatting.saveDataFramesInExcel(dataframes, sheet_names, save_path, file_subject, freeze_position)
: Saves multiple DataFrames to one Excel.makeExcelFileAutoFitWithFrozenPane(file_path, file_subject)
: Applies autofit and freeze to all sheets.makeColumnsWidthAutoFit(worksheet)
: Adjusts column width.makeRowsHeightAutoFit(worksheet)
: Adjusts row height.freezePaneInExcelFile(workbook, freeze_position)
: Freezes pane at given cell.removeSheetsFromExcelFile(file_path, sheet_names)
: Removes specified sheets.
extractDataFrame(file_path, sheet_names, ignored_sheets, headers_row_index, first_data_row, include_file_path)
: Reads data.prepareDataFrame(dataframe, file_path, headers_row_index, first_data_row, include_file_path)
: Cleans and standardizes headers.
findCSVFilesBySubstring(folder_path, file_name_substring)
: Locates CSVs by name.findCSVFilesInFolder(folder_path, subdirectories, check_projection_system)
: Locates CSVs and checks projection.findShapeFilesInFolder(folder_path, subdirectories)
: Locates shapefiles.findXLSXFilesBySubstring(folder_path, file_name_substring)
: Locates XLSX files by name.
mergeDataFramesHorizontallyOnCommonColumns(dataframes, data_frames_names)
: Merge on common columns.mergeDataFramesHorizontallyOnSpecificColumns(dataframes, data_frames_names, merging_columns)
: Merge using specific columns.mergeDataFramesVertically(dataframes, type_names, type_column, insert_index)
: Vertical merge with new label column.mergeSheetsHorizontallyOnSpecificColumns(file_path, merging_columns, selected_sheets)
: Merge Excel sheets horizontally.mergeSheetsVertically(file_path, selected_sheets, column_name_for_sheet_titles, sheet_titles)
: Merge Excel sheets vertically.
generateCombinations(items, max_count)
: Generates combinations of items.fillDataFrameByAnotherDataFrame(source_dataframe, destination_dataframe, source_columns, destination_columns)
: Fills columns in one DataFrame using another.isNumber(value)
: Checks if value is a number.isFileOpen(file_path)
: Checks if file is locked.checkFileIsClosedBeforeSave(save_path)
: Shows warning until file is closed.evaluateAndSplitLocation(location)
: Splits location string into latitude and longitude.normalizeDataFrame(dataframe, keys, ignoring_columns, variance_check)
: Normalizes DataFrame by removing redundant columns.makeAverageOnDataframe(dataframe, keys, check_numerics, fill_missing_values)
: Averages values across groups.