AI-powered Migration Tools

✨ Description

This repository outlines processes and includes tools used for interpreting data generated from site crawls to inform content migrations. The tools are designed to assist with data migration, data cleaning, and AI-fueled data analysis tasks. The goal is to apply data-based decisions to consolidating the migration of multiple sites.

🚀 Features

Uses uv for fast dependency management and isolated execution
Built with polars, google-genai, and python-dotenv
Modular script structure runnable via module syntax
Crawl Analysis: Analyze and extract insights from crawled datasets.
Deduplication: Remove duplicate items from columns to ensure data integrity.
JSON Expansion: Expand JSON fields within CSV files into separate columns for easier analysis.
Header Cleaning: Standardize and clean column headers in tabular data.
HTML Row Filtering: Filter out unwanted HTML rows from datasets.
AI Integration: Utilities for making AI calls to enhance or validate data.

🔄 Workflow

A visualization of the project's data processing workflow:

flowchart TB
    subgraph "Input"
        A[Raw Site Crawl CSV]
    end

    subgraph "Step 1: Data Expansion & Cleaning"
        B1[Expand JSON Columns]
        B2[Filter HTML Rows]
        B3[Clean Headers]
        A --> B1 --> B2 --> B3
    end
    
    subgraph "Step 2: AI-powered Analysis"
        C1[Extract Informational Columns]
        C2[AI Analysis of Site Structure]
        C3[Generate Migration Groups]
        C4[Analyze Sidebar Content]
        B3 --> C1 --> C2 --> C3 --> C4
    end
    
    subgraph "Step 3: Data Organization & Export"
        D1[Group by Migration Path]
        D2[Generate Statistics]
        D3[Export Individual CSV Groups]
        D4[Create Combined Sorted CSV]
        C4 --> D1 --> D2 --> D3 --> D4
    end
    
    subgraph "Output Files"
        E1[Expanded CSV]
        E2[Migration Groups JSON]
        E3[Final Analysis JSON]
        E4[Individual Group CSVs]
        E5[All Data Sorted CSV]
        E6[Summary CSV]
        
        D4 --> E1 & E2 & E3 & E4 & E5 & E6
    end
    
    classDef inputOutput fill:#f9f,stroke:#333,stroke-width:2px
    classDef process fill:#bbf,stroke:#333,stroke-width:1px
    classDef aiProcess fill:#bfb,stroke:#333,stroke-width:1px
    
    class A,E1,E2,E3,E4,E5,E6 inputOutput
    class B1,B2,B3,D1,D2,D3,D4 process
    class C1,C2,C3,C4 aiProcess

📦 Requirements

Python 3.13+
uv (a modern Python package manager)
Install uv (if you don’t already have it)

    curl -LsSf https://astral.sh/uv/install.sh | sh
    # or
    pipx install uv
    # or
    pip install uv
    # or
    brew install uv

For more options, review the Documentation for installing uv

🔧 Getting Started

Clone the repository:

    git clone https://github.com/civicactions/ai-migrations.git
    cd ai-migrations

Install dependencies:
- Install the required python version in a virtual env:
```
uv venv --python 3.13.0
```
- Install other dependencies directly from pyproject.toml:
```
uv pip install -r pyproject.toml
```

How to run the crawl analysis:

There are 2 options for running the crawl analysis. It can be run as an app on the browser our with Python scripts in the command line.

Running the analysis on the browser:

The app provides a visual interface for uploading a CSV file generated from a site crawl. It runs the analysis and provides CSV file downloads of the analyzed and grouped urls.

Local development In the command line, run the following:

Initialize packages

 uv sync

Start the app:

 python -m streamlit run ai_crawl_analysis/streamlit_app.py
 - OR -
 uv run -m streamlit run ai_crawl_analysis/streamlit_app.py

This will open the app in http://localhost:8501/. Upload a CSV file to the upload field to start the analysis.

Cloud environment URL - This is TBD & will be updated when this gets deployed. When this is deployed to the cloud, upload a CSV file to start the analysis.

Running the analysis in the command line:

The Python scripts provide a more granular method for executing the analysis. You can run all the steps or run individual steps for better control and debugging.

The processing scripts are structured as modules. You can

Run them using uv run or standard Python module syntax

uv run -m ai_crawl_analysis.main [path_to_crawl_file] (eg. data/audit-inputs/sample-seed-fund.csv)

Run individual scripts with these commands:

  uv run -m ai_crawl_analysis.expand_json_csv
  uv run -m ai_crawl_analysis.deduplicate_column_items
  uv run -m ai_crawl_analysis.crawl_analysis

OR

Run them using Python directly
- Activate the virtual environment
- Then run python -m ai_crawl_analysis.main

Environment variables

The crawl_analysis script requires an API_KEY environment variable. Edit the env.example field at the root of the project to add your AI API Key.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.streamlit		.streamlit
ai_crawl_analysis		ai_crawl_analysis
data		data
prompts		prompts
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
test_json.txt		test_json.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-powered Migration Tools

✨ Description

🚀 Features

🔄 Workflow

📦 Requirements

🔧 Getting Started

How to run the crawl analysis:

Running the analysis on the browser:

Running the analysis in the command line:

Environment variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

CivicActions/ai-crawl-analysis

Folders and files

Latest commit

History

Repository files navigation

AI-powered Migration Tools

✨ Description

🚀 Features

🔄 Workflow

📦 Requirements

🔧 Getting Started

How to run the crawl analysis:

Running the analysis on the browser:

Running the analysis in the command line:

Environment variables

License

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages