@@ -21,15 +21,16 @@ llm-stylometry/
2121│ ├── utils/ # Helper utilities
2222│ ├── visualization/ # Plotting and visualization
2323│ └── cli_utils.py # CLI helper functions
24- ├── code/ # Original analysis scripts
25- │ ├── main.py # Model training script
24+ ├── code/ # Training and CLI scripts
25+ │ ├── generate_figures.py # Main CLI entry point
26+ │ ├── consolidate_model_results.py # Result consolidation
27+ │ ├── main.py # Model training orchestration
2628│ ├── clean.py # Data preprocessing
27- │ └── ... # Various analysis scripts
29+ │ └── ... # Supporting training modules
2830├── data/ # Datasets and results
2931│ ├── raw/ # Original texts from Project Gutenberg
3032│ ├── cleaned/ # Preprocessed texts by author
31- │ ├── model_results.pkl # Consolidated model training results
32- │ └── model_results.csv # Model results in CSV format
33+ │ └── model_results.pkl # Consolidated model training results
3334├── models/ # Trained models (80 total)
3435│ └── {author}_tokenizer=gpt2_seed={0-9}/
3536├── paper/ # LaTeX paper and figures
@@ -40,7 +41,6 @@ llm-stylometry/
4041│ ├── data/ # Test data and fixtures
4142│ ├── test_*.py # Test modules
4243│ └── check_outputs.py # Output validation script
43- ├── generate_figures.py # Main CLI entry point
4444├── run_llm_stylometry.sh # Shell wrapper for easy setup
4545├── LICENSE # MIT License
4646├── README.md # This file
@@ -168,16 +168,17 @@ fig = generate_all_losses_figure(
168168** Note** : Training requires a CUDA-enabled GPU and takes significant time (~ 80 models total).
169169
170170``` bash
171- # Using the CLI (recommended)
171+ # Using the CLI (recommended - handles all steps automatically )
172172./run_llm_stylometry.sh --train
173-
174- # Or manually
175- conda activate llm-stylometry
176- python code/clean.py # Clean data
177- python code/main.py # Train models
178- python consolidate_model_results.py # Consolidate results
179173```
180174
175+ This command will:
176+ 1 . Clean and prepare the data if needed
177+ 2 . Train all 80 models (8 authors × 10 seeds)
178+ 3 . Consolidate results into ` data/model_results.pkl `
179+
180+ The training pipeline automatically handles data preparation, model training across available GPUs, and result consolidation. Individual model checkpoints and loss logs are saved in the ` models/ ` directory.
181+
181182### Model Configuration
182183
183184Each model uses:
0 commit comments