A simple script to test how well AI models can fix software bugs by generating git patches.
This script tests AI models on real software bugs from the SWE-Bench Lite dataset. It:
- Takes bug descriptions from real projects
- Asks an AI model to generate a fix (as a git patch)
- Tests if the fix can actually be applied to the code
- Reports how well the model performed
pip install requests datasets tqdm# Install Ollama (if you haven't already)
# Download from: https://ollama.ai/
# Start Ollama service
ollama serve
# Pull a coding model
ollama pull qwen2.5-coder:7b# Test with 3 bug fixes
python3 script.py --instances 3 --batch-size 1
# Test with 10 bug fixes
python3 script.py --instances 10 --batch-size 2
# Use a different model
python3 script.py --instances 5 --batch-size 1 --model codellama:13bThe script will show:
- Progress bars for each bug being processed
- Whether each patch was successfully applied
- A summary at the end with success rates
swebench_results.json: Detailed results for each bugswebench_eval.log: Log file with technical details
==================================================
EVALUATION SUMMARY
==================================================
Total instances processed: 3
Patches successfully applied: 1 (33.3%)
Patches failed to apply: 2 (66.7%)
Average processing time: 39.58 seconds
==================================================
- Make sure Ollama is running:
ollama serve - Check if the service is on:
curl http://localhost:11434/api/tags
- This means the AI generated an incomplete or malformed patch
- Try using a better model or adjusting the prompt
- Check the log file for specific error details
# Use a different Ollama model
python3 script.py --model deepseek-coder:6.7b
# Use a different Ollama server
python3 script.py --ollama-url http://192.168.1.100:11434# Process more bugs at once (faster but uses more memory)
python3 script.py --instances 50 --batch-size 10
# Process one at a time (slower but more reliable)
python3 script.py --instances 10 --batch-size 1# Save results to a different file
python3 script.py --output my_results.json
# Use a different working directory
python3 script.py --work-dir ./my_work_folder