An advanced AI-powered code review system that detects AI-generated code patterns and provides intelligent suggestions for improvement. The system incorporates multiple AI models including SBERT, spaCy, local LLM transformers, and Wikipedia API for comprehensive analysis.
- SBERT Semantic Analysis: Uses sentence transformers to detect semantic similarity between comments and known AI-generated patterns
- spaCy NLP Analysis: Advanced natural language processing to identify generic language and lack of technical specificity
- Local LLM Classification: Uses Hugging Face transformers for text classification of code comments
- Knowledge Base Verification: Validates code against programming concepts and best practices
- Wikipedia Concept Validation: Verifies technical terms against Wikipedia's knowledge base
- Edit History Analysis: Tracks code changes to detect large blocks added at once (typical of AI generation)
- Placeholder comments (TODO, FIXME, etc.)
- Generic variable names
- Lack of descriptive comments
- AI generation markers
- Overly generic comments
- Multi-language support (Python, JavaScript, TypeScript, C++)
- Real-time analysis and suggestions
- Automated fixes and optimizations
- Code quality scoring
- Seamless VS Code extension
- Real-time diagnostics and hover information
- Code action suggestions
- Feedback collection and dashboard
-
Clone the repository:
git clone <repository-url> cd Spurhacks
-
Install Python dependencies:
cd Backend pip install -r requirements.txt
-
Install AI Model Dependencies (Optional but recommended):
python setup_ai_models.py
This will install:
- SBERT (Sentence Transformers)
- spaCy with English language model
- Hugging Face Transformers
- Wikipedia API
- scikit-learn
-
Start the backend server:
python run_server.py
The server will be available at
http://localhost:8000
-
Install Node.js dependencies:
cd Frontend npm install
-
Compile the VS Code extension:
npm run compile
-
Install the extension in VS Code:
- Open VS Code
- Go to Extensions (Ctrl+Shift+X)
- Click "Install from VSIX..."
- Select the compiled extension
The backend can be configured through environment variables:
# API Configuration
API_KEY=your_api_key_here
ANALYSIS_TIMEOUT=30
# AI Model Configuration
ENABLE_SBERT=true
ENABLE_SPACY=true
ENABLE_TRANSFORMERS=true
ENABLE_WIKIPEDIA=true
Configure the VS Code extension in settings.json
:
{
"aiCodeReviewer.apiUrl": "http://localhost:8000/api/v1",
"aiCodeReviewer.enableRealTimeAnalysis": true,
"aiCodeReviewer.autoDetectAI": true
}
- Compares code comments with known AI-generated patterns
- Uses cosine similarity to detect semantic matches
- Provides confidence scores for detected patterns
- Identifies overly generic language in comments
- Detects lack of technical terms and specificity
- Analyzes comment quality and structure
- Uses DistilBERT for text classification
- Analyzes comment patterns for AI generation indicators
- Provides probability scores for AI-generated content
- Validates function and class names against programming best practices
- Checks for meaningful naming conventions
- Suggests improvements for generic names
- Verifies technical terms against Wikipedia's knowledge base
- Ensures proper use of programming terminology
- Identifies potentially incorrect or non-standard terms
Run the comprehensive test suite:
cd Backend
python test_enhanced_ai_detection.py
This will test:
- SBERT semantic similarity analysis
- spaCy NLP analysis
- Local LLM classification
- Knowledge base verification
- Wikipedia concept validation
- Edit history analysis
cd Backend
python test_ai_detection.py
# Health check
curl http://localhost:8000/health
# Analyze code
curl -X POST http://localhost:8000/api/v1/analyze \
-H "Content-Type: application/json" \
-d '{"code": "def hello(): return \"world\"", "language": "python"}'
# Detect AI-generated code
curl -X POST http://localhost:8000/api/v1/detect-ai \
-H "Content-Type: application/json" \
-d '{"code": "def x(): return y", "language": "python"}'
# AI-generated code (will be detected)
def process_data(data):
# TODO: Implement data processing
# This function does something
result = []
for item in data:
# Add logic here
result.append(item * 2)
return result
# Well-written code (low AI detection)
def calculate_discount_price(original_price: float, discount_percentage: float) -> float:
"""
Calculate the final price after applying a discount.
Args:
original_price: The original price of the item
discount_percentage: The discount percentage (0-100)
Returns:
The final price after discount
Raises:
ValueError: If discount_percentage is not between 0 and 100
"""
if not 0 <= discount_percentage <= 100:
raise ValueError("Discount percentage must be between 0 and 100")
discount_amount = original_price * (discount_percentage / 100)
final_price = original_price - discount_amount
return round(final_price, 2)
// AI-generated code (will be detected)
function processUserData(userData) {
// This function processes user data
let x = [];
let y = {};
// TODO: Add validation
for (let i = 0; i < userData.length; i++) {
x.push(userData[i]);
}
return x;
}
// Well-written code (low AI detection)
function calculateTotalPrice(items, taxRate) {
/**
* Calculate the total price including tax for a list of items.
*
* @param {Array} items - Array of items with price properties
* @param {number} taxRate - Tax rate as a decimal (e.g., 0.08 for 8%)
* @returns {number} Total price including tax
*/
const subtotal = items.reduce((sum, item) => sum + item.price, 0);
const taxAmount = subtotal * taxRate;
return subtotal + taxAmount;
}
POST /api/v1/analyze
Content-Type: application/json
{
"code": "your code here",
"language": "python|javascript|typescript|cpp",
"edit_history": [
{
"type": "insert|delete|replace",
"text": "code text",
"line": 1,
"timestamp": 1234567890
}
]
}
POST /api/v1/detect-ai
Content-Type: application/json
{
"code": "your code here",
"language": "python|javascript|typescript|cpp",
"edit_history": []
}
POST /api/v1/optimize
Content-Type: application/json
{
"code": "your code here",
"language": "python|javascript|typescript|cpp"
}
The enhanced AI detector provides detailed scores from multiple models:
- Semantic Score: SBERT similarity analysis (0-100%)
- NLP Score: spaCy language analysis (0-100%)
- LLM Score: Local transformer classification (0-100%)
- Knowledge Score: Programming concept validation (0-100%)
- Wiki Score: Wikipedia term verification (0-100%)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Sentence Transformers for semantic similarity analysis
- spaCy for natural language processing
- Hugging Face Transformers for local LLM capabilities
- Wikipedia API for concept validation
- FastAPI for the backend API
- VS Code Extension API for the frontend integration