A modular machine learning system for financial market directional prediction with Streamlit interface and benchmark strategies.
This project implements machine learning models to predict the direction (up/down) of financial markets with a modular architecture. The system supports multiple asset classes and provides both CLI and web interfaces for evaluation against benchmark strategies.
- π€ Multiple ML Models: Random Forest, MLP (Multi-Layer Perceptron)
- π― Benchmark Strategies: Comparison strategies (Bullish, Bearish, Random, Frequency, Momentum, Mean Reversion)
- οΏ½ Multi-Asset Support: Stocks, Crypto, Commodities
- π Interactive Interface: 3-step Streamlit application with technical indicators
- π Comprehensive Evaluation: Complete metrics and performance analysis
- οΏ½ Stock Indices: S&P 500, NASDAQ, Dow Jones
- π’ Individual Stocks: AAPL, MSFT, GOOGL, TSLA, NVDA, META, AMZN
- βΏ Cryptocurrencies: BTC, ETH, SOL
- π₯ Commodities: Gold, Silver, Crude Oil
# Clone the repository
git clone https://github.com/yaaks7/ml-trading.git
cd ml-trading
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt
streamlit run streamlit_app/app.py
β‘οΈ Open http://localhost:8501 in your browser
# Default analysis (S&P 500, 2023)
python main.py
# Custom asset and strategies
python main.py --asset BTC-USD --start-date 2023-01-01 --end-date 2024-12-31 --strategies bullish,random
# All strategies with verbose output
python main.py --asset ^GSPC --strategies all --verbose
# Run system test
python main.py --test
The web interface follows a logical, pedagogical approach with an intuitive three-step workflow:
Configure your analysis setup and visualize technical indicators before model training.
Data configuration showing asset selection, date range, and technical indicator setup
- Select Asset: Choose from S&P 500, BTC-USD, etc.
- Define Period: Start and end dates for analysis
- Configure Technical Indicators:
- Moving Averages (MA 5, 10, 20, 50, 100, 200)
- RSI (Relative Strength Index)
- MACD (Moving Average Convergence Divergence)
- Bollinger Bands
- Load & Process Data: View price charts and generated features
Configure your ML models and benchmark strategies for comprehensive evaluation.
Model configuration panel with ML models and benchmark strategy selection
- Select ML Models: Random Forest, MLP
- Choose Benchmarks: Comparison strategies for rigorous evaluation
- Configure Parameters: Model hyperparameters and training settings
- Train Models: Automated training process with progress tracking
Analyze model performance with comprehensive metrics and visualizations.
Complete results with performance metrics, model comparison, and detailed analysis
- Performance Metrics: Accuracy, Precision, Recall, F1-Score
- Model Comparison: Ranking and detailed analysis
- Visualizations: Confusion matrices, feature importance
- Export Results: Download reports in CSV/JSON format
python main.py [OPTIONS]
Options:
--asset {^GSPC,BTC-USD,AAPL,...} Asset to analyze (default: ^GSPC)
--start-date YYYY-MM-DD Start date (default: 2020-01-01)
--end-date YYYY-MM-DD End date (default: 2024-12-31)
--strategies LIST Comma-separated strategies or "all"
--test Run quick test with synthetic data
--verbose, -v Enable detailed logging
from src.data.fetcher import DataFetcher
from src.models.ml_models import get_all_ml_models
from src.strategies.naive import get_all_naive_strategies
from config.settings import DataConfig
# Configure and fetch data
config = DataConfig()
config.start_date = "2023-01-01"
config.end_date = "2024-12-31"
fetcher = DataFetcher(config)
X, y = fetcher.process_symbol("^GSPC")
# Split data
split_idx = int(len(X) * 0.8)
X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]
# Train and evaluate models
ml_models = get_all_ml_models()
for name, model_class in ml_models.items():
model = model_class()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"{name}: {accuracy:.3f}")
Symbol | Name | Description |
---|---|---|
^GSPC | S&P 500 | US large-cap index |
^IXIC | NASDAQ Composite | Tech-heavy index |
^DJI | Dow Jones | 30 large US companies |
Symbol | Name | Sector |
---|---|---|
AAPL | Apple Inc. | Technology |
MSFT | Microsoft Corporation | Technology |
GOOGL | Alphabet Inc. | Technology |
TSLA | Tesla Inc. | Automotive |
NVDA | NVIDIA Corporation | Semiconductors |
META | Meta Platforms Inc. | Social Media |
AMZN | Amazon.com Inc. | E-commerce |
Symbol | Name | Market Cap Rank |
---|---|---|
BTC-USD | Bitcoin | #1 |
ETH-USD | Ethereum | #2 |
SOL-USD | Solana | Top 10 |
Symbol | Name | Category |
---|---|---|
GC=F | Gold Futures | Precious Metals |
CL=F | Crude Oil Futures | Energy |
Basic Features
- OHLCV prices (Open, High, Low, Close, Volume)
- Returns (daily price changes)
- Log returns for volatility analysis
Technical Indicators
- Moving Averages: 5, 10, 20, 50, 100, 200 periods
- RSI: Relative Strength Index (14 periods)
- MACD: Moving Average Convergence Divergence
- Bollinger Bands: Volatility bands
Derived Features
- Price/MA ratios for trend analysis
- Multi-horizon trends (2, 5, 10, 20 days)
- Momentum indicators
- Volatility measures
- Random Forest: Ensemble of decision trees with feature importance analysis
- MLP (Multi-Layer Perceptron): Neural network for non-linear pattern recognition
- Bullish: Always predicts upward movement
- Bearish: Always predicts downward movement
- Random: Random predictions (50/50)
- Frequency: Based on historical up/down frequency
- Momentum: Follows the last price direction
- Mean Reversion: Contrarian approach
- Python 3.8+: Core language
- pandas/numpy: Data manipulation and analysis
- scikit-learn: Machine learning models and metrics
- yfinance: Financial data retrieval
- pandas-ta: Technical indicator library
- plotly: Interactive data visualization
- streamlit: Web application framework
- joblib: Model persistence and caching
- Accuracy: Overall prediction correctness
- Precision: True positive rate
- Recall: Sensitivity to upward movements
- F1-Score: Harmonic mean of precision/recall
- Confusion Matrix: Detailed error analysis
- Overfitting Detection: Train vs Test performance comparison
- Model Ranking: Comparative performance evaluation
- Feature Importance: Model interpretability and feature contribution
- Benchmark Comparison: ML models vs naive strategies
# Quick functionality test
python main.py --test
# Performance validation with real data
python main.py --asset BTC-USD --strategies all --verbose
Edit config/settings.py
:
SUPPORTED_ASSETS = {
'YOUR_SYMBOL': {
'name': 'Your Asset Name',
'type': 'stock', # or 'crypto', 'forex', 'commodity'
'description': 'Asset description',
'currency': 'USD',
'sector': 'Technology'
}
}
Extend src/data/indicators.py
:
def add_custom_indicator(df: pd.DataFrame, **kwargs) -> pd.DataFrame:
"""Add your custom technical indicator"""
df = df.copy()
df['CUSTOM_INDICATOR'] = your_calculation(df)
return df
Extend src/models/
:
from src.models.base import BaseMLModel
class YourCustomModel(BaseMLModel):
def __init__(self, **params):
super().__init__("Your Model", **params)
self.model = YourModelClass(**params)
def fit(self, X, y):
self.model.fit(X, y)
self.is_fitted = True
return self
def predict(self, X):
return self.model.predict(X)
- Use longer time periods (2+ years) for better model training
- Select appropriate technical indicators for your asset class
- Compare multiple models to avoid overfitting
- Always validate against benchmark strategies
- Monitor for data leakage in feature engineering
This project is licensed under the MIT License - see the LICENSE file for details.
Yanis - Machine Learning & Quantitative Finance Portfolio
- GitHub: github.com/yaaks7
- LinkedIn: linkedin.com/in/yanisaks