This project provides two ETL pipelines:
- Stock price data via yfinance
- Financial news via NewsAPI
Deployed with:
- Python + Airflow
- Terraform on AWS (S3, MWAA, IAM, Redshift optional)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python -m textblob.download_corpora # For sentiment analysis
export NEWS_API_KEY=your_newsapi_key
export S3_BUCKET=your_s3_bucket_name
python main.py
Files will be uploaded to S3 (based on your bucket and timestamped names).
cd terraform
terraform init
terraform plan -var-file="example.tfvars"
terraform apply -var-file="example.tfvars"
This will Create S3 bucket(s), Set up IAM roles and Configure MWAA environment
Use the S3 bucket Airflow looks at (you'll get this from Terraform outputs):
aws s3 cp ../dags/ s3://marketstream-dag-bucket/dags/ --recursive
You can now view and trigger DAGs from the Airflow UI hosted on MWAA.
- Redshift loading logic
- CloudWatch dashboards
- CI/CD via GitHub Actions for DAG deployment
- Slack/Email notifications
- Real-time updates via Kafka or WebSockets
PYTHONPATH=. pytest tests/