GitHub - maresac/airflow-topics: Extract recent twitter topics with airflow

This is my first project using airflow.

Get a sample of the most recent tweets in English language
Use Latent Dirichlet Allocation to model topics and extract most significant words / word tokens
Upload result as json file to an S3 bucket

Place an .env file in dags/topics/ containing the following env variables:

Run pip install -r requirements.txt first
Set AIRFLOW_HOME to your project path
Get airflow up and running:
- airflow initdb
- airflow scheduler
- airflow webserver -p 8080
Run tasks separately:
- airflow test airflow-topics extract_tweets $(date +%F)
- airflow test airflow-topics dump_topics $(date +%F)
- airflow test airflow-topics push_results $(date +%F)
Alternatively:
- airflow trigger_dag airflow-topics

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dags		dags
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback