🎬 ETL Pipeline for IMDb Data using Python and MySQL

📌 Project Overview

A Python-based ETL (Extract, Transform, Load) project to process the IMDb dataset and load it into a MySQL database. This pipeline is designed to be robust and handle large datasets in memory-efficient chunks.

🛠️ Tech Stack

Python
Pandas
MySQL / mysql-connector-python

🚀 Getting Started

Follow these steps to set up and run the project on your local machine.

Prerequisites

Python 3.x: Ensure you have a recent version of Python installed.
MySQL Server: A running MySQL instance is required to host the database.
IMDb Dataset: The project will automatically download the necessary datasets (title.basics.tsv.gz, title.ratings.tsv.gz, name.basics.tsv.gz).

Clone the repository

git clone https://github.com/Gireeshs02/imdb-data-pipeline.git
cd imdb-data-pipeline

Set up a Virtual Environment

py -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

Install Dependencies
```
pip install -r requirements.txt
```

Configure .env file with your MySQL credentials

DB_USER = user_name 
DB_PASSWORD = your_password
DB_HOST = localhost
DB_PORT = port_number
DB_NAME = database_name

Set up MySQL database
- Run the schema file inside MySQL:
```
mysql -u root -p your_database_name < schema.sql
```
Download the Data
```
py download_data.py
```
Run the Project
```
py main.py
```

🤝 Contributing

Contributions are welcome! Feel free to open issues, submit pull requests, or suggest improvements.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download-data.py		download-data.py
main.py		main.py
requirements.txt		requirements.txt
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 ETL Pipeline for IMDb Data using Python and MySQL

📌 Project Overview

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

🤝 Contributing

📜 License

About

Uh oh!

Languages

License

Gireeshs02/imdb-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

🎬 ETL Pipeline for IMDb Data using Python and MySQL

📌 Project Overview

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

🤝 Contributing

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages