Building-e-Commerce-Hive-Data-Warehouse

Prerequisites

Few things you need to have before starting the project:

Basic understanding of AWS services and creation of an EC2 instance on AWS
Good knowledge of Hive, Spark, SQL, shell scripting and working knowledge of Docker
Strong understanding of databases, relational modeling and a zeal to learn :)

Project Motivation

The main motive behind the project is to understand performing Hive analysis in Docker containers running on an AWS EC2 instance. In process of doing so, you will learn about how to create an AWS EC2 instance, setting up docker containers on that instance, loading data from a local file to AWS using CLI, data ingestion/transformation using Sqoop, Hive, and Spark, data warehousing, and performing data analysis using Hive queries.

Dataset

To run any sort of analytics, we need data. The dataset that we will be using to perform Hive analysis in this project is AdventureWorks dataset. AdventureWorks database supports standard online transaction processing scenarios for a fictitious bicycle manufacturer - Adventure Works Cycles. Various components of the database include Manufacturing, Sales, Purchasing, Product Management, Contact Management, and Human Resources.

We will be concentrating mainly on AdventureWorks Sales and Customer Demographics data in this project.

Problem Statement

Perform data analysis on AdventureWorks Sales and Customer Demographics data in Hive and answer the following:

find the upper and lower discount limits offered for any product
find the top 10 customers with highest contribution to sales
purchase pattern of customers based on salary, education, and gender
sales contribution percentage of each customer and sales contribution percentage based on gender and salary
identify the top performing territory based on sales
find the territory-wise sales and adherence to their defined sales quota

Project Work

Below steps document the work done in order to solve the problem statement.

Project Setup in AWS - Refer here
Creating Hive data warehouse - Refer here
Performing Hive Analytics - Refer here

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Part-1_AWS_Project_Setup		Part-1_AWS_Project_Setup
Part-2_Building_Hive_DataWarehouse		Part-2_Building_Hive_DataWarehouse
Part-3_Hive_Analytics		Part-3_Hive_Analytics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hive-project-diagram.png		hive-project-diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building-e-Commerce-Hive-Data-Warehouse

Prerequisites

Project Motivation

Dataset

Problem Statement

Project Work

About

Uh oh!

Uh oh!

Languages

License

dyavadi8769/Building-e-Commerce-Hive-Data-Warehouse

Folders and files

Latest commit

History

Repository files navigation

Building-e-Commerce-Hive-Data-Warehouse

Prerequisites

Project Motivation

Dataset

Problem Statement

Project Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages