đź“•Data: https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings
🔑Purpose: Predicting Reservation Destinations. Building a Machine Learning Model
👩🏻‍💻 Datasets Used: train_users_2.csv, test_users.csv
Describe the preprocessing steps taken for the numerical(age, sighnup_flow) and categorical data(gender, language). (Fills a null value and removes unfilled rows)
If the age is less than 10 years old or more than 90 years old, it was considered an abnormal value and dropped
Final Preprocessing Method:
the numerical data was preprocessed by scaling (MinMax, Robust, Standard)
the categorical data was preprocessed by encoding (OneHot, Ordinal, Label).
Visualization based on data frames processed for outliers
Describe the models used (Logistic Regression and Decision Tree)
Use k fold cross validation to find the optimal k value
Use confusion matrices to visualize the results of classification tasks and evaluate the performance of models
Logistic Regression
decision tree
Extract the top 5 combinations of all combinations considering 3 encoders (One-hot, Ordinary, Label) + 3 Scaling (MinMax, Standard, Robust) + hyperparameters
Analyze critical features in predicting destinations through modeling
It can be used to interpret other data because it has carried out an end-to-end process that includes all the steps, including data collection, preprocessing, modeling, evaluation, and data analysis results analysis.