-
Notifications
You must be signed in to change notification settings - Fork 324
Add Airbnb Data Analysis Project under datascience folder #1799 #1801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Airbnb Data Analysis Project under datascience folder #1799 #1801
Conversation
WalkthroughAdded a new README.md for the Airbnb Data Analysis project describing project overview, included artifacts (presentation and Tableau workbook), objectives, insights, setup/usage instructions, and potential future enhancements. No code or public APIs were modified. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly! Feel free to join our community on Discord to discuss more! |
✅ PR validation passed! Syncing labels and assignees from the linked issue... |
✅ PR validation passed! Syncing labels and assignees from the linked issue... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
Data_Science/Airbnb-Data-Analysis/README.md (4)
17-22
: Define metrics precisely (weekly revenue, zip source, bed_type status, date range).Document formulas and data fields to remove ambiguity and ease validation.
Apply:
## Key Insights - **Average Price Per Bedroom**: Larger properties generally have higher average prices, with steep increases for properties with 4+ bedrooms. - **Price by Zip Code**: Some zip codes show significantly higher prices, possibly indicating more desirable or premium areas. - **Revenue Trends**: Weekly revenue peaks during certain seasons, reflecting high-travel demand periods. - **Common Listing Types**: Real beds are the most frequent, and properties with 1-2 bedrooms make up the majority of listings. + +## Methodology & Metric Definitions +- Date range analyzed: <insert coverage window>. +- Weekly revenue: <define formula, e.g., price × (7 − weekly_availability) or based on calendar data>; specify fields used. +- Zip code source: <state field or geocoding method>. Note any imputation rules. +- Bed type and bedroom count fields: <list exact field names>; note if `bed_type` exists in this dataset/version. +- Outlier handling: <winsorization/filtering rules, if any>.
28-33
: Note privacy when adding reviews and outline basic ML evaluation.Minor content tweak to future work to set expectations.
Apply:
## Future Enhancements Potential future improvements include: - Adding More Data Features: Incorporating additional features like reviews or amenities. -- Predictive Analysis: Using machine learning to predict pricing trends based on data. +- Predictive Analysis: Using machine learning to predict pricing trends based on data; report RMSE/MAE and train/test split methodology. - Interactive Web Dashboard: Creating a web-based, interactive dashboard for public access. + - Privacy: If incorporating reviews/amenities, ensure text is anonymized and complies with the dataset license.
23-27
: Provide a non‑Tableau fallback (exported images/PDF or public link).This widens accessibility for reviewers without Tableau Desktop.
Apply:
- Open **TableauProject.twbx**: Use Tableau Desktop to open the file and explore the visualizations. - Review the Presentation: Open **AirbnbProject.pptx** to see summarized findings. + - No Tableau? Include/export key dashboards as PNG/PDF in `docs/` or provide a public Tableau link for viewing.
17-22
: Qualify insight statements with dataset scope to avoid overgeneralization.Add time/place context so claims aren’t read as universal truths.
Apply:
-- **Average Price Per Bedroom**: Larger properties generally have higher average prices, with steep increases for properties with 4+ bedrooms. -**Price by Zip Code**: Some zip codes show significantly higher prices, possibly indicating more desirable or premium areas. -**Revenue Trends**: Weekly revenue peaks during certain seasons, reflecting high-travel demand periods. -**Common Listing Types**: Real beds are the most frequent, and properties with 1-2 bedrooms make up the majority of listings. +- **Average Price Per Bedroom** (in this dataset/timeframe): Larger properties show higher average prices, with steeper increases for 4+ bedrooms. +- **Price by Zip Code** (study region): Certain zip codes trend higher, indicating premium areas. +- **Revenue Trends**: Weekly revenue peaks during specific seasons in the analyzed window. +- **Common Listing Types**: Real beds are most frequent, and 1–2 bedroom listings dominate this dataset.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Data_Science/Airbnb-Data-Analysis/AIRBNB PROJECT.pptx
is excluded by!**/*.pptx
📒 Files selected for processing (1)
Data_Science/Airbnb-Data-Analysis/README.md
(1 hunks)
🔇 Additional comments (1)
Data_Science/Airbnb-Data-Analysis/README.md (1)
1-1
: Confirm directory casing: Data_Sience vs datascience.Found: ./Data_Science/Airbnb-Data-Analysis. PR references "/datascience/...". Confirm intended casing and update paths or rename to avoid duplicate/ambiguous paths on case‑insensitive filesystems.
## Overview | ||
This project provides a comprehensive analysis of Airbnb listing data to uncover trends in pricing, availability, and other key factors. Using Tableau for visualization, the project explores the Airbnb rental market, highlighting patterns based on geographic location, room types, and pricing variations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add data source, license, and non‑affiliation/privacy notes.
Provenance and licensing are required for compliance and reuse; add a short disclaimer on Airbnb affiliation and PII handling.
Apply:
## Overview
This project provides a comprehensive analysis of Airbnb listing data to uncover trends in pricing, availability, and other key factors. Using Tableau for visualization, the project explores the Airbnb rental market, highlighting patterns based on geographic location, room types, and pricing variations.
+
+## Data Source & Licensing
+- Source: <dataset name/provider, URL, retrieval date>.
+- License: <dataset license>; confirm that redistribution of the workbook and excerpts complies.
+- Privacy: dataset must not include precise addresses, emails, phone numbers, or other PII. Any such fields should be removed or generalized.
+- Non‑affiliation: This project is for educational purposes and is not affiliated with or endorsed by Airbnb.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Overview | |
This project provides a comprehensive analysis of Airbnb listing data to uncover trends in pricing, availability, and other key factors. Using Tableau for visualization, the project explores the Airbnb rental market, highlighting patterns based on geographic location, room types, and pricing variations. | |
## Overview | |
This project provides a comprehensive analysis of Airbnb listing data to uncover trends in pricing, availability, and other key factors. Using Tableau for visualization, the project explores the Airbnb rental market, highlighting patterns based on geographic location, room types, and pricing variations. | |
## Data Source & Licensing | |
- Source: <dataset name/provider, URL, retrieval date>. | |
- License: <dataset license>; confirm that redistribution of the workbook and excerpts complies. | |
- Privacy: dataset must not include precise addresses, emails, phone numbers, or other PII. Any such fields should be removed or generalized. | |
- Non-affiliation: This project is for educational purposes and is not affiliated with or endorsed by Airbnb. |
🤖 Prompt for AI Agents
In Data_Science/Airbnb-Data-Analysis/README.md around lines 3-4, add brief
provenance, licensing and privacy/non‑affiliation statements: specify the
original data source (e.g., "Dataset obtained from Airbnb public listings via
[source/link] on [date]" or the exact file name), include a license or reuse
terms (e.g., CC BY‑NC or link to dataset's license) and a short non‑affiliation
disclaimer ("This project is not affiliated with Airbnb"), and add a PII/privacy
note explaining any personal data removal/anonymization steps and that no
attempt was made to contact hosts/guests. Keep each note one or two sentences
and place them under a new "Data Sources and License" or "Provenance and
Privacy" section near the top of the README.
## Project Files | ||
- **AirbnbProject.pptx**: PowerPoint presentation summarizing the main insights from the analysis. | ||
- **TableauProject.twbx**: Tableau workbook for creating and exploring interactive visualizations. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Include the Excel dataset and reconcile file names; avoid spaces/case drift.
- Project files list omits the Excel dataset mentioned in the PR objectives.
- Filenames differ from the PR text (e.g., “AIRBNB PROJECT.pptx” vs “AirbnbProject.pptx”). Normalize and match the actual committed files to prevent broken references across OSes.
Apply this minimal update (adjust names to whatever is actually in the repo):
## Project Files
-- **AirbnbProject.pptx**: PowerPoint presentation summarizing the main insights from the analysis.
-- **TableauProject.twbx**: Tableau workbook for creating and exploring interactive visualizations.
+- **Airbnb-Project.pptx**: PowerPoint presentation summarizing the main insights from the analysis.
+- **TableauProject.twbx**: Tableau workbook for creating and exploring interactive visualizations.
+- **data/Airbnb_Listings.xlsx**: Source dataset used by the workbook (if not embedded in the TWBX).
Run this to verify filenames and the folder that actually exists:
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
for d in Data_Science/Airbnb-Data-Analysis datascience/Airbnb-Data-Analysis; do
if [ -d "$d" ]; then
echo "Contents of $d:"
ls -al "$d"
fi
done
Length of output: 377
Normalize filenames in README and add the missing Excel dataset
- Repo (Data_Science/Airbnb-Data-Analysis) contains: AIRBNB PROJECT.pptx and TableauProject.twbx; README lists AirbnbProject.pptx and mentions an Excel dataset that is not present.
- Action (choose one):
- Rename Data_Science/Airbnb-Data-Analysis/"AIRBNB PROJECT.pptx" → Data_Science/Airbnb-Data-Analysis/AirbnbProject.pptx (remove spaces/case drift) and add the source dataset at data/Airbnb_Listings.xlsx if it's not embedded in the TWBX.
- OR update README to reference the exact committed filenames (Data_Science/Airbnb-Data-Analysis/"AIRBNB PROJECT.pptx", TableauProject.twbx) and explicitly state whether the source dataset is embedded in the TWBX or where the Excel file is located.
- Use consistent, space-free filenames for cross-OS compatibility.
🤖 Prompt for AI Agents
In Data_Science/Airbnb-Data-Analysis/README.md around lines 6–9, the README
mentions "AirbnbProject.pptx" and an Excel dataset that do not match the repo
contents (the repo contains "AIRBNB PROJECT.pptx" and TableauProject.twbx and
the Excel file is missing or embedded); fix by either (A) renaming the repo file
to AirbnbProject.pptx (remove spaces/change case) and adding the source dataset
at Data_Science/Airbnb-Data-Analysis/data/Airbnb_Listings.xlsx if it is not
embedded in the TWBX, or (B) update the README to reference the exact committed
filenames ("AIRBNB PROJECT.pptx" and "TableauProject.twbx") and explicitly state
whether the source dataset is embedded in the TWBX or where the Excel file can
be found; ensure filenames use consistent, space-free naming if you choose
rename, and commit the README and any file renames/additions together.
🧩 Analysis chain
Track large binaries with Git LFS to keep repo lean.
TWBX/PPTX (and XLSX) often exceed normal size thresholds; ensure they’re LFS‑tracked.
Use this check:
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
echo "Potential large binaries:"
python - <<'PY'
import os
exts={'.twbx','.twb','.pptx','.xlsx','.xls','.csv'}
for root,_,files in os.walk('.'):
for f in files:
if os.path.splitext(f)[1].lower() in exts:
p=os.path.join(root,f)
try:
sz=os.path.getsize(p)
except OSError:
continue
if sz>=5*1024*1024:
print(f"{p}\t{sz/1024/1024:.1f} MB")
PY
echo
echo ".gitattributes entries referencing LFS:"
[ -f .gitattributes ] && rg -n 'filter=lfs' .gitattributes || echo "No .gitattributes found"
Length of output: 1760
Track large binaries with Git LFS and migrate existing large files.
No .gitattributes found; the repo contains multiple files >=5MB. Add a .gitattributes and LFS-track binary/artifact patterns (e.g. *.twbx, *.pptx, *.xlsx, *.xls and large *.csv), then migrate the existing large files into LFS or move them to external storage (git lfs migrate import --include="path,...").
Notable large files found:
- Data_Science/Airbnb-Data-Analysis/TableauProject.twbx — 5.1 MB
- Machine_Learning/Book Recommendation System/books.csv — 73.9 MB
- Deep_Learning/toxicity insult analyser/jigsaw-toxic-comment-classification-challenge/train.csv/train.csv — 65.6 MB
- Deep_Learning/toxicity insult analyser/jigsaw-toxic-comment-classification-challenge/test.csv/test.csv — 57.6 MB
- Machine_Learning/Fake news detection project/dataset/Fake.csv — 59.9 MB
- Machine_Learning/Fake news detection project/dataset/True.csv — 51.1 MB
## Setup and Usage | ||
To interact with the project and explore the data: | ||
- Open **TableauProject.twbx**: Use Tableau Desktop to open the file and explore the visualizations. | ||
- Review the Presentation: Open **AirbnbProject.pptx** to see summarized findings. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify Tableau Desktop version and whether data is embedded; add relinking steps.
Without this, opening the TWBX may fail or silently point to the wrong data.
Apply:
## Setup and Usage
-To interact with the project and explore the data:
-- Open **TableauProject.twbx**: Use Tableau Desktop to open the file and explore the visualizations.
-- Review the Presentation: Open **AirbnbProject.pptx** to see summarized findings.
+To interact with the project and explore the data:
+- Prerequisite: Tableau Desktop (specify tested version, e.g., 2023.3 or later).
+- Open **TableauProject.twbx** in Tableau Desktop.
+ - If the workbook expects an external Excel file (not embedded), go to Data > [Data Source] > Edit Connection and relink to `data/Airbnb_Listings.xlsx`.
+ - If an extract is used, Data > [Data Source] > Extract > Refresh after relinking.
+- Review the presentation: open **Airbnb-Project.pptx** for summarized findings.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Setup and Usage | |
To interact with the project and explore the data: | |
- Open **TableauProject.twbx**: Use Tableau Desktop to open the file and explore the visualizations. | |
- Review the Presentation: Open **AirbnbProject.pptx** to see summarized findings. | |
## Setup and Usage | |
To interact with the project and explore the data: | |
- Prerequisite: Tableau Desktop (specify tested version, e.g., 2023.3 or later). | |
- Open **TableauProject.twbx** in Tableau Desktop. | |
- If the workbook expects an external Excel file (not embedded), go to Data > [Data Source] > Edit Connection and relink to `data/Airbnb_Listings.xlsx`. | |
- If an extract is used, Data > [Data Source] > Extract > Refresh after relinking. | |
- Review the presentation: open **Airbnb-Project.pptx** for summarized findings. |
🤖 Prompt for AI Agents
In Data_Science/Airbnb-Data-Analysis/README.md around lines 23 to 27, the TWBX
instructions lack Tableau version and data-link guidance; update the README to
state the minimum Tableau Desktop version used (e.g., Tableau Desktop 2023.1)
and explicitly note whether the TWBX contains embedded/extracted data or
references external files, then add concise relinking steps: open Tableau
Desktop, go to Data > Replace Data Source (or Data > Extract > Refresh) if the
workbook points to external CSV/Hyper files, and provide the expected relative
path(s) to the data files in the repo (or note that data is embedded so no
relinking is needed).
Pull Request for PyVerse 💡
Requesting to submit a pull request to the PyVerse repository.
Issue Title
Add Airbnb Data Analysis Project under datascience folder
Info about the Related Issue
What's the goal of the project?
To contribute an Airbnb Data Analysis Project under the datascience folder, providing a comprehensive analysis of Airbnb listing data using Tableau, Excel, and visual presentations. The aim is to highlight pricing trends, availability, and location patterns in the Airbnb rental market.
Name
Please mention your name.
Abinaya Goud Jandhyala
GitHub ID
Please mention your GitHub ID.
abinayagoudjandhyala
Email ID
Please mention your email ID for further communication.
[email protected]
Identify Yourself
Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).
GSSOC 2025
Closes
Enter the issue number that will be closed through this PR.
Closes: #1799
Describe the Add-ons or Changes You've Made
Give a clear description of what you have added or modified.
Added a complete Airbnb Data Analysis project featuring Tableau visualizations, Excel datasets, and a summary presentation. The project offers insights into average prices per bedroom, pricing by zip code, weekly revenue patterns, and listing breakdowns. A README details key insights and suggestions for future enhancements, such as integrating review/amenities data and ML predictive analysis.
Type of Change
Select the type of change:
How Has This Been Tested?
Describe how your changes have been tested.
The files were checked for integrity and correctness. Tableau workbook was loaded to verify interactivity and consistency of visualizations. The Excel dataset was reviewed for completeness and accurate linking to the Tableau workbook. The README was tested for clear instructions and setup reproducibility.
Checklist
Please confirm the following:
Summary by CodeRabbit