Scrape the hotel reviews of a whole city on TripAdvisor.
- python 3.5
Download and install required libs and data:
pip install bs4Store all reviews of New York City:
python tripadvisor-scrapper.py 60763 New_York_City_New_YorkStore all reviews of Paris:
python tripadvisor-scrapper.py 187147 Paris_Ile_de_FranceStore all reviews of Vienna:
python tripadvisor-scrapper.py 190454 ViennaThe scrapper requires the city location id and the city name as commandline arguments.
Both can be retrieved from the url, for example, https://www.tripadvisor.com/Hotels-g60763-New_York_City_New_York-Hotels.html
The city location id is the number after the g. The city name is the string from the dash after the city location id to the dash before Hotels.
Store all reviews of Vienna and additionally store the review urls list as pickle for rescraping later:
python tripadvisor-scrapper.py 190454 vienna --pickle storeA pickle is stored in data/timestamp-cityname
Store all reviews of Vienna using a review urls list loaded from pickle/20160601-1522-vienna.pickle:
python tripadvisor-scrapper.py 190454 Vienna --pickle load --filename 20160601-1522-vienna.pickleA pickle to load has to be placed in the pickle directory at the same directory level as the tripadvisor-scrapper.py
Put all reviews and hotel information of a city together:
python tripadvisor-totalizer.py /Users/admin/tripadvisor-scrapper/data/20160716-202314-vienna