A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data. [paper]
Install from requirements
pip install -r requirements.txt
Follow the steps below to construct the U.S. POI dataset:
To start, download the raw US POI data from the Foursquare dataset, which contains over 20 million POIs. The dataset can be downloaded from google drive.
Then, place the raw file in data/fsq/raw/
.
Download pretrained semantic category embeddings from the SD-CEM repository: repo.
Then, place the embedding files inside: data/SD-CEM
.
Download the official Foursquare POI category hierarchy CSV: website. The filename is personalization-apis-movement-sdk-categories.csv
.
Then, place the file in data/fsq/raw/
.
Download the U.S. Census ZIP Code Tabulation Area (ZCTA) shapefiles: Census database.
Then, place the shape file in: data/fsq/Census/
.
At the end, the data/
directory structure should be something like this:
data/
├── fsq/
│ ├── raw/
│ │ ├── <US_raw_POI_data_file> (e.g. fsq_us.parquet)
│ │ └── personalization-apis-movement-sdk-categories.csv
│ └── Census/
│ └── <ZCTA_shapefile>
└── SD-CEM/
└── <embedding_files> (e.g. SD-CEM#US#30.csv)
First, change your directory to placefm/
:
cd placefm
To generate state-level embeddings using PlaceFM:
python train.py --dataset fsq --method placefm --state <state abbr> --clustering_method kmeans --verbose
You can set default configuration by using configs/placefm/<dataset_name>.json
. The generated embeddings will be saved in checkpoints/placefm
.
To evaluate the effectiveness of generated region embeddings, we've implemented three downstream tasks:
-
Population Density (PD) prediction
Predict the population density of each US zipcode. -
Median House Price (HP) Prediction
estimate the median house price for each US zipcode. -
Urban Functionality (UF)
TODO
Run the following command to evaluate on abovementioned tasks. You can set the following parameters:
--embeddings
: Path to the region embeddings file.--run_eval
: Number of evaluation runs (e.g., 10).--dt_model
: Downstream task model, options:rf
(Random Forest),xgb
(XGBoost), etc.--verbose
: Enable verbose output.
python test.py --embeddings <path to the embeddings> --run_eval 10 --dt_model rf --verbose
The results will be logged in checkpoints/logs/
.
If you find this repo helpful, we would appreciate it if you could cite our paper.
@article{hashemi2025placefm,
title={PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data},
author={Hashemi, Mohammad and Amiri, Hossein and Zufle, Andreas},
journal={arXiv preprint arXiv:2507.02921},
year={2025}
}