xcube-stac
is a Python package and a xcube plugin
that provides a data store
for accessing data from STAC (SpatioTemporal Asset Catalogs).
A SpatioTemporal Asset Catalog (STAC) typically consists of three main components:
- Catalogs
- Collections
- Items
Each item represents a spatiotemporal observation and includes:
- a timestamp or temporal range
- a bounding box defining its spatial extent
- one or more assets, each linking to a data source (such as imagery or metadata)
Items within a collection generally share common characteristics. For example, a STAC catalog might have separate collections for different satellite data products. Each item would then correspond to a specific measurement covering a certain area at a particular time. In multi-spectral instruments, different bands are often stored as individual assets.
Most STAC catalogs conform to the STAC API - Item Search specification, enabling efficient server-side queries based on spatial, temporal, or attribute filters. Without this conformance, only client-side searches are possible, which can be slow for large catalogs.
The xcube-stac plugin reads data sources described by a STAC catalog and opens
them as xr.Dataset
that follows the xcube dataset convention.
By default:
- A data ID corresponds to a single STAC item.
- Each item is opened as a dataset, with each asset becoming a data variable within that dataset.
- In ARDC-mode data stores (
*-ardc
), a data ID can also correspond to a collection ID.
"stac"
: General STAC data store. Uses xcube's file-system data stores to access Zarr, NetCDF, or GeoTIFF sources."stac-xcube"
: Accesses datasets published by the xcube server STAC API."stac-cdse"
: Tailored for the CDSE STAC API. specific support is provided for the collections listed in Special support for the CDSE STAC API."stac-cdse-ardc"
: Generates 3D spatiotemporal analysis-ready data cubes (ARDCs) from multiple STAC items for the supported CDSE collections listed in Special support for the CDSE STAC API."stac-pc"
: Tailored for the Planetary Computer STAC API. Specific support is provided for the collections listed in Special support for the Planetary Computer STAC API."stac-pc-ardc"
: Generates ARDCs from multiple STAC items for the supported Planetary Computer collections listed in Special support for the Planetary Computer STAC API.
Some STAC catalogs are designed to enable the creation of analysis-ready data cubes (ARDCs) from multiple STAC items in a collection.
Currently, ARDC support is provided for:
The workflow for building a 3D analysis-ready cube includes:
- Querying products from the CDSE STAC API for a specified time range and spatial extent.
- Retrieving observations using a lazy-loading reader. (Different collections use different readers depending on the underlying data format.)
- Mosaicking spatial tiles into single images per timestamp.
- Stacking these mosaics along the temporal axis to produce a 3D data cube.
Note:
During evaluation, we also considered odc-stac and stackstac for stacking STAC items.
However, both libraries rely onrasterio.open
(GDAL drivers) to read data, which prevents accessing data directly from the CDSE S3 endpoint due to blocked AWS environments.Among them, a benchmark shows that
odc-stac
outperformsstackstac
. Additionally,stackstac
is less mature, with known issues (e.g., #196) handling COG overviews. Despite this, both are widely used in the community and may be supported in future releases.
Special support for the CDSE STAC API
Currently, we support the following collections and data IDs:
Special support for the Planetary Computer STAC API
Currently, we support the following collections and data IDs:
This section describes three alternative methods you can use to install the xcube-stac plugin.
For installation of conda packages, we recommend
mamba. It is also possible to use conda,
but note that installation may be significantly slower with conda than with
mamba. If using conda rather than mamba, replace the mamba
command with
conda
in the installation commands given below.
This method creates a new environment and installs the latest conda-forge release of xcube-stac, along with all its required dependencies, into the newly created environment.
To do so, execute the following commands:
mamba create --name xcube-stac --channel conda-forge xcube-stac
mamba activate xcube-stac
The name of the environment may be freely chosen.
This method assumes that you have an existing environment, and you want to install xcube-stac into it.
With the existing environment activated, execute this command:
mamba install --channel conda-forge xcube-stac
Once again, xcube and any other necessary dependencies will be installed automatically if they are not already installed.
If you want to install xcube-stac directly from the git repository (for example in order to use an unreleased version or to modify the code), you can do so as follows:
mamba create --name xcube-stac --channel conda-forge --only-deps xcube-stac
mamba activate xcube-stac
git clone https://github.com/xcube-dev/xcube-stac.git
python -m pip install --no-deps --editable xcube-stac/
This installs all the dependencies of xcube-stac into a fresh conda environment, then installs xcube-stac into this environment from the repository.
Note, this step is only needed, if the CDSE STAC API
wants to be used. In order to access EO data via S3 from CDSE
one needs to generate S3 credentials,
which are required to initiate a "stac-cdse"
data store. So far, only Sentinel-2 L2A
is supported. An example is shown in a notebook.
The following Jupyter notebooks provide some examples:
example/notebooks/geotiff_nonsearchable_catalog.ipynb
: This notebook shows an example how to load a GeoTIFF file from a non-searchable STAC catalog.example/notebooks/geotiff_searchable_catalog.ipynb
: This notebook shows an example how to load a GeoTIFF file from a searchable STAC catalog.example/notebooks/netcdf_searchable_catalog.ipynb
: This notebook shows an example of how to load a NetCDF file from a searchable STAC catalog.example/notebooks/sentinel_2_cdse.ipynb
: This notebook shows an example of how to access Sentinel-2 L1C and L2A data using the CDSE STAC API. It shows how to access individual observation tiles and how to generate spatiotemporal 3d analysis-ready data cubes from multiple STAC items.example/notebooks/sentinel_2_planetary_computer.ipynb
: This notebook shows an example of how to access Sentinel-2 L2A data using the Planetary Computer STAC API. It shows how to access individual observation tiles and how to generate spatiotemporal 3d analysis-ready data cubes from multiple STAC items.example/notebooks/sentinel_3_cdse.ipynb
: This notebook shows an example of how to access Sentinel-3 Synergy Level-2 Land Surface Reflectance and Aerosol product using the CDSE STAC API. It shows how to access individual observation tiles and how to generate spatiotemporal 3d analysis-ready data cubes from multiple STAC items.example/notebooks/xcube_server_stac_s3.ipynb
: This notebook shows an example of how to open data sources published by xcube server via the STAC API.
The xcube data store framework allows to access data, following the few lines of code below. In the following examples S3 credentials for CDSE data access is needed
from xcube.core.store import new_data_store
credentials = {
"key": "xxx",
"secret": "xxx",
}
store = new_data_store("stac-cdse", **credentials)
ds = store.open_data(
"collections/sentinel-2-l2a/items/S2B_MSIL2A_20200705T101559_N0500_R065_T32TMT_20230530T175912"
)
ds
The data ID "collections/sentinel-2-l2a/items/S2B_MSIL2A_20200705T101559_N0500_R065_T32TMT_20230530T175912"
points to the CDSE STAC item's JSON
and is specified by the segment of the URL that follows the catalog's URL.
To generate a 3D spatiotemporal datacubes, execute the following lines of code.
from xcube.core.store import new_data_store
credentials = {
"key": "xxx",
"secret": "xxx",
}
store = new_data_store("stac-cdse-ardc", **credentials)
ds = store.open_data(
data_id="sentinel-2-l2a",
bbox=[9.7, 53.3, 10.3, 53.8],
time_range=["2020-07-15", "2020-08-01"],
spatial_res=10 / 111320, # meter in degree
crs="EPSG:4326",
asset_names=["B02", "B03", "B04"],
)
In the stac-cdse-ardc
data store, the data IDs are the collection IDs within the STAC
catalog. To get Sentinel-2 L2A data, we assign data_id
to "sentinel-2-l2a"
in the
above example. The bounding box and time range are assigned to define the temporal and
spatial extent of the data cube. The parameter crs
and spatial_res
are required as
well and define the coordinate reference system (CRS) and the spatial resolution,
respectively. Note, that the bounding box and spatial resolution needs to be given
in the respective CRS.
The test suite uses pytest-recording
to mock STAC catalogs. To run the test suite, pytest
and pytest-recording
need to
be installed. Then, the test suite can be executed as usual by typing:
pytest
To analyze test coverage:
pytest --cov=xcube_stac
To produce an HTML coverage report:
pytest --cov-report html --cov=xcube_stac
The unit test suite uses pytest-recording
to mock STAC catalogs. During development an actual HTTP request is performed
to a STAC catalog and the responses are saved in cassettes/**.yaml
files.
During testing, only the cassettes/**.yaml
files are used without an actual
HTTP request. During development, to save the responses to cassettes/**.yaml
, run
pytest -v -s --record-mode new_episodes
Note that --record-mode new_episodes
overwrites all cassettes. If the user only
wants to write cassettes which are not saved already, --record-mode once
can be used.
pytest-recording supports all records modes given by VCR.py.
After recording the cassettes, testing can be performed as usual.