Tabular API

An API service that provides RESTful access to CSV or tabular data converted by Hydra. This service provides a REST API to access PostgreSQL database tables containing CSV data, offering HTTP querying capabilities, pagination, and data streaming for CSV or tabular resources.

This service is mainly used, developed and maintained by data.gouv.fr - the France Open Data platform. The production API is deployed on data.gouv.fr infrastructure at https://tabular-api.data.gouv.fr/api. See the product documentation (in French) for usage details and the technical documentation for API reference.

🛠️ Installation & Setup

📋 Requirements

Python >= 3.11, < 3.13
uv for dependency management
Docker & Docker Compose

🧪 Run with a test database

Start the Infrastructure

Start this project via docker compose:
```
docker compose up
```
This starts PostgREST container and PostgreSQL container with fake test data. You can access the raw PostgREST API on http://localhost:8080.
Launch the main API proxy

Install dependencies and start the proxy services:
```
uv sync
uv run adev runserver -p8005 api_tabular/app.py        # Api related to apified CSV files by udata-hydra
uv run adev runserver -p8006 api_tabular/metrics.py    # Api related to udata's metrics
```
The main API provides a controlled layer over PostgREST - exposing PostgREST directly would be too permissive, so this adds a security and access control layer.
Test the API

Query the API using a resource_id. Several test resources are available in the fake database:
- aaaaaaaa-1111-bbbb-2222-cccccccccccc - Main test resource with 1000 rows
- aaaaaaaa-5555-bbbb-6666-cccccccccccc - Resource with database indexes
- dddddddd-7777-eeee-8888-ffffffffffff - Resource allowed for aggregation
- aaaaaaaa-9999-bbbb-1010-cccccccccccc - Resource with indexes and aggregation allowed

🏭 Run with a real Hydra database

To use the API with a real database served by Hydra instead of the fake test database:

Configure the PostgREST endpoint to point to your Hydra database:

export PGREST_ENDPOINT="http://your-hydra-postgrest:8080"

Or create a config.toml file:

PGREST_ENDPOINT = "http://your-hydra-postgrest:8080"

Start only the API services (skip the fake database):

uv sync
uv run adev runserver -p8005 api_tabular/app.py
uv run adev runserver -p8006 api_tabular/metrics.py

Use real resource IDs from your Hydra database instead of the test IDs.

Note: Make sure your Hydra PostgREST instance is accessible and the database schema matches the expected structure (tables in the csvapi schema).

📚 API Documentation

Resource Endpoints

Get Resource Metadata

GET /api/resources/{resource_id}/

Returns basic information about the resource including creation date, URL, and available endpoints.

Example:

curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/

Response:

{
  "created_at": "2023-04-21T22:54:22.043492+00:00",
  "url": "https://data.gouv.fr/datasets/example/resources/fake.csv",
  "links": [
    {
      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
      "type": "GET",
      "rel": "profile"
    },
    {
      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/",
      "type": "GET",
      "rel": "data"
    },
    {
      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
      "type": "GET",
      "rel": "swagger"
    }
  ]
}

Get Resource Profile

GET /api/resources/{resource_id}/profile/

Returns the CSV profile information (column types, headers, etc.) generated by csv-detective.

Example:

curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/

Response:

{
  "profile": {
    "header": [
        "id",
        "score",
        "decompte",
        "is_true",
        "birth",
        "liste"
    ]
  },
  "...": "..."
}

Get Resource Data

GET /api/resources/{resource_id}/data/

Returns the actual data with support for filtering, sorting, and pagination.

Example:

curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/

Response:

{
  "data": [
    {
        "__id": 1,
        "id": " 8c7a6452-9295-4db2-b692-34104574fded",
        "score": 0.708,
        "decompte": 90,
        "is_true": false,
        "birth": "1949-07-16",
        "liste": "[0]"
    },
    ...
  ],
  "links": {
      "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
      "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
      "next": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=20",
      "prev": null
  },
  "meta": {
      "page": 1,
      "page_size": 20,
      "total": 1000
  }
}

Get Resource Data as CSV

GET /api/resources/{resource_id}/data/csv/

Streams the data directly as a CSV file for download.

Get Resource Data as JSON

GET /api/resources/{resource_id}/data/json/

Streams the data directly as a JSON file for download.

Get Swagger Documentation

GET /api/resources/{resource_id}/swagger/

Returns OpenAPI/Swagger documentation specific to this resource.

Query Operators

The data endpoint can be queried with the following operators as query string (replacing column_name with the name of an actual column), if the column type allows it (see the swagger for each column's allowed parameter):

Filtering Operators

# exact value
column_name__exact=value

# differs
column_name__differs=value

# contains (for strings only)
column_name__contains=value

# in (value in list)
column_name__in=value1,value2,value3

# less
column_name__less=value

# greater
column_name__greater=value

# strictly less
column_name__strictly_less=value

# strictly greater
column_name__strictly_greater=value

Sorting

# sort by column
column_name__sort=asc
column_name__sort=desc

Aggregation Operators

⚠️ WARNING: Aggregation requests are only available for resources that are listed in the ALLOW_AGGREGATION list of the config file, which can be seen at the /api/aggregation-exceptions/ endpoint.

# group by values
column_name__groupby

# count values
column_name__count

# mean / average
column_name__avg

# minimum
column_name__min

# maximum
column_name__max

# sum
column_name__sum

Note: Passing an aggregation operator (count, avg, min, max, sum) returns a column that is named <column_name>__<operator> (for instance: ?birth__groupby&score__sum will return a list of dicts with the keys birth and score__sum).

Pagination

page=1          # Page number (default: 1)
page_size=20    # Items per page (default: 20, max: 50)

Column Selection

columns=col1,col2,col3    # Select specific columns only

Example Queries

Basic Filtering

curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?score__greater=0.9&decompte__exact=13

Returns:

{
  "data": [
    {
      "__id": 52,
      "id": " 5174f26d-d62b-4adb-a43a-c3b6288fa2f6",
      "score": 0.985,
      "decompte": 13,
      "is_true": false,
      "birth": "1980-03-23",
      "liste": "[0]"
    },
    {
      "__id": 543,
      "id": " 8705df7c-8a6a-49e2-9514-cf2fb532525e",
      "score": 0.955,
      "decompte": 13,
      "is_true": true,
      "birth": "1965-02-06",
      "liste": "[0, 1, 2]"
    }
  ],
  "links": {
    "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
    "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
    "next": null,
    "prev": null
  },
  "meta": {
    "page": 1,
    "page_size": 20,
    "total": 2
  }
}

Aggregation with Filtering

With filters and aggregators (filtering is always done before aggregation, no matter the order in the parameters):

curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?decompte__groupby&birth__less=1996&score__avg

i.e. decompte and average of score for all rows where birth<="1996", grouped by decompte, returns:

{
    "data": [
        {
            "decompte": 55,
            "score__avg": 0.7123333333333334
        },
        {
            "decompte": 27,
            "score__avg": 0.6068888888888889
        },
        {
            "decompte": 23,
            "score__avg": 0.4603333333333334
        },
        ...
    ]
}

Pagination

curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=30

Column Selection

curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?columns=id,score,birth

Metrics API

The metrics service provides similar functionality for system metrics:

# Get metrics data
curl http://localhost:8006/api/{model}/data/

# Get metrics as CSV
curl http://localhost:8006/api/{model}/data/csv/

Health Check

# Main API health
curl http://localhost:8005/health/

# Metrics API health
curl http://localhost:8006/health/

⚙️ Configuration

Configuration is handled through TOML files and environment variables. The default configuration is in api_tabular/config_default.toml.

Key Configuration Options

Option	Default	Description
`PGREST_ENDPOINT`	`http://localhost:8080`	PostgREST server URL
`SERVER_NAME`	`localhost:8005`	Server name for URL generation
`SCHEME`	`http`	URL scheme (http/https)
`SENTRY_DSN`	`None`	Sentry DSN for error reporting (optional)
`PAGE_SIZE_DEFAULT`	`20`	Default page size
`PAGE_SIZE_MAX`	`50`	Maximum allowed page size
`BATCH_SIZE`	`50000`	Batch size for streaming
`DOC_PATH`	`/api/doc`	Swagger documentation path
`ALLOW_AGGREGATION`	`["dddddddd-7777-eeee-8888-ffffffffffff", "aaaaaaaa-9999-bbbb-1010-cccccccccccc"]`	List of resource IDs allowed for aggregation

Environment Variables

You can override any configuration value using environment variables:

export PGREST_ENDPOINT="http://my-postgrest:8080"
export PAGE_SIZE_DEFAULT=50
export SENTRY_DSN="https://your-sentry-dsn"

Custom Configuration File

Create a config.toml file in the project root or set the CSVAPI_SETTINGS environment variable:

export CSVAPI_SETTINGS="/path/to/your/config.toml"

🧪 Testing

This project uses pytest for testing with async support and mocking capabilities. You must have the two tests containers running for the tests to run.

Running Tests

# Run all tests
poetry run pytest

# Run specific test file
poetry run pytest tests/test_api.py

# Run tests with verbose output
poetry run pytest -v

# Run tests and show print statements
poetry run pytest -s

Tests Structure

tests/test_api.py - API endpoint tests (actually pings the running API)
tests/test_config.py - Configuration loading tests
tests/test_query.py - Query building and processing tests
tests/test_swagger.py - Swagger documentation tests (actually pings the running API)
tests/test_utils.py - Utility function tests
tests/conftest.py - Test fixtures and configuration

CI/CD Testing

Tests are automatically run in CI/CD. See .circleci/config.yml for the complete CI/CD configuration.

🤝 Contributing

🧹 Code Linting and Formatting

This project follows PEP 8 style guidelines using Ruff for linting and formatting. Either running these commands manually or installing the pre-commit hook is required before submitting contributions.

# Lint and sort imports, and format code
poetry run ruff check  --select I --fix && poetry run ruff format

🔗 Pre-commit Hooks

This repository uses a pre-commit hook which lint and format code before each commit. Installing the pre-commit hook is required for contributions.

Install pre-commit hooks:

poetry run pre-commit install

The re-commit hook that automatically:

Check YAML syntax
Fix end-of-file issues
Remove trailing whitespace
Check for large files
Run Ruff linting and formatting

🧪 Running Tests

Pull requests cannot be merged unless all CI/CD tests pass. Tests are automatically run on every pull request and push to main branch. See .circleci/config.yml for the complete CI/CD configuration, and the 🧪 Testing section above for detailed testing commands.

📦 Version Management

The release process uses bump'X:

# To perform a dry run of version bumping
poetry run bumpx -v -d
# To bump version
poetry run bumpx

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Issues: GitHub Issues
Discussion: Use the discussion section at the end of the production API page
Contact Form: Support form

🌐 Production Resources

Production API: https://tabular-api.data.gouv.fr/api
Product Documentation: API tabulaire data.gouv.fr (beta) (in French)
Technical Documentation: Swagger/OpenAPI docs

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.circleci		.circleci
api_tabular		api_tabular
db		db
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
banner.png		banner.png
docker-compose.yml		docker-compose.yml
metrics_swagger.yaml		metrics_swagger.yaml
pyproject.toml		pyproject.toml
ressource_app_swagger.yaml		ressource_app_swagger.yaml
tag_version.sh		tag_version.sh
uv.lock		uv.lock

License

datagouv/api-tabular

Folders and files

Latest commit

History

Repository files navigation

Tabular API

🛠️ Installation & Setup

📋 Requirements

🧪 Run with a test database

🏭 Run with a real Hydra database

📚 API Documentation

Resource Endpoints

Get Resource Metadata

Get Resource Profile

Get Resource Data

Get Resource Data as CSV

Get Resource Data as JSON

Get Swagger Documentation

Query Operators

Filtering Operators

Sorting

Aggregation Operators

Pagination

Column Selection

Example Queries

Basic Filtering

Aggregation with Filtering

Pagination

Column Selection

Metrics API

Health Check

⚙️ Configuration

Key Configuration Options

Environment Variables

Custom Configuration File

🧪 Testing

Running Tests

Tests Structure

CI/CD Testing

🤝 Contributing

🧹 Code Linting and Formatting

🔗 Pre-commit Hooks

🧪 Running Tests

📦 Version Management

📄 License

🆘 Support

🌐 Production Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages