feat: optimize pmtiles generation even more #1408

jcpitre · 2025-10-20T14:35:02Z

Closes: Too many features in routes.pmtiles for datasets without shapes #1391
Closes: Optimize pmtiles generation even more #1383

Refactored the PMTiles creation code so that each GTFS file is processed by a single class (sort of, the exceptions being RoutesProcessor and RoutesProcessorForColors that both process routes.txt). It makes it easier to know when to download the file and delete it after use.
Also corrected the problem with #1391 where each trip resulted in its own feature in routes.pmtiles, for datasets witout shapes. This was corrected by having multiple trips with the same stops put togetner in a single feature.

For #1391, tested with mdb-1026-202510060055 which has 431204 trips in trips.txt.
Before fixing, routes-output.geojson would have 431204 features.
After fixing there is 10292 features.

From our AI friend

This pull request refactors and improves the GTFS CSV processing pipeline by introducing a new base processor class, streamlining CSV parsing, and enhancing work directory management. The changes emphasize modularity, performance, and safer file handling, while removing unused or redundant code. Key improvements include the addition of a fast CSV parser, a context-managed work directory utility, and a simplified, robust interface for accessing CSV data.

Core pipeline and processing improvements:

Added a new BaseProcessor class in base_processor.py to standardize CSV processing, including encoding detection, safe file existence checks, and metric tracking. Subclasses override process_file() for custom logic.
Introduced AgenciesProcessor as an example subclass of BaseProcessor, demonstrating how to extract agency information from a GTFS file using the new pipeline.

CSV parsing and access enhancements:

Added FastCsvParser for efficient line-by-line parsing, using a heuristic to optimize for unquoted lines and tracking quoted line usage.
Refactored CsvCache to focus on path resolution and safe value extraction by index, removing in-memory caching and several relationship-building methods. Added static helpers for column index lookup and safe type conversion. [1] [2] [3] [4] [5]

Work directory and resource management:

Added EphemeralOrDebugWorkdir, a context manager for temporary or debug-mode working directories, with automatic cleanup of old directories and configurable behavior via environment variables.

Minor fixes and API consistency:

Renamed stop_txt_is_lat_log_required to stop_txt_is_lat_lon_required in gtfs.py for clarity; added a helper is_lat_lon_required. [1] [2]

These changes collectively modernize the pipeline, improve performance and maintainability, and lay the groundwork for further modular processors and robust CSV handling.Summary:

Please make sure these boxes are checked before submitting your pull request - thanks!

Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
Add or update any needed documentation to the repo
Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
Linked all relevant issues
Include screenshot(s) showing how this pull request works and fixes the issue(s)

jcpitre added 6 commits October 15, 2025 22:19

Major refactor to prevent loadin entire files in memory and +

da11ac8

Added file processors

448af38

Added ShapesProcessor

f5eea79

Added the writing of routes.json into RoutesProcessor

6ab58c7

Added a BaseProcessor and other improvements.

14c5c13

Added unit tests

7a93210

jcpitre linked an issue Oct 20, 2025 that may be closed by this pull request

Optimize pmtiles generation even more #1383

Open

jcpitre added 2 commits October 20, 2025 23:37

Correct a problem with routes with multiple trips.

2b74b8b

Added comments

e46d7eb

jcpitre changed the title ~~1383 optimize pmtiles generation even more~~ feat: optimize pmtiles generation even more Oct 21, 2025

jcpitre and others added 3 commits October 21, 2025 13:23

Added comments and one test file.

611817a

Merge branch 'main' into 1383-optimize-pmtiles-generation-even-more

09ed30c

Merge branch 'main' into 1383-optimize-pmtiles-generation-even-more

7d00a69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: optimize pmtiles generation even more #1408

feat: optimize pmtiles generation even more #1408

Uh oh!

jcpitre commented Oct 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: optimize pmtiles generation even more #1408

Are you sure you want to change the base?

feat: optimize pmtiles generation even more #1408

Uh oh!

Conversation

jcpitre commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

From our AI friend

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jcpitre commented Oct 20, 2025 •

edited

Loading