Skip to content

Optimize pmtiles generation even more #1383

@jcpitre

Description

@jcpitre

This issue is to describe different ways we could optimize generation of pmtiles:

Optimization for process memory:

  • Read stop_times.txt via http and use chunking
    • Right now we copy the whole file to the in-memory file system then read it into the process. But we only need a subset of the information in this file. We could skip this copy operation by reading it in chunks viw http and we would save the in-memory disk space used.
  • Download files on demand and delete them right after they are used
    • Currently we download all files in one step, then process them. the in-memory file system uses the process memory, pushing the requirements for memory higher.
    • When possible, we should download a file, process it, then delete it before downloading another file.
    • Example of a feed with over 8GB of file data: mdb-2014 that cannot be processed right now.

Optimization for time:

  • We are currently constrained by the fact that we use cloud tasks that have a timeout of 30 minutes when using an http target. We can go over this time with big feeds.
  • TBD

Feel free to add to these lists. We can create separate issues as we work on items from these lists.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions