Description
It would be useful to treat each layer in a docker images as a package of its own.
Why? They are a thing that can be fetched individually and even if a single layer is not of much value alone, this can technically be used alone and when stored as a package (say in the purldb) this becomes something that can be reused (e.g., reuse the scan, analysis, etc.). Of course if we start treating each layer as a "package" the approach to combining the results of multiple overlaid layers would change as we would have possibly two ways of scanning a layer (and therefore two different scan contents:
- Scan a layer solo, in which case we may get many details, such as the details of all the system packages if any system package was installed in that layer. In effect the package databases contain everything: not only the packages currently installed, but also all the packages installed in previous layers
- Scan a layer in context, e.g., after scanning the previous layer and subtracting from that layer packages installed in previous layers (This is the current behaviour)
I am not sure a layer can ever be reused in abstract of its parent layer or rather not always as this would lead to aberrations, so there is some research to do there before committing to one or the other approach.
These would be some of the actual specific issues to work out:
- FetchCode: Fetch image/layer metadata from container registries
- PurlDB: Identify and index base layers for common images in container registries for lookup and matching
- ScanCode.io: Match container layers to PurlDB