Skip to content

Avro: Allow reading ManifestList V1 using a V2 reader #1587

@Fokko

Description

@Fokko

Is your feature request related to a problem or challenge?

For reading Manifest/ManifestList using PyIceberg we want to have the interface as simple as possible. Therefore we want to enable reading V1 metadata (manifest-list/manifest) using a V2 reader. This is what we do today in PyIceberg, and it makes upgrading a table to a newer version much easier.

When trying to read all the manifests though Rust, I'm seeing the following error:

    @cached(cache=LRUCache(maxsize=128), key=lambda io, manifest_list: hashkey(manifest_list))
    def _manifests(io: FileIO, manifest_list: str) -> Tuple[ManifestFile, ...]:
        """Read and cache manifests from the given manifest list, returning a tuple to prevent modification."""
        bs = io.new_input(manifest_list).open().read()
        from pyiceberg_core import manifest
    
>       entries = list(manifest.read_manifest_list(bs).entries())
E       pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: DataInvalid => Failure in conversion with avro
E       
E       Source: Failed to deserialize Avro value into value: missing field `content`

This one refers to 517: content:

Image

The comment sais: use 0 for all v1 manifests. Therefore we can set the default value of Avro to 0 to populate the field in the case of V1 manifest-lists.

Describe the solution you'd like

Able to read a V1 manifest-list using a V2 reader.

Willingness to contribute

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions