Skip to content

Avro write support - will you accept a patch? #615

Closed
@martin-traverse

Description

@martin-traverse

Describe the enhancement requested

Hi,

We need full Avro read / write support in our project and are working on an implementation. I had a look at what already exists in arrow-java, I think it would be fairly straightforward to extend what is there to get full read/write support in the Arrow Java project. Here is what I am proposing:

  • A set of producers to handle the Avro data structures, mirroring the existing consumers
  • Handle the high level file structure (header, embedded schema and block structure)
  • Support for compressed blocks (using the existing codecs in the Avro project)
  • High level APIs for read / write, including incremental read (block by block, corresponding to the VSR)

The last point is important for us because we handle streaming data, if we can check a whole block is available before reading it we should be able to prevent avoid on IO calls.

If I draft a PR along these lines, would there be interest to help me refine it and get it into arrow-java? If not we can do our own implementation which will be simpler because we don't need all the features and data types, but I think the delta is not that large and IMO it would be a good thing to have in the Arrow Java toolkit.

Thoughts welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions