Description
Describe the enhancement requested
Hi,
We need full Avro read / write support in our project and are working on an implementation. I had a look at what already exists in arrow-java, I think it would be fairly straightforward to extend what is there to get full read/write support in the Arrow Java project. Here is what I am proposing:
- A set of producers to handle the Avro data structures, mirroring the existing consumers
- Handle the high level file structure (header, embedded schema and block structure)
- Support for compressed blocks (using the existing codecs in the Avro project)
- High level APIs for read / write, including incremental read (block by block, corresponding to the VSR)
The last point is important for us because we handle streaming data, if we can check a whole block is available before reading it we should be able to prevent avoid on IO calls.
If I draft a PR along these lines, would there be interest to help me refine it and get it into arrow-java? If not we can do our own implementation which will be simpler because we don't need all the features and data types, but I think the delta is not that large and IMO it would be a good thing to have in the Arrow Java toolkit.
Thoughts welcome!