Skip to content

Conversation

yaqi-zhao
Copy link

No description provided.

@pitrou
Copy link
Member

pitrou commented Dec 13, 2022

Hi @yaqi-zhao ,

  1. Can you clarify the PR title and description to explain what this is about?
  2. Can you fill in information about the data files in https://github.com/apache/parquet-testing/blob/master/data/README.md?

@yaqi-zhao
Copy link
Author

Hi, @pitrou I submitted a PR to Apache/Arrow(apache/arrow#14585) and add a benchmark test which will use these files. The test intend to analyze the parquet reader performace with the different bit width packing.

@yaqi-zhao yaqi-zhao force-pushed the master branch 3 times, most recently from 5f4a154 to 8834fc5 Compare December 14, 2022 07:18
@pitrou
Copy link
Member

pitrou commented Dec 14, 2022

How long does it take to generate those files on the fly from the benchmarks?

In general parquet-testing is for interoperability testing between different Parquet implementations, not for benchmarking of individual implementations.

At worse we could use arrow-testing for that, but even then we should strive to make the files much smaller. We don't want to consume hundreds of MB just for a single set of benchmarks, IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants