add parquet test data #30

yaqi-zhao · 2022-12-08T01:14:41Z

No description provided.

pitrou · 2022-12-13T14:52:27Z

Can you clarify the PR title and description to explain what this is about?
Can you fill in information about the data files in https://github.com/apache/parquet-testing/blob/master/data/README.md?

yaqi-zhao · 2022-12-14T06:31:09Z

Hi, @pitrou I submitted a PR to Apache/Arrow(apache/arrow#14585) and add a benchmark test which will use these files. The test intend to analyze the parquet reader performace with the different bit width packing.

pitrou · 2022-12-14T09:17:43Z

How long does it take to generate those files on the fly from the benchmarks?

In general parquet-testing is for interoperability testing between different Parquet implementations, not for benchmarking of individual implementations.

At worse we could use arrow-testing for that, but even then we should strive to make the files much smaller. We don't want to consume hundreds of MB just for a single set of benchmarks, IMHO.

yaqi-zhao force-pushed the master branch 3 times, most recently from 5f4a154 to 8834fc5 Compare December 14, 2022 07:18

add parquet test data

9ff2c51

yaqi-zhao force-pushed the master branch from 8834fc5 to 9ff2c51 Compare December 14, 2022 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add parquet test data #30

add parquet test data #30

Uh oh!

yaqi-zhao commented Dec 8, 2022

Uh oh!

pitrou commented Dec 13, 2022

Uh oh!

yaqi-zhao commented Dec 14, 2022

Uh oh!

pitrou commented Dec 14, 2022

Uh oh!

Uh oh!

add parquet test data #30

Are you sure you want to change the base?

add parquet test data #30

Uh oh!

Conversation

yaqi-zhao commented Dec 8, 2022

Uh oh!

pitrou commented Dec 13, 2022

Uh oh!

yaqi-zhao commented Dec 14, 2022

Uh oh!

pitrou commented Dec 14, 2022

Uh oh!

Uh oh!