-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem or challenge?
(Raising this as followup this discussion)
It appears that there are multiple crates (included as dependencies
or dev-dependencies
) that are not being used across datafusion. I have run cargo machete
to get a list out
A few notes:
- The majority of the unused detections are correct (marked as
CORRECT
). - There is a minority of the unused detections that are false positive (marked as
FALSE POSITIVE
).
I linked a test PR that results from running cargo machete
: #17545
Below is the extraction of unused deps detected by cargo machete
:
datafusion-cli -- ./datafusion-cli/Cargo.toml:
assert_cmd -- CORRECT
predicates -- CORRECT
datafusion-datasource -- ./datafusion/datasource/Cargo.toml:
parquet -- CORRECT, referred by parquet feature
datafusion-physical-plan -- ./datafusion/physical-plan/Cargo.toml:
tempfile -- CORRECT
datafusion-datasource-json -- ./datafusion/datasource-json/Cargo.toml:
datafusion-catalog -- CORRECT
datafusion-physical-expr -- CORRECT
serde_json -- CORRECT
datafusion-datasource-avro -- ./datafusion/datasource-avro/Cargo.toml:
chrono -- CORRECT
datafusion-catalog -- CORRECT
datafusion-execution -- CORRECT
datafusion-physical-expr -- CORRECT
rstest -- CORRECT
tokio -- CORRECT
datafusion -- ./datafusion/core/Cargo.toml:
dashmap -- CORRECT
datafusion-doc -- CORRECT
datafusion-macros -- CORRECT
hex -- CORRECT, referred by parquet_encryption feature
datafusion-physical-expr -- ./datafusion/physical-expr/Cargo.toml:
log -- CORRECT
datafusion-proto-common -- ./datafusion/proto-common/Cargo.toml:
serde_json -- CORRECT, referred by "json" feature
datafusion-catalog -- ./datafusion/catalog/Cargo.toml:
datafusion-sql -- CORRECT
datafusion-physical-expr-adapter -- ./datafusion/physical-expr-adapter/Cargo.toml:
insta -- CORRECT
rstest -- CORRECT
datafusion-pruning -- ./datafusion/pruning/Cargo.toml:
arrow-schema -- CORRECT
datafusion-functions-aggregate -- ./datafusion/functions-aggregate/Cargo.toml:
datafusion-doc -- FALSE POSITIVE
datafusion-wasmtest -- ./datafusion/wasmtest/Cargo.toml:
chrono -- FALSE POSITIVE, chrono must be compiled with wasmbind feature
getrandom -- FALSE POSITIVE, "The \"wasm_js\" backend requires the `wasm_js` feature for `getrandom`
insta -- CORRECT
datafusion-physical-optimizer -- ./datafusion/physical-optimizer/Cargo.toml:
datafusion-functions-nested -- CORRECT
log -- CORRECT
datafusion-catalog-listing -- ./datafusion/catalog-listing/Cargo.toml:
datafusion-session
datafusion-datasource-parquet -- ./datafusion/datasource-parquet/Cargo.toml:
datafusion-catalog -- CORRECT
datafusion-physical-optimizer -- CORRECT
hex -- CORRECT, referred by parquet_encryption feature
rand -- CORRECT
datafusion-functions-window -- ./datafusion/functions-window/Cargo.toml:
datafusion-doc -- FALSE POSITIVE
datafusion-spark -- ./datafusion/spark/Cargo.toml:
datafusion-macros -- CORRECT
xxhash-rust -- CORRECT
datafusion-datasource-csv -- ./datafusion/datasource-csv/Cargo.toml:
datafusion-catalog -- CORRECT
datafusion-physical-expr --CORRECT
datafusion-session -- ./datafusion/session/Cargo.toml:
arrow -- CORRECT
dashmap -- CORRECT
datafusion-common-runtime -- CORRECT
datafusion-physical-expr -- CORRECT
datafusion-sql -- CORRECT
futures -- CORRECT
itertools -- CORRECT
log -- CORRECT
object_store -- CORRECT
tokio -- CORRECT
datafusion-benchmarks -- ./benchmarks/Cargo.toml:
test-utils -- CORRECT
I believe this topic has been discussed previously (e.g., apache/arrow-rs#6796), and likely I don't have the full picture.
I'm keen to gather feedback from the maintainers on:
- whether this would be something useful
- If it would cause issues, I'm not seeing caused by enabling non-default features.
Describe the solution you'd like
Include the cargo machete
command as part of the CI. We will need to add some ignores at the workspace level. This includes:
datafusion-doc
getrandom
in datafusion-wasmtestchrono
in datafusion-wasmtest
Describe alternatives you've considered
RustRover (or whatever preferred editor) does a good job in spotting unused crates. An alternative would be to keep the process manual and to check the unused crate locally and avoid blocking the CI (as is now).
Additional context
No response