Skip to content

Detecting unused dependencies in CI #17550

@samueleresca

Description

@samueleresca

Is your feature request related to a problem or challenge?

(Raising this as followup this discussion)

It appears that there are multiple crates (included as dependencies or dev-dependencies) that are not being used across datafusion. I have run cargo machete to get a list out

A few notes:

  • The majority of the unused detections are correct (marked as CORRECT).
  • There is a minority of the unused detections that are false positive (marked as FALSE POSITIVE).

I linked a test PR that results from running cargo machete: #17545

Below is the extraction of unused deps detected by cargo machete:

datafusion-cli -- ./datafusion-cli/Cargo.toml:
        assert_cmd -- CORRECT
        predicates -- CORRECT
datafusion-datasource -- ./datafusion/datasource/Cargo.toml:
        parquet  -- CORRECT, referred by parquet feature
datafusion-physical-plan -- ./datafusion/physical-plan/Cargo.toml:
        tempfile -- CORRECT
datafusion-datasource-json -- ./datafusion/datasource-json/Cargo.toml:
        datafusion-catalog -- CORRECT
        datafusion-physical-expr -- CORRECT
        serde_json -- CORRECT
datafusion-datasource-avro -- ./datafusion/datasource-avro/Cargo.toml:
        chrono -- CORRECT
        datafusion-catalog -- CORRECT
        datafusion-execution -- CORRECT
        datafusion-physical-expr -- CORRECT
        rstest -- CORRECT
        tokio -- CORRECT
datafusion -- ./datafusion/core/Cargo.toml:
        dashmap -- CORRECT
        datafusion-doc -- CORRECT
        datafusion-macros -- CORRECT
        hex -- CORRECT, referred by parquet_encryption feature
datafusion-physical-expr -- ./datafusion/physical-expr/Cargo.toml:
        log -- CORRECT
datafusion-proto-common -- ./datafusion/proto-common/Cargo.toml:
        serde_json -- CORRECT, referred by "json" feature
datafusion-catalog -- ./datafusion/catalog/Cargo.toml:
        datafusion-sql -- CORRECT
datafusion-physical-expr-adapter -- ./datafusion/physical-expr-adapter/Cargo.toml:
        insta -- CORRECT
        rstest -- CORRECT
datafusion-pruning -- ./datafusion/pruning/Cargo.toml:
        arrow-schema -- CORRECT
datafusion-functions-aggregate -- ./datafusion/functions-aggregate/Cargo.toml:
        datafusion-doc -- FALSE POSITIVE
datafusion-wasmtest -- ./datafusion/wasmtest/Cargo.toml:
        chrono -- FALSE POSITIVE, chrono must be compiled with wasmbind feature
        getrandom -- FALSE POSITIVE,  "The \"wasm_js\" backend requires the `wasm_js` feature for `getrandom`
        insta -- CORRECT
datafusion-physical-optimizer -- ./datafusion/physical-optimizer/Cargo.toml:
        datafusion-functions-nested -- CORRECT
        log -- CORRECT
datafusion-catalog-listing -- ./datafusion/catalog-listing/Cargo.toml:
        datafusion-session
datafusion-datasource-parquet -- ./datafusion/datasource-parquet/Cargo.toml:
        datafusion-catalog -- CORRECT
        datafusion-physical-optimizer -- CORRECT
        hex -- CORRECT, referred by parquet_encryption feature
        rand -- CORRECT
datafusion-functions-window -- ./datafusion/functions-window/Cargo.toml:
        datafusion-doc -- FALSE POSITIVE
datafusion-spark -- ./datafusion/spark/Cargo.toml:
        datafusion-macros -- CORRECT
        xxhash-rust -- CORRECT
datafusion-datasource-csv -- ./datafusion/datasource-csv/Cargo.toml:
        datafusion-catalog -- CORRECT
        datafusion-physical-expr --CORRECT
datafusion-session -- ./datafusion/session/Cargo.toml:
        arrow -- CORRECT
        dashmap -- CORRECT
        datafusion-common-runtime -- CORRECT
        datafusion-physical-expr -- CORRECT
        datafusion-sql -- CORRECT
        futures -- CORRECT
        itertools -- CORRECT
        log -- CORRECT
        object_store -- CORRECT
        tokio -- CORRECT
datafusion-benchmarks -- ./benchmarks/Cargo.toml:
        test-utils -- CORRECT

I believe this topic has been discussed previously (e.g., apache/arrow-rs#6796), and likely I don't have the full picture.

I'm keen to gather feedback from the maintainers on:

  • whether this would be something useful
  • If it would cause issues, I'm not seeing caused by enabling non-default features.

Describe the solution you'd like

Include the cargo machete command as part of the CI. We will need to add some ignores at the workspace level. This includes:

  • datafusion-doc
  • getrandom in datafusion-wasmtest
  • chrono in datafusion-wasmtest

Describe alternatives you've considered

RustRover (or whatever preferred editor) does a good job in spotting unused crates. An alternative would be to keep the process manual and to check the unused crate locally and avoid blocking the CI (as is now).

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions