Skip to content

Conversation

kingcrimsontianyu
Copy link
Contributor

No description provided.

@kingcrimsontianyu kingcrimsontianyu added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 5, 2024
@kingcrimsontianyu kingcrimsontianyu self-assigned this Nov 5, 2024
Copy link

copy-pr-bot bot commented Dec 12, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kingcrimsontianyu kingcrimsontianyu changed the base branch from branch-24.12 to branch-25.02 December 12, 2024 13:55
raydouglass and others added 22 commits January 23, 2025 15:03
…ang-tidy (rapidsai#594)

This small PR applies east const style using `clang-format`. This makes KvikIO consistent with cuDF in coding style. The following parameters were applied to `.clang-format` for auto-reformatting. The file `.clang-format` itself is not updated in this PR.
```
QualifierAlignment: Custom
QualifierOrder: [inline, static, type, const, volatile]
```
In addition, this PR fixes minor "missing header" issues reported by `clang-tidy`.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#594
Forward-merge branch-25.02 to branch-25.04
This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster.

xref: rapidsai/build-infra#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: rapidsai#605
Contributes to rapidsai/build-planning#146

Proposes:

* setting `[tool.scikit-build].ninja.make-fallback = false`, so `scikit-build-core` will not silently fallback to using GNU Make if `ninja` is not available

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#612
The nightly package versions are currently not being correctly assigned.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#616
`shellcheck` is a fast, static analysis tool for shell scripts. It's good at
flagging up unused variables, unintentional glob expansions, and other potential
execution and security headaches that arise from the wonders of `bash` (and
other shlangs).

This PR adds a `pre-commit` hook to run `shellcheck` on all of the `sh-lang`
files in the `ci/` directory, and the changes requested by `shellcheck` to make
the existing files pass the check.

xref: rapidsai/build-planning#135

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#621
Exposes `build_type` as an input in `test.yaml` so that `test.yaml` can be
manually run against a specific branch/commit as needed.

The default value is still `nightly`, and without maintainer intervention, that
is what will run each night.

xref rapidsai/build-planning#147

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#620
This completes the migration to NVKS runners now that all libraries have been tested and rapidsai/shared-workflows#273 has been merged.

xref: rapidsai/build-infra#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#623
Enables telemetry during kvikio CI runs. This is done by parsing GitHub Actions run log metadata and should have no impact on build or test times.

xref rapidsai/build-infra#139

Authors:
  - Mike Sarahan (https://github.com/msarahan)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#615
This uses the `RAPIDS_PACKAGE_VERSION` values set in rapidsai#616. This ensures we have consistent nightly versions.

This PR is independent of rapidsai#622 (it is needed regardless of whether that PR is closed or merged).

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#628
Forward-merge branch-25.02 into branch-25.04
Forward-merge branch-25.02 into branch-25.04
…c and async I/O to improve code readability (rapidsai#608)

This PR improves the readability of compatibility mode handling.

The current way of determining FileHandle's compatibility mode is somewhat complicated and unintuitive. The data member `_compat_mode` accompanied by some utility functions more or less combines 3 different things into one:

- The initially requested compat mode (`ON`/`OFF`/`AUTO`)
- The capability of performing synchronous cuFile I/O (bool)
- The capability of performing asynchronous cufile I/O (bool)

The disadvantages include:
- `FileHandle::is_compat_mode_preferred()` always derives the preferred compat mode on the fly as opposed to getting an already determined value.
- `FileHandle::is_compat_mode_preferred_for_async(CompatMode)` is potentially throwing, which is asymmetric to `is_compat_mode_preferred()`. Also when the compat mode is `OFF`, it has to invoke `is_stream_api_available()` and `config_path()` on each pass instead of getting an already determined value.
- There is no way to retrieve what the original requested compat mode is.

These add to cognitive burden when rereading the source to introduce new features to FileHandle. This PR attempts to improve the logic by making it concise and crystal clear.

This PR also fixes a line number bug in error handling.

This PR is breaking in that the rarely used public functions to query the compat mode data in the `FileHandle` are removed. These data are instead queryable via the new `CompatModeManager` class.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#608
This PR implements the basic feature outlined in rapidsai#631. 
The two good-to-haves are currently blocked.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#630
This PR addresses the tutorial contribution mentioned in rapidsai#580

Authors:
  - Yiheng Wang (https://github.com/yiheng-wang-nv)
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#597
Forward-merge branch-25.02 into branch-25.04
vyasr and others added 30 commits June 13, 2025 16:49
…rt (rapidsai#754)

On arm cufile was not supported until CUDA 12.2, whereas support exists since 12.0 on x86 architectures. To properly reflect these dependencies, we need to build separate variants of cufile on arm for cuda versions before and after 12.2. This PR updates the recipe to support that.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#754
KvikIO unit test has a utility class `EnvVarContext` class, introduced in rapidsai#700, and slightly improved in rapidsai#735. It has been identified that this class was incorrectly initialized, resulting in UB: it causes the unit test failure in C++20, which by fluke was not observed in C++17. This PR fixes this error. Specifically, the constructor of `EnvVarContext` is:
```
EnvVarContext(std::initializer_list<std::pair<std::string_view, std::string_view>> env_var_entries);
```
There are several ways of instantiation:
```
// Direct initialization
EnvVarContext env_var_ctx({{"env_1", "v1"}, {"env_2", "v2"}});

// Direct list initialization
EnvVarContext env_var_ctx{{"env_1", "v1"}, {"env_2", "v2"}};

// Copy list initialization
EnvVarContext env_var_ctx = {{"env_1", "v1"}, {"env_2", "v2"}};
```
The erroneous instantiation performed is:
```
// Extra pair of braces
// {}: brace-enclosed initializer list
// {{"env_1", "v1"}, {"env_2", "v2"}}: one element of type pair<std::string_view, std::string_view>
// {"env_1", "v1"}: first
// {"env_2", "v2"}: second
EnvVarContext env_var_ctx{{{"env_1", "v1"}, {"env_2", "v2"}}};
``` 
As a result, the initializer list only has 1 pair, with the key being `{"env_1", "v1"}` and value being `{"env_2", "v2"}`. For the key, for instance, the 5-th overload (https://en.cppreference.com/w/cpp/string/basic_string_view/basic_string_view.html) of the constructor was used, where `first` points to "env_1" and `last` points to "v1". Since the two iterators do not form a valid range, UB ensues.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#751
This PR introduces a utility function to clear page cache in C++ and Python.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#741
Simplify the logic necessary to handle compiler versions by folding it into the context.

Authors:
  - https://github.com/jakirkham

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#755
…sai#758)

We need to drop the CUDA patch version from `cuda_version`.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#758
Now that we have dropped support for CUDA 11 we no longer require the nvidia channel.
With the changes in rapidsai/rapids-dask-dependency#85, RAPIDS now only uses released versions of dask, so we no longer need the dask channel either.
This PR also removes the explicit cufile dependence in the kvikio conda packages, which should no longer be necessary now that we have variants of the libkvikio package for different CUDA versions handling this dependency (see rapidsai#754).

Contributes to rapidsai/build-planning#184

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#759
Use CUDA 12.9 throughout different build and test environments.

Authors:
  - https://github.com/jakirkham

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#762
Contributes to rapidsai/shared-workflows#376

* adds descriptions for all inputs to workflows triggered by `workflow_dispatch`

## Notes for Reviewers

### Motivation

The input descriptions show up in the UI when you go to trigger these workflows. Like this:

![image](https://github.com/user-attachments/assets/fc62d1ff-39eb-47c7-9a21-57aab959e64f)

I'm hoping that will make it easier for developers to manually trigger workflows. Inspired by being asked multiple times "what format is `date` supposed to be in?".

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: rapidsai#764
AWS S3 object key name is case sensitive. Current implementation of `open_s3_url` converts all the letters of URL to lowercase before passing it to the wrapped C++ library. As a result, if the object key name contains any capital letter, the following error will occur:
```
RuntimeError: KvikIO failure at: /home/coder/kvikio/cpp/src/shim/libcurl.cpp:176: curl_easy_perform() error (The requested URL returned error: 404)
```
This PR fixes this issue by forwarding the user-provided URL as-is.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#765
KvikIO remote I/O interface requires users to provide a buffer to read the remote data into. The following pattern is often used:
```python
import cupy as cp

# Create a remote handle from the URL
remote_handle = kvikio.RemoteFile.open_s3_url(url)

# Query the remote file size and preallocate the user-provided buffer
buf = cp.empty(remote_handle.nbytes(), dtype=cp.int8)

# Read into the buffer
fut = remote_handle.pread(buf)
fut.get()
```
Currently in Cython, the `extern` method `nbytes()` (remote file size) is given a return type of `int`, whereas its initial return type in the C++ library is `std::size_t`. The `int` here is interpreted as the `int` in C++ as opposed to the variable-length `int` in Python. Consequently, integer overflow occurs when reading from a large-size file, in which case `nbytes()` returns negative values.
This PR fixes this bug.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#766
As part of rapidsai#768, remove CUDA 11 from docs.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Tom Augspurger (https://github.com/TomAugspurger)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - https://github.com/jakirkham

URL: rapidsai#769
…apidsai#771)

In rapidsai/build-planning#187 we switched the docker image tagging scheme
over to include the CalVer information.  This was done to allow us to make
changes to the images during burndown without breaking release pipelines.

This PR moves all of the existing `latest` tags to the newer versioned tag
`25.08-latest` and also modifies the `update_version.sh` script to bump
that version at branch creation time.

xref: rapidsai/build-planning#187

Authors:
  - Gil Forsyth (https://github.com/gforsyth)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#771
Forward-merge branch-25.08 into branch-25.10
The `nvcomp` conda package is being split into a C++ package `libnvcomp` and a Python bindings package `nvcomp`. We want to use the C++ package only, so we are adopting `libnvcomp`.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: rapidsai#774
Forward-merge branch-25.08 into branch-25.10
As part of rapidsai#768, remove CUDA 11 workarounds that should not anymore be necessary given CUDA 11 support is being dropped.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - https://github.com/jakirkham

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

URL: rapidsai#770
Forward-merge branch-25.08 into branch-25.10
rapidsai#740)

This PR introduces memory-mapped I/O (`MmapHandle`) as an alternative to the standard I/O (`FileHandle`).

The benchmark results are at rapidsai#530 (comment)

Partially addresses rapidsai#530

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#740
Forward-merge branch-25.08 into branch-25.10
This PR removes the OS suffix from devcontainers, allowing the upstream devcontainer images to determine the OS version.

Contributes to rapidsai/build-planning#200.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: rapidsai#780
cuDF PR rapidsai/cudf#19164 currently has 4 failed unit tests when `LIBCUDF_MMAP_ENABLED=ON`:
```
28 - CSV_TEST (Failed)
29 - ORC_TEST (Failed)
32 - JSON_TEST (Failed)
40 - DATA_CHUNK_SOURCE_TEST (Failed)
```
The fix entails code changes on both the KvikIO and cuDF sides.
On the KvikIO side, the `MmapHandle::read()` and `MmapHandle::pread()` methods need to:
- Allow the read size to be 0
- Allow `offset` to be equal to `initial_map_offset` (when the read size is 0)

This PR makes this change. In addition, this PR adds more detailed error messages when out-of-range exception occurs.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#781
conda-forge is migrating to gcc 14, so this PR is updating for alignment.

See rapidsai/build-planning#188

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: rapidsai#756
rapids_config will use `RAPIDS_BRANCH` contents to determine what branch to use

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#776
This PR changes KvikIO C++ standard from 17 to 20.

Depends on rapidsai#751

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.