Skip to content

Add support for rm_file in GitHubFileSystem implementation #1839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 7, 2025

Conversation

mecit-san
Copy link
Contributor

@mecit-san mecit-san commented May 4, 2025

This PR adds support for the rm_file method in the GitHubFileSystem implementation of fsspec, enabling file deletion using the GitHub REST API.

The GitHub API for managing file contents requires:

  • A commit message for each action (add/update/delete), similar to editing files via the GitHub web interface.
  • The SHA of the target file’s latest version when deleting or updating, to avoid race conditions.

This implementation adheres to those requirements, making rm_file work reliably for GitHub-backed filesystems.

Note: This PR focuses solely on file deletion. I plan to follow up with a separate PR to implement the remaining methods and functionality (e.g., adding file, updating file, etc.) for full GitHubFileSystem support.

Fixes #1836

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking generally good.

@martindurant martindurant merged commit b0c88ec into fsspec:master May 7, 2025
10 checks passed
@mecit-san
Copy link
Contributor Author

Thanks for the review, @martindurant.

By the way, based on merge workflow, it seems that initializing the filesystem with incorrect credentials (mock data) behaves differently in Python 3.10 compared to other versions — it raises an authentication error. Since this issue doesn't appear to be related to the rm call, should we consider removing this test or handling it differently? 🤔

fs = fsspec.filesystem(
    "github", org="mwaskom", repo="seaborn-data", username="user", token="token"
)

@martindurant
Copy link
Member

You mean as opposed to NotFound? The test can be version-specific if need be.

@mecit-san
Copy link
Contributor Author

This is the line where the test fails due to a 403 error:

fs = fsspec.filesystem(
    "github", org="mwaskom", repo="seaborn-data", username="user", token="token"
)

The error occurs when attempting to retrieve the SHA inside __init__, using the provided token.

At first, I suspected the issue was related to the Python version, since the failure only appeared under Python 3.10 test workflow. However, after running the same test locally using Miniconda with Python 3.10, it passed without issues.

I now believe the failure is due to GitHub rejecting unauthenticated or invalid-token access to certain endpoints. This can happen if the API rate limit for unauthenticated requests has been exceeded.

@martindurant

@martindurant
Copy link
Member

I suppose it's fine. We can add more docs around it if people start noticing this. Particularly around the new delete (and write) functionality, we don't expect anyone to come with an invalid token!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: add write to github implementation
2 participants