Skip to content

Conversation

georgiastuart
Copy link
Contributor

What problem this PR solves

This PR is primarily to fix a bug associated with Open Storage Network (OSN) S3 buckets. On OSN, users can't enumerate buckets in the root of the remote, so it leads to OOD believing that a bucket doesn't exist. For example, the bucket georgia-osn exists on my remote, but it doesn't register as valid since lsf on the parent is empty:

image

But navigating to a "subdirectory" works fine since the parent returns a file listing on lsf:

image

How this PR solves the problem

This PR simply adds an alternative route to verifying a bucket exists for S3 remotes. Instead of using lsf to enumerate the parent, it uses rclone test info --check-length and approves any match with an entry that's not maxFileLength = -1.

This has an unexpected side benefit of correctly rejecting an s3 file (not directory) for file browser navigation. Previously, it threw vague JSON error:

image

Now, it correctly identifies the file as not being a directory:

image

Improvements needed

I'm no Ruby expert, so this can probably be cleaned up. I went the straightforward route and duplicated the whole if-statement since efforts to reduce redundancy got more convoluted, somehow...

@johrstrom
Copy link
Contributor

Thanks! I'll take a look shortly.

@robinkar
Copy link
Contributor

robinkar commented Sep 24, 2025

Commenting here since this is going to affect the usage of our S3 storage quite heavily.

Purely from a user experience point of view, this adds quite a long delay to browsing the files for us, as it seems to attempt to write 48 different test files:

$ time rclone test info --check-length s3allas:/mybucket
// s3allas
maxFileLength = 1024 // for 1 byte unicode characters
maxFileLength = 512 // for 2 byte unicode characters
maxFileLength = 341 // for 3 byte unicode characters
maxFileLength = 256 // for 4 byte unicode characters

real	0m1.828s
user	0m0.330s
sys	0m0.101s

vs

$ time rclone lsf --low-level-retries=1 s3allas:/
...

real	0m0.149s
user	0m0.199s
sys	0m0.074s

I suppose --check-length may be relatively safe, but when I implemented the existing solution, I specifically avoided rclone test due to NB this can create undeletable files and other hazards - use with care (https://rclone.org/commands/rclone_test_info/).

I'm not familiar with OSN, but would there be any alternative commands that could be used. Do any of the commands (lsd, lsf, lsjson, etc) give useful output on the directory (bucket) itself if querying the root directory does not give a suitable answer?

@georgiastuart
Copy link
Contributor Author

Hm, not ideal. I was thinking that --check-length avoided writing test files since I was able to use it with RO keys.

Using lsf on the path itself doesn't reject buckets that don't exist, unfortunately, and is nearly equivalent to just returning true for any S3 remote. Since the concept of a directory doesn't map neatly to S3, that might be fine though?

@georgiastuart
Copy link
Contributor Author

Actually, thinking about it, running lsf on the bucket itself might work, though there's not a way to check if it's a directory. I feel that's a safe assumption for s3 though. I think I'm misremembering with checking a nonexistent sub-directory, which never returns an error. Will refactor in a bit.

@robinkar
Copy link
Contributor

I just refreshed my memory a bit on this.

The main reason why it doesn't use lsf on the directory itself is that is not enough information to determine whether that is a directory or a file.
I.e. /robinkar/directory/directory is a file, while /robinkar/directory is a "directory", yet both return the same output:

$ rclone lsf --dir-slash --max-depth 1 s3allas:/robinkar/directory/directory
directory
$ rclone lsf --dir-slash --max-depth 1 s3allas:/robinkar/directory
directory

However, checking the other commands, I noticed that --stat has been added to the lsjson command since the original solution was developed, which has quite promising output on our system at least:

$ rclone lsjson --stat s3allas:/
{
	"Path": "",
	"Name": "",
	"Size": -1,
	"MimeType": "inode/directory",
	"ModTime": "2025-09-24T15:10:22.593125949+03:00",
	"IsDir": true,
	"IsBucket": true
}
$ rclone lsjson --stat s3allas:/robinkar
{
	"Path": "",
	"Name": "",
	"Size": -1,
	"MimeType": "inode/directory",
	"ModTime": "2025-09-24T15:10:27.738932019+03:00",
	"IsDir": true
}
$ rclone lsjson --stat s3allas:/robinkar/directory
{
	"Path": "",
	"Name": "",
	"Size": -1,
	"MimeType": "inode/directory",
	"ModTime": "2025-09-24T15:10:30.748754601+03:00",
	"IsDir": true
}
$ rclone lsjson --stat s3allas:/robinkar/directory/directory
{
	"Path": "directory",
	"Name": "directory",
	"Size": 0,
	"MimeType": "application/octet-stream",
	"ModTime": "2025-09-24T14:59:24.652156511+03:00",
	"IsDir": false,
	"Tier": "STANDARD"
}
$ rclone lsjson --stat s3allas:/robinkar/directory/doesnotexist
{
	"Path": "",
	"Name": "",
	"Size": -1,
	"MimeType": "inode/directory",
	"ModTime": "2025-09-24T15:10:35.721886791+03:00",
	"IsDir": true
}

--stat seems to have been added in Rclone v1.57.0. This also seems generic enough that it could be used for all storage types other than S3 as well.
So a question to Jeff maybe, is there a minimum version we need to support? RHEL8 seems to ship 1.57.0 exactly (in EPEL).

@georgiastuart
Copy link
Contributor Author

georgiastuart commented Sep 29, 2025

@robinkar I refactored to use lsjon --stat preferentially, but to fall back to lsf if it returns anything but success or error code 3. Downside is it'll try both ways if there's something causing a failure, but considering most sites will have newer Rclone, I think it's probably OK.

Signed-off-by: Georgia Stuart <[email protected]>
Signed-off-by: Georgia Stuart <[email protected]>
Signed-off-by: Georgia Stuart <[email protected]>
Signed-off-by: Georgia Stuart <[email protected]>
Signed-off-by: Georgia Stuart <[email protected]>
@georgiastuart georgiastuart force-pushed the bug-fix/s3-remote-bucket-check branch from 2b6f867 to 043ac68 Compare September 30, 2025 13:58
@robinkar
Copy link
Contributor

robinkar commented Oct 2, 2025

@robinkar I refactored to use lsjon --stat preferentially, but to fall back to lsf if it returns anything but success or error code 3. Downside is it'll try both ways if there's something causing a failure, but considering most sites will have newer Rclone, I think it's probably OK.

Seems to work nicely with our object storage at least (S3 and Swift).
I think this kind of fallback is OK. Alternative could be some configuration, but I think most sites do indeed have newer Rclone nowadays.

@johrstrom
Copy link
Contributor

Thank you so much for looking into this @robinkar! I can't get to this right away, but I'll try to circle back shortly @georgiastuart. Seems like from @robinkar's last comment it's in a good state so I just need a bit to refresh my memory as well and get some other things off my plate.

@georgiastuart
Copy link
Contributor Author

Sounds good! I'm happy to refactor more if there's a better way.

@johrstrom johrstrom self-requested a review October 13, 2025 16:18
johrstrom
johrstrom previously approved these changes Oct 13, 2025
Copy link
Contributor

@johrstrom johrstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be touched up, but I'm good with this as it is.

Though I'd appreciate @robinkar's second opinion.

@johrstrom johrstrom closed this Oct 13, 2025
@johrstrom johrstrom reopened this Oct 13, 2025
@github-project-automation github-project-automation bot moved this from Awaiting Review to Merged/Closed in PR Review Pipeline Oct 13, 2025
@github-project-automation github-project-automation bot moved this from Merged/Closed to Awaiting Review in PR Review Pipeline Oct 13, 2025
@robinkar
Copy link
Contributor

I think it could be touched up, but I'm good with this as it is.

Though I'd appreciate @robinkar's second opinion.

Left a small comment regarding the regex, but I'm fine with this.

Copy link
Contributor

@johrstrom johrstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and made the change suggested by Robin.

Thanks for this addition @georgiastuart! It'll be in 4.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Awaiting Review

Development

Successfully merging this pull request may close these issues.

4 participants