Skip to content

Conversation

nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Sep 9, 2025

Make a shallow copy of an object, copy metadata only.
Limited to the same repository and branch.
Experimental

Closes #9499

Make a shallow copy of an object, copy metadata only.
Limited to the same repository and branch.
Experimental.
@nopcoder nopcoder requested a review from guy-har September 9, 2025 12:23
@nopcoder nopcoder self-assigned this Sep 9, 2025
@nopcoder nopcoder added area/API Improvements or additions to the API include-changelog PR description should be included in next release changelog labels Sep 9, 2025
Copy link

github-actions bot commented Sep 9, 2025

📚 Documentation preview at https://pr-9500.docs-lakefs-preview.io/

(Updated: 9/9/2025, 2:46:00 PM - Commit: 6c92dbd)

Comment on lines +2769 to +2774
if srcRepository != destRepository {
return nil, fmt.Errorf("%w: clone must be between the same repository", graveler.ErrInvalid)
}
if srcRef != destBranch {
return nil, fmt.Errorf("%w: clone must be between the same branch", graveler.ErrInvalid)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should default to the CopyEntry behaviour

dstEntry := *srcEntry
dstEntry.Path = destPath
dstEntry.CreationDate = time.Now()
err = c.CreateEntry(ctx, destRepository, destBranch, dstEntry, opts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not checking if you're allowed to use the same physical address! There are possible races with GC!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the conditions to allowed? this version does not take responsibility and it can cause issues if the caller will cause this race with GC.
It assume that the caller will use this API on uncommitted data.

@@ -2765,6 +2765,29 @@ func (c *Catalog) PrepareGCUncommitted(ctx context.Context, repositoryID string,
}, nil
}

func (c *Catalog) CloneEntry(ctx context.Context, srcRepository, srcRef, srcPath, destRepository, destBranch, destPath string, opts ...graveler.SetOptionsFunc) (*DBEntry, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not change CopyEntry behavior and create CloneEntry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC grace time is associated to the physical object creation. Here I'm not accessing the underlaying storage and I can't verify based on the entry if it is a valid 'clone' candidate.

This is why the additional 'mode' where the caller is responsible in the clone - assume that lakeFSFS will create clones while the data is uncommitted on the same branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaiad200 do you have a suggestion when to verify that the copy is a safe clone on lakeFS?

Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
IMO the documentation should be clear what this does and clear that there is some risk that the caller should consider

expectedError: graveler.ErrNotFound,
},
{
name: "successful_clone_with_metadata",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API Improvements or additions to the API include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Add support to copy object logical mode
3 participants