Skip to content

Conversation

silverweed
Copy link
Contributor

@silverweed silverweed commented Sep 23, 2025

This PR adds a first version of the new RFile prototype, implementing basic Open/Append/Recreate/Get/Put methods.

For a sneak peek of the final shape of the prototype, see here

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Copy link
Member

@hageboeck hageboeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the work!

Seeing the code sparked a lot of questions. Many of those might be either addressed in later PRs and some might be irrelevant, so feel free to "Resolve" those.

Copy link

github-actions bot commented Sep 23, 2025

Test Results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit bb01c06.

♻️ This comment has been updated with latest results.

@silverweed silverweed force-pushed the rfile_01_1 branch 3 times, most recently from 978733c to 7430133 Compare September 24, 2025 11:41
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, but in general I'll leave review to others...

@silverweed silverweed force-pushed the rfile_01_1 branch 2 times, most recently from 868ac3e to 6cbc229 Compare September 25, 2025 13:14
Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice take-off!

Copy link
Member

@hageboeck hageboeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Just a few small things below.

@silverweed silverweed added the clean build Ask CI to do non-incremental build on PR label Oct 7, 2025
@silverweed silverweed closed this Oct 7, 2025
@silverweed silverweed reopened this Oct 7, 2025
@silverweed silverweed merged commit 24d3272 into root-project:master Oct 7, 2025
29 of 55 checks passed
@silverweed silverweed deleted the rfile_01_1 branch October 7, 2025 11:32
/// Given `path`, returns the TKey corresponding to the object at that path (assuming the path is fully split, i.e.
/// "a/b/c" always means "object 'c' inside directory 'b' inside directory 'a'").
/// IMPORTANT: `path` must have been validated/normalized via ValidateAndNormalizePath() (see RFile.cxx).
TKey *GetTKey(std::string_view path) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider:

Suggested change
TKey *GetTKey(std::string_view path) const;
TKey *GetKey(std::string_view path) const;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use TKey in RFile whenever I want to point out that it's specifically a TKey - which should only be used internally and never exposed in the public API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is internal, it does not really matter that much. I still find it a bit awkward unless we envisioned the possibility that there would eventually be a RKey.

Comment on lines +293 to +296
} else if (key && !GetROOT()->IsBatch()) {
R__LOG_WARNING(RFileLog()) << "Tried to get object '" << path << "' of type " << cls->GetName()
<< " but that path contains an object of type " << key->GetClassName();
}
Copy link
Member

@pcanal pcanal Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is debatable. What if the user wants to use this in a script used interactively? For example, the inner part of the script might be interrogating what 'user type' of file this before moving forward (eg. do something for file containing TTree with one name and another for files containing a RNTuple with a same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is certainly debatable and I'm still not sure about the best compromise. I guess a flag could help, though it's a bit unergonomic. Another option would be to use R__LOG_INFO which iirc is off by default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this as the I/O meeting

Comment on lines +39 to +49
RFile is a modern and minimalistic interface to ROOT files, both local and remote, that can be used instead of TFile
when the following conditions are met:
- you want a simple interface that makes it easy to do things right and hard to do things wrong;
- you only need basic Put/Get operations and don't need the more advanced TFile/TDirectory functionalities;
- you want more robustness and better error reporting for those operations;
- you want clearer ownership semantics expressed through the type system rather than having objects "automagically"
handled for you via implicit ownership of raw pointers.
RFile doesn't try to cover the entirety of use cases covered by TFile/TDirectory/TDirectoryFile and is not
a 1:1 replacement for them. It is meant to simplify the most common use cases and make them easier to handle by
minimizing the amount of ROOT-specific quirks and conforming to more standard C++ practices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the tone a bit harsh. Consider:

RFile is a modern and minimalistic interface to ROOT files, both local and remote, that can be used instead of TFile
when you only need basic Put/Get operations and don't need the more advanced TFile/TDirectory functionalities.  It provides:
- a simple interface that makes it easy to do things right and hard to do things wrong,
- more robustness and better error reporting for those operations,
- clearer ownership semantics expressed through the type system.

RFile doesn't cover the entirety of use cases covered by TFile/TDirectory/TDirectoryFile and is not
a 1:1 replacement for them.  It is meant to simplify the most common use cases by following newer standard C++ practices.

## Directories
Differently from TFile, the RFile class itself is not also a "directory". In fact, there is no RDirectory class at all.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Differently from TFile, the RFile class itself is not also a "directory". In fact, there is no RDirectory class at all.
Unlike TFile, the RFile class does not function as a directory. Moreover, there is no RDirectory class provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think this is better than the current version?

Copy link
Member

@pcanal pcanal Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reads (better) to me 'english-wise'. Unlike is more (american) English than Differently from. The rest of that sentence is rephrase in a function way rather than indirectly focusing on the C++ semantic of inheritance. The In fact ... at all. was making the 2nd sentences stand (too much) as a reinforcement of the first rather than its owns message. Instead of saying 'we also do not plan to offer a standalone RDirectory facility', the In fact ... at all can be (mis?) read as 'RFile is not a directory solely because we are not implementing RDirectory [but if we did implement it, RFile would inherit from it]' (the last part is a slight exaggeration to clarify the drift)'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for the "unlike", I agree.
The point of the sentence for me was specifically to focus on the C++ semantic rather than the functionality (see here), as I think it's a relatively big change from both TFile and the older RFile prototype. However, rereading the sentence, maybe the intention was not that clear.
I have a bit of a problem with the "function as a directory" wording as it's not very clear in my opinion.
Perhaps I should just drop the paragraph altogether, I don't think it adds much value to the documentation after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can indeed make sense to drop the first 2 sentences and start at Directories are still ... maybe rephrased now as (for example)

Directories are semantically supported by the `RFile` interfaces as they are a concrete part of the ROOT binary format.
Directories are interacted with solely via the use of filesystem-like string-based paths. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if the updated phrasing works for you :)

Differently from TFile, the RFile class itself is not also a "directory". In fact, there is no RDirectory class at all.
Directories are still an existing concept in RFile (since they are a concept in the ROOT binary format),
but they are usually interacted with indirectly, via the use of filesystem-like string-based paths. If you Put an object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but they are usually interacted with indirectly, via the use of filesystem-like string-based paths. If you Put an object

Is the "usually" intentional? It implies that there is (or will be) other to interact with the directories. Is that the intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it implies that we "reserve the right" to also provide directory querying functionality at some point (which we know we'll probably need in some form)

Copy link
Member

@pcanal pcanal Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that 'usually' is effectively expressing "reserve the right". In the contrary, it reads as if there is already other ways. A closer adverb might 'for now only interacted with indirectly.'

Comment on lines +100 to +101
kPutAllowOverwrite = 0x1,
kPutOverwriteKeepCycle = 0x2,
Copy link
Member

@pcanal pcanal Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we document these flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are internal and pretty self-explanatory, but I can add doc comments to them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it took me to look at the code to understand the 2nd one, I first read it as:
"kPut Overwrite[/change] [the default behavior] Keep Cycle" [aka 'problably' not keep the cycle]
while I guess it is meant to be read as
"[The] kPut Overwrite [operation will] Keep [the existing last] Cycle"

The misunderstanding is based on my misunderstanding in that context of "Overwrite" as a noun or as a verb. The fact that in kPutAllowOverwrite the 2nd word is treated as a verb lead my brain to infer that the 2nd word of the other enum was also to be treated as verb :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I added documentation to them in the followup PR.


EXPECT_FALSE(file->Get<TH1D>("inexistent"));
EXPECT_FALSE(file->Get<TH1F>("hist"));
EXPECT_TRUE(file->Get<TH1>("hist"));
Copy link
Member

@pcanal pcanal Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delete the result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, the return value is a unique_ptr :)

{
FileRaii fileGuard("does_not_exist.root");

// make sure that the file really does not exist, in case a previous test didn't clean it up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test that intentionally create the file named 'does_not_exist.root' ? The comment seems to imply there is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// This succeeds because Update creates the file if it doesn't exist.
EXPECT_NO_THROW(RFile::Update("does_not_exist.root"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This answer: https://github.com/root-project/root/pull/19958/files#r2411952125.

I recommend using a different name:

Suggested change
EXPECT_NO_THROW(RFile::Update("does_not_exist.root"));
FileRaii updateFileGuard("created_by_update.root");
gSystem->Unlink(updateFileGuard.GetPath().c_str());
EXPECT_NO_THROW(RFile::Update("created_by_update.root"));

Comment on lines +151 to +152
EXPECT_EQ(hist->GetDirectory(), nullptr);
ASSERT_NE(hist, nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPECT_EQ(hist->GetDirectory(), nullptr);
ASSERT_NE(hist, nullptr);
ASSERT_NE(hist, nullptr);
EXPECT_EQ(hist->GetDirectory(), nullptr);

Otherwise in case of failure of the ASSERT_NE, the actual result is a segmentation fault in the EXPECT_EQ

{
FileRaii fileGuard("test_rfile_wrong.root.1");
ROOT::TestSupport::CheckDiagsRAII diagsRaii;
diagsRaii.requiredDiag(kWarning, "ROOT.File", "preferred file extension is \".root\"", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a warning rather than an Info? (Reminder: tools can tell whether a file is a ROOT file by reading the first few bytes of the file (MIME format)).


static void CheckExtension(std::string_view path)
{
if (ROOT::EndsWith(path, ".xml")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is .xml so special?

Copy link
Contributor Author

@silverweed silverweed Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because TFile::Open has a plugin that tries to open the file as XML if that's the extension and we don't want to support XML files in RFile (there might be more unwanted extensions that I missed, but this is the one I'm aware of).

Edit: I guess there's at least also .zip, we should ban that as well for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also .zip:

etc/plugins/TArchiveFile/P010_TZIPFile.C:   gPluginMgr->AddHandler("TArchiveFile", ".+[.]zip$", "TZIPFile",
etc/plugins/TFile/P080_TXMLFile.C:   gPluginMgr->AddHandler("TFile", ".+[.]xml$", "TXMLFile",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean build Ask CI to do non-incremental build on PR in:I/O
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants